Re: [PATCH v3 08/12] memory-hotplug: remove memmap of sparse-vmemmap

2012-11-26 Thread Jianguo Wu
On 2012/11/27 14:49, Wen Congyang wrote:

> At 11/27/2012 01:47 PM, Jianguo Wu Wrote:
>> On 2012/11/1 17:44, Wen Congyang wrote:
>>
>>> From: Yasuaki Ishimatsu 
>>>
>>> All pages of virtual mapping in removed memory cannot be freed, since some 
>>> pages
>>> used as PGD/PUD includes not only removed memory but also other memory. So 
>>> the
>>> patch checks whether page can be freed or not.
>>>
>>> How to check whether page can be freed or not?
>>>  1. When removing memory, the page structs of the revmoved memory are filled
>>> with 0FD.
>>>  2. All page structs are filled with 0xFD on PT/PMD, PT/PMD can be cleared.
>>> In this case, the page used as PT/PMD can be freed.
>>>
>>> Applying patch, __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is integrated
>>> into one. So __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is deleted.
>>>
>>> Note:  vmemmap_kfree() and vmemmap_free_bootmem() are not implemented for 
>>> ia64,
>>> ppc, s390, and sparc.
>>>
>>> CC: David Rientjes 
>>> CC: Jiang Liu 
>>> CC: Len Brown 
>>> CC: Christoph Lameter 
>>> Cc: Minchan Kim 
>>> CC: Andrew Morton 
>>> CC: KOSAKI Motohiro 
>>> CC: Wen Congyang 
>>> Signed-off-by: Yasuaki Ishimatsu 
>>> ---
>>>  arch/ia64/mm/discontig.c  |   8 
>>>  arch/powerpc/mm/init_64.c |   8 
>>>  arch/s390/mm/vmem.c   |   8 
>>>  arch/sparc/mm/init_64.c   |   8 
>>>  arch/x86/mm/init_64.c | 119 
>>> ++
>>>  include/linux/mm.h|   2 +
>>>  mm/memory_hotplug.c   |  17 +--
>>>  mm/sparse.c   |   5 +-
>>>  8 files changed, 158 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/arch/ia64/mm/discontig.c b/arch/ia64/mm/discontig.c
>>> index 33943db..0d23b69 100644
>>> --- a/arch/ia64/mm/discontig.c
>>> +++ b/arch/ia64/mm/discontig.c
>>> @@ -823,6 +823,14 @@ int __meminit vmemmap_populate(struct page *start_page,
>>> return vmemmap_populate_basepages(start_page, size, node);
>>>  }
>>>  
>>> +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
>>> +{
>>> +}
>>> +
>>> +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
>>> +{
>>> +}
>>> +
>>>  void register_page_bootmem_memmap(unsigned long section_nr,
>>>   struct page *start_page, unsigned long size)
>>>  {
>>> diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
>>> index 6466440..df7d155 100644
>>> --- a/arch/powerpc/mm/init_64.c
>>> +++ b/arch/powerpc/mm/init_64.c
>>> @@ -298,6 +298,14 @@ int __meminit vmemmap_populate(struct page *start_page,
>>> return 0;
>>>  }
>>>  
>>> +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
>>> +{
>>> +}
>>> +
>>> +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
>>> +{
>>> +}
>>> +
>>>  void register_page_bootmem_memmap(unsigned long section_nr,
>>>   struct page *start_page, unsigned long size)
>>>  {
>>> diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
>>> index 4f4803a..ab69c34 100644
>>> --- a/arch/s390/mm/vmem.c
>>> +++ b/arch/s390/mm/vmem.c
>>> @@ -236,6 +236,14 @@ out:
>>> return ret;
>>>  }
>>>  
>>> +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
>>> +{
>>> +}
>>> +
>>> +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
>>> +{
>>> +}
>>> +
>>>  void register_page_bootmem_memmap(unsigned long section_nr,
>>>   struct page *start_page, unsigned long size)
>>>  {
>>> diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
>>> index 75a984b..546855d 100644
>>> --- a/arch/sparc/mm/init_64.c
>>> +++ b/arch/sparc/mm/init_64.c
>>> @@ -2232,6 +2232,14 @@ void __meminit vmemmap_populate_print_last(void)
>>> }
>>>  }
>>>  
>>> +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
>>> +{
>>> +}
>>> +
>>> +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
>>> +{
>>> +}
>>> +
>>>  void register_page_bootmem_memmap(unsigned long section_nr,
>>>   struct page *start_page, unsigned long size)
>>>  {
>>> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
>>> index 795dae3..e85626d 100644
>>> --- a/arch/x86/mm/init_64.c
>>> +++ b/arch/x86/mm/init_64.c
>>> @@ -998,6 +998,125 @@ vmemmap_populate(struct page *start_page, unsigned 
>>> long size, int node)
>>> return 0;
>>>  }
>>>  
>>> +#define PAGE_INUSE 0xFD
>>> +
>>> +unsigned long find_and_clear_pte_page(unsigned long addr, unsigned long 
>>> end,
>>> +   struct page **pp, int *page_size)
>>> +{
>>> +   pgd_t *pgd;
>>> +   pud_t *pud;
>>> +   pmd_t *pmd;
>>> +   pte_t *pte = NULL;
>>> +   void *page_addr;
>>> +   unsigned long next;
>>> +
>>> +   *pp = NULL;
>>> +
>>> +   pgd = pgd_offset_k(addr);
>>> +   if (pgd_none(*pgd))
>>> +   return pgd_addr_end(addr, end);
>>> +
>>> +   pud = pud_offset(pgd, addr);
>>> +   if (pud_none(*pud))
>>> +   return pud_addr_end(addr, end);
>>> +
>>> +   if (!cp

Re: [PATCH v3 08/12] memory-hotplug: remove memmap of sparse-vmemmap

2012-11-26 Thread Wen Congyang
At 11/27/2012 01:47 PM, Jianguo Wu Wrote:
> On 2012/11/1 17:44, Wen Congyang wrote:
> 
>> From: Yasuaki Ishimatsu 
>>
>> All pages of virtual mapping in removed memory cannot be freed, since some 
>> pages
>> used as PGD/PUD includes not only removed memory but also other memory. So 
>> the
>> patch checks whether page can be freed or not.
>>
>> How to check whether page can be freed or not?
>>  1. When removing memory, the page structs of the revmoved memory are filled
>> with 0FD.
>>  2. All page structs are filled with 0xFD on PT/PMD, PT/PMD can be cleared.
>> In this case, the page used as PT/PMD can be freed.
>>
>> Applying patch, __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is integrated
>> into one. So __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is deleted.
>>
>> Note:  vmemmap_kfree() and vmemmap_free_bootmem() are not implemented for 
>> ia64,
>> ppc, s390, and sparc.
>>
>> CC: David Rientjes 
>> CC: Jiang Liu 
>> CC: Len Brown 
>> CC: Christoph Lameter 
>> Cc: Minchan Kim 
>> CC: Andrew Morton 
>> CC: KOSAKI Motohiro 
>> CC: Wen Congyang 
>> Signed-off-by: Yasuaki Ishimatsu 
>> ---
>>  arch/ia64/mm/discontig.c  |   8 
>>  arch/powerpc/mm/init_64.c |   8 
>>  arch/s390/mm/vmem.c   |   8 
>>  arch/sparc/mm/init_64.c   |   8 
>>  arch/x86/mm/init_64.c | 119 
>> ++
>>  include/linux/mm.h|   2 +
>>  mm/memory_hotplug.c   |  17 +--
>>  mm/sparse.c   |   5 +-
>>  8 files changed, 158 insertions(+), 17 deletions(-)
>>
>> diff --git a/arch/ia64/mm/discontig.c b/arch/ia64/mm/discontig.c
>> index 33943db..0d23b69 100644
>> --- a/arch/ia64/mm/discontig.c
>> +++ b/arch/ia64/mm/discontig.c
>> @@ -823,6 +823,14 @@ int __meminit vmemmap_populate(struct page *start_page,
>>  return vmemmap_populate_basepages(start_page, size, node);
>>  }
>>  
>> +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
>> +{
>> +}
>> +
>> +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
>> +{
>> +}
>> +
>>  void register_page_bootmem_memmap(unsigned long section_nr,
>>struct page *start_page, unsigned long size)
>>  {
>> diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
>> index 6466440..df7d155 100644
>> --- a/arch/powerpc/mm/init_64.c
>> +++ b/arch/powerpc/mm/init_64.c
>> @@ -298,6 +298,14 @@ int __meminit vmemmap_populate(struct page *start_page,
>>  return 0;
>>  }
>>  
>> +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
>> +{
>> +}
>> +
>> +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
>> +{
>> +}
>> +
>>  void register_page_bootmem_memmap(unsigned long section_nr,
>>struct page *start_page, unsigned long size)
>>  {
>> diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
>> index 4f4803a..ab69c34 100644
>> --- a/arch/s390/mm/vmem.c
>> +++ b/arch/s390/mm/vmem.c
>> @@ -236,6 +236,14 @@ out:
>>  return ret;
>>  }
>>  
>> +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
>> +{
>> +}
>> +
>> +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
>> +{
>> +}
>> +
>>  void register_page_bootmem_memmap(unsigned long section_nr,
>>struct page *start_page, unsigned long size)
>>  {
>> diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
>> index 75a984b..546855d 100644
>> --- a/arch/sparc/mm/init_64.c
>> +++ b/arch/sparc/mm/init_64.c
>> @@ -2232,6 +2232,14 @@ void __meminit vmemmap_populate_print_last(void)
>>  }
>>  }
>>  
>> +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
>> +{
>> +}
>> +
>> +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
>> +{
>> +}
>> +
>>  void register_page_bootmem_memmap(unsigned long section_nr,
>>struct page *start_page, unsigned long size)
>>  {
>> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
>> index 795dae3..e85626d 100644
>> --- a/arch/x86/mm/init_64.c
>> +++ b/arch/x86/mm/init_64.c
>> @@ -998,6 +998,125 @@ vmemmap_populate(struct page *start_page, unsigned 
>> long size, int node)
>>  return 0;
>>  }
>>  
>> +#define PAGE_INUSE 0xFD
>> +
>> +unsigned long find_and_clear_pte_page(unsigned long addr, unsigned long end,
>> +struct page **pp, int *page_size)
>> +{
>> +pgd_t *pgd;
>> +pud_t *pud;
>> +pmd_t *pmd;
>> +pte_t *pte = NULL;
>> +void *page_addr;
>> +unsigned long next;
>> +
>> +*pp = NULL;
>> +
>> +pgd = pgd_offset_k(addr);
>> +if (pgd_none(*pgd))
>> +return pgd_addr_end(addr, end);
>> +
>> +pud = pud_offset(pgd, addr);
>> +if (pud_none(*pud))
>> +return pud_addr_end(addr, end);
>> +
>> +if (!cpu_has_pse) {
>> +next = (addr + PAGE_SIZE) & PAGE_MASK;
>> +pmd = pmd_offset(pud, addr);
>> +if (pmd_none(*pmd))
>> +ret

Re: [PATCH v3 08/12] memory-hotplug: remove memmap of sparse-vmemmap

2012-11-26 Thread Wen Congyang
At 11/27/2012 01:47 PM, Jianguo Wu Wrote:
> On 2012/11/1 17:44, Wen Congyang wrote:
> 
>> From: Yasuaki Ishimatsu 
>>
>> All pages of virtual mapping in removed memory cannot be freed, since some 
>> pages
>> used as PGD/PUD includes not only removed memory but also other memory. So 
>> the
>> patch checks whether page can be freed or not.
>>
>> How to check whether page can be freed or not?
>>  1. When removing memory, the page structs of the revmoved memory are filled
>> with 0FD.
>>  2. All page structs are filled with 0xFD on PT/PMD, PT/PMD can be cleared.
>> In this case, the page used as PT/PMD can be freed.
>>
>> Applying patch, __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is integrated
>> into one. So __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is deleted.
>>
>> Note:  vmemmap_kfree() and vmemmap_free_bootmem() are not implemented for 
>> ia64,
>> ppc, s390, and sparc.
>>
>> CC: David Rientjes 
>> CC: Jiang Liu 
>> CC: Len Brown 
>> CC: Christoph Lameter 
>> Cc: Minchan Kim 
>> CC: Andrew Morton 
>> CC: KOSAKI Motohiro 
>> CC: Wen Congyang 
>> Signed-off-by: Yasuaki Ishimatsu 
>> ---
>>  arch/ia64/mm/discontig.c  |   8 
>>  arch/powerpc/mm/init_64.c |   8 
>>  arch/s390/mm/vmem.c   |   8 
>>  arch/sparc/mm/init_64.c   |   8 
>>  arch/x86/mm/init_64.c | 119 
>> ++
>>  include/linux/mm.h|   2 +
>>  mm/memory_hotplug.c   |  17 +--
>>  mm/sparse.c   |   5 +-
>>  8 files changed, 158 insertions(+), 17 deletions(-)
>>
>> diff --git a/arch/ia64/mm/discontig.c b/arch/ia64/mm/discontig.c
>> index 33943db..0d23b69 100644
>> --- a/arch/ia64/mm/discontig.c
>> +++ b/arch/ia64/mm/discontig.c
>> @@ -823,6 +823,14 @@ int __meminit vmemmap_populate(struct page *start_page,
>>  return vmemmap_populate_basepages(start_page, size, node);
>>  }
>>  
>> +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
>> +{
>> +}
>> +
>> +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
>> +{
>> +}
>> +
>>  void register_page_bootmem_memmap(unsigned long section_nr,
>>struct page *start_page, unsigned long size)
>>  {
>> diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
>> index 6466440..df7d155 100644
>> --- a/arch/powerpc/mm/init_64.c
>> +++ b/arch/powerpc/mm/init_64.c
>> @@ -298,6 +298,14 @@ int __meminit vmemmap_populate(struct page *start_page,
>>  return 0;
>>  }
>>  
>> +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
>> +{
>> +}
>> +
>> +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
>> +{
>> +}
>> +
>>  void register_page_bootmem_memmap(unsigned long section_nr,
>>struct page *start_page, unsigned long size)
>>  {
>> diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
>> index 4f4803a..ab69c34 100644
>> --- a/arch/s390/mm/vmem.c
>> +++ b/arch/s390/mm/vmem.c
>> @@ -236,6 +236,14 @@ out:
>>  return ret;
>>  }
>>  
>> +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
>> +{
>> +}
>> +
>> +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
>> +{
>> +}
>> +
>>  void register_page_bootmem_memmap(unsigned long section_nr,
>>struct page *start_page, unsigned long size)
>>  {
>> diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
>> index 75a984b..546855d 100644
>> --- a/arch/sparc/mm/init_64.c
>> +++ b/arch/sparc/mm/init_64.c
>> @@ -2232,6 +2232,14 @@ void __meminit vmemmap_populate_print_last(void)
>>  }
>>  }
>>  
>> +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
>> +{
>> +}
>> +
>> +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
>> +{
>> +}
>> +
>>  void register_page_bootmem_memmap(unsigned long section_nr,
>>struct page *start_page, unsigned long size)
>>  {
>> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
>> index 795dae3..e85626d 100644
>> --- a/arch/x86/mm/init_64.c
>> +++ b/arch/x86/mm/init_64.c
>> @@ -998,6 +998,125 @@ vmemmap_populate(struct page *start_page, unsigned 
>> long size, int node)
>>  return 0;
>>  }
>>  
>> +#define PAGE_INUSE 0xFD
>> +
>> +unsigned long find_and_clear_pte_page(unsigned long addr, unsigned long end,
>> +struct page **pp, int *page_size)
>> +{
>> +pgd_t *pgd;
>> +pud_t *pud;
>> +pmd_t *pmd;
>> +pte_t *pte = NULL;
>> +void *page_addr;
>> +unsigned long next;
>> +
>> +*pp = NULL;
>> +
>> +pgd = pgd_offset_k(addr);
>> +if (pgd_none(*pgd))
>> +return pgd_addr_end(addr, end);
>> +
>> +pud = pud_offset(pgd, addr);
>> +if (pud_none(*pud))
>> +return pud_addr_end(addr, end);
>> +
>> +if (!cpu_has_pse) {
>> +next = (addr + PAGE_SIZE) & PAGE_MASK;
>> +pmd = pmd_offset(pud, addr);
>> +if (pmd_none(*pmd))
>> +ret

Re: [PATCH] cpuidle: Measure idle state durations with monotonic clock

2012-11-26 Thread Len Brown

>>  drivers/idle/intel_idle.c   |   14 +-

Acked-by: Len Brown 

thanks!
-Len

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v3 08/12] memory-hotplug: remove memmap of sparse-vmemmap

2012-11-26 Thread Jianguo Wu
On 2012/11/1 17:44, Wen Congyang wrote:

> From: Yasuaki Ishimatsu 
> 
> All pages of virtual mapping in removed memory cannot be freed, since some 
> pages
> used as PGD/PUD includes not only removed memory but also other memory. So the
> patch checks whether page can be freed or not.
> 
> How to check whether page can be freed or not?
>  1. When removing memory, the page structs of the revmoved memory are filled
> with 0FD.
>  2. All page structs are filled with 0xFD on PT/PMD, PT/PMD can be cleared.
> In this case, the page used as PT/PMD can be freed.
> 
> Applying patch, __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is integrated
> into one. So __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is deleted.
> 
> Note:  vmemmap_kfree() and vmemmap_free_bootmem() are not implemented for 
> ia64,
> ppc, s390, and sparc.
> 
> CC: David Rientjes 
> CC: Jiang Liu 
> CC: Len Brown 
> CC: Christoph Lameter 
> Cc: Minchan Kim 
> CC: Andrew Morton 
> CC: KOSAKI Motohiro 
> CC: Wen Congyang 
> Signed-off-by: Yasuaki Ishimatsu 
> ---
>  arch/ia64/mm/discontig.c  |   8 
>  arch/powerpc/mm/init_64.c |   8 
>  arch/s390/mm/vmem.c   |   8 
>  arch/sparc/mm/init_64.c   |   8 
>  arch/x86/mm/init_64.c | 119 
> ++
>  include/linux/mm.h|   2 +
>  mm/memory_hotplug.c   |  17 +--
>  mm/sparse.c   |   5 +-
>  8 files changed, 158 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/ia64/mm/discontig.c b/arch/ia64/mm/discontig.c
> index 33943db..0d23b69 100644
> --- a/arch/ia64/mm/discontig.c
> +++ b/arch/ia64/mm/discontig.c
> @@ -823,6 +823,14 @@ int __meminit vmemmap_populate(struct page *start_page,
>   return vmemmap_populate_basepages(start_page, size, node);
>  }
>  
> +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
> +{
> +}
> +
> +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
> +{
> +}
> +
>  void register_page_bootmem_memmap(unsigned long section_nr,
> struct page *start_page, unsigned long size)
>  {
> diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
> index 6466440..df7d155 100644
> --- a/arch/powerpc/mm/init_64.c
> +++ b/arch/powerpc/mm/init_64.c
> @@ -298,6 +298,14 @@ int __meminit vmemmap_populate(struct page *start_page,
>   return 0;
>  }
>  
> +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
> +{
> +}
> +
> +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
> +{
> +}
> +
>  void register_page_bootmem_memmap(unsigned long section_nr,
> struct page *start_page, unsigned long size)
>  {
> diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
> index 4f4803a..ab69c34 100644
> --- a/arch/s390/mm/vmem.c
> +++ b/arch/s390/mm/vmem.c
> @@ -236,6 +236,14 @@ out:
>   return ret;
>  }
>  
> +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
> +{
> +}
> +
> +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
> +{
> +}
> +
>  void register_page_bootmem_memmap(unsigned long section_nr,
> struct page *start_page, unsigned long size)
>  {
> diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
> index 75a984b..546855d 100644
> --- a/arch/sparc/mm/init_64.c
> +++ b/arch/sparc/mm/init_64.c
> @@ -2232,6 +2232,14 @@ void __meminit vmemmap_populate_print_last(void)
>   }
>  }
>  
> +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
> +{
> +}
> +
> +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
> +{
> +}
> +
>  void register_page_bootmem_memmap(unsigned long section_nr,
> struct page *start_page, unsigned long size)
>  {
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 795dae3..e85626d 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -998,6 +998,125 @@ vmemmap_populate(struct page *start_page, unsigned long 
> size, int node)
>   return 0;
>  }
>  
> +#define PAGE_INUSE 0xFD
> +
> +unsigned long find_and_clear_pte_page(unsigned long addr, unsigned long end,
> + struct page **pp, int *page_size)
> +{
> + pgd_t *pgd;
> + pud_t *pud;
> + pmd_t *pmd;
> + pte_t *pte = NULL;
> + void *page_addr;
> + unsigned long next;
> +
> + *pp = NULL;
> +
> + pgd = pgd_offset_k(addr);
> + if (pgd_none(*pgd))
> + return pgd_addr_end(addr, end);
> +
> + pud = pud_offset(pgd, addr);
> + if (pud_none(*pud))
> + return pud_addr_end(addr, end);
> +
> + if (!cpu_has_pse) {
> + next = (addr + PAGE_SIZE) & PAGE_MASK;
> + pmd = pmd_offset(pud, addr);
> + if (pmd_none(*pmd))
> + return next;
> +
> + pte = pte_offset_kernel(pmd, addr);
> + if (pte_none(*pte))
> + return next;
> +
> + *page_size

Re: [PATCH] cpuidle: Measure idle state durations with monotonic clock

2012-11-26 Thread Len Brown
On 11/15/2012 04:04 AM, Preeti Murthy wrote:
> Hi all,
> 
> The code looks correct and inviting to me as it has led to good cleanups.
> I dont think passing 0 as the argument to the function
> sched_clock_idle_wakeup_event()
> should lead to problems,as it does not do anything useful with the
> passed arguments.
> 
> My only curiosity is what was the purpose of passing idle residency time to
> sched_clock_idle_wakeup_event() when this data could always be retrieved from
> dev->last_residency for each cpu,which gets almost immediately updated.

sched_clock_idle_wakeup_event() is part of the scheduler.
The scheduler doesn't know what a cpuidle_device is, and
probably should not grow such a dependency.

cheers,
-Len Brown, Intel Open Source Technology Center

> But this does not seem to come in way of this patch for now.Anyway I
> have added Peter to
> the list so that he can opine about this issue if possible and needed.
> 
> Reviewed-by: Preeti U Murthy 
> 
> 
> Regards
> Preeti U Murthy
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/3] powerpc: Relocate prom_init.c on 64bit

2012-11-26 Thread Benjamin Herrenschmidt
On Tue, 2012-11-27 at 14:39 +1100, Anton Blanchard wrote:
> The ppc64 kernel can get loaded at any address which means
> our very early init code in prom_init.c must be relocatable. We do
> this with a pretty nasty RELOC() macro that we wrap accesses of
> variables with. It is very fragile and sometimes we forget to add a
> RELOC() to an uncommon path or sometimes a compiler change breaks it.
> 
> 32bit has a much more elegant solution where we build prom_init.c
> with -mrelocatable and then process the relocations manually.
> Unfortunately we can't do the equivalent on 64bit and we would
> have to build the entire kernel relocatable (-pie), resulting in a
> large increase in kernel footprint (megabytes of relocation data).
> The relocation data will be marked __initdata but it still creates
> more pressure on our already tight memory layout at boot.
> 
> Alan Modra pointed out that the 64bit ABI is relocatable even
> if we don't build with -pie, we just need to relocate the TOC.
> This patch implements that idea and relocates the TOC entries of
> prom_init.c. An added bonus is there are very few relocations to
> process which helps keep boot times on simulators down.
> 
> gcc does not put 64bit integer constants into the TOC but to be
> safe we may want a build time script which passes through the
> prom_init.c TOC entries to make sure everything looks reasonable.

My only potential objection was that it might have been cleaner to
actually build prom_init.c as a separate binary alltogether and piggy
back it... 

Ben.

> Signed-off-by: Anton Blanchard 
> --- 
> 
> To keep the patch small and reviewable, I separated the removal
> of the RELOC macro into a follow up patch.
> 
> For simplicity I do the relocation in C but if self brain surgery
> keeps people up at night we can move it into assembly.
> 
> Index: b/arch/powerpc/kernel/prom_init.c
> ===
> --- a/arch/powerpc/kernel/prom_init.c
> +++ b/arch/powerpc/kernel/prom_init.c
> @@ -66,8 +66,8 @@
>   * is running at whatever address it has been loaded at.
>   * On ppc32 we compile with -mrelocatable, which means that references
>   * to extern and static variables get relocated automatically.
> - * On ppc64 we have to relocate the references explicitly with
> - * RELOC.  (Note that strings count as static variables.)
> + * ppc64 objects are always relocatable, we just need to relocate the
> + * TOC.
>   *
>   * Because OF may have mapped I/O devices into the area starting at
>   * KERNELBASE, particularly on CHRP machines, we can't safely call
> @@ -79,13 +79,12 @@
>   * On ppc64, 64 bit values are truncated to 32 bits (and
>   * fortunately don't get interpreted as two arguments).
>   */
> +#define RELOC(x) (x)
> +#define ADDR(x)  (u32)(unsigned long)(x)
> +
>  #ifdef CONFIG_PPC64
> -#define RELOC(x)(*PTRRELOC(&(x)))
> -#define ADDR(x)  (u32) add_reloc_offset((unsigned long)(x))
>  #define OF_WORKAROUNDS   0
>  #else
> -#define RELOC(x) (x)
> -#define ADDR(x)  (u32) (x)
>  #define OF_WORKAROUNDS   of_workarounds
>  int of_workarounds;
>  #endif
> @@ -334,9 +333,6 @@ static void __init prom_printf(const cha
>   struct prom_t *_prom = &RELOC(prom);
>  
>   va_start(args, format);
> -#ifdef CONFIG_PPC64
> - format = PTRRELOC(format);
> -#endif
>   for (p = format; *p != 0; p = q) {
>   for (q = p; *q != 0 && *q != '\n' && *q != '%'; ++q)
>   ;
> @@ -437,9 +433,6 @@ static unsigned int __init prom_claim(un
>  
>  static void __init __attribute__((noreturn)) prom_panic(const char *reason)
>  {
> -#ifdef CONFIG_PPC64
> - reason = PTRRELOC(reason);
> -#endif
>   prom_print(reason);
>   /* Do not call exit because it clears the screen on pmac
>* it also causes some sort of double-fault on early pmacs */
> @@ -929,7 +922,7 @@ static void __init prom_send_capabilitie
>* (we assume this is the same for all cores) and use it to
>* divide NR_CPUS.
>*/
> - cores = (u32 
> *)PTRRELOC(&ibm_architecture_vec[IBM_ARCH_VEC_NRCORES_OFFSET]);
> + cores = (u32 
> *)&ibm_architecture_vec[IBM_ARCH_VEC_NRCORES_OFFSET];
>   if (*cores != NR_CPUS) {
>   prom_printf("WARNING ! "
>   "ibm_architecture_vec structure 
> inconsistent: %lu!\n",
> @@ -2850,6 +2843,53 @@ static void __init prom_check_initrd(uns
>  #endif /* CONFIG_BLK_DEV_INITRD */
>  }
>  
> +#ifdef CONFIG_PPC64
> +#ifdef CONFIG_RELOCATABLE
> +static void reloc_toc(void)
> +{
> +}
> +
> +static void unreloc_toc(void)
> +{
> +}
> +#else
> +static void __reloc_toc(void *tocstart, unsigned long offset,
> + unsigned long nr_entries)
> +{
> + unsigned long i;
> + unsigned long *toc_entry = (unsigned long *)tocstart;
> +
> + for (i = 0; i < nr_entries; i++) {
> + *

Re: [PATCH 1/2] vfio powerpc: implemented IOMMU driver for VFIO

2012-11-26 Thread David Gibson
On Tue, Nov 27, 2012 at 03:58:14PM +1100, Alexey Kardashevskiy wrote:
> On 27/11/12 15:29, Alex Williamson wrote:
> >On Tue, 2012-11-27 at 15:06 +1100, Alexey Kardashevskiy wrote:
> >>On 27/11/12 05:20, Alex Williamson wrote:
> >>>On Fri, 2012-11-23 at 20:03 +1100, Alexey Kardashevskiy wrote:
> VFIO implements platform independent stuff such as
> a PCI driver, BAR access (via read/write on a file descriptor
> or direct mapping when possible) and IRQ signaling.
> 
> The platform dependent part includes IOMMU initialization
> and handling. This patch implements an IOMMU driver for VFIO
> which does mapping/unmapping pages for the guest IO and
> provides information about DMA window (required by a POWERPC
> guest).
> 
> The counterpart in QEMU is required to support this functionality.
> 
> Cc: David Gibson 
> Signed-off-by: Alexey Kardashevskiy 
> ---
>    drivers/vfio/Kconfig|6 +
>    drivers/vfio/Makefile   |1 +
>    drivers/vfio/vfio_iommu_spapr_tce.c |  247 
>  +++
>    include/linux/vfio.h|   20 +++
>    4 files changed, 274 insertions(+)
>    create mode 100644 drivers/vfio/vfio_iommu_spapr_tce.c
> 
> diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> index 7cd5dec..b464687 100644
> --- a/drivers/vfio/Kconfig
> +++ b/drivers/vfio/Kconfig
> @@ -3,10 +3,16 @@ config VFIO_IOMMU_TYPE1
>   depends on VFIO
>   default n
> 
> +config VFIO_IOMMU_SPAPR_TCE
> + tristate
> + depends on VFIO && SPAPR_TCE_IOMMU
> + default n
> +
>    menuconfig VFIO
>   tristate "VFIO Non-Privileged userspace driver framework"
>   depends on IOMMU_API
>   select VFIO_IOMMU_TYPE1 if X86
> + select VFIO_IOMMU_SPAPR_TCE if PPC_POWERNV
>   help
> VFIO provides a framework for secure userspace device drivers.
> See Documentation/vfio.txt for more details.
> diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
> index 2398d4a..72bfabc 100644
> --- a/drivers/vfio/Makefile
> +++ b/drivers/vfio/Makefile
> @@ -1,3 +1,4 @@
>    obj-$(CONFIG_VFIO) += vfio.o
>    obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
> +obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
>    obj-$(CONFIG_VFIO_PCI) += pci/
> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
> b/drivers/vfio/vfio_iommu_spapr_tce.c
> new file mode 100644
> index 000..46a6298
> --- /dev/null
> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> @@ -0,0 +1,247 @@
> +/*
> + * VFIO: IOMMU DMA mapping support for TCE on POWER
> + *
> + * Copyright (C) 2012 IBM Corp.  All rights reserved.
> + * Author: Alexey Kardashevskiy 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * Derived from original vfio_iommu_type1.c:
> + * Copyright (C) 2012 Red Hat, Inc.  All rights reserved.
> + * Author: Alex Williamson 
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define DRIVER_VERSION  "0.1"
> +#define DRIVER_AUTHOR   "a...@ozlabs.ru"
> +#define DRIVER_DESC "VFIO IOMMU SPAPR TCE"
> +
> +static void tce_iommu_detach_group(void *iommu_data,
> + struct iommu_group *iommu_group);
> +
> +/*
> + * VFIO IOMMU fd for SPAPR_TCE IOMMU implementation
> + */
> +
> +/*
> + * The container descriptor supports only a single group per container.
> + * Required by the API as the container is not supplied with the IOMMU 
> group
> + * at the moment of initialization.
> + */
> +struct tce_container {
> + struct mutex lock;
> + struct iommu_table *tbl;
> +};
> +
> +static void *tce_iommu_open(unsigned long arg)
> +{
> + struct tce_container *container;
> +
> + if (arg != VFIO_SPAPR_TCE_IOMMU) {
> + printk(KERN_ERR "tce_vfio: Wrong IOMMU type\n");
> + return ERR_PTR(-EINVAL);
> + }
> +
> + container = kzalloc(sizeof(*container), GFP_KERNEL);
> + if (!container)
> + return ERR_PTR(-ENOMEM);
> +
> + mutex_init(&container->lock);
> +
> + return container;
> +}
> +
> +static void tce_iommu_release(void *iommu_data)
> +{
> + struct tce_container *container = iommu_data;
> +
> + WARN_ON(container->tbl && !container->tbl->it_group);
> >>>
> >>>I think your patch ordering is backwards here.  it_group isn't added
> >>>until 2/2.  I'd really like to see the arch/powerpc code approved and
> >

Re: [PATCH 1/2] vfio powerpc: implemented IOMMU driver for VFIO

2012-11-26 Thread Alex Williamson
On Tue, 2012-11-27 at 15:58 +1100, Alexey Kardashevskiy wrote:
> On 27/11/12 15:29, Alex Williamson wrote:
> > On Tue, 2012-11-27 at 15:06 +1100, Alexey Kardashevskiy wrote:
> >> On 27/11/12 05:20, Alex Williamson wrote:
> >>> On Fri, 2012-11-23 at 20:03 +1100, Alexey Kardashevskiy wrote:
>  VFIO implements platform independent stuff such as
>  a PCI driver, BAR access (via read/write on a file descriptor
>  or direct mapping when possible) and IRQ signaling.
> 
>  The platform dependent part includes IOMMU initialization
>  and handling. This patch implements an IOMMU driver for VFIO
>  which does mapping/unmapping pages for the guest IO and
>  provides information about DMA window (required by a POWERPC
>  guest).
> 
>  The counterpart in QEMU is required to support this functionality.
> 
>  Cc: David Gibson 
>  Signed-off-by: Alexey Kardashevskiy 
>  ---
> drivers/vfio/Kconfig|6 +
> drivers/vfio/Makefile   |1 +
> drivers/vfio/vfio_iommu_spapr_tce.c |  247 
>  +++
> include/linux/vfio.h|   20 +++
> 4 files changed, 274 insertions(+)
> create mode 100644 drivers/vfio/vfio_iommu_spapr_tce.c
> 
>  diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
>  index 7cd5dec..b464687 100644
>  --- a/drivers/vfio/Kconfig
>  +++ b/drivers/vfio/Kconfig
>  @@ -3,10 +3,16 @@ config VFIO_IOMMU_TYPE1
>   depends on VFIO
>   default n
> 
>  +config VFIO_IOMMU_SPAPR_TCE
>  +tristate
>  +depends on VFIO && SPAPR_TCE_IOMMU
>  +default n
>  +
> menuconfig VFIO
>   tristate "VFIO Non-Privileged userspace driver framework"
>   depends on IOMMU_API
>   select VFIO_IOMMU_TYPE1 if X86
>  +select VFIO_IOMMU_SPAPR_TCE if PPC_POWERNV
>   help
> VFIO provides a framework for secure userspace device drivers.
> See Documentation/vfio.txt for more details.
>  diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
>  index 2398d4a..72bfabc 100644
>  --- a/drivers/vfio/Makefile
>  +++ b/drivers/vfio/Makefile
>  @@ -1,3 +1,4 @@
> obj-$(CONFIG_VFIO) += vfio.o
> obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
>  +obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
> obj-$(CONFIG_VFIO_PCI) += pci/
>  diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
>  b/drivers/vfio/vfio_iommu_spapr_tce.c
>  new file mode 100644
>  index 000..46a6298
>  --- /dev/null
>  +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
>  @@ -0,0 +1,247 @@
>  +/*
>  + * VFIO: IOMMU DMA mapping support for TCE on POWER
>  + *
>  + * Copyright (C) 2012 IBM Corp.  All rights reserved.
>  + * Author: Alexey Kardashevskiy 
>  + *
>  + * This program is free software; you can redistribute it and/or modify
>  + * it under the terms of the GNU General Public License version 2 as
>  + * published by the Free Software Foundation.
>  + *
>  + * Derived from original vfio_iommu_type1.c:
>  + * Copyright (C) 2012 Red Hat, Inc.  All rights reserved.
>  + * Author: Alex Williamson 
>  + */
>  +
>  +#include 
>  +#include 
>  +#include 
>  +#include 
>  +#include 
>  +#include 
>  +#include 
>  +
>  +#define DRIVER_VERSION  "0.1"
>  +#define DRIVER_AUTHOR   "a...@ozlabs.ru"
>  +#define DRIVER_DESC "VFIO IOMMU SPAPR TCE"
>  +
>  +static void tce_iommu_detach_group(void *iommu_data,
>  +struct iommu_group *iommu_group);
>  +
>  +/*
>  + * VFIO IOMMU fd for SPAPR_TCE IOMMU implementation
>  + */
>  +
>  +/*
>  + * The container descriptor supports only a single group per container.
>  + * Required by the API as the container is not supplied with the IOMMU 
>  group
>  + * at the moment of initialization.
>  + */
>  +struct tce_container {
>  +struct mutex lock;
>  +struct iommu_table *tbl;
>  +};
>  +
>  +static void *tce_iommu_open(unsigned long arg)
>  +{
>  +struct tce_container *container;
>  +
>  +if (arg != VFIO_SPAPR_TCE_IOMMU) {
>  +printk(KERN_ERR "tce_vfio: Wrong IOMMU type\n");
>  +return ERR_PTR(-EINVAL);
>  +}
>  +
>  +container = kzalloc(sizeof(*container), GFP_KERNEL);
>  +if (!container)
>  +return ERR_PTR(-ENOMEM);
>  +
>  +mutex_init(&container->lock);
>  +
>  +return container;
>  +}
>  +
>  +static void tce_iommu_release(void *iommu_data)
>  +{
>  +struct tce_container *container = iommu_

Re: [PATCH 1/2] vfio powerpc: implemented IOMMU driver for VFIO

2012-11-26 Thread Alexey Kardashevskiy

On 27/11/12 15:29, Alex Williamson wrote:

On Tue, 2012-11-27 at 15:06 +1100, Alexey Kardashevskiy wrote:

On 27/11/12 05:20, Alex Williamson wrote:

On Fri, 2012-11-23 at 20:03 +1100, Alexey Kardashevskiy wrote:

VFIO implements platform independent stuff such as
a PCI driver, BAR access (via read/write on a file descriptor
or direct mapping when possible) and IRQ signaling.

The platform dependent part includes IOMMU initialization
and handling. This patch implements an IOMMU driver for VFIO
which does mapping/unmapping pages for the guest IO and
provides information about DMA window (required by a POWERPC
guest).

The counterpart in QEMU is required to support this functionality.

Cc: David Gibson 
Signed-off-by: Alexey Kardashevskiy 
---
   drivers/vfio/Kconfig|6 +
   drivers/vfio/Makefile   |1 +
   drivers/vfio/vfio_iommu_spapr_tce.c |  247 
+++
   include/linux/vfio.h|   20 +++
   4 files changed, 274 insertions(+)
   create mode 100644 drivers/vfio/vfio_iommu_spapr_tce.c

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index 7cd5dec..b464687 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -3,10 +3,16 @@ config VFIO_IOMMU_TYPE1
depends on VFIO
default n

+config VFIO_IOMMU_SPAPR_TCE
+   tristate
+   depends on VFIO && SPAPR_TCE_IOMMU
+   default n
+
   menuconfig VFIO
tristate "VFIO Non-Privileged userspace driver framework"
depends on IOMMU_API
select VFIO_IOMMU_TYPE1 if X86
+   select VFIO_IOMMU_SPAPR_TCE if PPC_POWERNV
help
  VFIO provides a framework for secure userspace device drivers.
  See Documentation/vfio.txt for more details.
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index 2398d4a..72bfabc 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -1,3 +1,4 @@
   obj-$(CONFIG_VFIO) += vfio.o
   obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
+obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
   obj-$(CONFIG_VFIO_PCI) += pci/
diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
b/drivers/vfio/vfio_iommu_spapr_tce.c
new file mode 100644
index 000..46a6298
--- /dev/null
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -0,0 +1,247 @@
+/*
+ * VFIO: IOMMU DMA mapping support for TCE on POWER
+ *
+ * Copyright (C) 2012 IBM Corp.  All rights reserved.
+ * Author: Alexey Kardashevskiy 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Derived from original vfio_iommu_type1.c:
+ * Copyright (C) 2012 Red Hat, Inc.  All rights reserved.
+ * Author: Alex Williamson 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define DRIVER_VERSION  "0.1"
+#define DRIVER_AUTHOR   "a...@ozlabs.ru"
+#define DRIVER_DESC "VFIO IOMMU SPAPR TCE"
+
+static void tce_iommu_detach_group(void *iommu_data,
+   struct iommu_group *iommu_group);
+
+/*
+ * VFIO IOMMU fd for SPAPR_TCE IOMMU implementation
+ */
+
+/*
+ * The container descriptor supports only a single group per container.
+ * Required by the API as the container is not supplied with the IOMMU group
+ * at the moment of initialization.
+ */
+struct tce_container {
+   struct mutex lock;
+   struct iommu_table *tbl;
+};
+
+static void *tce_iommu_open(unsigned long arg)
+{
+   struct tce_container *container;
+
+   if (arg != VFIO_SPAPR_TCE_IOMMU) {
+   printk(KERN_ERR "tce_vfio: Wrong IOMMU type\n");
+   return ERR_PTR(-EINVAL);
+   }
+
+   container = kzalloc(sizeof(*container), GFP_KERNEL);
+   if (!container)
+   return ERR_PTR(-ENOMEM);
+
+   mutex_init(&container->lock);
+
+   return container;
+}
+
+static void tce_iommu_release(void *iommu_data)
+{
+   struct tce_container *container = iommu_data;
+
+   WARN_ON(container->tbl && !container->tbl->it_group);


I think your patch ordering is backwards here.  it_group isn't added
until 2/2.  I'd really like to see the arch/powerpc code approved and
merged by the powerpc maintainer before we add the code that makes use
of it into vfio.  Otherwise we just get lots of churn if interfaces
change or they disapprove of it altogether.



Makes sense, thanks.



+   if (container->tbl && container->tbl->it_group)
+   tce_iommu_detach_group(iommu_data, container->tbl->it_group);
+
+   mutex_destroy(&container->lock);
+
+   kfree(container);
+}
+
+static long tce_iommu_ioctl(void *iommu_data,
+unsigned int cmd, unsigned long arg)
+{
+   struct tce_container *container = iommu_data;
+   unsigned long minsz;
+
+   switch (cmd) {
+   case VFIO_CHECK_EXTENSION: {
+   return (arg == VFIO_SPAPR_TCE_IOMMU) ? 1 : 0;
+   }
+   cas

Re: [PATCH 2/2] vfio powerpc: enabled on powernv platform

2012-11-26 Thread Alex Williamson
On Fri, 2012-11-23 at 20:03 +1100, Alexey Kardashevskiy wrote:
> This patch initializes IOMMU groups based on the IOMMU
> configuration discovered during the PCI scan on POWERNV
> (POWER non virtualized) platform. The IOMMU groups are
> to be used later by VFIO driver (PCI pass through).
> 
> It also implements an API for mapping/unmapping pages for
> guest PCI drivers and providing DMA window properties.
> This API is going to be used later by QEMU-VFIO to handle
> h_put_tce hypercalls from the KVM guest.
> 
> Although this driver has been tested only on the POWERNV
> platform, it should work on any platform which supports
> TCE tables.
> 
> To enable VFIO on POWER, enable SPAPR_TCE_IOMMU config
> option and configure VFIO as required.
> 
> Cc: David Gibson 
> Signed-off-by: Alexey Kardashevskiy 
> ---
>  arch/powerpc/include/asm/iommu.h |6 ++
>  arch/powerpc/kernel/iommu.c  |  141 
> ++
>  arch/powerpc/platforms/powernv/pci.c |  135 
>  drivers/iommu/Kconfig|8 ++
>  4 files changed, 290 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/iommu.h 
> b/arch/powerpc/include/asm/iommu.h
> index cbfe678..5ba66cb 100644
> --- a/arch/powerpc/include/asm/iommu.h
> +++ b/arch/powerpc/include/asm/iommu.h
> @@ -76,6 +76,9 @@ struct iommu_table {
>   struct iommu_pool large_pool;
>   struct iommu_pool pools[IOMMU_NR_POOLS];
>   unsigned long *it_map;   /* A simple allocation bitmap for now */
> +#ifdef CONFIG_IOMMU_API
> + struct iommu_group *it_group;
> +#endif
>  };
>  
>  struct scatterlist;
> @@ -147,5 +150,8 @@ static inline void iommu_restore(void)
>  }
>  #endif
>  
> +extern long iommu_put_tces(struct iommu_table *tbl, unsigned long entry, 
> uint64_t tce,
> + enum dma_data_direction direction, unsigned long pages);
> +
>  #endif /* __KERNEL__ */
>  #endif /* _ASM_IOMMU_H */
> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> index ff5a6ce..c8dad1f 100644
> --- a/arch/powerpc/kernel/iommu.c
> +++ b/arch/powerpc/kernel/iommu.c
> @@ -44,6 +44,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define DBG(...)
>  
> @@ -856,3 +857,143 @@ void iommu_free_coherent(struct iommu_table *tbl, 
> size_t size,
>   free_pages((unsigned long)vaddr, get_order(size));
>   }
>  }
> +
> +#ifdef CONFIG_IOMMU_API
> +/*
> + * SPAPR TCE API
> + */
> +static struct page *free_tce(struct iommu_table *tbl, unsigned long entry)
> +{
> + struct page *page;
> + unsigned long oldtce;
> +
> + oldtce = ppc_md.tce_get(tbl, entry);
> +
> + if (!(oldtce & (TCE_PCI_WRITE | TCE_PCI_READ)))
> + return NULL;
> +
> + page = pfn_to_page(oldtce >> PAGE_SHIFT);
> +
> + WARN_ON(!page);
> + if (page && (oldtce & TCE_PCI_WRITE))
> + SetPageDirty(page);
> + ppc_md.tce_free(tbl, entry, 1);
> +
> + return page;
> +}
> +
> +static int put_tce(struct iommu_table *tbl, unsigned long entry,
> + uint64_t tce, enum dma_data_direction direction)
> +{
> + int ret;
> + struct page *page = NULL;
> + unsigned long kva, offset;
> +
> + /* Map new TCE */
> + offset = (tce & IOMMU_PAGE_MASK) - (tce & PAGE_MASK);
> + ret = get_user_pages_fast(tce & PAGE_MASK, 1,
> + direction != DMA_TO_DEVICE, &page);

We're locking memory here on behalf of the user, but I don't see where
rlimit gets checked to verify the user has privileges to lock the pages.
I know you're locking a much smaller set of memory than x86 does, but
are we just foregoing that added security?

> + if (ret < 1) {
> + printk(KERN_ERR "tce_vfio: get_user_pages_fast failed tce=%llx 
> ioba=%lx ret=%d\n",
> + tce, entry << IOMMU_PAGE_SHIFT, ret);
> + if (!ret)
> + ret = -EFAULT;
> + return ret;
> + }
> +
> + kva = (unsigned long) page_address(page);
> + kva += offset;
> +
> + /* tce_build receives a virtual address */
> + entry += tbl->it_offset; /* Offset into real TCE table */
> + ret = ppc_md.tce_build(tbl, entry, 1, kva, direction, NULL);
> +
> + /* tce_build() only returns non-zero for transient errors */
> + if (unlikely(ret)) {
> + printk(KERN_ERR "tce_vfio: tce_put failed on tce=%llx ioba=%lx 
> kva=%lx ret=%d\n",
> + tce, entry << IOMMU_PAGE_SHIFT, kva, ret);
> + put_page(page);
> + return -EIO;
> + }
> +
> + return 0;
> +}
> +
> +static void tce_flush(struct iommu_table *tbl)
> +{
> + /* Flush/invalidate TLB caches if necessary */
> + if (ppc_md.tce_flush)
> + ppc_md.tce_flush(tbl);
> +
> + /* Make sure updates are seen by hardware */
> + mb();
> +}
> +
> +long iommu_put_tces(struct iommu_table *tbl, unsigned long entry, uint64_t 
> tce,
> + enum dma_data_direction direction,

Re: [PATCH 1/2] vfio powerpc: implemented IOMMU driver for VFIO

2012-11-26 Thread Alex Williamson
On Tue, 2012-11-27 at 15:06 +1100, Alexey Kardashevskiy wrote:
> On 27/11/12 05:20, Alex Williamson wrote:
> > On Fri, 2012-11-23 at 20:03 +1100, Alexey Kardashevskiy wrote:
> >> VFIO implements platform independent stuff such as
> >> a PCI driver, BAR access (via read/write on a file descriptor
> >> or direct mapping when possible) and IRQ signaling.
> >>
> >> The platform dependent part includes IOMMU initialization
> >> and handling. This patch implements an IOMMU driver for VFIO
> >> which does mapping/unmapping pages for the guest IO and
> >> provides information about DMA window (required by a POWERPC
> >> guest).
> >>
> >> The counterpart in QEMU is required to support this functionality.
> >>
> >> Cc: David Gibson 
> >> Signed-off-by: Alexey Kardashevskiy 
> >> ---
> >>   drivers/vfio/Kconfig|6 +
> >>   drivers/vfio/Makefile   |1 +
> >>   drivers/vfio/vfio_iommu_spapr_tce.c |  247 
> >> +++
> >>   include/linux/vfio.h|   20 +++
> >>   4 files changed, 274 insertions(+)
> >>   create mode 100644 drivers/vfio/vfio_iommu_spapr_tce.c
> >>
> >> diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> >> index 7cd5dec..b464687 100644
> >> --- a/drivers/vfio/Kconfig
> >> +++ b/drivers/vfio/Kconfig
> >> @@ -3,10 +3,16 @@ config VFIO_IOMMU_TYPE1
> >>depends on VFIO
> >>default n
> >>
> >> +config VFIO_IOMMU_SPAPR_TCE
> >> +  tristate
> >> +  depends on VFIO && SPAPR_TCE_IOMMU
> >> +  default n
> >> +
> >>   menuconfig VFIO
> >>tristate "VFIO Non-Privileged userspace driver framework"
> >>depends on IOMMU_API
> >>select VFIO_IOMMU_TYPE1 if X86
> >> +  select VFIO_IOMMU_SPAPR_TCE if PPC_POWERNV
> >>help
> >>  VFIO provides a framework for secure userspace device drivers.
> >>  See Documentation/vfio.txt for more details.
> >> diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
> >> index 2398d4a..72bfabc 100644
> >> --- a/drivers/vfio/Makefile
> >> +++ b/drivers/vfio/Makefile
> >> @@ -1,3 +1,4 @@
> >>   obj-$(CONFIG_VFIO) += vfio.o
> >>   obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
> >> +obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
> >>   obj-$(CONFIG_VFIO_PCI) += pci/
> >> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
> >> b/drivers/vfio/vfio_iommu_spapr_tce.c
> >> new file mode 100644
> >> index 000..46a6298
> >> --- /dev/null
> >> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> >> @@ -0,0 +1,247 @@
> >> +/*
> >> + * VFIO: IOMMU DMA mapping support for TCE on POWER
> >> + *
> >> + * Copyright (C) 2012 IBM Corp.  All rights reserved.
> >> + * Author: Alexey Kardashevskiy 
> >> + *
> >> + * This program is free software; you can redistribute it and/or modify
> >> + * it under the terms of the GNU General Public License version 2 as
> >> + * published by the Free Software Foundation.
> >> + *
> >> + * Derived from original vfio_iommu_type1.c:
> >> + * Copyright (C) 2012 Red Hat, Inc.  All rights reserved.
> >> + * Author: Alex Williamson 
> >> + */
> >> +
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +
> >> +#define DRIVER_VERSION  "0.1"
> >> +#define DRIVER_AUTHOR   "a...@ozlabs.ru"
> >> +#define DRIVER_DESC "VFIO IOMMU SPAPR TCE"
> >> +
> >> +static void tce_iommu_detach_group(void *iommu_data,
> >> +  struct iommu_group *iommu_group);
> >> +
> >> +/*
> >> + * VFIO IOMMU fd for SPAPR_TCE IOMMU implementation
> >> + */
> >> +
> >> +/*
> >> + * The container descriptor supports only a single group per container.
> >> + * Required by the API as the container is not supplied with the IOMMU 
> >> group
> >> + * at the moment of initialization.
> >> + */
> >> +struct tce_container {
> >> +  struct mutex lock;
> >> +  struct iommu_table *tbl;
> >> +};
> >> +
> >> +static void *tce_iommu_open(unsigned long arg)
> >> +{
> >> +  struct tce_container *container;
> >> +
> >> +  if (arg != VFIO_SPAPR_TCE_IOMMU) {
> >> +  printk(KERN_ERR "tce_vfio: Wrong IOMMU type\n");
> >> +  return ERR_PTR(-EINVAL);
> >> +  }
> >> +
> >> +  container = kzalloc(sizeof(*container), GFP_KERNEL);
> >> +  if (!container)
> >> +  return ERR_PTR(-ENOMEM);
> >> +
> >> +  mutex_init(&container->lock);
> >> +
> >> +  return container;
> >> +}
> >> +
> >> +static void tce_iommu_release(void *iommu_data)
> >> +{
> >> +  struct tce_container *container = iommu_data;
> >> +
> >> +  WARN_ON(container->tbl && !container->tbl->it_group);
> >
> > I think your patch ordering is backwards here.  it_group isn't added
> > until 2/2.  I'd really like to see the arch/powerpc code approved and
> > merged by the powerpc maintainer before we add the code that makes use
> > of it into vfio.  Otherwise we just get lots of churn if interfaces
> > change or they disapprove of it altogether.
> 
> 
> Makes sense, thanks.
> 
> 
> >> +  if (container->tbl && container->tbl->it_group)
> >> +  tce_io

Re: [PATCH] vfio powerpc: enabled and supported on powernv platform

2012-11-26 Thread Alex Williamson
On Tue, 2012-11-27 at 14:28 +1100, Alexey Kardashevskiy wrote:
> On 27/11/12 05:04, Alex Williamson wrote:
> > On Mon, 2012-11-26 at 08:18 -0700, Alex Williamson wrote:
> >> On Fri, 2012-11-23 at 13:02 +1100, Alexey Kardashevskiy wrote:
> >>> On 22/11/12 22:56, Sethi Varun-B16395 wrote:
> 
> 
> > -Original Message-
> > From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-
> > ow...@vger.kernel.org] On Behalf Of Alex Williamson
> > Sent: Tuesday, November 20, 2012 11:50 PM
> > To: Alexey Kardashevskiy
> > Cc: Benjamin Herrenschmidt; Paul Mackerras; linuxppc-
> > d...@lists.ozlabs.org; linux-ker...@vger.kernel.org; 
> > k...@vger.kernel.org;
> > David Gibson
> > Subject: Re: [PATCH] vfio powerpc: enabled and supported on powernv
> > platform
> >
> > On Tue, 2012-11-20 at 11:48 +1100, Alexey Kardashevskiy wrote:
> >> VFIO implements platform independent stuff such as a PCI driver, BAR
> >> access (via read/write on a file descriptor or direct mapping when
> >> possible) and IRQ signaling.
> >> The platform dependent part includes IOMMU initialization and
> >> handling.
> >>
> >> This patch initializes IOMMU groups based on the IOMMU configuration
> >> discovered during the PCI scan, only POWERNV platform is supported at
> >> the moment.
> >>
> >> Also the patch implements an VFIO-IOMMU driver which manages DMA
> >> mapping/unmapping requests coming from the client (now QEMU). It also
> >> returns a DMA window information to let the guest initialize the
> >> device tree for a guest OS properly. Although this driver has been
> >> tested only on POWERNV, it should work on any platform supporting TCE
> >> tables.
> >>
> >> To enable VFIO on POWER, enable SPAPR_TCE_IOMMU config option.
> >>
> >> Cc: David Gibson 
> >> Signed-off-by: Alexey Kardashevskiy 
> >> ---
> >>arch/powerpc/include/asm/iommu.h |6 +
> >>arch/powerpc/kernel/iommu.c  |  140 +++
> >>arch/powerpc/platforms/powernv/pci.c |  135 +++
> >>drivers/iommu/Kconfig|8 ++
> >>drivers/vfio/Kconfig |6 +
> >>drivers/vfio/Makefile|1 +
> >>drivers/vfio/vfio_iommu_spapr_tce.c  |  247
> > ++
> >>include/linux/vfio.h |   20 +++
> >>8 files changed, 563 insertions(+)
> >>create mode 100644 drivers/vfio/vfio_iommu_spapr_tce.c
> >>
> >> diff --git a/arch/powerpc/include/asm/iommu.h
> >> b/arch/powerpc/include/asm/iommu.h
> >> index cbfe678..5ba66cb 100644
> >> --- a/arch/powerpc/include/asm/iommu.h
> >> +++ b/arch/powerpc/include/asm/iommu.h
> >> @@ -64,30 +64,33 @@ struct iommu_pool {  }
> >> cacheline_aligned_in_smp;
> >>
> >>struct iommu_table {
> >>unsigned long  it_busno; /* Bus number this table belongs 
> >> to */
> >>unsigned long  it_size;  /* Size of iommu table in entries 
> >> */
> >>unsigned long  it_offset;/* Offset into global table */
> >>unsigned long  it_base;  /* mapped address of tce table */
> >>unsigned long  it_index; /* which iommu table this is */
> >>unsigned long  it_type;  /* type: PCI or Virtual Bus */
> >>unsigned long  it_blocksize; /* Entries in each block 
> >> (cacheline)
> > */
> >>unsigned long  poolsize;
> >>unsigned long  nr_pools;
> >>struct iommu_pool large_pool;
> >>struct iommu_pool pools[IOMMU_NR_POOLS];
> >>unsigned long *it_map;   /* A simple allocation bitmap for 
> >> now
> > */
> >> +#ifdef CONFIG_IOMMU_API
> >> +  struct iommu_group *it_group;
> >> +#endif
> >>};
> >>
> >>struct scatterlist;
> >>
> >>static inline void set_iommu_table_base(struct device *dev, void
> >> *base)  {
> >>dev->archdata.dma_data.iommu_table_base = base;  }
> >>
> >>static inline void *get_iommu_table_base(struct device *dev)  {
> >>return dev->archdata.dma_data.iommu_table_base;
> >>}
> >>
> >>/* Frees table for an individual device node */ @@ -135,17 +138,20 
> >> @@
> >> static inline void pci_iommu_init(void) { }  extern void
> >> alloc_dart_table(void);  #if defined(CONFIG_PPC64) &&
> >> defined(CONFIG_PM)  static inline void iommu_save(void)  {
> >>if (ppc_md.iommu_save)
> >>ppc_md.iommu_save();
> >>}
> >>
> >>static inline void iommu_restore(void)  {
> >>if (ppc_md.iommu_restore)
> >>ppc_md.iommu_restore();
> >>}
> >>#endif
> >>
> >> +extern long iommu_put_tces(struct iommu

Re: [PATCH 1/2] vfio powerpc: implemented IOMMU driver for VFIO

2012-11-26 Thread Alexey Kardashevskiy

On 27/11/12 05:20, Alex Williamson wrote:

On Fri, 2012-11-23 at 20:03 +1100, Alexey Kardashevskiy wrote:

VFIO implements platform independent stuff such as
a PCI driver, BAR access (via read/write on a file descriptor
or direct mapping when possible) and IRQ signaling.

The platform dependent part includes IOMMU initialization
and handling. This patch implements an IOMMU driver for VFIO
which does mapping/unmapping pages for the guest IO and
provides information about DMA window (required by a POWERPC
guest).

The counterpart in QEMU is required to support this functionality.

Cc: David Gibson 
Signed-off-by: Alexey Kardashevskiy 
---
  drivers/vfio/Kconfig|6 +
  drivers/vfio/Makefile   |1 +
  drivers/vfio/vfio_iommu_spapr_tce.c |  247 +++
  include/linux/vfio.h|   20 +++
  4 files changed, 274 insertions(+)
  create mode 100644 drivers/vfio/vfio_iommu_spapr_tce.c

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index 7cd5dec..b464687 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -3,10 +3,16 @@ config VFIO_IOMMU_TYPE1
depends on VFIO
default n

+config VFIO_IOMMU_SPAPR_TCE
+   tristate
+   depends on VFIO && SPAPR_TCE_IOMMU
+   default n
+
  menuconfig VFIO
tristate "VFIO Non-Privileged userspace driver framework"
depends on IOMMU_API
select VFIO_IOMMU_TYPE1 if X86
+   select VFIO_IOMMU_SPAPR_TCE if PPC_POWERNV
help
  VFIO provides a framework for secure userspace device drivers.
  See Documentation/vfio.txt for more details.
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index 2398d4a..72bfabc 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -1,3 +1,4 @@
  obj-$(CONFIG_VFIO) += vfio.o
  obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
+obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
  obj-$(CONFIG_VFIO_PCI) += pci/
diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
b/drivers/vfio/vfio_iommu_spapr_tce.c
new file mode 100644
index 000..46a6298
--- /dev/null
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -0,0 +1,247 @@
+/*
+ * VFIO: IOMMU DMA mapping support for TCE on POWER
+ *
+ * Copyright (C) 2012 IBM Corp.  All rights reserved.
+ * Author: Alexey Kardashevskiy 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Derived from original vfio_iommu_type1.c:
+ * Copyright (C) 2012 Red Hat, Inc.  All rights reserved.
+ * Author: Alex Williamson 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define DRIVER_VERSION  "0.1"
+#define DRIVER_AUTHOR   "a...@ozlabs.ru"
+#define DRIVER_DESC "VFIO IOMMU SPAPR TCE"
+
+static void tce_iommu_detach_group(void *iommu_data,
+   struct iommu_group *iommu_group);
+
+/*
+ * VFIO IOMMU fd for SPAPR_TCE IOMMU implementation
+ */
+
+/*
+ * The container descriptor supports only a single group per container.
+ * Required by the API as the container is not supplied with the IOMMU group
+ * at the moment of initialization.
+ */
+struct tce_container {
+   struct mutex lock;
+   struct iommu_table *tbl;
+};
+
+static void *tce_iommu_open(unsigned long arg)
+{
+   struct tce_container *container;
+
+   if (arg != VFIO_SPAPR_TCE_IOMMU) {
+   printk(KERN_ERR "tce_vfio: Wrong IOMMU type\n");
+   return ERR_PTR(-EINVAL);
+   }
+
+   container = kzalloc(sizeof(*container), GFP_KERNEL);
+   if (!container)
+   return ERR_PTR(-ENOMEM);
+
+   mutex_init(&container->lock);
+
+   return container;
+}
+
+static void tce_iommu_release(void *iommu_data)
+{
+   struct tce_container *container = iommu_data;
+
+   WARN_ON(container->tbl && !container->tbl->it_group);


I think your patch ordering is backwards here.  it_group isn't added
until 2/2.  I'd really like to see the arch/powerpc code approved and
merged by the powerpc maintainer before we add the code that makes use
of it into vfio.  Otherwise we just get lots of churn if interfaces
change or they disapprove of it altogether.



Makes sense, thanks.



+   if (container->tbl && container->tbl->it_group)
+   tce_iommu_detach_group(iommu_data, container->tbl->it_group);
+
+   mutex_destroy(&container->lock);
+
+   kfree(container);
+}
+
+static long tce_iommu_ioctl(void *iommu_data,
+unsigned int cmd, unsigned long arg)
+{
+   struct tce_container *container = iommu_data;
+   unsigned long minsz;
+
+   switch (cmd) {
+   case VFIO_CHECK_EXTENSION: {
+   return (arg == VFIO_SPAPR_TCE_IOMMU) ? 1 : 0;
+   }
+   case VFIO_IOMMU_SPAPR_TCE_GET_INFO: {
+   struct vfio_iommu_spapr_tce_info info;
+   struct iommu

[PATCH 3/3] powerpc: Build kernel with -mcmodel=medium

2012-11-26 Thread Anton Blanchard

Finally remove the two level TOC and build with -mcmodel=medium.

Unfortunately we can't build modules with -mcmodel=medium due to
the tricks the kernel module loader plays with percpu data:

# -mcmodel=medium breaks modules because it uses 32bit offsets from
# the TOC pointer to create pointers where possible. Pointers into the
# percpu data area are created by this method.
#
# The kernel module loader relocates the percpu data section from the
# original location (starting with 0xd...) to somewhere in the base
# kernel percpu data space (starting with 0xc...). We need a full
# 64bit relocation for this to work, hence -mcmodel=large.

On older kernels we fall back to the two level TOC (-mminimal-toc)

Signed-off-by: Anton Blanchard 
--- 

Index: b/arch/powerpc/Makefile
===
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -67,7 +67,24 @@ LDFLAGS_vmlinux-y := -Bstatic
 LDFLAGS_vmlinux-$(CONFIG_RELOCATABLE) := -pie
 LDFLAGS_vmlinux:= $(LDFLAGS_vmlinux-y)
 
-CFLAGS-$(CONFIG_PPC64) := -mminimal-toc -mtraceback=no -mcall-aixdesc
+ifeq ($(CONFIG_PPC64),y)
+ifeq ($(call cc-option-yn,-mcmodel=medium),y)
+   # -mcmodel=medium breaks modules because it uses 32bit offsets from
+   # the TOC pointer to create pointers where possible. Pointers into the
+   # percpu data area are created by this method.
+   #
+   # The kernel module loader relocates the percpu data section from the
+   # original location (starting with 0xd...) to somewhere in the base
+   # kernel percpu data space (starting with 0xc...). We need a full
+   # 64bit relocation for this to work, hence -mcmodel=large.
+   KBUILD_CFLAGS_MODULE += -mcmodel=large
+else
+   export NO_MINIMAL_TOC := -mno-minimal-toc
+endif
+endif
+
+CFLAGS-$(CONFIG_PPC64) := -mtraceback=no -mcall-aixdesc
+CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mcmodel=medium,-mminimal-toc)
 CFLAGS-$(CONFIG_PPC32) := -ffixed-r2 -mmultiple
 
 CFLAGS-$(CONFIG_GENERIC_CPU) += $(call cc-option,-mtune=power7,-mtune=power4)
Index: b/arch/powerpc/kernel/Makefile
===
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -7,7 +7,7 @@ CFLAGS_ptrace.o += -DUTS_MACHINE='"$(UT
 subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror
 
 ifeq ($(CONFIG_PPC64),y)
-CFLAGS_prom_init.o += -mno-minimal-toc
+CFLAGS_prom_init.o += $(NO_MINIMAL_TOC)
 endif
 ifeq ($(CONFIG_PPC32),y)
 CFLAGS_prom_init.o  += -fPIC
Index: b/arch/powerpc/lib/Makefile
===
--- a/arch/powerpc/lib/Makefile
+++ b/arch/powerpc/lib/Makefile
@@ -4,7 +4,7 @@
 
 subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror
 
-ccflags-$(CONFIG_PPC64):= -mno-minimal-toc
+ccflags-$(CONFIG_PPC64):= $(NO_MINIMAL_TOC)
 
 CFLAGS_REMOVE_code-patching.o = -pg
 CFLAGS_REMOVE_feature-fixups.o = -pg
Index: b/arch/powerpc/mm/Makefile
===
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -4,7 +4,7 @@
 
 subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror
 
-ccflags-$(CONFIG_PPC64):= -mno-minimal-toc
+ccflags-$(CONFIG_PPC64):= $(NO_MINIMAL_TOC)
 
 obj-y  := fault.o mem.o pgtable.o gup.o \
   init_$(CONFIG_WORD_SIZE).o \
Index: b/arch/powerpc/oprofile/Makefile
===
--- a/arch/powerpc/oprofile/Makefile
+++ b/arch/powerpc/oprofile/Makefile
@@ -1,6 +1,6 @@
 subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror
 
-ccflags-$(CONFIG_PPC64):= -mno-minimal-toc
+ccflags-$(CONFIG_PPC64):= $(NO_MINIMAL_TOC)
 
 obj-$(CONFIG_OPROFILE) += oprofile.o
 
Index: b/arch/powerpc/platforms/pseries/Makefile
===
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -1,4 +1,4 @@
-ccflags-$(CONFIG_PPC64):= -mno-minimal-toc
+ccflags-$(CONFIG_PPC64):= $(NO_MINIMAL_TOC)
 ccflags-$(CONFIG_PPC_PSERIES_DEBUG)+= -DDEBUG
 
 obj-y  := lpar.o hvCall.o nvram.o reconfig.o \
Index: b/arch/powerpc/sysdev/Makefile
===
--- a/arch/powerpc/sysdev/Makefile
+++ b/arch/powerpc/sysdev/Makefile
@@ -1,6 +1,6 @@
 subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror
 
-ccflags-$(CONFIG_PPC64):= -mno-minimal-toc
+ccflags-$(CONFIG_PPC64):= $(NO_MINIMAL_TOC)
 
 mpic-msi-obj-$(CONFIG_PCI_MSI) += mpic_msi.o mpic_u3msi.o mpic_pasemi_msi.o
 obj-$(CONFIG_MPIC) += mpic.o $(mpic-msi-obj-y)
Index: b/arch/powerpc/xmon/Makefile
===
--- a/arch/powerpc/xmon/Makefile
+++ b/arch/powerpc/xmon/Makefile
@@

[PATCH 2/3] powerpc: Remove RELOC() macro

2012-11-26 Thread Anton Blanchard

Now we relocate prom_init.c on 64bit we can finally remove the
nasty RELOC() macro.

Finally a patch that I can claim has a net positive effect on
the kernel. It doesn't happen very often.

Signed-off-by: Anton Blanchard 
--- 

Index: b/arch/powerpc/kernel/prom_init.c
===
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -79,7 +79,6 @@
  * On ppc64, 64 bit values are truncated to 32 bits (and
  * fortunately don't get interpreted as two arguments).
  */
-#define RELOC(x)   (x)
 #define ADDR(x)(u32)(unsigned long)(x)
 
 #ifdef CONFIG_PPC64
@@ -94,7 +93,7 @@ int of_workarounds;
 
 #define PROM_BUG() do {\
 prom_printf("kernel BUG at %s line 0x%x!\n",   \
-   RELOC(__FILE__), __LINE__); \
+   __FILE__, __LINE__);\
 __asm__ __volatile__(".long " BUG_ILLEGAL_INSTR);  \
 } while (0)
 
@@ -232,7 +231,7 @@ static int __init call_prom(const char *
for (i = 0; i < nret; i++)
args.args[nargs+i] = 0;
 
-   if (enter_prom(&args, RELOC(prom_entry)) < 0)
+   if (enter_prom(&args, prom_entry) < 0)
return PROM_ERROR;
 
return (nret > 0) ? args.args[nargs] : 0;
@@ -257,7 +256,7 @@ static int __init call_prom_ret(const ch
for (i = 0; i < nret; i++)
args.args[nargs+i] = 0;
 
-   if (enter_prom(&args, RELOC(prom_entry)) < 0)
+   if (enter_prom(&args, prom_entry) < 0)
return PROM_ERROR;
 
if (rets != NULL)
@@ -271,20 +270,19 @@ static int __init call_prom_ret(const ch
 static void __init prom_print(const char *msg)
 {
const char *p, *q;
-   struct prom_t *_prom = &RELOC(prom);
 
-   if (_prom->stdout == 0)
+   if (prom.stdout == 0)
return;
 
for (p = msg; *p != 0; p = q) {
for (q = p; *q != 0 && *q != '\n'; ++q)
;
if (q > p)
-   call_prom("write", 3, 1, _prom->stdout, p, q - p);
+   call_prom("write", 3, 1, prom.stdout, p, q - p);
if (*q == 0)
break;
++q;
-   call_prom("write", 3, 1, _prom->stdout, ADDR("\r\n"), 2);
+   call_prom("write", 3, 1, prom.stdout, ADDR("\r\n"), 2);
}
 }
 
@@ -293,7 +291,6 @@ static void __init prom_print_hex(unsign
 {
int i, nibbles = sizeof(val)*2;
char buf[sizeof(val)*2+1];
-   struct prom_t *_prom = &RELOC(prom);
 
for (i = nibbles-1;  i >= 0;  i--) {
buf[i] = (val & 0xf) + '0';
@@ -302,7 +299,7 @@ static void __init prom_print_hex(unsign
val >>= 4;
}
buf[nibbles] = '\0';
-   call_prom("write", 3, 1, _prom->stdout, buf, nibbles);
+   call_prom("write", 3, 1, prom.stdout, buf, nibbles);
 }
 
 /* max number of decimal digits in an unsigned long */
@@ -311,7 +308,6 @@ static void __init prom_print_dec(unsign
 {
int i, size;
char buf[UL_DIGITS+1];
-   struct prom_t *_prom = &RELOC(prom);
 
for (i = UL_DIGITS-1; i >= 0;  i--) {
buf[i] = (val % 10) + '0';
@@ -321,7 +317,7 @@ static void __init prom_print_dec(unsign
}
/* shift stuff down */
size = UL_DIGITS - i;
-   call_prom("write", 3, 1, _prom->stdout, buf+i, size);
+   call_prom("write", 3, 1, prom.stdout, buf+i, size);
 }
 
 static void __init prom_printf(const char *format, ...)
@@ -330,19 +326,18 @@ static void __init prom_printf(const cha
va_list args;
unsigned long v;
long vs;
-   struct prom_t *_prom = &RELOC(prom);
 
va_start(args, format);
for (p = format; *p != 0; p = q) {
for (q = p; *q != 0 && *q != '\n' && *q != '%'; ++q)
;
if (q > p)
-   call_prom("write", 3, 1, _prom->stdout, p, q - p);
+   call_prom("write", 3, 1, prom.stdout, p, q - p);
if (*q == 0)
break;
if (*q == '\n') {
++q;
-   call_prom("write", 3, 1, _prom->stdout,
+   call_prom("write", 3, 1, prom.stdout,
  ADDR("\r\n"), 2);
continue;
}
@@ -364,7 +359,7 @@ static void __init prom_printf(const cha
++q;
vs = va_arg(args, int);
if (vs < 0) {
-   prom_print(RELOC("-"));
+   prom_print("-");
vs = -vs;
}
prom_print_dec(vs);
@@ -385,7 +380,7 @@ static void __init prom_printf(const cha
   

[PATCH 1/3] powerpc: Relocate prom_init.c on 64bit

2012-11-26 Thread Anton Blanchard

The ppc64 kernel can get loaded at any address which means
our very early init code in prom_init.c must be relocatable. We do
this with a pretty nasty RELOC() macro that we wrap accesses of
variables with. It is very fragile and sometimes we forget to add a
RELOC() to an uncommon path or sometimes a compiler change breaks it.

32bit has a much more elegant solution where we build prom_init.c
with -mrelocatable and then process the relocations manually.
Unfortunately we can't do the equivalent on 64bit and we would
have to build the entire kernel relocatable (-pie), resulting in a
large increase in kernel footprint (megabytes of relocation data).
The relocation data will be marked __initdata but it still creates
more pressure on our already tight memory layout at boot.

Alan Modra pointed out that the 64bit ABI is relocatable even
if we don't build with -pie, we just need to relocate the TOC.
This patch implements that idea and relocates the TOC entries of
prom_init.c. An added bonus is there are very few relocations to
process which helps keep boot times on simulators down.

gcc does not put 64bit integer constants into the TOC but to be
safe we may want a build time script which passes through the
prom_init.c TOC entries to make sure everything looks reasonable.

Signed-off-by: Anton Blanchard 
--- 

To keep the patch small and reviewable, I separated the removal
of the RELOC macro into a follow up patch.

For simplicity I do the relocation in C but if self brain surgery
keeps people up at night we can move it into assembly.

Index: b/arch/powerpc/kernel/prom_init.c
===
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -66,8 +66,8 @@
  * is running at whatever address it has been loaded at.
  * On ppc32 we compile with -mrelocatable, which means that references
  * to extern and static variables get relocated automatically.
- * On ppc64 we have to relocate the references explicitly with
- * RELOC.  (Note that strings count as static variables.)
+ * ppc64 objects are always relocatable, we just need to relocate the
+ * TOC.
  *
  * Because OF may have mapped I/O devices into the area starting at
  * KERNELBASE, particularly on CHRP machines, we can't safely call
@@ -79,13 +79,12 @@
  * On ppc64, 64 bit values are truncated to 32 bits (and
  * fortunately don't get interpreted as two arguments).
  */
+#define RELOC(x)   (x)
+#define ADDR(x)(u32)(unsigned long)(x)
+
 #ifdef CONFIG_PPC64
-#define RELOC(x)(*PTRRELOC(&(x)))
-#define ADDR(x)(u32) add_reloc_offset((unsigned long)(x))
 #define OF_WORKAROUNDS 0
 #else
-#define RELOC(x)   (x)
-#define ADDR(x)(u32) (x)
 #define OF_WORKAROUNDS of_workarounds
 int of_workarounds;
 #endif
@@ -334,9 +333,6 @@ static void __init prom_printf(const cha
struct prom_t *_prom = &RELOC(prom);
 
va_start(args, format);
-#ifdef CONFIG_PPC64
-   format = PTRRELOC(format);
-#endif
for (p = format; *p != 0; p = q) {
for (q = p; *q != 0 && *q != '\n' && *q != '%'; ++q)
;
@@ -437,9 +433,6 @@ static unsigned int __init prom_claim(un
 
 static void __init __attribute__((noreturn)) prom_panic(const char *reason)
 {
-#ifdef CONFIG_PPC64
-   reason = PTRRELOC(reason);
-#endif
prom_print(reason);
/* Do not call exit because it clears the screen on pmac
 * it also causes some sort of double-fault on early pmacs */
@@ -929,7 +922,7 @@ static void __init prom_send_capabilitie
 * (we assume this is the same for all cores) and use it to
 * divide NR_CPUS.
 */
-   cores = (u32 
*)PTRRELOC(&ibm_architecture_vec[IBM_ARCH_VEC_NRCORES_OFFSET]);
+   cores = (u32 
*)&ibm_architecture_vec[IBM_ARCH_VEC_NRCORES_OFFSET];
if (*cores != NR_CPUS) {
prom_printf("WARNING ! "
"ibm_architecture_vec structure 
inconsistent: %lu!\n",
@@ -2850,6 +2843,53 @@ static void __init prom_check_initrd(uns
 #endif /* CONFIG_BLK_DEV_INITRD */
 }
 
+#ifdef CONFIG_PPC64
+#ifdef CONFIG_RELOCATABLE
+static void reloc_toc(void)
+{
+}
+
+static void unreloc_toc(void)
+{
+}
+#else
+static void __reloc_toc(void *tocstart, unsigned long offset,
+   unsigned long nr_entries)
+{
+   unsigned long i;
+   unsigned long *toc_entry = (unsigned long *)tocstart;
+
+   for (i = 0; i < nr_entries; i++) {
+   *toc_entry = *toc_entry + offset;
+   toc_entry++;
+   }
+}
+
+static void reloc_toc(void)
+{
+   unsigned long offset = reloc_offset();
+   unsigned long nr_entries =
+   (__prom_init_toc_end - __prom_init_toc_start) / sizeof(long);
+
+   /* Need to add offset to get at __prom_init_toc_start */
+   __reloc_toc(__prom_init_toc_start + offset, offset, nr_entrie

Re: [PATCH] vfio powerpc: enabled and supported on powernv platform

2012-11-26 Thread Alexey Kardashevskiy

On 27/11/12 05:04, Alex Williamson wrote:

On Mon, 2012-11-26 at 08:18 -0700, Alex Williamson wrote:

On Fri, 2012-11-23 at 13:02 +1100, Alexey Kardashevskiy wrote:

On 22/11/12 22:56, Sethi Varun-B16395 wrote:




-Original Message-
From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-
ow...@vger.kernel.org] On Behalf Of Alex Williamson
Sent: Tuesday, November 20, 2012 11:50 PM
To: Alexey Kardashevskiy
Cc: Benjamin Herrenschmidt; Paul Mackerras; linuxppc-
d...@lists.ozlabs.org; linux-ker...@vger.kernel.org; k...@vger.kernel.org;
David Gibson
Subject: Re: [PATCH] vfio powerpc: enabled and supported on powernv
platform

On Tue, 2012-11-20 at 11:48 +1100, Alexey Kardashevskiy wrote:

VFIO implements platform independent stuff such as a PCI driver, BAR
access (via read/write on a file descriptor or direct mapping when
possible) and IRQ signaling.
The platform dependent part includes IOMMU initialization and
handling.

This patch initializes IOMMU groups based on the IOMMU configuration
discovered during the PCI scan, only POWERNV platform is supported at
the moment.

Also the patch implements an VFIO-IOMMU driver which manages DMA
mapping/unmapping requests coming from the client (now QEMU). It also
returns a DMA window information to let the guest initialize the
device tree for a guest OS properly. Although this driver has been
tested only on POWERNV, it should work on any platform supporting TCE
tables.

To enable VFIO on POWER, enable SPAPR_TCE_IOMMU config option.

Cc: David Gibson 
Signed-off-by: Alexey Kardashevskiy 
---
   arch/powerpc/include/asm/iommu.h |6 +
   arch/powerpc/kernel/iommu.c  |  140 +++
   arch/powerpc/platforms/powernv/pci.c |  135 +++
   drivers/iommu/Kconfig|8 ++
   drivers/vfio/Kconfig |6 +
   drivers/vfio/Makefile|1 +
   drivers/vfio/vfio_iommu_spapr_tce.c  |  247

++

   include/linux/vfio.h |   20 +++
   8 files changed, 563 insertions(+)
   create mode 100644 drivers/vfio/vfio_iommu_spapr_tce.c

diff --git a/arch/powerpc/include/asm/iommu.h
b/arch/powerpc/include/asm/iommu.h
index cbfe678..5ba66cb 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -64,30 +64,33 @@ struct iommu_pool {  }
cacheline_aligned_in_smp;

   struct iommu_table {
unsigned long  it_busno; /* Bus number this table belongs to */
unsigned long  it_size;  /* Size of iommu table in entries */
unsigned long  it_offset;/* Offset into global table */
unsigned long  it_base;  /* mapped address of tce table */
unsigned long  it_index; /* which iommu table this is */
unsigned long  it_type;  /* type: PCI or Virtual Bus */
unsigned long  it_blocksize; /* Entries in each block (cacheline)

*/

unsigned long  poolsize;
unsigned long  nr_pools;
struct iommu_pool large_pool;
struct iommu_pool pools[IOMMU_NR_POOLS];
unsigned long *it_map;   /* A simple allocation bitmap for now

*/

+#ifdef CONFIG_IOMMU_API
+   struct iommu_group *it_group;
+#endif
   };

   struct scatterlist;

   static inline void set_iommu_table_base(struct device *dev, void
*base)  {
dev->archdata.dma_data.iommu_table_base = base;  }

   static inline void *get_iommu_table_base(struct device *dev)  {
return dev->archdata.dma_data.iommu_table_base;
   }

   /* Frees table for an individual device node */ @@ -135,17 +138,20 @@
static inline void pci_iommu_init(void) { }  extern void
alloc_dart_table(void);  #if defined(CONFIG_PPC64) &&
defined(CONFIG_PM)  static inline void iommu_save(void)  {
if (ppc_md.iommu_save)
ppc_md.iommu_save();
   }

   static inline void iommu_restore(void)  {
if (ppc_md.iommu_restore)
ppc_md.iommu_restore();
   }
   #endif

+extern long iommu_put_tces(struct iommu_table *tbl, unsigned long

entry, uint64_t tce,

+   enum dma_data_direction direction, unsigned long pages);
+
   #endif /* __KERNEL__ */
   #endif /* _ASM_IOMMU_H */
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index ff5a6ce..94f614b 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -32,30 +32,31 @@
   #include 
   #include 
   #include 
   #include 
   #include 
   #include 
   #include 
   #include 
   #include 
   #include 
   #include 
   #include 
   #include 
   #include 
   #include 
+#include 

   #define DBG(...)

   static int novmerge;

   static void __iommu_free(struct iommu_table *, dma_addr_t, unsigned
int);

   static int __init setup_iommu(char *str)  {
if (!strcmp(str, "novmerge"))
novmerge = 1;
else if (!strcmp(str, "vmerge"))
novmerge = 0;
return 1;
   }
@@ -844,15 +845,154 @@ void *iommu_alloc_coherent(struct device *dev,
st

[PATCH 16/16] powerpc: Documentation for transactional memory on powerpc

2012-11-26 Thread Michael Neuling
Signed-off-by: Matt Evans 
Signed-off-by: Michael Neuling 
---
 Documentation/powerpc/transactional_memory.txt |  119 
 1 file changed, 119 insertions(+)
 create mode 100644 Documentation/powerpc/transactional_memory.txt

diff --git a/Documentation/powerpc/transactional_memory.txt 
b/Documentation/powerpc/transactional_memory.txt
new file mode 100644
index 000..881df6b
--- /dev/null
+++ b/Documentation/powerpc/transactional_memory.txt
@@ -0,0 +1,119 @@
+Transactional Memory support
+
+
+POWER kernel support for this feature is currently limited to supporting
+its use by user programs.  It is not currently used by the kernel itself.
+
+This file aims to sum up how it is supported by Linux and what behaviour you
+can expect from your user programs.
+
+
+Basic overview
+==
+
+Hardware Transactional Memory is supported on POWER8 processors, and is a
+feature that enables a different form of atomic memory access.  Several new
+instructions are presented to delimit transactions; transactions are
+guaranteed to either complete atomically or roll back and undo any partial
+changes.
+
+A simple transaction looks like this:
+
+begin_move_money:
+  tbegin
+  beq   abort_handler
+
+  ldr4, SAVINGS_ACCT(r3)
+  ldr5, CURRENT_ACCT(r3)
+  subi  r5, r5, 1
+  addi  r4, r4, 1
+  std   r4, SAVINGS_ACCT(r3)
+  std   r5, CURRENT_ACCT(r3)
+
+  tend
+
+  b continue
+
+abort_handler:
+  ... test for odd failures ...
+
+  /* Retry the transaction if it failed because it conflicted with
+   * someone else: */
+  b begin_move_money
+
+
+The 'tbegin' instruction denotes the start point, and 'tend' the end point.
+Between these points the processor is in 'Transactional' state; any memory
+references will complete in one go if there are no conflicts with other
+transactional or non-transactional accesses within the system.  In this
+example, the transaction completes as though it were normal straight-line code
+IF no other processor has touched SAVINGS_ACCT(r3) or CURRENT_ACCT(r3); an
+atomic move of money from the current account to the savings account has been
+performed.  Even though the normal ld/std instructions are used (note no
+lwarx/stwcx), either *both* SAVINGS_ACCT(r3) and CURRENT_ACCT(r3) will be
+updated, or neither will be updated.
+
+If, in the meantime, there is a conflict with the locations accessed by the
+transaction, the transaction will be aborted by the CPU.  Register and memory
+state will roll back to that at the 'tbegin', and control will continue from
+'tbegin+4'.  The branch to abort_handler will be taken this second time; the
+abort handler can check the cause of the failure, and retry.
+
+Checkpointed registers include all GPRs, FPRs, VRs/VSRs, LR, CCR/CR, CTR, FPCSR
+and a few other status/flag regs; see the ISA for details.
+
+Causes of transaction aborts
+
+
+- Conflicts with cache lines used by other processors
+- Signals
+- Context switches
+- See the ISA for full documentation of everything that will abort 
transactions.
+
+
+Syscalls
+
+
+Performing syscalls from within transaction is not recommended, and can lead
+to unpredictable results.
+
+Syscalls do not by design abort transactions, but beware: The kernel code will
+not be running in transactional state.  The effect of syscalls will always
+remain visible, but depending on the call they may abort your transaction as a
+side-effect, read soon-to-be-aborted transactional data that should not remain
+invisible, etc.  If you constantly retry a transaction that constantly aborts
+itself by calling a syscall, you'll have a livelock & make no progress.
+
+Simple syscalls (e.g. sigprocmask()) "could" be OK.  Even things like write()
+from, say, printf() should be OK as long as the kernel does not access any
+memory that was accessed transactionally.
+
+Consider any syscalls that happen to work as debug-only -- not recommended for
+production use.  Best to queue them up till after the transaction is over.
+
+
+Failure cause codes used by kernel
+==
+
+These are defined in , and distinguish different reasons why the
+kernel aborted a transaction:
+
+ TM_CAUSE_RESCHED   Thread was rescheduled.
+ TM_CAUSE_FAC_UNAV  FP/VEC/VSX unavailable trap.
+ TM_CAUSE_SYSCALL   Currently unused; future syscalls that must abort
+transactions for consistency will use this.
+ TM_CAUSE_SIGNALSignal delivered.
+ TM_CAUSE_MISC  Currently unused.
+
+These can be checked by the user program's abort handler as TEXASR[0:7].
+
+
+GDB
+===
+
+GDB and ptrace are not currently TM-aware.  If one stops during a transaction,
+it looks like the transaction has just started (the checkpointed state is
+presented).  The transaction cannot then be continued and will take the failure
+handler route.  Furthermore, the transactional 2nd register state will be
+inaccessible.  GDB can currently be

[PATCH 15/16] powerpc: Add transactional memory to pseries and ppc64 defconfigs

2012-11-26 Thread Michael Neuling
Signed-off-by: Matt Evans 
Signed-off-by: Michael Neuling 
---
 arch/powerpc/configs/ppc64_defconfig   |1 +
 arch/powerpc/configs/pseries_defconfig |1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/powerpc/configs/ppc64_defconfig 
b/arch/powerpc/configs/ppc64_defconfig
index 6d03530..26000f6 100644
--- a/arch/powerpc/configs/ppc64_defconfig
+++ b/arch/powerpc/configs/ppc64_defconfig
@@ -46,6 +46,7 @@ CONFIG_NO_HZ=y
 CONFIG_HIGH_RES_TIMERS=y
 CONFIG_HZ_100=y
 CONFIG_BINFMT_MISC=m
+CONFIG_TRANSACTIONAL_MEM=y
 CONFIG_HOTPLUG_CPU=y
 CONFIG_KEXEC=y
 CONFIG_IRQ_ALL_CPUS=y
diff --git a/arch/powerpc/configs/pseries_defconfig 
b/arch/powerpc/configs/pseries_defconfig
index 5b8e1e5..b5f94b7 100644
--- a/arch/powerpc/configs/pseries_defconfig
+++ b/arch/powerpc/configs/pseries_defconfig
@@ -40,6 +40,7 @@ CONFIG_NO_HZ=y
 CONFIG_HIGH_RES_TIMERS=y
 CONFIG_HZ_100=y
 CONFIG_BINFMT_MISC=m
+CONFIG_TRANSACTIONAL_MEM=y
 CONFIG_HOTPLUG_CPU=y
 CONFIG_KEXEC=y
 CONFIG_IRQ_ALL_CPUS=y
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 14/16] powerpc: Add config option for transactional memory

2012-11-26 Thread Michael Neuling
Kconfig option for transactional memory on powerpc.

Signed-off-by: Matt Evans 
Signed-off-by: Michael Neuling 
---
 arch/powerpc/Kconfig |8 
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index a902a5c..ece67ca 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -308,6 +308,14 @@ config MATH_EMULATION
  unit, which will allow programs that use floating-point
  instructions to run.
 
+config TRANSACTIONAL_MEM
+   bool "Transactional Memory support"
+   depends on PPC64
+   depends on SMP
+   default n
+   ---help---
+ Support user-mode Transactional Memory.
+
 config 8XX_MINIMAL_FPEMU
bool "Minimal math emulation for 8xx"
depends on 8xx && !MATH_EMULATION
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 13/16] powerpc: Add transactional memory to POWER8 cpu features

2012-11-26 Thread Michael Neuling
Signed-off-by: Matt Evans 
Signed-off-by: Michael Neuling 
---
 arch/powerpc/include/asm/cputable.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/cputable.h 
b/arch/powerpc/include/asm/cputable.h
index f2163da..74458e69 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -417,7 +417,7 @@ extern const char *powerpc_base_platform;
CPU_FTR_DSCR | CPU_FTR_SAO  | \
CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
CPU_FTR_ICSWX | CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
-   CPU_FTR_DBELL)
+   CPU_FTR_DBELL | CPU_FTR_TM_COMP)
 #define CPU_FTRS_CELL  (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 12/16] powerpc: Hook in new transactional memory code

2012-11-26 Thread Michael Neuling
This hooks the new transactional memory code into context switching, FP/VMX/VMX
unavailable and exception return.

Signed-off-by: Matt Evans 
Signed-off-by: Michael Neuling 
---
 arch/powerpc/kernel/entry_64.S   |   22 
 arch/powerpc/kernel/exceptions-64s.S |   48 --
 arch/powerpc/kernel/fpu.S|1 -
 arch/powerpc/kernel/process.c|   15 +--
 arch/powerpc/kernel/traps.c  |   32 +++
 arch/powerpc/kernel/vector.S |1 -
 6 files changed, 113 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 5ae8e51..b3590c3 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -602,7 +602,29 @@ _GLOBAL(ret_from_except_lite)
beq 1f
bl  .restore_interrupts
bl  .schedule
+#ifdef CONFIG_TRANSACTIONAL_MEM
+BEGIN_FTR_SECTION
+   /* If TIF_RESTOREALL was set by switch_to, we MUST clear it before
+* any return to userspace -- no one else is going to.  TRAP.0 has been
+* cleared to flag full regs to ret_from_except.
+* Try to avoid the slow atomic clear if the flag isn't set.
+* (This is OK as no one else will be clearing this flag.)
+ */
+   clrrdi  r9,r1,THREAD_SHIFT  /* current_thread_info() */
+   li  r4,_TIF_RESTOREALL
+   addir9, r9, TI_FLAGS
+   ld  r3, 0(r9)   /* Test TIF_RESTOREALL first! */
+   and.r0, r3, r4
+   beq .ret_from_except
+3: ldarx   r10, 0, r9  /* If set, clear. */
+   andcr10, r10, r4
+   stdcx.  r10, 0, r9
+   bne 3b
+   b   .ret_from_except/* Not _lite; we may have full regs! */
+END_FTR_SECTION_IFSET(CPU_FTR_TM)
+#else
b   .ret_from_except_lite
+#endif
 
 1: bl  .save_nvgprs
bl  .restore_interrupts
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index d97cea4..220b896 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1144,9 +1144,24 @@ fp_unavailable_common:
addir3,r1,STACK_FRAME_OVERHEAD
bl  .kernel_fp_unavailable_exception
BUG_OPCODE
-1: bl  .load_up_fpu
+1:
+#ifdef CONFIG_TRANSACTIONAL_MEM
+BEGIN_FTR_SECTION
+   srdir0, r12, MSR_TS_LG
+   andi.   r0, r0, 3
+   bne-2f
+END_FTR_SECTION_IFSET(CPU_FTR_TM)
+#endif
+   bl  .load_up_fpu
+   std r12,_MSR(r1)
b   fast_exception_return
-
+#ifdef CONFIG_TRANSACTIONAL_MEM
+2: /* User process was in a transaction */
+   bl  .save_nvgprs
+   addir3,r1,STACK_FRAME_OVERHEAD
+   bl  .fp_unavailable_tm
+   b   .ret_from_except
+#endif
.align  7
.globl altivec_unavailable_common
 altivec_unavailable_common:
@@ -1154,8 +1169,23 @@ altivec_unavailable_common:
 #ifdef CONFIG_ALTIVEC
 BEGIN_FTR_SECTION
beq 1f
+#ifdef CONFIG_TRANSACTIONAL_MEM
+  BEGIN_FTR_SECTION_NESTED(69)
+   srdir0, r12, MSR_TS_LG
+   andi.   r0, r0, 3
+   bne-2f
+  END_FTR_SECTION_NESTED(CPU_FTR_TM, CPU_FTR_TM, 69)
+#endif
bl  .load_up_altivec
+   std r12,_MSR(r1)
b   fast_exception_return
+#ifdef CONFIG_TRANSACTIONAL_MEM
+2: /* User process was in a transaction */
+   bl  .save_nvgprs
+   addir3,r1,STACK_FRAME_OVERHEAD
+   bl  .altivec_unavailable_tm
+   b   .ret_from_except
+#endif
 1:
 END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
 #endif
@@ -1172,7 +1202,21 @@ vsx_unavailable_common:
 #ifdef CONFIG_VSX
 BEGIN_FTR_SECTION
beq 1f
+#ifdef CONFIG_TRANSACTIONAL_MEM
+  BEGIN_FTR_SECTION_NESTED(69)
+   srdir0, r12, MSR_TS_LG
+   andi.   r0, r0, 3
+   bne-2f
+  END_FTR_SECTION_NESTED(CPU_FTR_TM, CPU_FTR_TM, 69)
+#endif
b   .load_up_vsx
+#ifdef CONFIG_TRANSACTIONAL_MEM
+2: /* User process was in a transaction */
+   bl  .save_nvgprs
+   addir3,r1,STACK_FRAME_OVERHEAD
+   bl  .vsx_unavailable_tm
+   b   .ret_from_except
+#endif
 1:
 END_FTR_SECTION_IFSET(CPU_FTR_VSX)
 #endif
diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S
index 6ab0e87..08b6a12f 100644
--- a/arch/powerpc/kernel/fpu.S
+++ b/arch/powerpc/kernel/fpu.S
@@ -170,7 +170,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
lwz r4,THREAD_FPEXC_MODE(r5)
ori r12,r12,MSR_FP
or  r12,r12,r4
-   std r12,_MSR(r1)
 #endif
lfd fr0,THREAD_FPSCR(r5)
MTFSF_L(fr0)
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 1bf2c6c7..a0bfd97 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -455,7 +455,7 @@ out_and_saveregs:
tm_save_sprs(thr);
 }
 
-static inline void __maybe_unused tm_recheckpoint_new_task(struct task

[PATCH 11/16] powerpc: Assembler routines for FP/VSX/VMX unavailable during a transaction

2012-11-26 Thread Michael Neuling
We do lazy FP but not lazy TM (ie. userspace starts with MSR TM=1 FP=0).  Hence
if userspace does an FP instruction during a transaction, we'll take an
fp unavailable exception.

This adds functions needed to handle this case.  We have to inject the current
FP state into the checkpoint so that the hardware can decide what to do with
the transaction.  We can't inject only the FP so we have to do a full treclaim
and recheckpoint to inject just the FP state.  This will cause the transaction
to be marked as aborted by the hardware.

This just add the routines needed to do this for FP, VMX and VSX.  It doesn't
hook them into the rest of the code yet.

Signed-off-by: Matt Evans 
Signed-off-by: Michael Neuling 
---
 arch/powerpc/kernel/traps.c |   95 +++
 1 file changed, 95 insertions(+)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 4acd98d..7b9f160 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -58,6 +58,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #if defined(CONFIG_DEBUGGER) || defined(CONFIG_KEXEC)
@@ -1192,6 +1193,100 @@ void tm_unavailable_exception(struct pt_regs *regs)
die("Unexpected TM unavailable exception", regs, SIGABRT);
 }
 
+#ifdef CONFIG_TRANSACTIONAL_MEM
+
+extern void do_load_up_fpu(struct pt_regs *regs);
+
+void fp_unavailable_tm(struct pt_regs *regs)
+{
+   /* Note:  This does not handle any kind of FP laziness. */
+
+   /* We restore the interrupt state now */
+   if (!arch_irq_disabled_regs(regs))
+   local_irq_enable();
+
+   TM_DEBUG("FP Unavailable trap whilst transactional at 0x%lx, MSR=%lx\n",
+regs->nip, regs->msr);
+   tm_enable();
+
+/* We can only have got here if the task started using FP after
+ * beginning the transaction.  So, the transactional regs are just a
+ * copy of the checkpointed ones.  But, we still need to recheckpoint
+ * as we're enabling FP for the process; it will return, abort the
+ * transaction, and probably retry but now with FP enabled.  So the
+ * checkpointed FP registers need to be loaded.
+*/
+   tm_reclaim(¤t->thread, current->thread.regs->msr,
+  TM_CAUSE_FAC_UNAV);
+   /* Reclaim didn't save out any FPRs to transact_fprs. */
+
+   /* Enable FP for the task: */
+   regs->msr |= (MSR_FP | current->thread.fpexc_mode);
+
+   /* This loads and recheckpoints the FP registers from
+* thread.fpr[].  They will remain in registers after the
+* checkpoint so we don't need to reload them after.
+*/
+   tm_recheckpoint(¤t->thread, regs->msr);
+}
+
+#ifdef CONFIG_ALTIVEC
+extern void do_load_up_altivec(struct pt_regs *regs);
+
+void altivec_unavailable_tm(struct pt_regs *regs)
+{
+   /* See the comments in fp_unavailable_tm().  This function operates
+* the same way.
+*/
+
+   /* We restore the interrupt state now */
+   if (!arch_irq_disabled_regs(regs))
+   local_irq_enable();
+
+   TM_DEBUG("Vector Unavailable trap whilst transactional at 0x%lx,"
+"MSR=%lx\n",
+regs->nip, regs->msr);
+   tm_enable();
+   tm_reclaim(¤t->thread, current->thread.regs->msr,
+  TM_CAUSE_FAC_UNAV);
+   regs->msr |= MSR_VEC;
+   tm_recheckpoint(¤t->thread, regs->msr);
+   current->thread.used_vr = 1;
+}
+#endif
+
+#ifdef CONFIG_VSX
+void vsx_unavailable_tm(struct pt_regs *regs)
+{
+   /* See the comments in fp_unavailable_tm().  This works similarly,
+* though we're loading both FP and VEC registers in here.
+*
+* If FP isn't in use, load FP regs.  If VEC isn't in use, load VEC
+* regs.  Either way, set MSR_VSX.
+*/
+
+   /* We restore the interrupt state now */
+   if (!arch_irq_disabled_regs(regs))
+   local_irq_enable();
+
+   TM_DEBUG("VSX Unavailable trap whilst transactional at 0x%lx,"
+"MSR=%lx\n",
+regs->nip, regs->msr);
+
+   tm_enable();
+   /* This reclaims FP and/or VR regs if they're already enabled */
+   tm_reclaim(¤t->thread, current->thread.regs->msr,
+  TM_CAUSE_FAC_UNAV);
+
+   regs->msr |= MSR_VEC | MSR_FP | current->thread.fpexc_mode |
+   MSR_VSX;
+   /* This loads & recheckpoints FP and VRs. */
+   tm_recheckpoint(¤t->thread, regs->msr);
+   current->thread.used_vsr = 1;
+}
+#endif
+#endif /* CONFIG_TRANSACTIONAL_MEM */
+
 void performance_monitor_exception(struct pt_regs *regs)
 {
__get_cpu_var(irq_stat).pmu_irqs++;
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 10/16] powerpc: Add transactional memory unavaliable execption handler

2012-11-26 Thread Michael Neuling
These should never happen since we always turn on MSR TM when in userspace. We
don't do lazy TM.

Hence if we hit this, we barf and kill the task as something's gone horribly
wrong.

Signed-off-by: Matt Evans 
Signed-off-by: Michael Neuling 
---
 arch/powerpc/kernel/exceptions-64s.S |   19 +++
 arch/powerpc/kernel/traps.c  |   21 +
 2 files changed, 40 insertions(+)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 32fc04f..d97cea4 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -313,6 +313,9 @@ vsx_unavailable_pSeries_1:
. = 0xf40
b   vsx_unavailable_pSeries
 
+   . = 0xf60
+   b   tm_unavailable_pSeries
+
 #ifdef CONFIG_CBE_RAS
STD_EXCEPTION_HV(0x1200, 0x1202, cbe_system_error)
KVM_HANDLER_SKIP(PACA_EXGEN, EXC_HV, 0x1202)
@@ -526,6 +529,8 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_206)
KVM_HANDLER_PR(PACA_EXGEN, EXC_STD, 0xf20)
STD_EXCEPTION_PSERIES(., 0xf40, vsx_unavailable)
KVM_HANDLER_PR(PACA_EXGEN, EXC_STD, 0xf40)
+   STD_EXCEPTION_PSERIES(., 0xf60, tm_unavailable)
+   KVM_HANDLER_PR(PACA_EXGEN, EXC_STD, 0xf60)
 
 /*
  * An interrupt came in while soft-disabled. We set paca->irq_happened, then:
@@ -815,6 +820,10 @@ vsx_unavailable_relon_pSeries_1:
. = 0x4f40
b   vsx_unavailable_relon_pSeries
 
+tm_unavailable_relon_pSeries_1:
+   . = 0x4f60
+   b   tm_unavailable_relon_pSeries
+
 #ifdef CONFIG_CBE_RAS
STD_RELON_EXCEPTION_HV(0x5200, 0x1202, cbe_system_error)
 #endif /* CONFIG_CBE_RAS */
@@ -1174,6 +1183,15 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
b   .ret_from_except
 
.align  7
+   .globl tm_unavailable_common
+tm_unavailable_common:
+   EXCEPTION_PROLOG_COMMON(0xf60, PACA_EXGEN)
+   bl  .save_nvgprs
+   addir3,r1,STACK_FRAME_OVERHEAD
+   bl  .tm_unavailable_exception
+   b   .ret_from_except
+
+   .align  7
.globl  __end_handlers
 __end_handlers:
 
@@ -1387,6 +1405,7 @@ _GLOBAL(do_stab_bolted)
STD_RELON_EXCEPTION_PSERIES(., 0xf00, performance_monitor)
STD_RELON_EXCEPTION_PSERIES(., 0xf20, altivec_unavailable)
STD_RELON_EXCEPTION_PSERIES(., 0xf40, vsx_unavailable)
+   STD_RELON_EXCEPTION_PSERIES(., 0xf60, tm_unavailable)
 
 #if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_PPC_POWERNV)
 /*
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 8fed874..4acd98d 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1171,6 +1171,27 @@ void vsx_unavailable_exception(struct pt_regs *regs)
die("Unrecoverable VSX Unavailable Exception", regs, SIGABRT);
 }
 
+void tm_unavailable_exception(struct pt_regs *regs)
+{
+   /* We restore the interrupt state now */
+   if (!arch_irq_disabled_regs(regs))
+   local_irq_enable();
+
+   /* Currently we never expect a TMU exception.  Catch
+* this and kill the process!
+*/
+   printk(KERN_EMERG "Unexpected TM unavailable exception at %lx "
+  "(msr %lx)\n",
+  regs->nip, regs->msr);
+
+   if (user_mode(regs)) {
+   _exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
+   return;
+   }
+
+   die("Unexpected TM unavailable exception", regs, SIGABRT);
+}
+
 void performance_monitor_exception(struct pt_regs *regs)
 {
__get_cpu_var(irq_stat).pmu_irqs++;
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 09/16] powerpc: Add reclaim and recheckpoint functions for context switching transactional memory processes

2012-11-26 Thread Michael Neuling
When we switch out a task, we need to save both the checkpointed and the
speculated state into the thread struct.

Similarly when we are switching in a task we need to load both the checkpointed
and speculated state.  If the task was using FP, we non-lazily reload both the
original and the speculative FP register states.  This is because the kernel
doesn't see if/when a TM rollback occurs, so if we take an FP unavoidable
later, we are unable to determine which set of FP regs need to be restored.

This simply adds these functions.  It doesn't hook them into the existing code
yet.

Signed-off-by: Matt Evans 
Signed-off-by: Michael Neuling 
---
 arch/powerpc/kernel/process.c |  113 +
 1 file changed, 113 insertions(+)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index fd5ce1b..1bf2c6c7 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -50,6 +50,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #ifdef CONFIG_PPC64
 #include 
@@ -407,6 +408,118 @@ int set_dabr(unsigned long dabr, unsigned long dabrx)
 DEFINE_PER_CPU(struct cpu_usage, cpu_usage_array);
 #endif
 
+#ifdef CONFIG_TRANSACTIONAL_MEM
+static inline void tm_reclaim_task(struct task_struct *tsk)
+{
+   /* We have to work out if we're switching from/to a task that's in the
+* middle of a transaction.
+*
+* In switching we need to maintain a 2nd register state as
+* oldtask->thread.ckpt_regs.  We tm_reclaim(oldproc); this saves the
+* checkpointed (tbegin) state in ckpt_regs and saves the transactional
+* (current) FPRs into oldtask->thread.transact_fpr[].
+*
+* We also context switch (save) TFHAR/TEXASR/TFIAR in here.
+*/
+   struct thread_struct *thr = &tsk->thread;
+
+   if (!thr->regs)
+   return;
+
+   if (!MSR_TM_ACTIVE(thr->regs->msr))
+   goto out_and_saveregs;
+
+   /* Stash the original thread MSR, as giveup_fpu et al will
+* modify it.  We hold onto it to see whether the task used
+* FP & vector regs.
+*/
+   thr->tm_orig_msr = thr->regs->msr;
+
+   TM_DEBUG("--- tm_reclaim on pid %d (NIP=%lx, "
+"ccr=%lx, msr=%lx, trap=%lx)\n",
+tsk->pid, thr->regs->nip,
+thr->regs->ccr, thr->regs->msr,
+thr->regs->trap);
+
+   tm_reclaim(thr, thr->regs->msr, TM_CAUSE_RESCHED);
+
+   TM_DEBUG("--- tm_reclaim on pid %d complete\n",
+tsk->pid);
+
+out_and_saveregs:
+   /* Always save the regs here, even if a transaction's not active.
+* This context-switches a thread's TM info SPRs.  We do it here to
+* be consistent with the restore path (in recheckpoint) which
+* cannot happen later in _switch().
+*/
+   tm_save_sprs(thr);
+}
+
+static inline void __maybe_unused tm_recheckpoint_new_task(struct task_struct 
*new)
+{
+   unsigned long msr;
+
+   if (!cpu_has_feature(CPU_FTR_TM))
+   return;
+
+   /* Recheckpoint the registers of the thread we're about to switch to.
+*
+* If the task was using FP, we non-lazily reload both the original and
+* the speculative FP register states.  This is because the kernel
+* doesn't see if/when a TM rollback occurs, so if we take an FP
+* unavoidable later, we are unable to determine which set of FP regs
+* need to be restored.
+*/
+   if (!new->thread.regs)
+   return;
+
+   /* The TM SPRs are restored here, so that TEXASR.FS can be set
+* before the trecheckpoint and no explosion occurs.
+*/
+   tm_restore_sprs(&new->thread);
+
+   if (!MSR_TM_ACTIVE(new->thread.regs->msr))
+   return;
+   msr = new->thread.tm_orig_msr;
+   /* Recheckpoint to restore original checkpointed register state. */
+   TM_DEBUG("*** tm_recheckpoint of pid %d "
+"(new->msr 0x%lx, new->origmsr 0x%lx)\n",
+new->pid, new->thread.regs->msr, msr);
+
+   /* This loads the checkpointed FP/VEC state, if used */
+   tm_recheckpoint(&new->thread, msr);
+
+   /* This loads the speculative FP/VEC state, if used */
+   if (msr & MSR_FP) {
+   do_load_up_transact_fpu(&new->thread);
+   new->thread.regs->msr |=
+   (MSR_FP | new->thread.fpexc_mode);
+   }
+   if (msr & MSR_VEC) {
+   do_load_up_transact_altivec(&new->thread);
+   new->thread.regs->msr |= MSR_VEC;
+   }
+   /* We may as well turn on VSX too since all the state is restored now */
+   if (msr & MSR_VSX)
+   new->thread.regs->msr |= MSR_VSX;
+
+   TM_DEBUG("*** tm_recheckpoint of pid %d complete "
+"(kernel msr 0x%lx)\n",
+new->pid, mfmsr());
+}
+
+static inline void __switch_to_tm(struct task_str

[PATCH 08/16] powerpc: Add FP/VSX and VMX register load functions for transactional memory

2012-11-26 Thread Michael Neuling
This adds functions to restore the state of the FP/VSX registers from
what's stored in the thread_struct.  Two version for FP/VSX are required
since one restores them from transactional/checkpoint side of the
thread_struct and the other from the speculated side.

Similar functions are added for VMX registers.

Signed-off-by: Matt Evans 
Signed-off-by: Michael Neuling 
---
 arch/powerpc/kernel/fpu.S|   54 ++
 arch/powerpc/kernel/vector.S |   51 +++
 2 files changed, 105 insertions(+)

diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S
index adb1551..6ab0e87 100644
--- a/arch/powerpc/kernel/fpu.S
+++ b/arch/powerpc/kernel/fpu.S
@@ -62,6 +62,60 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX);  
\
__REST_32FPVSRS_TRANSACT(n,__REG_##c,__REG_##base)
 #define SAVE_32FPVSRS(n,c,base) __SAVE_32FPVSRS(n,__REG_##c,__REG_##base)
 
+#ifdef CONFIG_TRANSACTIONAL_MEM
+/*
+ * Wrapper to call load_up_fpu from C.
+ * void do_load_up_fpu(struct pt_regs *regs);
+ */
+_GLOBAL(do_load_up_fpu)
+   mflrr0
+   std r0, 16(r1)
+   stdur1, -112(r1)
+
+   subir6, r3, STACK_FRAME_OVERHEAD
+   /* load_up_fpu expects r12=MSR, r13=PACA, and returns
+* with r12 = new MSR.
+*/
+   ld  r12,_MSR(r6)
+   GET_PACA(r13)
+
+   bl  load_up_fpu
+   std r12,_MSR(r6)
+
+   ld  r0, 112+16(r1)
+   addir1, r1, 112
+   mtlrr0
+   blr
+
+
+/* void do_load_up_fpu(struct thread_struct *thread)
+ *
+ * This is similar to load_up_fpu but for the transactional version of the FP
+ * register set.  It doesn't mess with the task MSR or valid flags.
+ * Furthermore, we don't do lazy FP with TM currently.
+ */
+_GLOBAL(do_load_up_transact_fpu)
+   mfmsr   r6
+   ori r5,r6,MSR_FP
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+   orisr5,r5,MSR_VSX@h
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif
+   SYNC
+   MTMSRD(r5)
+
+   lfd fr0,THREAD_TRANSACT_FPSCR(r3)
+   MTFSF_L(fr0)
+   REST_32FPVSRS_TRANSACT(0, R4, R3)
+
+   /* FP/VSX off again */
+   MTMSRD(r6)
+   SYNC
+
+   blr
+#endif /* CONFIG_TRANSACTIONAL_MEM */
+
 /*
  * This task wants to use the FPU now.
  * On UP, disable FP for the task which had the FPU previously,
diff --git a/arch/powerpc/kernel/vector.S b/arch/powerpc/kernel/vector.S
index e830289..330fc8c 100644
--- a/arch/powerpc/kernel/vector.S
+++ b/arch/powerpc/kernel/vector.S
@@ -7,6 +7,57 @@
 #include 
 #include 
 
+#ifdef CONFIG_TRANSACTIONAL_MEM
+/*
+ * Wrapper to call load_up_altivec from C.
+ * void do_load_up_altivec(struct pt_regs *regs);
+ */
+_GLOBAL(do_load_up_altivec)
+   mflrr0
+   std r0, 16(r1)
+   stdur1, -112(r1)
+
+   subir6, r3, STACK_FRAME_OVERHEAD
+   /* load_up_altivec expects r12=MSR, r13=PACA, and returns
+* with r12 = new MSR.
+*/
+   ld  r12,_MSR(r6)
+   GET_PACA(r13)
+   bl  load_up_altivec
+   std r12,_MSR(r6)
+
+   ld  r0, 112+16(r1)
+   addir1, r1, 112
+   mtlrr0
+   blr
+
+/* void do_load_up_altivec(struct thread_struct *thread)
+ *
+ * This is similar to load_up_altivec but for the transactional version of the
+ * vector regs.  It doesn't mess with the task MSR or valid flags.
+ * Furthermore, VEC laziness is not supported with TM currently.
+ */
+_GLOBAL(do_load_up_transact_altivec)
+   mfmsr   r6
+   orisr5,r6,MSR_VEC@h
+   MTMSRD(r5)
+   isync
+
+   li  r4,1
+   stw r4,THREAD_USED_VR(r3)
+
+   li  r10,THREAD_TRANSACT_VSCR
+   lvx vr0,r10,r3
+   mtvscr  vr0
+   REST_32VRS_TRANSACT(0,r4,r3)
+
+   /* Disable VEC again. */
+   MTMSRD(r6)
+   isync
+
+   blr
+#endif
+
 /*
  * load_up_altivec(unused, unused, tsk)
  * Disable VMX for the task which had it previously,
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 07/16] powerpc: Add helper functions for transactional memory context switching

2012-11-26 Thread Michael Neuling
Here we add the helper functions to be used when context switching.  These
allow us to fully reclaim and recheckpoint a transaction.

Signed-off-by: Matt Evans 
Signed-off-by: Michael Neuling 
---
 arch/powerpc/include/asm/reg.h |2 +-
 arch/powerpc/include/asm/tm.h  |   19 ++
 arch/powerpc/kernel/Makefile   |2 +
 arch/powerpc/kernel/tm.S   |  378 
 4 files changed, 400 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/include/asm/tm.h
 create mode 100644 arch/powerpc/kernel/tm.S

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index bea823f..895020f 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -784,7 +784,7 @@
  *HV mode in which case it is HSPRG0
  *
  * 64-bit server:
- * - SPRG0 unused (reserved for HV on Power4)
+ * - SPRG0 scratch for TM recheckpoint/reclaim (reserved for HV on Power4)
  * - SPRG2 scratch for exception vectors
  * - SPRG3 CPU and NUMA node for VDSO getcpu (user visible)
  *  - HSPRG0 stores PACA in HV mode
diff --git a/arch/powerpc/include/asm/tm.h b/arch/powerpc/include/asm/tm.h
new file mode 100644
index 000..18f6bf7
--- /dev/null
+++ b/arch/powerpc/include/asm/tm.h
@@ -0,0 +1,19 @@
+/*
+ * Transactional memory support routines to reclaim and recheckpoint
+ * transactional process state.
+ *
+ * Copyright 2012 Matt Evans & Michael Neuling, IBM Corporation.
+ */
+
+#ifdef CONFIG_TRANSACTIONAL_MEM
+extern void do_load_up_transact_fpu(struct thread_struct *thread);
+extern void do_load_up_transact_altivec(struct thread_struct *thread);
+#endif
+
+extern void tm_enable(void);
+extern void tm_reclaim(struct thread_struct *thread,
+  unsigned long orig_msr, uint8_t cause);
+extern void tm_recheckpoint(struct thread_struct *thread,
+   unsigned long orig_msr);
+extern void tm_save_sprs(struct thread_struct *thread);
+extern void tm_restore_sprs(struct thread_struct *thread);
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 44fbbea..abf8469 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -122,6 +122,8 @@ ifneq ($(CONFIG_PPC_INDIRECT_IO),y)
 obj-y  += iomap.o
 endif
 
+obj64-$(CONFIG_TRANSACTIONAL_MEM)  += tm.o
+
 obj-$(CONFIG_PPC64)+= $(obj64-y)
 obj-$(CONFIG_PPC32)+= $(obj32-y)
 
diff --git a/arch/powerpc/kernel/tm.S b/arch/powerpc/kernel/tm.S
new file mode 100644
index 000..b368e4a
--- /dev/null
+++ b/arch/powerpc/kernel/tm.S
@@ -0,0 +1,378 @@
+/*
+ * Transactional memory support routines to reclaim and recheckpoint
+ * transactional process state.
+ *
+ * Copyright 2012 Matt Evans & Michael Neuling, IBM Corporation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#ifdef CONFIG_VSX
+/* See fpu.S, this is very similar but to save/restore checkpointed FPRs/VSRs 
*/
+#define __SAVE_32FPRS_VSRS_TRANSACT(n,c,base)  \
+BEGIN_FTR_SECTION  \
+   b   2f; \
+END_FTR_SECTION_IFSET(CPU_FTR_VSX);\
+   SAVE_32FPRS_TRANSACT(n,base);   \
+   b   3f; \
+2: SAVE_32VSRS_TRANSACT(n,c,base); \
+3:
+/* ...and this is just plain borrowed from there. */
+#define __REST_32FPRS_VSRS(n,c,base)   \
+BEGIN_FTR_SECTION  \
+   b   2f; \
+END_FTR_SECTION_IFSET(CPU_FTR_VSX);\
+   REST_32FPRS(n,base);\
+   b   3f; \
+2: REST_32VSRS(n,c,base);  \
+3:
+#else
+#define __SAVE_32FPRS_VSRS_TRANSACT(n,c,base) SAVE_32FPRS_TRANSACT(n, base)
+#define __REST_32FPRS_VSRS(n,c,base) REST_32FPRS(n, base)
+#endif
+#define SAVE_32FPRS_VSRS_TRANSACT(n,c,base) \
+   __SAVE_32FPRS_VSRS_TRANSACT(n,__REG_##c,__REG_##base)
+#define REST_32FPRS_VSRS(n,c,base) \
+   __REST_32FPRS_VSRS(n,__REG_##c,__REG_##base)
+
+/* Stack frame offsets for local variables. */
+#define TM_FRAME_L0TM_FRAME_SIZE-16
+#define TM_FRAME_L1TM_FRAME_SIZE-8
+#define STACK_PARAM(x) (48+((x)*8))
+
+
+/* In order to access the TM SPRs, TM must be enabled.  So, do so: */
+_GLOBAL(tm_enable)
+   mfmsr   r4
+   li  r3, MSR_TM >> 32
+   sldir3, r3, 32
+   and.r0, r4, r3
+   bne 1f
+   or  r4, r4, r3
+   mtmsrd  r4
+1: blr
+
+_GLOBAL(tm_save_sprs)
+   mfspr   r0, SPRN_TFHAR
+   std r0, THREAD_TM_TFHAR(r3)
+   mfspr   r0, SPRN_TEXASR
+   std r0, THREAD_TM_TEXASR(r3)
+   mfspr   r0, SPRN_TFIAR
+   std r0, THREAD_TM_TFIAR(r3)
+   blr
+
+_GLOBAL(tm_restore_sprs)
+   ld  r0, THREAD_TM_TFHAR(r3)
+   mtspr   SPRN_TFHAR, r0
+   ld  r0, THREAD_TM_TEXASR(r3)
+   mtspr   SPRN_TEXASR, r0
+   ld  r0, THREAD_TM_TFIAR(r

[PATCH 06/16] powerpc: Add transactional memory paca scratch register to show_regs

2012-11-26 Thread Michael Neuling
Add transactional memory paca scratch register to show_regs.  This is useful
for debugging.

Signed-off-by: Matt Evans 
Signed-off-by: Michael Neuling 
---
 arch/powerpc/include/asm/paca.h   |1 +
 arch/powerpc/kernel/asm-offsets.c |1 +
 arch/powerpc/kernel/entry_64.S|4 
 arch/powerpc/kernel/process.c |3 +++
 4 files changed, 9 insertions(+)

diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index e9e7a69..0168516 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -137,6 +137,7 @@ struct paca_struct {
u8 irq_work_pending;/* IRQ_WORK interrupt while 
soft-disable */
u8 nap_state_lost;  /* NV GPR values lost in power7_idle */
u64 sprg3;  /* Saved user-visible sprg */
+   u64 tm_scratch; /* TM scratch area for reclaim */
 
 #ifdef CONFIG_PPC_POWERNV
/* Pointer to OPAL machine check event structure set by the
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 1a70f02..42a4243 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -122,6 +122,7 @@ int main(void)
 #endif
 
 #ifdef CONFIG_TRANSACTIONAL_MEM
+   DEFINE(PACATMSCRATCH, offsetof(struct paca_struct, tm_scratch));
DEFINE(THREAD_TM_TFHAR, offsetof(struct thread_struct, tm_tfhar));
DEFINE(THREAD_TM_TEXASR, offsetof(struct thread_struct, tm_texasr));
DEFINE(THREAD_TM_TFIAR, offsetof(struct thread_struct, tm_tfiar));
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index df6857f..5ae8e51 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -755,6 +755,10 @@ fast_exception_return:
andcr4,r4,r0 /* r0 contains MSR_RI here */
mtmsrd  r4,1
 
+#ifdef CONFIG_TRANSACTIONAL_MEM
+   /* TM debug */
+   std r3, PACATMSCRATCH(r13) /* Stash returned-to MSR */
+#endif
/*
 * r13 is our per cpu area, only restore it if we are returning to
 * userspace the value stored in the stack frame may belong to
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 6d66a68..fd5ce1b 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -693,6 +693,9 @@ void show_regs(struct pt_regs * regs)
printk("NIP ["REG"] %pS\n", regs->nip, (void *)regs->nip);
printk("LR ["REG"] %pS\n", regs->link, (void *)regs->link);
 #endif
+#ifdef CONFIG_TRANSACTIONAL_MEM
+   printk("PACATMSCRATCH [%llx]\n", get_paca()->tm_scratch);
+#endif
show_stack(current, (unsigned long *) regs->gpr[1]);
if (!user_mode(regs))
show_instructions(regs);
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 05/16] powerpc: Register defines for various transactional memory registers

2012-11-26 Thread Michael Neuling
Defines for MSR bits and transactional memory related SPRs TFIAR, TEXASR and
TEXASRU.

Signed-off-by: Matt Evans 
Signed-off-by: Michael Neuling 
---
 arch/powerpc/include/asm/reg.h |   21 +
 1 file changed, 21 insertions(+)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 736c6af..bea823f 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -29,6 +29,8 @@
 #define MSR_SF_LG  63  /* Enable 64 bit mode */
 #define MSR_ISF_LG 61  /* Interrupt 64b mode valid on 630 */
 #define MSR_HV_LG  60  /* Hypervisor state */
+#define MSR_TS_LG  33  /* Transactional Mem State (2 bits) */
+#define MSR_TM_LG  32  /* Transactional Mem Available */
 #define MSR_VEC_LG 25  /* Enable AltiVec */
 #define MSR_VSX_LG 23  /* Enable VSX */
 #define MSR_POW_LG 18  /* Enable Power Management */
@@ -98,6 +100,21 @@
 #define MSR_RI __MASK(MSR_RI_LG)   /* Recoverable Exception */
 #define MSR_LE __MASK(MSR_LE_LG)   /* Little Endian */
 
+#define MSR_TM __MASK(MSR_TM_LG)   /* Transactional Mem Available 
*/
+#define MSR_TS_MASK (__MASK(MSR_TS_LG) | \
+__MASK(MSR_TS_LG+1))   /* Transaction State bits */
+#define MSR_TM_ACTIVE(x) (((x) & MSR_TS_MASK) != 0) /* Transaction active? */
+
+/* Reason codes describing kernel causes for transaction aborts.  By
+   convention, bit0 is copied to TEXASR[56] (IBM bit 7) which is set if
+   the failure is persistent.
+*/
+#define TM_CAUSE_RESCHED   0xfe
+#define TM_CAUSE_TLBI  0xfc
+#define TM_CAUSE_FAC_UNAV  0xfa
+#define TM_CAUSE_SYSCALL   0xf9 /* Persistent */
+#define TM_CAUSE_MISC  0xf6
+
 #if defined(CONFIG_PPC_BOOK3S_64)
 #define MSR_64BIT  MSR_SF
 
@@ -193,6 +210,10 @@
 #define SPRN_UAMOR 0x9d/* User Authority Mask Override Register */
 #define SPRN_AMOR  0x15d   /* Authority Mask Override Register */
 #define SPRN_ACOP  0x1F/* Available Coprocessor Register */
+#define SPRN_TFIAR 0x81/* Transaction Failure Inst Addr   */
+#define SPRN_TEXASR0x82/* Transaction EXception & Summary */
+#define SPRN_TEXASRU   0x83/* ''  ''  ''Upper 32  */
+#define SPRN_TFHAR 0x80/* Transaction Failure Handler Addr */
 #define SPRN_CTRLF 0x088
 #define SPRN_CTRLT 0x098
 #define   CTRL_CT  0xc000  /* current thread */
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 04/16] powerpc: New macros for transactional memory support

2012-11-26 Thread Michael Neuling
This adds new macros for saving and restoring checkpointed architected state
from and to the thread_struct.

It also adds some debugging macros for when your brain explodes trying to debug
your transactional memory enabled kernel.

Signed-off-by: Matt Evans 
Signed-off-by: Michael Neuling 
---
 arch/powerpc/include/asm/ppc_asm.h   |   83 ++
 arch/powerpc/include/asm/processor.h |1 +
 arch/powerpc/kernel/asm-offsets.c|   24 ++
 arch/powerpc/kernel/fpu.S|   12 +
 arch/powerpc/kernel/process.c|   10 
 arch/powerpc/kernel/traps.c  |   11 +
 6 files changed, 141 insertions(+)

diff --git a/arch/powerpc/include/asm/ppc_asm.h 
b/arch/powerpc/include/asm/ppc_asm.h
index ea2a86e..a17a598 100644
--- a/arch/powerpc/include/asm/ppc_asm.h
+++ b/arch/powerpc/include/asm/ppc_asm.h
@@ -125,6 +125,89 @@ END_FW_FTR_SECTION_IFSET(FW_FEATURE_SPLPAR)
 #define REST_16VRS(n,b,base)   REST_8VRS(n,b,base); REST_8VRS(n+8,b,base)
 #define REST_32VRS(n,b,base)   REST_16VRS(n,b,base); REST_16VRS(n+16,b,base)
 
+/* Save/restore FPRs, VRs and VSRs from their checkpointed backups in
+ * thread_struct:
+ */
+#define SAVE_FPR_TRANSACT(n, base) stfd n,THREAD_TRANSACT_FPR0+\
+   8*TS_FPRWIDTH*(n)(base)
+#define SAVE_2FPRS_TRANSACT(n, base)   SAVE_FPR_TRANSACT(n, base); \
+   SAVE_FPR_TRANSACT(n+1, base)
+#define SAVE_4FPRS_TRANSACT(n, base)   SAVE_2FPRS_TRANSACT(n, base);   \
+   SAVE_2FPRS_TRANSACT(n+2, base)
+#define SAVE_8FPRS_TRANSACT(n, base)   SAVE_4FPRS_TRANSACT(n, base);   \
+   SAVE_4FPRS_TRANSACT(n+4, base)
+#define SAVE_16FPRS_TRANSACT(n, base)  SAVE_8FPRS_TRANSACT(n, base);   \
+   SAVE_8FPRS_TRANSACT(n+8, base)
+#define SAVE_32FPRS_TRANSACT(n, base)  SAVE_16FPRS_TRANSACT(n, base);  \
+   SAVE_16FPRS_TRANSACT(n+16, base)
+
+#define REST_FPR_TRANSACT(n, base) lfd n,THREAD_TRANSACT_FPR0+ \
+   8*TS_FPRWIDTH*(n)(base)
+#define REST_2FPRS_TRANSACT(n, base)   REST_FPR_TRANSACT(n, base); \
+   REST_FPR_TRANSACT(n+1, base)
+#define REST_4FPRS_TRANSACT(n, base)   REST_2FPRS_TRANSACT(n, base);   \
+   REST_2FPRS_TRANSACT(n+2, base)
+#define REST_8FPRS_TRANSACT(n, base)   REST_4FPRS_TRANSACT(n, base);   \
+   REST_4FPRS_TRANSACT(n+4, base)
+#define REST_16FPRS_TRANSACT(n, base)  REST_8FPRS_TRANSACT(n, base);   \
+   REST_8FPRS_TRANSACT(n+8, base)
+#define REST_32FPRS_TRANSACT(n, base)  REST_16FPRS_TRANSACT(n, base);  \
+   REST_16FPRS_TRANSACT(n+16, base)
+
+
+#define SAVE_VR_TRANSACT(n,b,base) li b,THREAD_TRANSACT_VR0+(16*(n)); \
+   stvx n,b,base
+#define SAVE_2VRS_TRANSACT(n,b,base)   SAVE_VR_TRANSACT(n,b,base); \
+   SAVE_VR_TRANSACT(n+1,b,base)
+#define SAVE_4VRS_TRANSACT(n,b,base)   SAVE_2VRS_TRANSACT(n,b,base);   \
+   SAVE_2VRS_TRANSACT(n+2,b,base)
+#define SAVE_8VRS_TRANSACT(n,b,base)   SAVE_4VRS_TRANSACT(n,b,base);   \
+   SAVE_4VRS_TRANSACT(n+4,b,base)
+#define SAVE_16VRS_TRANSACT(n,b,base)  SAVE_8VRS_TRANSACT(n,b,base);   \
+   SAVE_8VRS_TRANSACT(n+8,b,base)
+#define SAVE_32VRS_TRANSACT(n,b,base)  SAVE_16VRS_TRANSACT(n,b,base);  \
+   SAVE_16VRS_TRANSACT(n+16,b,base)
+
+#define REST_VR_TRANSACT(n,b,base) li b,THREAD_TRANSACT_VR0+(16*(n)); \
+   lvx n,b,base
+#define REST_2VRS_TRANSACT(n,b,base)   REST_VR_TRANSACT(n,b,base); \
+   REST_VR_TRANSACT(n+1,b,base)
+#define REST_4VRS_TRANSACT(n,b,base)   REST_2VRS_TRANSACT(n,b,base);   \
+   REST_2VRS_TRANSACT(n+2,b,base)
+#define REST_8VRS_TRANSACT(n,b,base)   REST_4VRS_TRANSACT(n,b,base);   \
+   REST_4VRS_TRANSACT(n+4,b,base)
+#define REST_16VRS_TRANSACT(n,b,base)  REST_8VRS_TRANSACT(n,b,base);   \
+   REST_8VRS_TRANSACT(n+8,b,base)
+#define REST_32VRS_TRANSACT(n,b,base)  REST_16VRS_TRANSACT(n,b,base);  \
+   REST_16VRS_TRANSACT(n+16,b,base)
+
+
+#define SAVE_VSR_TRANSACT(n,b,base)li b,THREAD_TRANSACT_VSR0+(16*(n)); \
+   STXVD2X(n,R##base,R##b)
+#define SAVE_2VSRS_TRANSACT(n,b,base)  SAVE_VSR_TRANSACT(n,b,base);\
+   SAVE_VSR_TRANSACT(n+1,b,base)
+#define SAVE_4VSRS_TRANSACT(n,b,base)  SAVE_2VSRS_TRANSACT(n,b,base);  \
+ 

[PATCH 03/16] powerpc: Add additional state needed for transactional memory to thread struct

2012-11-26 Thread Michael Neuling
Set of new archtected state for saving away on context switch.

Signed-off-by: Matt Evans 
Signed-off-by: Michael Neuling 
---
 arch/powerpc/include/asm/processor.h |   28 
 1 file changed, 28 insertions(+)

diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 8750204..0d1c188 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -236,6 +236,34 @@ struct thread_struct {
unsigned long   spefscr;/* SPE & eFP status */
int used_spe;   /* set if process has used spe */
 #endif /* CONFIG_SPE */
+#ifdef CONFIG_TRANSACTIONAL_MEM
+   u64 tm_tfhar;   /* Transaction fail handler addr */
+   u64 tm_texasr;  /* Transaction exception & summary */
+   u64 tm_tfiar;   /* Transaction fail instr address reg */
+   unsigned long   tm_orig_msr;/* Thread's MSR on ctx switch */
+   struct pt_regs  ckpt_regs;  /* Checkpointed registers */
+
+   /*
+* Transactional FP and VSX 0-31 register set.
+* NOTE: the sense of these is the opposite of the integer ckpt_regs!
+*
+* When a transaction is active/signalled/scheduled etc., *regs is the
+* most recent set of/speculated GPRs with ckpt_regs being the older
+* checkpointed regs to which we roll back if transaction aborts.
+*
+* However, fpr[] is the checkpointed 'base state' of FP regs, and
+* transact_fpr[] is the new set of transactional values.
+* VRs work the same way.
+*/
+   double  transact_fpr[32][TS_FPRWIDTH];
+   struct {
+   unsigned int pad;
+   unsigned int val;   /* Floating point status */
+   } transact_fpscr;
+   vector128   transact_vr[32] __attribute__((aligned(16)));
+   vector128   transact_vscr __attribute__((aligned(16)));
+   unsigned long   transact_vrsave;
+#endif /* CONFIG_TRANSACTIONAL_MEM */
 #ifdef CONFIG_KVM_BOOK3S_32_HANDLER
void*   kvm_shadow_vcpu; /* KVM internal data */
 #endif /* CONFIG_KVM_BOOK3S_32_HANDLER */
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 02/16] powerpc: Add new instructions for transactional memory

2012-11-26 Thread Michael Neuling
Here we define the new instructions we need for transactional memory in the
kernel.  This is so we can support compiling with binutils that don't support
the new transactional memory instructions.

Transactional memory results in two sets of architected state (GPRs/VSRs
etc).

treclaim allows us to read the checkpointed state (from the tbegin) so that we
can store it away on a context switch.  It does this by overwriting the exiting
architected state, so you have to save that away before you treclaim.  treclaim
will also abort a transaction, so you can give a register value which contains
an abort reason.

trecheckpoint allows us to inject into the checkpointed state as if it were at
the tbegin.  It does this by copying the current architected state into the
checkpointed state.

Signed-off-by: Matt Evans 
Signed-off-by: Michael Neuling 
---
 arch/powerpc/include/asm/ppc-opcode.h |7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index 45fd394..3674ffc 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -129,6 +129,8 @@
 #define PPC_INST_TLBSRX_DOT0x7c0006a5
 #define PPC_INST_XXLOR 0xf510
 #define PPC_INST_XVCPSGNDP 0xf780
+#define PPC_INST_TRECHKPT  0x7c0007dd
+#define PPC_INST_TRECLAIM  0x7c00075d
 
 #define PPC_INST_NAP   0x4c000364
 #define PPC_INST_SLEEP 0x4c0003a4
@@ -291,4 +293,9 @@
 #define PPC_NAPstringify_in_c(.long PPC_INST_NAP)
 #define PPC_SLEEP  stringify_in_c(.long PPC_INST_SLEEP)
 
+/* Transactional memory instructions */
+#define TRECHKPT   stringify_in_c(.long PPC_INST_TRECHKPT)
+#define TRECLAIM(r)stringify_in_c(.long PPC_INST_TRECLAIM \
+  | __PPC_RA(r))
+
 #endif /* _ASM_POWERPC_PPC_OPCODE_H */
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 01/16] powerpc: Add new CPU feature bit for transactional memory

2012-11-26 Thread Michael Neuling
Signed-off-by: Matt Evans 
Signed-off-by: Michael Neuling 
---
 arch/powerpc/include/asm/cputable.h |8 
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/include/asm/cputable.h 
b/arch/powerpc/include/asm/cputable.h
index fc4d2c5..f2163da 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -171,6 +171,7 @@ extern const char *powerpc_base_platform;
 #define CPU_FTR_POPCNTD
LONG_ASM_CONST(0x0800)
 #define CPU_FTR_ICSWX  LONG_ASM_CONST(0x1000)
 #define CPU_FTR_VMX_COPY   LONG_ASM_CONST(0x2000)
+#define CPU_FTR_TM LONG_ASM_CONST(0x4000)
 
 #ifndef __ASSEMBLY__
 
@@ -216,6 +217,13 @@ extern const char *powerpc_base_platform;
 #define PPC_FEATURE_HAS_EFP_DOUBLE_COMP 0
 #endif
 
+/* We only set the TM feature if the kernel was compiled with TM supprt */
+#ifdef CONFIG_TRANSACTIONAL_MEM
+#define CPU_FTR_TM_COMPCPU_FTR_TM
+#else
+#define CPU_FTR_TM_COMP0
+#endif
+
 /* We need to mark all pages as being coherent if we're SMP or we have a
  * 74[45]x and an MPC107 host bridge. Also 83xx and PowerQUICC II
  * require it for PCI "streaming/prefetch" to work properly.
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 00/16] powerpc: Hardware transactional memory support for POWER8

2012-11-26 Thread Michael Neuling
POWER8 implements hardware transactional memory support.  This patch series
adds kernel support so that user programs can use this hardware transactional
memory and the new state is properly context switched.  It is not currently
used by the kernel itself.

This patch series was originally developed by Matt Evans.

Basic overview of a POWER8 hardware transaction memory
=
Hardware transactional memory is a feature that enables a different form of
atomic memory access.  Several new instructions are presented to delimit
transactions; transactions are guaranteed to either complete atomically or roll
back and undo any partial changes.

A simple transaction looks like this:

begin_move_money:
  tbegin
  beq   abort_handler

  ldr4, SAVINGS_ACCT(r3)
  ldr5, CURRENT_ACCT(r3)
  subi  r5, r5, 1
  addi  r4, r4, 1
  std   r4, SAVINGS_ACCT(r3)
  std   r5, CURRENT_ACCT(r3)

  tend

  b continue

abort_handler:
  ... test for odd failures ...

  /* Retry the transaction if it failed because it conflicted with
   * someone else: */
  b begin_move_money


The 'tbegin' instruction denotes the start point, and 'tend' the end point.
Between these points the processor is in 'Transactional' state; any memory
references will complete in one go if there are no conflicts with other
transactional or non-transactional accesses within the system.  In this
example, the transaction completes as though it were normal straight-line code
IF no other processor has touched SAVINGS_ACCT(r3) or CURRENT_ACCT(r3); an
atomic move of money from the current account to the savings account has been
performed.  Even though the normal ld/std instructions are used (note no
lwarx/stwcx), either *both* SAVINGS_ACCT(r3) and CURRENT_ACCT(r3) will be
updated, or neither will be updated.

If, in the meantime, there is a conflict with the locations accessed by the
transaction, the transaction will be aborted by the CPU.  Register and memory
state will roll back to that at the 'tbegin', and control will continue from
'tbegin+4'.  The branch to abort_handler will be taken this second time; the
abort handler can check the cause of the failure, and retry.

Checkpointed registers include all GPRs, FPRs, VRs/VSRs, LR, CCR/CR, CTR, FPCSR
and a few other status/flag regs; 

-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v3 11/12] memory-hotplug: remove sysfs file of node

2012-11-26 Thread Wen Congyang
At 11/26/2012 10:27 PM, Jianguo Wu Wrote:
> On 2012/11/1 17:44, Wen Congyang wrote:
>> This patch introduces a new function try_offline_node() to
>> remove sysfs file of node when all memory sections of this
>> node are removed. If some memory sections of this node are
>> not removed, this function does nothing.
>>
>> CC: David Rientjes 
>> CC: Jiang Liu 
>> CC: Len Brown 
>> CC: Christoph Lameter 
>> Cc: Minchan Kim 
>> CC: Andrew Morton 
>> CC: KOSAKI Motohiro 
>> CC: Yasuaki Ishimatsu 
>> Signed-off-by: Wen Congyang 
>> ---
>>  drivers/acpi/acpi_memhotplug.c |  8 +-
>>  include/linux/memory_hotplug.h |  2 +-
>>  mm/memory_hotplug.c| 58 
>> --
>>  3 files changed, 64 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
>> index 24c807f..0780f99 100644
>> --- a/drivers/acpi/acpi_memhotplug.c
>> +++ b/drivers/acpi/acpi_memhotplug.c
>> @@ -310,7 +310,9 @@ static int acpi_memory_disable_device(struct 
>> acpi_memory_device *mem_device)
>>  {
>>  int result;
>>  struct acpi_memory_info *info, *n;
>> +int node;
>>  
>> +node = acpi_get_node(mem_device->device->handle);
>>  
>>  /*
>>   * Ask the VM to offline this memory range.
>> @@ -318,7 +320,11 @@ static int acpi_memory_disable_device(struct 
>> acpi_memory_device *mem_device)
>>   */
>>  list_for_each_entry_safe(info, n, &mem_device->res_list, list) {
>>  if (info->enabled) {
>> -result = remove_memory(info->start_addr, info->length);
>> +if (node < 0)
>> +node = memory_add_physaddr_to_nid(
>> +info->start_addr);
>> +result = remove_memory(node, info->start_addr,
>> +info->length);
>>  if (result)
>>  return result;
>>  }
>> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
>> index d4c4402..7b4cfe6 100644
>> --- a/include/linux/memory_hotplug.h
>> +++ b/include/linux/memory_hotplug.h
>> @@ -231,7 +231,7 @@ extern int arch_add_memory(int nid, u64 start, u64 size);
>>  extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
>>  extern int offline_memory_block(struct memory_block *mem);
>>  extern bool is_memblock_offlined(struct memory_block *mem);
>> -extern int remove_memory(u64 start, u64 size);
>> +extern int remove_memory(int node, u64 start, u64 size);
>>  extern int sparse_add_one_section(struct zone *zone, unsigned long 
>> start_pfn,
>>  int nr_pages);
>>  extern void sparse_remove_one_section(struct zone *zone, struct mem_section 
>> *ms);
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 7bcced0..d965da3 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -29,6 +29,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  #include 
>>  
>> @@ -1299,7 +1300,58 @@ static int is_memblock_offlined_cb(struct 
>> memory_block *mem, void *arg)
>>  return ret;
>>  }
>>  
>> -int __ref remove_memory(u64 start, u64 size)
>> +static int check_cpu_on_node(void *data)
>> +{
>> +struct pglist_data *pgdat = data;
>> +int cpu;
>> +
>> +for_each_present_cpu(cpu) {
>> +if (cpu_to_node(cpu) == pgdat->node_id)
>> +/*
>> + * the cpu on this node isn't removed, and we can't
>> + * offline this node.
>> + */
>> +return -EBUSY;
>> +}
>> +
>> +return 0;
>> +}
>> +
>> +/* offline the node if all memory sections of this node are removed */
>> +static void try_offline_node(int nid)
>> +{
>> +unsigned long start_pfn = NODE_DATA(nid)->node_start_pfn;
>> +unsigned long end_pfn = start_pfn + NODE_DATA(nid)->node_spanned_pages;
>> +unsigned long pfn;
>> +
>> +for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
>> +unsigned long section_nr = pfn_to_section_nr(pfn);
>> +
>> +if (!present_section_nr(section_nr))
>> +continue;
>> +
>> +if (pfn_to_nid(pfn) != nid)
>> +continue;
>> +
>> +/*
>> + * some memory sections of this node are not removed, and we
>> + * can't offline node now.
>> + */
>> +return;
>> +}
>> +
>> +if (stop_machine(check_cpu_on_node, NODE_DATA(nid), NULL))
>> +return;
> 
> how about:
>   if (nr_cpus_node(nid))

If all cpus on the node is offlined, but not removed, nr_cpus_node(nid) will
return 0. In this case, we still can't offline the node.

Another purpose to use stop_machine() is to prevent cpu hotplug. We can't lock
cpuhotplug here.

Thanks
Wen Congyang

>   return;
>> +
>> +/*
>> + * all memory/cpu of t

Re: [PATCH 5/6] powerpc: Macros for saving/restore PPR

2012-11-26 Thread Haren Myneni
On 11/22/2012 07:39 PM, Michael Neuling wrote:
> Haren Myneni  wrote:
> 
>> [PATCH 5/6] powerpc: Macros for saving/restore PPR
>>
>> Several macros are defined for saving and restore user defined PPR value.
>>
>> Signed-off-by: Haren Myneni 
>> ---
>>  arch/powerpc/include/asm/exception-64s.h |   29 
>> +
>>  arch/powerpc/include/asm/ppc_asm.h   |   25 +
>>  arch/powerpc/include/asm/reg.h   |1 +
>>  3 files changed, 55 insertions(+), 0 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/exception-64s.h 
>> b/arch/powerpc/include/asm/exception-64s.h
>> index bfd3f1f..880ef7d 100644
>> --- a/arch/powerpc/include/asm/exception-64s.h
>> +++ b/arch/powerpc/include/asm/exception-64s.h
>> @@ -62,6 +62,35 @@
>>  #define EXC_HV  H
>>  #define EXC_STD
>>  
>> +/*
>> + * PPR save/restore macros used in exceptions_64s.S  
>> + * Used for P7 or later processors
>> + */
>> +#define SAVE_PPR(area, ra, rb)  
>> \
>> +BEGIN_FTR_SECTION_NESTED(940)   
>> \
>> +ld  ra,PACACURRENT(r13);\
>> +ld  rb,area+EX_PPR(r13);/* Read PPR from paca */\
>> +std rb,TASKTHREADPPR(ra);   \
>> +END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,940)
>> +
>> +#define RESTORE_PPR_PACA(area, ra)  \
>> +BEGIN_FTR_SECTION_NESTED(941)   
>> \
>> +ld  ra,area+EX_PPR(r13);\
>> +mtspr   SPRN_PPR,ra;\
>> +END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,941)
>> +
> 
> Can you add some documentation here on why we should use the different
> versions.
> 
> I'm trying to read the next patch and it's not clear to my why some have
> HMT_MEDIUM_NO_PPR and other times HMT_MEDIUM and others
> HTM_MEDIUM_HAS_PPR.
> 
> Looks like HTM_MEDIUM_NO_PPR sets the priority to medium on systems
> where we can't save/restore the PPR, hence it can be called earlier in
> the exception handler before we have free GPRs.  HTM_MEDIUM_HAS_PPR
> saves the priority on systems where it can, and then sets the priority
> to medium.
> 
> Maybe we should change the names 
>   HTM_MEDIUM_NO_PPR  => HTM_MEDIUM_PPR_DISCARD   and
>   HTM_MEDIUM_HAS_PPR => HTM_MEDIUM_PPR_SAVE
> But now I'm heading into bike shedding territory... plus I think I
> suggested the names you have currently, so I'm feeling a bit dumb now
> :-)

No problem, We can change these macro names if HTM_MEDIUM_PPR_DISCARD/
HTM_MEDIUM_PPR_SAVE gives better description.

Right, HTM_MEDIUM_NO_PPR is used on systems where we do not save/restore
PPR. So the behaviour is same as before - just increases the priority.
HTM_MEDIUM_HAS_PPR will be executed on systems where CPU_FTR_HAS_PPR is
enabled. I will write some comments around these macros to make it clear.

We can also name them HMT_MEDIUM_CPU_NO_PPR_SAVE and
HMT_MEDIUM_CPU_HAS_PPR_SAVE since we are enabling PPR save/restore using
CPU_FTR macro.

Otherwise I will follow with your suggestions.


Thanks
Haren

> Mikey
> 
>> +#define HMT_MEDIUM_NO_PPR   \
>> +BEGIN_FTR_SECTION_NESTED(942)   
>> \
>> +HMT_MEDIUM; \
>> +END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,0,942)  /*non P7*/   
>> +
>> +#define HMT_MEDIUM_HAS_PPR(area, ra)
>> \
>> +BEGIN_FTR_SECTION_NESTED(943)   
>> \
>> +mfspr   ra,SPRN_PPR;\
>> +std ra,area+EX_PPR(r13);\
>> +HMT_MEDIUM; \
>> +END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,943) 
>> +
>>  #define __EXCEPTION_PROLOG_1(area, extra, vec)  
>> \
>>  GET_PACA(r13);  \
>>  std r9,area+EX_R9(r13); /* save r9 - r12 */ \
> 
> 
>> diff --git a/arch/powerpc/include/asm/ppc_asm.h 
>> b/arch/powerpc/include/asm/ppc_asm.h
>> index 376e36d..52649cc 100644
>> --- a/arch/powerpc/include/asm/ppc_asm.h
>> +++ b/arch/powerpc/include/asm/ppc_asm.h
>> @@ -389,6 +389,31 @@ END_FTR_SECTION_IFCLR(CPU_FTR_601)
>>  FTR_SECTION_ELSE_NESTED(848);   \
>>  mtocrf (FXM), RS;   \
>>  ALT_FTR_SECTION_END_NESTED_IFCLR(CPU_FTR_NOEXECUTE, 848)
>> +
>> +/*
>> + * PPR restore macros used in entry_64.S
>> + * Used for P7 or later processors
>> + */
>> +#define HMT_MEDIUM_LOW_HAS_PPR  
>> \
>> +BEGIN_FTR_SECTION_NESTED(944)   
>> \
>> +HMT_MEDIUM_LOW;

Re: [PATCH 2/6] powerpc: Define CPU_FTR_HAS_PPR

2012-11-26 Thread Haren Myneni
On 11/22/2012 08:01 PM, Michael Neuling wrote:
> Heaven Myneni  wrote:
> 
>> [PATCH 2/6] powerpc: Define CPU_FTR_HAS_PPR
>>
>> CPU_FTR_HAS_PPR is defined for POWER7.
>>
>> Signed-off-by: Haren Myneni 
>> ---
>>  arch/powerpc/include/asm/cputable.h |6 --
>>  1 files changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/cputable.h 
>> b/arch/powerpc/include/asm/cputable.h
>> index 21a0687..12e3a1d 100644
>> --- a/arch/powerpc/include/asm/cputable.h
>> +++ b/arch/powerpc/include/asm/cputable.h
>> @@ -171,6 +171,7 @@ extern const char *powerpc_base_platform;
>>  #define CPU_FTR_POPCNTD 
>> LONG_ASM_CONST(0x0800)
>>  #define CPU_FTR_ICSWX   
>> LONG_ASM_CONST(0x1000)
>>  #define CPU_FTR_VMX_COPYLONG_ASM_CONST(0x2000)
>> +#define CPU_FTR_HAS_PPR 
>> LONG_ASM_CONST(0x4000)
>>  
>>  #ifndef __ASSEMBLY__
>>  
>> @@ -400,7 +401,8 @@ extern const char *powerpc_base_platform;
>>  CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \
>>  CPU_FTR_DSCR | CPU_FTR_SAO  | CPU_FTR_ASYM_SMT | \
>>  CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
>> -CPU_FTR_ICSWX | CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY)
>> +CPU_FTR_ICSWX | CPU_FTR_CFAR | CPU_FTR_HVMODE | \
>> +CPU_FTR_VMX_COPY | CPU_FTR_HAS_PPR)
>>  #define CPU_FTRS_CELL   (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
>>  CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
>>  CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \
>> @@ -422,7 +424,7 @@ extern const char *powerpc_base_platform;
>>  (CPU_FTRS_POWER3 | CPU_FTRS_RS64 | CPU_FTRS_POWER4 |\
>>  CPU_FTRS_PPC970 | CPU_FTRS_POWER5 | CPU_FTRS_POWER6 |   \
>>  CPU_FTRS_POWER7 | CPU_FTRS_CELL | CPU_FTRS_PA6T |   \
>> -CPU_FTR_VSX)
>> +CPU_FTR_VSX | CPU_FTR_HAS_PPR)
> 
> FYI, there is no need to add this to POSSIBLE, since you are adding it
> to POWER7 anyway.

Will remove CPU_FTR_HAS_PPR for POSSIBLE macro. Added this in second
version for enabling this feature with command parameter, but forgot to
remove it.

Thanks
Haren


> 
> Mikey
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] vfio powerpc: enabled and supported on powernv platform

2012-11-26 Thread Benjamin Herrenschmidt
On Mon, 2012-11-26 at 11:04 -0700, Alex Williamson wrote:
> Ok, I see tces are put on shutdown via tce_iommu_detach_group, so you're
> more concerned about the guest simply mapping over top of it's own
> mappings.  Is that common?  Is it common enough for every multi-page
> mapping to assume it will happen?  I know this is a performance
> sensitive path for you and it seems like a map-only w/ fallback to
> unmap, remap would be better in the general case.
> 
> On x86 we do exactly that, but we do the unmap, remap from userspace
> when we get an EBUSY.  Thanks, 

Right, Linux as guest at least will never map "over" an existing
mapping. It will always unmap first. IE. The only transition we do on
H_PUT_TCE are 0 -> valid and valid -> 0.

So it would be fine to simplify the code and keep the "map over map" as
a slow fallback. I can't tell for other operating systems but we don't
care about those at this point :-)

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Build regressions/improvements in v3.7-rc7

2012-11-26 Thread Geert Uytterhoeven
On Mon, Nov 26, 2012 at 9:54 PM, Geert Uytterhoeven
 wrote:
> JFYI, when comparing v3.7-rc7 to v3.7-rc6[3], the summaries are:
>   - build errors: +4/-8

  + error: "__sync_fetch_and_and_4"
[drivers/staging/line6/line6usb.ko] undefined!:  => N/A
  + error: "__sync_fetch_and_or_4" [drivers/staging/line6/line6usb.ko]
undefined!:  => N/A

sh4/sh-randconfig

  + error: "smp_send_reschedule" [arch/powerpc/kvm/kvm.ko] undefined!:  => N/A

powerpc/powerpc-randconfig

> [1] http://kisskb.ellerman.id.au/kisskb/head/5646/ (all 117 configs)
> [3] http://kisskb.ellerman.id.au/kisskb/head/5628/ (all 117 configs)

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 437/493] char: remove use of __devexit

2012-11-26 Thread Kent Yoder
On Mon, Nov 19, 2012 at 01:26:26PM -0500, Bill Pemberton wrote:
> CONFIG_HOTPLUG is going away as an option so __devexit is no
> longer needed.
> 
> Signed-off-by: Bill Pemberton 
> Cc: David Airlie  
> Cc: Olof Johansson  
> Cc: Mattia Dongili  
> Cc: Kent Yoder  
> Cc: Rajiv Andrade  
> Cc: Marcel Selhorst  
> Cc: Sirrix AG  
> Cc: linuxppc-dev@lists.ozlabs.org 
> Cc: linux-arm-ker...@lists.infradead.org 
> Cc: openipmi-develo...@lists.sourceforge.net 
> Cc: platform-driver-...@vger.kernel.org 
> Cc: tpmdd-de...@lists.sourceforge.net 

Acked-by: Kent Yoder 

> ---
>  drivers/char/agp/ali-agp.c | 2 +-
>  drivers/char/agp/amd-k7-agp.c  | 2 +-
>  drivers/char/agp/amd64-agp.c   | 2 +-
>  drivers/char/agp/ati-agp.c | 2 +-
>  drivers/char/agp/efficeon-agp.c| 2 +-
>  drivers/char/agp/i460-agp.c| 2 +-
>  drivers/char/agp/intel-agp.c   | 2 +-
>  drivers/char/agp/nvidia-agp.c  | 2 +-
>  drivers/char/agp/sgi-agp.c | 2 +-
>  drivers/char/agp/sis-agp.c | 2 +-
>  drivers/char/agp/sworks-agp.c  | 2 +-
>  drivers/char/agp/uninorth-agp.c| 2 +-
>  drivers/char/agp/via-agp.c | 2 +-
>  drivers/char/hw_random/atmel-rng.c | 2 +-
>  drivers/char/hw_random/bcm63xx-rng.c   | 2 +-
>  drivers/char/hw_random/exynos-rng.c| 2 +-
>  drivers/char/hw_random/n2-drv.c| 2 +-
>  drivers/char/hw_random/pasemi-rng.c| 2 +-
>  drivers/char/hw_random/picoxcell-rng.c | 2 +-
>  drivers/char/hw_random/ppc4xx-rng.c| 2 +-
>  drivers/char/hw_random/timeriomem-rng.c| 2 +-
>  drivers/char/hw_random/virtio-rng.c| 2 +-
>  drivers/char/ipmi/ipmi_si_intf.c   | 6 +++---
>  drivers/char/sonypi.c  | 2 +-
>  drivers/char/tb0219.c  | 2 +-
>  drivers/char/tpm/tpm_i2c_infineon.c| 2 +-
>  drivers/char/tpm/tpm_ibmvtpm.c | 2 +-
>  drivers/char/tpm/tpm_infineon.c| 2 +-
>  drivers/char/tpm/tpm_tis.c | 2 +-
>  drivers/char/xilinx_hwicap/xilinx_hwicap.c | 4 ++--
>  30 files changed, 33 insertions(+), 33 deletions(-)
> 
> diff --git a/drivers/char/agp/ali-agp.c b/drivers/char/agp/ali-agp.c
> index 9eb629b..0e69e60 100644
> --- a/drivers/char/agp/ali-agp.c
> +++ b/drivers/char/agp/ali-agp.c
> @@ -374,7 +374,7 @@ found:
>   return agp_add_bridge(bridge);
>  }
> 
> -static void __devexit agp_ali_remove(struct pci_dev *pdev)
> +static void agp_ali_remove(struct pci_dev *pdev)
>  {
>   struct agp_bridge_data *bridge = pci_get_drvdata(pdev);
> 
> diff --git a/drivers/char/agp/amd-k7-agp.c b/drivers/char/agp/amd-k7-agp.c
> index 2e1efa9..cb9c9f9 100644
> --- a/drivers/char/agp/amd-k7-agp.c
> +++ b/drivers/char/agp/amd-k7-agp.c
> @@ -480,7 +480,7 @@ static int agp_amdk7_probe(struct pci_dev *pdev,
>   return agp_add_bridge(bridge);
>  }
> 
> -static void __devexit agp_amdk7_remove(struct pci_dev *pdev)
> +static void agp_amdk7_remove(struct pci_dev *pdev)
>  {
>   struct agp_bridge_data *bridge = pci_get_drvdata(pdev);
> 
> diff --git a/drivers/char/agp/amd64-agp.c b/drivers/char/agp/amd64-agp.c
> index f4086c5..280c0d5 100644
> --- a/drivers/char/agp/amd64-agp.c
> +++ b/drivers/char/agp/amd64-agp.c
> @@ -579,7 +579,7 @@ static int agp_amd64_probe(struct pci_dev *pdev,
>   return 0;
>  }
> 
> -static void __devexit agp_amd64_remove(struct pci_dev *pdev)
> +static void agp_amd64_remove(struct pci_dev *pdev)
>  {
>   struct agp_bridge_data *bridge = pci_get_drvdata(pdev);
> 
> diff --git a/drivers/char/agp/ati-agp.c b/drivers/char/agp/ati-agp.c
> index 6aeb624..485b15c 100644
> --- a/drivers/char/agp/ati-agp.c
> +++ b/drivers/char/agp/ati-agp.c
> @@ -533,7 +533,7 @@ found:
>   return agp_add_bridge(bridge);
>  }
> 
> -static void __devexit agp_ati_remove(struct pci_dev *pdev)
> +static void agp_ati_remove(struct pci_dev *pdev)
>  {
>   struct agp_bridge_data *bridge = pci_get_drvdata(pdev);
> 
> diff --git a/drivers/char/agp/efficeon-agp.c b/drivers/char/agp/efficeon-agp.c
> index bf80cf1..08a465e 100644
> --- a/drivers/char/agp/efficeon-agp.c
> +++ b/drivers/char/agp/efficeon-agp.c
> @@ -407,7 +407,7 @@ static int agp_efficeon_probe(struct pci_dev *pdev,
>   return agp_add_bridge(bridge);
>  }
> 
> -static void __devexit agp_efficeon_remove(struct pci_dev *pdev)
> +static void agp_efficeon_remove(struct pci_dev *pdev)
>  {
>   struct agp_bridge_data *bridge = pci_get_drvdata(pdev);
> 
> diff --git a/drivers/char/agp/i460-agp.c b/drivers/char/agp/i460-agp.c
> index bda2215..0436bf1 100644
> --- a/drivers/char/agp/i460-agp.c
> +++ b/drivers/char/agp/i460-agp.c
> @@ -611,7 +611,7 @@ static int agp_intel_i460_probe(struct pci_dev *pdev,
>   return agp_add_bridge(bridge);
>  }
> 
> -static void __devexit agp_intel_i460_remove(struct pci_dev *pdev)
> +static void agp_intel_i460_remove(struct pci_dev *pdev)
>  

Re: [PATCH 1/2] vfio powerpc: implemented IOMMU driver for VFIO

2012-11-26 Thread Alex Williamson
On Fri, 2012-11-23 at 20:03 +1100, Alexey Kardashevskiy wrote:
> VFIO implements platform independent stuff such as
> a PCI driver, BAR access (via read/write on a file descriptor
> or direct mapping when possible) and IRQ signaling.
> 
> The platform dependent part includes IOMMU initialization
> and handling. This patch implements an IOMMU driver for VFIO
> which does mapping/unmapping pages for the guest IO and
> provides information about DMA window (required by a POWERPC
> guest).
> 
> The counterpart in QEMU is required to support this functionality.
> 
> Cc: David Gibson 
> Signed-off-by: Alexey Kardashevskiy 
> ---
>  drivers/vfio/Kconfig|6 +
>  drivers/vfio/Makefile   |1 +
>  drivers/vfio/vfio_iommu_spapr_tce.c |  247 
> +++
>  include/linux/vfio.h|   20 +++
>  4 files changed, 274 insertions(+)
>  create mode 100644 drivers/vfio/vfio_iommu_spapr_tce.c
> 
> diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> index 7cd5dec..b464687 100644
> --- a/drivers/vfio/Kconfig
> +++ b/drivers/vfio/Kconfig
> @@ -3,10 +3,16 @@ config VFIO_IOMMU_TYPE1
>   depends on VFIO
>   default n
>  
> +config VFIO_IOMMU_SPAPR_TCE
> + tristate
> + depends on VFIO && SPAPR_TCE_IOMMU
> + default n
> +
>  menuconfig VFIO
>   tristate "VFIO Non-Privileged userspace driver framework"
>   depends on IOMMU_API
>   select VFIO_IOMMU_TYPE1 if X86
> + select VFIO_IOMMU_SPAPR_TCE if PPC_POWERNV
>   help
> VFIO provides a framework for secure userspace device drivers.
> See Documentation/vfio.txt for more details.
> diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
> index 2398d4a..72bfabc 100644
> --- a/drivers/vfio/Makefile
> +++ b/drivers/vfio/Makefile
> @@ -1,3 +1,4 @@
>  obj-$(CONFIG_VFIO) += vfio.o
>  obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
> +obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
>  obj-$(CONFIG_VFIO_PCI) += pci/
> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
> b/drivers/vfio/vfio_iommu_spapr_tce.c
> new file mode 100644
> index 000..46a6298
> --- /dev/null
> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> @@ -0,0 +1,247 @@
> +/*
> + * VFIO: IOMMU DMA mapping support for TCE on POWER
> + *
> + * Copyright (C) 2012 IBM Corp.  All rights reserved.
> + * Author: Alexey Kardashevskiy 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * Derived from original vfio_iommu_type1.c:
> + * Copyright (C) 2012 Red Hat, Inc.  All rights reserved.
> + * Author: Alex Williamson 
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define DRIVER_VERSION  "0.1"
> +#define DRIVER_AUTHOR   "a...@ozlabs.ru"
> +#define DRIVER_DESC "VFIO IOMMU SPAPR TCE"
> +
> +static void tce_iommu_detach_group(void *iommu_data,
> + struct iommu_group *iommu_group);
> +
> +/*
> + * VFIO IOMMU fd for SPAPR_TCE IOMMU implementation
> + */
> +
> +/*
> + * The container descriptor supports only a single group per container.
> + * Required by the API as the container is not supplied with the IOMMU group
> + * at the moment of initialization.
> + */
> +struct tce_container {
> + struct mutex lock;
> + struct iommu_table *tbl;
> +};
> +
> +static void *tce_iommu_open(unsigned long arg)
> +{
> + struct tce_container *container;
> +
> + if (arg != VFIO_SPAPR_TCE_IOMMU) {
> + printk(KERN_ERR "tce_vfio: Wrong IOMMU type\n");
> + return ERR_PTR(-EINVAL);
> + }
> +
> + container = kzalloc(sizeof(*container), GFP_KERNEL);
> + if (!container)
> + return ERR_PTR(-ENOMEM);
> +
> + mutex_init(&container->lock);
> +
> + return container;
> +}
> +
> +static void tce_iommu_release(void *iommu_data)
> +{
> + struct tce_container *container = iommu_data;
> +
> + WARN_ON(container->tbl && !container->tbl->it_group);

I think your patch ordering is backwards here.  it_group isn't added
until 2/2.  I'd really like to see the arch/powerpc code approved and
merged by the powerpc maintainer before we add the code that makes use
of it into vfio.  Otherwise we just get lots of churn if interfaces
change or they disapprove of it altogether.

> + if (container->tbl && container->tbl->it_group)
> + tce_iommu_detach_group(iommu_data, container->tbl->it_group);
> +
> + mutex_destroy(&container->lock);
> +
> + kfree(container);
> +}
> +
> +static long tce_iommu_ioctl(void *iommu_data,
> +  unsigned int cmd, unsigned long arg)
> +{
> + struct tce_container *container = iommu_data;
> + unsigned long minsz;
> +
> + switch (cmd) {
> + case VFIO_CHECK_EXTENSION: {
> + return (arg == VFIO_SPAPR_TCE_IOMMU) ? 1 : 0;
> +  

Re: [PATCH] vfio powerpc: enabled and supported on powernv platform

2012-11-26 Thread Alex Williamson
On Mon, 2012-11-26 at 08:18 -0700, Alex Williamson wrote:
> On Fri, 2012-11-23 at 13:02 +1100, Alexey Kardashevskiy wrote:
> > On 22/11/12 22:56, Sethi Varun-B16395 wrote:
> > >
> > >
> > >> -Original Message-
> > >> From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-
> > >> ow...@vger.kernel.org] On Behalf Of Alex Williamson
> > >> Sent: Tuesday, November 20, 2012 11:50 PM
> > >> To: Alexey Kardashevskiy
> > >> Cc: Benjamin Herrenschmidt; Paul Mackerras; linuxppc-
> > >> d...@lists.ozlabs.org; linux-ker...@vger.kernel.org; 
> > >> k...@vger.kernel.org;
> > >> David Gibson
> > >> Subject: Re: [PATCH] vfio powerpc: enabled and supported on powernv
> > >> platform
> > >>
> > >> On Tue, 2012-11-20 at 11:48 +1100, Alexey Kardashevskiy wrote:
> > >>> VFIO implements platform independent stuff such as a PCI driver, BAR
> > >>> access (via read/write on a file descriptor or direct mapping when
> > >>> possible) and IRQ signaling.
> > >>> The platform dependent part includes IOMMU initialization and
> > >>> handling.
> > >>>
> > >>> This patch initializes IOMMU groups based on the IOMMU configuration
> > >>> discovered during the PCI scan, only POWERNV platform is supported at
> > >>> the moment.
> > >>>
> > >>> Also the patch implements an VFIO-IOMMU driver which manages DMA
> > >>> mapping/unmapping requests coming from the client (now QEMU). It also
> > >>> returns a DMA window information to let the guest initialize the
> > >>> device tree for a guest OS properly. Although this driver has been
> > >>> tested only on POWERNV, it should work on any platform supporting TCE
> > >>> tables.
> > >>>
> > >>> To enable VFIO on POWER, enable SPAPR_TCE_IOMMU config option.
> > >>>
> > >>> Cc: David Gibson 
> > >>> Signed-off-by: Alexey Kardashevskiy 
> > >>> ---
> > >>>   arch/powerpc/include/asm/iommu.h |6 +
> > >>>   arch/powerpc/kernel/iommu.c  |  140 +++
> > >>>   arch/powerpc/platforms/powernv/pci.c |  135 +++
> > >>>   drivers/iommu/Kconfig|8 ++
> > >>>   drivers/vfio/Kconfig |6 +
> > >>>   drivers/vfio/Makefile|1 +
> > >>>   drivers/vfio/vfio_iommu_spapr_tce.c  |  247
> > >> ++
> > >>>   include/linux/vfio.h |   20 +++
> > >>>   8 files changed, 563 insertions(+)
> > >>>   create mode 100644 drivers/vfio/vfio_iommu_spapr_tce.c
> > >>>
> > >>> diff --git a/arch/powerpc/include/asm/iommu.h
> > >>> b/arch/powerpc/include/asm/iommu.h
> > >>> index cbfe678..5ba66cb 100644
> > >>> --- a/arch/powerpc/include/asm/iommu.h
> > >>> +++ b/arch/powerpc/include/asm/iommu.h
> > >>> @@ -64,30 +64,33 @@ struct iommu_pool {  }
> > >>> cacheline_aligned_in_smp;
> > >>>
> > >>>   struct iommu_table {
> > >>> unsigned long  it_busno; /* Bus number this table belongs 
> > >>> to */
> > >>> unsigned long  it_size;  /* Size of iommu table in entries 
> > >>> */
> > >>> unsigned long  it_offset;/* Offset into global table */
> > >>> unsigned long  it_base;  /* mapped address of tce table */
> > >>> unsigned long  it_index; /* which iommu table this is */
> > >>> unsigned long  it_type;  /* type: PCI or Virtual Bus */
> > >>> unsigned long  it_blocksize; /* Entries in each block 
> > >>> (cacheline)
> > >> */
> > >>> unsigned long  poolsize;
> > >>> unsigned long  nr_pools;
> > >>> struct iommu_pool large_pool;
> > >>> struct iommu_pool pools[IOMMU_NR_POOLS];
> > >>> unsigned long *it_map;   /* A simple allocation bitmap for 
> > >>> now
> > >> */
> > >>> +#ifdef CONFIG_IOMMU_API
> > >>> +   struct iommu_group *it_group;
> > >>> +#endif
> > >>>   };
> > >>>
> > >>>   struct scatterlist;
> > >>>
> > >>>   static inline void set_iommu_table_base(struct device *dev, void
> > >>> *base)  {
> > >>> dev->archdata.dma_data.iommu_table_base = base;  }
> > >>>
> > >>>   static inline void *get_iommu_table_base(struct device *dev)  {
> > >>> return dev->archdata.dma_data.iommu_table_base;
> > >>>   }
> > >>>
> > >>>   /* Frees table for an individual device node */ @@ -135,17 +138,20 @@
> > >>> static inline void pci_iommu_init(void) { }  extern void
> > >>> alloc_dart_table(void);  #if defined(CONFIG_PPC64) &&
> > >>> defined(CONFIG_PM)  static inline void iommu_save(void)  {
> > >>> if (ppc_md.iommu_save)
> > >>> ppc_md.iommu_save();
> > >>>   }
> > >>>
> > >>>   static inline void iommu_restore(void)  {
> > >>> if (ppc_md.iommu_restore)
> > >>> ppc_md.iommu_restore();
> > >>>   }
> > >>>   #endif
> > >>>
> > >>> +extern long iommu_put_tces(struct iommu_table *tbl, unsigned long
> > >> entry, uint64_t tce,
> > >>> +   enum dma_data_direction direction, unsigned long pages);
> > >>> +
> > >>>   #endif /* __KERNEL__ */
> > >>>   #endif /* _ASM_IOMMU_H */
> > >>> diff --git

Re: [Pv-drivers] [PATCH 192/493] scsi: remove use of __devinit

2012-11-26 Thread Dmitry Torokhov
On Mon, Nov 19, 2012 at 01:22:21PM -0500, Bill Pemberton wrote:
> CONFIG_HOTPLUG is going away as an option so __devinit is no longer
> needed.
> 

...

>  drivers/scsi/vmw_pvscsi.c |  6 +-

For vmw_pvscsi:

Acked-by: Dmitry Torokhov 

Thanks,
Dmitry


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/9] dma_debug: add debug_dma_mapping_error support to architectures that support DMA_DEBUG_API

2012-11-26 Thread Shuah Khan
On Mon, 2012-11-26 at 12:22 +0100, Joerg Roedel wrote:
> Hi Shuah,
> 
> On Fri, Nov 23, 2012 at 02:29:02PM -0700, Shuah Khan wrote:
> > x86 - done in the first patch that added the feature.
> > 
> > ARM64: dma_debug: add debug_dma_mapping_error support
> > c6x: dma_debug: add debug_dma_mapping_error support
> > ia64: dma_debug: add debug_dma_mapping_error support
> > microblaze: dma-mapping: support debug_dma_mapping_error
> > mips: dma_debug: add debug_dma_mapping_error support
> > powerpc: dma_debug: add debug_dma_mapping_error support
> > sh: dma_debug: add debug_dma_mapping_error support
> > sparc: dma_debug: add debug_dma_mapping_error support
> > tile: dma_debug: add debug_dma_mapping_error support
> 
> Have you compile-tested the invididual archs you are changing here?
> 

Joerg,

Yes I compile tested all of them (except microblaze) on Nov 20th
linux_next git. The patch for microblaze is already in linux_next when I
tried to apply the patch to Nov 20th linux-next and figured that is
already covered and skipped that one.

-- Shuah

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] vfio powerpc: enabled and supported on powernv platform

2012-11-26 Thread Alex Williamson
On Fri, 2012-11-23 at 13:02 +1100, Alexey Kardashevskiy wrote:
> On 22/11/12 22:56, Sethi Varun-B16395 wrote:
> >
> >
> >> -Original Message-
> >> From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-
> >> ow...@vger.kernel.org] On Behalf Of Alex Williamson
> >> Sent: Tuesday, November 20, 2012 11:50 PM
> >> To: Alexey Kardashevskiy
> >> Cc: Benjamin Herrenschmidt; Paul Mackerras; linuxppc-
> >> d...@lists.ozlabs.org; linux-ker...@vger.kernel.org; k...@vger.kernel.org;
> >> David Gibson
> >> Subject: Re: [PATCH] vfio powerpc: enabled and supported on powernv
> >> platform
> >>
> >> On Tue, 2012-11-20 at 11:48 +1100, Alexey Kardashevskiy wrote:
> >>> VFIO implements platform independent stuff such as a PCI driver, BAR
> >>> access (via read/write on a file descriptor or direct mapping when
> >>> possible) and IRQ signaling.
> >>> The platform dependent part includes IOMMU initialization and
> >>> handling.
> >>>
> >>> This patch initializes IOMMU groups based on the IOMMU configuration
> >>> discovered during the PCI scan, only POWERNV platform is supported at
> >>> the moment.
> >>>
> >>> Also the patch implements an VFIO-IOMMU driver which manages DMA
> >>> mapping/unmapping requests coming from the client (now QEMU). It also
> >>> returns a DMA window information to let the guest initialize the
> >>> device tree for a guest OS properly. Although this driver has been
> >>> tested only on POWERNV, it should work on any platform supporting TCE
> >>> tables.
> >>>
> >>> To enable VFIO on POWER, enable SPAPR_TCE_IOMMU config option.
> >>>
> >>> Cc: David Gibson 
> >>> Signed-off-by: Alexey Kardashevskiy 
> >>> ---
> >>>   arch/powerpc/include/asm/iommu.h |6 +
> >>>   arch/powerpc/kernel/iommu.c  |  140 +++
> >>>   arch/powerpc/platforms/powernv/pci.c |  135 +++
> >>>   drivers/iommu/Kconfig|8 ++
> >>>   drivers/vfio/Kconfig |6 +
> >>>   drivers/vfio/Makefile|1 +
> >>>   drivers/vfio/vfio_iommu_spapr_tce.c  |  247
> >> ++
> >>>   include/linux/vfio.h |   20 +++
> >>>   8 files changed, 563 insertions(+)
> >>>   create mode 100644 drivers/vfio/vfio_iommu_spapr_tce.c
> >>>
> >>> diff --git a/arch/powerpc/include/asm/iommu.h
> >>> b/arch/powerpc/include/asm/iommu.h
> >>> index cbfe678..5ba66cb 100644
> >>> --- a/arch/powerpc/include/asm/iommu.h
> >>> +++ b/arch/powerpc/include/asm/iommu.h
> >>> @@ -64,30 +64,33 @@ struct iommu_pool {  }
> >>> cacheline_aligned_in_smp;
> >>>
> >>>   struct iommu_table {
> >>>   unsigned long  it_busno; /* Bus number this table belongs 
> >>> to */
> >>>   unsigned long  it_size;  /* Size of iommu table in entries 
> >>> */
> >>>   unsigned long  it_offset;/* Offset into global table */
> >>>   unsigned long  it_base;  /* mapped address of tce table */
> >>>   unsigned long  it_index; /* which iommu table this is */
> >>>   unsigned long  it_type;  /* type: PCI or Virtual Bus */
> >>>   unsigned long  it_blocksize; /* Entries in each block 
> >>> (cacheline)
> >> */
> >>>   unsigned long  poolsize;
> >>>   unsigned long  nr_pools;
> >>>   struct iommu_pool large_pool;
> >>>   struct iommu_pool pools[IOMMU_NR_POOLS];
> >>>   unsigned long *it_map;   /* A simple allocation bitmap for 
> >>> now
> >> */
> >>> +#ifdef CONFIG_IOMMU_API
> >>> + struct iommu_group *it_group;
> >>> +#endif
> >>>   };
> >>>
> >>>   struct scatterlist;
> >>>
> >>>   static inline void set_iommu_table_base(struct device *dev, void
> >>> *base)  {
> >>>   dev->archdata.dma_data.iommu_table_base = base;  }
> >>>
> >>>   static inline void *get_iommu_table_base(struct device *dev)  {
> >>>   return dev->archdata.dma_data.iommu_table_base;
> >>>   }
> >>>
> >>>   /* Frees table for an individual device node */ @@ -135,17 +138,20 @@
> >>> static inline void pci_iommu_init(void) { }  extern void
> >>> alloc_dart_table(void);  #if defined(CONFIG_PPC64) &&
> >>> defined(CONFIG_PM)  static inline void iommu_save(void)  {
> >>>   if (ppc_md.iommu_save)
> >>>   ppc_md.iommu_save();
> >>>   }
> >>>
> >>>   static inline void iommu_restore(void)  {
> >>>   if (ppc_md.iommu_restore)
> >>>   ppc_md.iommu_restore();
> >>>   }
> >>>   #endif
> >>>
> >>> +extern long iommu_put_tces(struct iommu_table *tbl, unsigned long
> >> entry, uint64_t tce,
> >>> + enum dma_data_direction direction, unsigned long pages);
> >>> +
> >>>   #endif /* __KERNEL__ */
> >>>   #endif /* _ASM_IOMMU_H */
> >>> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> >>> index ff5a6ce..94f614b 100644
> >>> --- a/arch/powerpc/kernel/iommu.c
> >>> +++ b/arch/powerpc/kernel/iommu.c
> >>> @@ -32,30 +32,31 @@
> >>>   #include 
> >>>   #include 
> >>>   #include 
> >>>   #in

Re: [PATCH] vfio powerpc: enabled and supported on powernv platform

2012-11-26 Thread Alex Williamson
On Thu, 2012-11-22 at 11:56 +, Sethi Varun-B16395 wrote:
> 
> > -Original Message-
> > From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-
> > ow...@vger.kernel.org] On Behalf Of Alex Williamson
> > Sent: Tuesday, November 20, 2012 11:50 PM
> > To: Alexey Kardashevskiy
> > Cc: Benjamin Herrenschmidt; Paul Mackerras; linuxppc-
> > d...@lists.ozlabs.org; linux-ker...@vger.kernel.org; k...@vger.kernel.org;
> > David Gibson
> > Subject: Re: [PATCH] vfio powerpc: enabled and supported on powernv
> > platform
> > 
> > On Tue, 2012-11-20 at 11:48 +1100, Alexey Kardashevskiy wrote:
> > > VFIO implements platform independent stuff such as a PCI driver, BAR
> > > access (via read/write on a file descriptor or direct mapping when
> > > possible) and IRQ signaling.
> > > The platform dependent part includes IOMMU initialization and
> > > handling.
> > >
> > > This patch initializes IOMMU groups based on the IOMMU configuration
> > > discovered during the PCI scan, only POWERNV platform is supported at
> > > the moment.
> > >
> > > Also the patch implements an VFIO-IOMMU driver which manages DMA
> > > mapping/unmapping requests coming from the client (now QEMU). It also
> > > returns a DMA window information to let the guest initialize the
> > > device tree for a guest OS properly. Although this driver has been
> > > tested only on POWERNV, it should work on any platform supporting TCE
> > > tables.
> > >
> > > To enable VFIO on POWER, enable SPAPR_TCE_IOMMU config option.
> > >
> > > Cc: David Gibson 
> > > Signed-off-by: Alexey Kardashevskiy 
> > > ---
> > >  arch/powerpc/include/asm/iommu.h |6 +
> > >  arch/powerpc/kernel/iommu.c  |  140 +++
> > >  arch/powerpc/platforms/powernv/pci.c |  135 +++
> > >  drivers/iommu/Kconfig|8 ++
> > >  drivers/vfio/Kconfig |6 +
> > >  drivers/vfio/Makefile|1 +
> > >  drivers/vfio/vfio_iommu_spapr_tce.c  |  247
> > ++
> > >  include/linux/vfio.h |   20 +++
> > >  8 files changed, 563 insertions(+)
> > >  create mode 100644 drivers/vfio/vfio_iommu_spapr_tce.c
> > >
> > > diff --git a/arch/powerpc/include/asm/iommu.h
> > > b/arch/powerpc/include/asm/iommu.h
> > > index cbfe678..5ba66cb 100644
> > > --- a/arch/powerpc/include/asm/iommu.h
> > > +++ b/arch/powerpc/include/asm/iommu.h
> > > @@ -64,30 +64,33 @@ struct iommu_pool {  }
> > > cacheline_aligned_in_smp;
> > >
> > >  struct iommu_table {
> > >   unsigned long  it_busno; /* Bus number this table belongs to */
> > >   unsigned long  it_size;  /* Size of iommu table in entries */
> > >   unsigned long  it_offset;/* Offset into global table */
> > >   unsigned long  it_base;  /* mapped address of tce table */
> > >   unsigned long  it_index; /* which iommu table this is */
> > >   unsigned long  it_type;  /* type: PCI or Virtual Bus */
> > >   unsigned long  it_blocksize; /* Entries in each block (cacheline)
> > */
> > >   unsigned long  poolsize;
> > >   unsigned long  nr_pools;
> > >   struct iommu_pool large_pool;
> > >   struct iommu_pool pools[IOMMU_NR_POOLS];
> > >   unsigned long *it_map;   /* A simple allocation bitmap for now
> > */
> > > +#ifdef CONFIG_IOMMU_API
> > > + struct iommu_group *it_group;
> > > +#endif
> > >  };
> > >
> > >  struct scatterlist;
> > >
> > >  static inline void set_iommu_table_base(struct device *dev, void
> > > *base)  {
> > >   dev->archdata.dma_data.iommu_table_base = base;  }
> > >
> > >  static inline void *get_iommu_table_base(struct device *dev)  {
> > >   return dev->archdata.dma_data.iommu_table_base;
> > >  }
> > >
> > >  /* Frees table for an individual device node */ @@ -135,17 +138,20 @@
> > > static inline void pci_iommu_init(void) { }  extern void
> > > alloc_dart_table(void);  #if defined(CONFIG_PPC64) &&
> > > defined(CONFIG_PM)  static inline void iommu_save(void)  {
> > >   if (ppc_md.iommu_save)
> > >   ppc_md.iommu_save();
> > >  }
> > >
> > >  static inline void iommu_restore(void)  {
> > >   if (ppc_md.iommu_restore)
> > >   ppc_md.iommu_restore();
> > >  }
> > >  #endif
> > >
> > > +extern long iommu_put_tces(struct iommu_table *tbl, unsigned long
> > entry, uint64_t tce,
> > > + enum dma_data_direction direction, unsigned long pages);
> > > +
> > >  #endif /* __KERNEL__ */
> > >  #endif /* _ASM_IOMMU_H */
> > > diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> > > index ff5a6ce..94f614b 100644
> > > --- a/arch/powerpc/kernel/iommu.c
> > > +++ b/arch/powerpc/kernel/iommu.c
> > > @@ -32,30 +32,31 @@
> > >  #include 
> > >  #include 
> > >  #include 
> > >  #include 
> > >  #include 
> > >  #include 
> > >  #include 
> > >  #include 
> > >  #include 
> > >  #include 
> > >  #include 
> > >  #include 
> > >  #include 
> > >  #include 
> > >  #include 
> > > +#include 
> > >
> > >  #define DBG(...)
> > >
> > >  static int nov

Re: [PATCH v3 11/12] memory-hotplug: remove sysfs file of node

2012-11-26 Thread Jianguo Wu
On 2012/11/1 17:44, Wen Congyang wrote:
> This patch introduces a new function try_offline_node() to
> remove sysfs file of node when all memory sections of this
> node are removed. If some memory sections of this node are
> not removed, this function does nothing.
> 
> CC: David Rientjes 
> CC: Jiang Liu 
> CC: Len Brown 
> CC: Christoph Lameter 
> Cc: Minchan Kim 
> CC: Andrew Morton 
> CC: KOSAKI Motohiro 
> CC: Yasuaki Ishimatsu 
> Signed-off-by: Wen Congyang 
> ---
>  drivers/acpi/acpi_memhotplug.c |  8 +-
>  include/linux/memory_hotplug.h |  2 +-
>  mm/memory_hotplug.c| 58 
> --
>  3 files changed, 64 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
> index 24c807f..0780f99 100644
> --- a/drivers/acpi/acpi_memhotplug.c
> +++ b/drivers/acpi/acpi_memhotplug.c
> @@ -310,7 +310,9 @@ static int acpi_memory_disable_device(struct 
> acpi_memory_device *mem_device)
>  {
>   int result;
>   struct acpi_memory_info *info, *n;
> + int node;
>  
> + node = acpi_get_node(mem_device->device->handle);
>  
>   /*
>* Ask the VM to offline this memory range.
> @@ -318,7 +320,11 @@ static int acpi_memory_disable_device(struct 
> acpi_memory_device *mem_device)
>*/
>   list_for_each_entry_safe(info, n, &mem_device->res_list, list) {
>   if (info->enabled) {
> - result = remove_memory(info->start_addr, info->length);
> + if (node < 0)
> + node = memory_add_physaddr_to_nid(
> + info->start_addr);
> + result = remove_memory(node, info->start_addr,
> + info->length);
>   if (result)
>   return result;
>   }
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index d4c4402..7b4cfe6 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -231,7 +231,7 @@ extern int arch_add_memory(int nid, u64 start, u64 size);
>  extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
>  extern int offline_memory_block(struct memory_block *mem);
>  extern bool is_memblock_offlined(struct memory_block *mem);
> -extern int remove_memory(u64 start, u64 size);
> +extern int remove_memory(int node, u64 start, u64 size);
>  extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn,
>   int nr_pages);
>  extern void sparse_remove_one_section(struct zone *zone, struct mem_section 
> *ms);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 7bcced0..d965da3 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -29,6 +29,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  
> @@ -1299,7 +1300,58 @@ static int is_memblock_offlined_cb(struct memory_block 
> *mem, void *arg)
>   return ret;
>  }
>  
> -int __ref remove_memory(u64 start, u64 size)
> +static int check_cpu_on_node(void *data)
> +{
> + struct pglist_data *pgdat = data;
> + int cpu;
> +
> + for_each_present_cpu(cpu) {
> + if (cpu_to_node(cpu) == pgdat->node_id)
> + /*
> +  * the cpu on this node isn't removed, and we can't
> +  * offline this node.
> +  */
> + return -EBUSY;
> + }
> +
> + return 0;
> +}
> +
> +/* offline the node if all memory sections of this node are removed */
> +static void try_offline_node(int nid)
> +{
> + unsigned long start_pfn = NODE_DATA(nid)->node_start_pfn;
> + unsigned long end_pfn = start_pfn + NODE_DATA(nid)->node_spanned_pages;
> + unsigned long pfn;
> +
> + for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
> + unsigned long section_nr = pfn_to_section_nr(pfn);
> +
> + if (!present_section_nr(section_nr))
> + continue;
> +
> + if (pfn_to_nid(pfn) != nid)
> + continue;
> +
> + /*
> +  * some memory sections of this node are not removed, and we
> +  * can't offline node now.
> +  */
> + return;
> + }
> +
> + if (stop_machine(check_cpu_on_node, NODE_DATA(nid), NULL))
> + return;

how about:
if (nr_cpus_node(nid))
return;
> +
> + /*
> +  * all memory/cpu of this node are removed, we can offline this
> +  * node now.
> +  */
> + node_set_offline(nid);
> + unregister_one_node(nid);
> +}
> +
> +int __ref remove_memory(int nid, u64 start, u64 size)
>  {
>   unsigned long start_pfn, end_pfn;
>   int ret = 0;
> @@ -1346,6 +1398,8 @@ repeat:
>  
>   arch_remove_memory(start, size);
>  
> + try_offline_node(n

Re: [PATCH 065/493] i2c: remove use of __devexit_p

2012-11-26 Thread Bill Pemberton
Wolfram Sang writes:
> 
> 
> --osDK9TLjxFScVI/L
> Content-Type: text/plain; charset=us-ascii
> Content-Disposition: inline
> Content-Transfer-Encoding: quoted-printable
> 
> On Tue, Nov 20, 2012 at 02:46:21PM +0100, Jean Delvare wrote:
> > On Mon, 19 Nov 2012 13:20:14 -0500, Bill Pemberton wrote:
> > > CONFIG_HOTPLUG is going away as an option so __devexit_p is no longer
> > > needed.
> >=20
> > As mentioned on the lm-sensors list for hwmon patches already, I think
> > it would be much clearer to not split __devexit, __devexit_p, __devinit
> > etc. removal into separate patches. One patch per subsystem would be
> > easier to review and apply. If patches grow too large then you'd rather
> > split in a different direction, for example drivers/i2c/muxes vs.
> > drivers/i2c/busses or even grouped by related bus drivers (see entries
> > "I2C OVER PARALLEL PORT" and "I2C/SMBUS CONTROLLER DRIVERS FOR PC" in
> > MAINTAINERS for examples of meaningful groups.)
> 
> I agree with Jean here. Is there a V2 planned? With a change like this?
> 

Yes, my plan is to redo the patches for the i2c subsystem.

-- 
Bill
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/9] dma_debug: add debug_dma_mapping_error support to architectures that support DMA_DEBUG_API

2012-11-26 Thread Joerg Roedel
Hi Shuah,

On Fri, Nov 23, 2012 at 02:29:02PM -0700, Shuah Khan wrote:
> x86 - done in the first patch that added the feature.
> 
> ARM64: dma_debug: add debug_dma_mapping_error support
> c6x: dma_debug: add debug_dma_mapping_error support
> ia64: dma_debug: add debug_dma_mapping_error support
> microblaze: dma-mapping: support debug_dma_mapping_error
> mips: dma_debug: add debug_dma_mapping_error support
> powerpc: dma_debug: add debug_dma_mapping_error support
> sh: dma_debug: add debug_dma_mapping_error support
> sparc: dma_debug: add debug_dma_mapping_error support
> tile: dma_debug: add debug_dma_mapping_error support

Have you compile-tested the invididual archs you are changing here?


Joerg


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/9] dma_debug: add debug_dma_mapping_error support to architectures that support DMA_DEBUG_API

2012-11-26 Thread Joerg Roedel
On Mon, Nov 26, 2012 at 11:57:19AM +0100, Marek Szyprowski wrote:
> I've took all the patches to the next-dma-debug branch in my tree, I sorry
> that You have to wait so long for it. My branch is based on Joerg's
> dma-debug branch and I've included it for testing in linux-next branch.
> 
> Joerg: would You mind if I handle pushing the whole branch to v3.8
> via my kernel tree? Those changes should be kept close together to
> avoid build breaks for bisecting.

I'll apply the patches to my tree soon enough. But before that I'll wait
a little bit longer to give the arch maintainers the chance to add the
missing Acked-bys.


Joerg


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/9] dma_debug: add debug_dma_mapping_error support to architectures that support DMA_DEBUG_API

2012-11-26 Thread Marek Szyprowski

Hello,

On 11/23/2012 10:29 PM, Shuah Khan wrote:

An earlier patch added dma mapping error debug feature to dma_debug
infrastructure. References:

https://lkml.org/lkml/2012/10/8/296
https://lkml.org/lkml/2012/11/3/219

The following series of patches adds the call to debug_dma_mapping_error() to
architecture specific dma_mapping_error() interfaces on the following
architectures that support CONFIG_DMA_API_DEBUG.


I've took all the patches to the next-dma-debug branch in my tree, I sorry
that You have to wait so long for it. My branch is based on Joerg's
dma-debug branch and I've included it for testing in linux-next branch.

Joerg: would You mind if I handle pushing the whole branch to v3.8
via my kernel tree? Those changes should be kept close together to
avoid build breaks for bisecting.

Best regards
--
Marek Szyprowski
Samsung Poland R&D Center


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev