Re: [PATCH v12 13/14] mm/vmalloc: Hugepage vmalloc mappings

2021-02-19 Thread Ding Tianhong
On 2021/2/19 15:45, Nicholas Piggin wrote:
> Excerpts from Ding Tianhong's message of February 19, 2021 1:45 pm:
>> Hi Nicholas:
>>
>> I met some problem for this patch, like this:
>>
>> kva = vmalloc(3*1024k);
>>
>> remap_vmalloc_range(xxx, kva, xxx)
>>
>> It failed because that the check for page_count(page) is null so return, it 
>> break the some logic for current modules.
>> because the new huge page is not valid for composed page.
> 
> Hey Ding, that's a good catch. How are you testing this stuff, do you 
> have a particular driver that does this?
> 

yes, The driver would get a memory from the vmalloc in kernel space, and then 
the physical same memory will mmap to the user space. The drivers could not 
work when applying this patch.

>> I think some guys really don't get used to the changes for the vmalloc that 
>> the small pages was transparency to the hugepage
>> when the size is bigger than the PMD_SIZE.
> 
> I think in this case vmalloc could allocate the large page as a compound
> page which would solve this problem I think? (without having actually 
> tested it)
> 

yes, i think the __GFP_COMP flag could fix this.

>> can we think about give a new static huge page to fix it? just like use a a 
>> new vmalloc_huge_xxx function to disginguish the current function,
>> the user could choose to use the transparent hugepage or static hugepage for 
>> vmalloc.
> 
> Yeah that's a good question, there are a few things in the huge vmalloc 
> code that accounts things as small pages and you can't assume large or 
> small. If there is benefit from forcing large pages that could certainly
> be added.
> 

The vmalloc transparent is good, but not fit every user scenes, some guys like 
to use the deterministic function
for performance critical area.

Thanks
Ding

> Interestingly, remap_vmalloc_range in theory could map the pages as 
> large in userspace as well. That takes more work but if something
> really needs that for performance, it could be done.
> 
> Thanks,
> Nick
> .
> 



Re: [PATCH v12 13/14] mm/vmalloc: Hugepage vmalloc mappings

2021-02-18 Thread Ding Tianhong
Hi Nicholas:

I met some problem for this patch, like this:

kva = vmalloc(3*1024k);

remap_vmalloc_range(xxx, kva, xxx)

It failed because that the check for page_count(page) is null so return, it 
break the some logic for current modules.
because the new huge page is not valid for composed page.

I think some guys really don't get used to the changes for the vmalloc that the 
small pages was transparency to the hugepage
when the size is bigger than the PMD_SIZE.

can we think about give a new static huge page to fix it? just like use a a new 
vmalloc_huge_xxx function to disginguish the current function,
the user could choose to use the transparent hugepage or static hugepage for 
vmalloc.

Thanks
Ding


On 2021/2/2 19:05, Nicholas Piggin wrote:
> Support huge page vmalloc mappings. Config option HAVE_ARCH_HUGE_VMALLOC
> enables support on architectures that define HAVE_ARCH_HUGE_VMAP and
> supports PMD sized vmap mappings.
> 
> vmalloc will attempt to allocate PMD-sized pages if allocating PMD size
> or larger, and fall back to small pages if that was unsuccessful.
> 
> Architectures must ensure that any arch specific vmalloc allocations
> that require PAGE_SIZE mappings (e.g., module allocations vs strict
> module rwx) use the VM_NOHUGE flag to inhibit larger mappings.
> 
> This can result in more internal fragmentation and memory overhead for a
> given allocation, an option nohugevmalloc is added to disable at boot.
> 
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/Kconfig|  11 ++
>  include/linux/vmalloc.h |  21 
>  mm/page_alloc.c |   5 +-
>  mm/vmalloc.c| 215 +++-
>  4 files changed, 205 insertions(+), 47 deletions(-)
> 
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 24862d15f3a3..eef170e0c9b8 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -724,6 +724,17 @@ config HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>  config HAVE_ARCH_HUGE_VMAP
>   bool
>  
> +#
> +#  Archs that select this would be capable of PMD-sized vmaps (i.e.,
> +#  arch_vmap_pmd_supported() returns true), and they must make no assumptions
> +#  that vmalloc memory is mapped with PAGE_SIZE ptes. The VM_NO_HUGE_VMAP 
> flag
> +#  can be used to prohibit arch-specific allocations from using hugepages to
> +#  help with this (e.g., modules may require it).
> +#
> +config HAVE_ARCH_HUGE_VMALLOC
> + depends on HAVE_ARCH_HUGE_VMAP
> + bool
> +
>  config ARCH_WANT_HUGE_PMD_SHARE
>   bool
>  
> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> index 99ea72d547dc..93270adf5db5 100644
> --- a/include/linux/vmalloc.h
> +++ b/include/linux/vmalloc.h
> @@ -25,6 +25,7 @@ struct notifier_block;  /* in notifier.h */
>  #define VM_NO_GUARD  0x0040  /* don't add guard page */
>  #define VM_KASAN 0x0080  /* has allocated kasan shadow 
> memory */
>  #define VM_MAP_PUT_PAGES 0x0100  /* put pages and free array in 
> vfree */
> +#define VM_NO_HUGE_VMAP  0x0200  /* force PAGE_SIZE pte 
> mapping */
>  
>  /*
>   * VM_KASAN is used slighly differently depending on CONFIG_KASAN_VMALLOC.
> @@ -59,6 +60,9 @@ struct vm_struct {
>   unsigned long   size;
>   unsigned long   flags;
>   struct page **pages;
> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMALLOC
> + unsigned intpage_order;
> +#endif
>   unsigned intnr_pages;
>   phys_addr_t phys_addr;
>   const void  *caller;
> @@ -193,6 +197,22 @@ void free_vm_area(struct vm_struct *area);
>  extern struct vm_struct *remove_vm_area(const void *addr);
>  extern struct vm_struct *find_vm_area(const void *addr);
>  
> +static inline bool is_vm_area_hugepages(const void *addr)
> +{
> + /*
> +  * This may not 100% tell if the area is mapped with > PAGE_SIZE
> +  * page table entries, if for some reason the architecture indicates
> +  * larger sizes are available but decides not to use them, nothing
> +  * prevents that. This only indicates the size of the physical page
> +  * allocated in the vmalloc layer.
> +  */
> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMALLOC
> + return find_vm_area(addr)->page_order > 0;
> +#else
> + return false;
> +#endif
> +}
> +
>  #ifdef CONFIG_MMU
>  int vmap_range(unsigned long addr, unsigned long end,
>   phys_addr_t phys_addr, pgprot_t prot,
> @@ -210,6 +230,7 @@ static inline void set_vm_flush_reset_perms(void *addr)
>   if (vm)
>   vm->flags |= VM_FLUSH_RESET_PERMS;
>  }
> +
>  #else
>  static inline int
>  map_kernel_range_noflush(unsigned long start, unsigned long size,
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 519a60d5b6f7..1116ce45744b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -72,6 +72,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -8240,6 +8241,7 @@ void *__init 

Re: [PATCH v12 01/14] ARM: mm: add missing pud_page define to 2-level page tables

2021-02-02 Thread Ding Tianhong
On 2021/2/2 19:13, Russell King - ARM Linux admin wrote:
> On Tue, Feb 02, 2021 at 09:05:02PM +1000, Nicholas Piggin wrote:
>> diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
>> index c02f24400369..d63a5bb6bd0c 100644
>> --- a/arch/arm/include/asm/pgtable.h
>> +++ b/arch/arm/include/asm/pgtable.h
>> @@ -166,6 +166,9 @@ extern struct page *empty_zero_page;
>>  
>>  extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
>>  
>> +#define pud_page(pud)   pmd_page(__pmd(pud_val(pud)))
>> +#define pud_write(pud)  pmd_write(__pmd(pud_val(pud)))
> 
> As there is no PUD, does it really make sense to return a valid
> struct page (which will be the PTE page) for pud_page(), which is
> several tables above?
> 
--- a/arch/arm/include/asm/pgtable-2level.h
+++ b/arch/arm/include/asm/pgtable-2level.h

+static inline int pud_none(pud_t pud)
+{
+  return 0;
+}

I think it could be fix like this.

Ding


Re: [PATCH v12 01/14] ARM: mm: add missing pud_page define to 2-level page tables

2021-02-02 Thread Ding Tianhong
On 2021/2/2 19:47, Ding Tianhong wrote:
> On 2021/2/2 19:13, Russell King - ARM Linux admin wrote:
>> On Tue, Feb 02, 2021 at 09:05:02PM +1000, Nicholas Piggin wrote:
>>> diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
>>> index c02f24400369..d63a5bb6bd0c 100644
>>> --- a/arch/arm/include/asm/pgtable.h
>>> +++ b/arch/arm/include/asm/pgtable.h
>>> @@ -166,6 +166,9 @@ extern struct page *empty_zero_page;
>>>  
>>>  extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
>>>  
>>> +#define pud_page(pud)  pmd_page(__pmd(pud_val(pud)))
>>> +#define pud_write(pud) pmd_write(__pmd(pud_val(pud)))
>>
>> As there is no PUD, does it really make sense to return a valid
>> struct page (which will be the PTE page) for pud_page(), which is
>> several tables above?
>>
> --- a/arch/arm/include/asm/pgtable-2level.h
> +++ b/arch/arm/include/asm/pgtable-2level.h
> 
> +static inline int pud_none(pud_t pud)
> +{
> +  return 0;
> +}
> 
 --- a/arch/arm/include/asm/pgtable-2level.h
 +++ b/arch/arm/include/asm/pgtable-2level.h>
 +static inline int pud_page(pud_t pud)
 +{
 +  return 0;
 +}

> I think it could be fix like this.
> 
> Ding
> 



Re: [PATCH v11 01/13] mm/vmalloc: fix HUGE_VMAP regression by enabling huge pages in vmalloc_to_page

2021-01-27 Thread Ding Tianhong
On 2021/1/26 12:44, Nicholas Piggin wrote:
> vmalloc_to_page returns NULL for addresses mapped by larger pages[*].
> Whether or not a vmap is huge depends on the architecture details,
> alignments, boot options, etc., which the caller can not be expected
> to know. Therefore HUGE_VMAP is a regression for vmalloc_to_page.
> 
> This change teaches vmalloc_to_page about larger pages, and returns
> the struct page that corresponds to the offset within the large page.
> This makes the API agnostic to mapping implementation details.
> 
> [*] As explained by commit 029c54b095995 ("mm/vmalloc.c: huge-vmap:
> fail gracefully on unexpected huge vmap mappings")
> 
> Reviewed-by: Christoph Hellwig 
> Signed-off-by: Nicholas Piggin 
> ---
>  mm/vmalloc.c | 41 ++---
>  1 file changed, 26 insertions(+), 15 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index e6f352bf0498..62372f9e0167 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -34,7 +34,7 @@
>  #include 
>  #include 
>  #include 
> -
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -343,7 +343,9 @@ int is_vmalloc_or_module_addr(const void *x)
>  }
>  
>  /*
> - * Walk a vmap address to the struct page it maps.
> + * Walk a vmap address to the struct page it maps. Huge vmap mappings will
> + * return the tail page that corresponds to the base page address, which
> + * matches small vmap mappings.
>   */
>  struct page *vmalloc_to_page(const void *vmalloc_addr)
>  {
> @@ -363,25 +365,33 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
>  
>   if (pgd_none(*pgd))
>   return NULL;
> + if (WARN_ON_ONCE(pgd_leaf(*pgd)))
> + return NULL; /* XXX: no allowance for huge pgd */
> + if (WARN_ON_ONCE(pgd_bad(*pgd)))
> + return NULL;
> +
>   p4d = p4d_offset(pgd, addr);
>   if (p4d_none(*p4d))
>   return NULL;
> - pud = pud_offset(p4d, addr);
> + if (p4d_leaf(*p4d))
> + return p4d_page(*p4d) + ((addr & ~P4D_MASK) >> PAGE_SHIFT);
> + if (WARN_ON_ONCE(p4d_bad(*p4d)))
> + return NULL;
>  
> - /*
> -  * Don't dereference bad PUD or PMD (below) entries. This will also
> -  * identify huge mappings, which we may encounter on architectures
> -  * that define CONFIG_HAVE_ARCH_HUGE_VMAP=y. Such regions will be
> -  * identified as vmalloc addresses by is_vmalloc_addr(), but are
> -  * not [unambiguously] associated with a struct page, so there is
> -  * no correct value to return for them.
> -  */
> - WARN_ON_ONCE(pud_bad(*pud));
> - if (pud_none(*pud) || pud_bad(*pud))
> + pud = pud_offset(p4d, addr);
> + if (pud_none(*pud))
> + return NULL;
> + if (pud_leaf(*pud))
> + return pud_page(*pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);

Hi Nicho:

/builds/1mzfdQzleCy69KZFb5qHNSEgabZ/mm/vmalloc.c: In function 'vmalloc_to_page':
/builds/1mzfdQzleCy69KZFb5qHNSEgabZ/include/asm-generic/pgtable-nop4d-hack.h:48:27:
 error: implicit declaration of function 'pud_page'; did you mean 'put_page'? 
[-Werror=implicit-function-declaration]
   48 | #define pgd_page(pgd)(pud_page((pud_t){ pgd }))
  |   ^~~~

the pug_page is not defined for aarch32 when enabling 2-level page config, it 
break the system building.


> + if (WARN_ON_ONCE(pud_bad(*pud)))
>   return NULL;
> +
>   pmd = pmd_offset(pud, addr);
> - WARN_ON_ONCE(pmd_bad(*pmd));
> - if (pmd_none(*pmd) || pmd_bad(*pmd))
> + if (pmd_none(*pmd))
> + return NULL;
> + if (pmd_leaf(*pmd))
> + return pmd_page(*pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
> + if (WARN_ON_ONCE(pmd_bad(*pmd)))
>   return NULL;
>  
>   ptep = pte_offset_map(pmd, addr);
> @@ -389,6 +399,7 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
>   if (pte_present(pte))
>   page = pte_page(pte);
>   pte_unmap(ptep);
> +
>   return page;
>  }
>  EXPORT_SYMBOL(vmalloc_to_page);
> 



Re: [PATCH v11 05/13] mm: HUGE_VMAP arch support cleanup

2021-01-26 Thread Ding Tianhong
Reviewed-by: Ding Tianhong 

On 2021/1/26 12:45, Nicholas Piggin wrote:
> This changes the awkward approach where architectures provide init
> functions to determine which levels they can provide large mappings for,
> to one where the arch is queried for each call.
> 
> This removes code and indirection, and allows constant-folding of dead
> code for unsupported levels.
> 
> This also adds a prot argument to the arch query. This is unused
> currently but could help with some architectures (e.g., some powerpc
> processors can't map uncacheable memory with large pages).
> 
> Cc: linuxppc-...@lists.ozlabs.org
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Borislav Petkov 
> Cc: x...@kernel.org
> Cc: "H. Peter Anvin" 
> Acked-by: Catalin Marinas  [arm64]
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/arm64/include/asm/vmalloc.h |  8 ++
>  arch/arm64/mm/mmu.c  | 10 +--
>  arch/powerpc/include/asm/vmalloc.h   |  8 ++
>  arch/powerpc/mm/book3s64/radix_pgtable.c |  8 +-
>  arch/x86/include/asm/vmalloc.h   |  7 ++
>  arch/x86/mm/ioremap.c| 12 +--
>  include/linux/io.h   |  9 ---
>  include/linux/vmalloc.h  |  6 ++
>  init/main.c  |  1 -
>  mm/ioremap.c | 94 ++--
>  10 files changed, 85 insertions(+), 78 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/vmalloc.h 
> b/arch/arm64/include/asm/vmalloc.h
> index 2ca708ab9b20..597b40405319 100644
> --- a/arch/arm64/include/asm/vmalloc.h
> +++ b/arch/arm64/include/asm/vmalloc.h
> @@ -1,4 +1,12 @@
>  #ifndef _ASM_ARM64_VMALLOC_H
>  #define _ASM_ARM64_VMALLOC_H
>  
> +#include 
> +
> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
> +bool arch_vmap_p4d_supported(pgprot_t prot);
> +bool arch_vmap_pud_supported(pgprot_t prot);
> +bool arch_vmap_pmd_supported(pgprot_t prot);
> +#endif
> +
>  #endif /* _ASM_ARM64_VMALLOC_H */
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index ae0c3d023824..1613d290cbd1 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -1313,12 +1313,12 @@ void *__init fixmap_remap_fdt(phys_addr_t dt_phys, 
> int *size, pgprot_t prot)
>   return dt_virt;
>  }
>  
> -int __init arch_ioremap_p4d_supported(void)
> +bool arch_vmap_p4d_supported(pgprot_t prot)
>  {
> - return 0;
> + return false;
>  }
>  
> -int __init arch_ioremap_pud_supported(void)
> +bool arch_vmap_pud_supported(pgprot_t prot)
>  {
>   /*
>* Only 4k granule supports level 1 block mappings.
> @@ -1328,9 +1328,9 @@ int __init arch_ioremap_pud_supported(void)
>  !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS);
>  }
>  
> -int __init arch_ioremap_pmd_supported(void)
> +bool arch_vmap_pmd_supported(pgprot_t prot)
>  {
> - /* See arch_ioremap_pud_supported() */
> + /* See arch_vmap_pud_supported() */
>   return !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS);
>  }
>  
> diff --git a/arch/powerpc/include/asm/vmalloc.h 
> b/arch/powerpc/include/asm/vmalloc.h
> index b992dfaaa161..105abb73f075 100644
> --- a/arch/powerpc/include/asm/vmalloc.h
> +++ b/arch/powerpc/include/asm/vmalloc.h
> @@ -1,4 +1,12 @@
>  #ifndef _ASM_POWERPC_VMALLOC_H
>  #define _ASM_POWERPC_VMALLOC_H
>  
> +#include 
> +
> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
> +bool arch_vmap_p4d_supported(pgprot_t prot);
> +bool arch_vmap_pud_supported(pgprot_t prot);
> +bool arch_vmap_pmd_supported(pgprot_t prot);
> +#endif
> +
>  #endif /* _ASM_POWERPC_VMALLOC_H */
> diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
> b/arch/powerpc/mm/book3s64/radix_pgtable.c
> index 98f0b243c1ab..743807fc210f 100644
> --- a/arch/powerpc/mm/book3s64/radix_pgtable.c
> +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
> @@ -1082,13 +1082,13 @@ void radix__ptep_modify_prot_commit(struct 
> vm_area_struct *vma,
>   set_pte_at(mm, addr, ptep, pte);
>  }
>  
> -int __init arch_ioremap_pud_supported(void)
> +bool arch_vmap_pud_supported(pgprot_t prot)
>  {
>   /* HPT does not cope with large pages in the vmalloc area */
>   return radix_enabled();
>  }
>  
> -int __init arch_ioremap_pmd_supported(void)
> +bool arch_vmap_pmd_supported(pgprot_t prot)
>  {
>   return radix_enabled();
>  }
> @@ -1182,7 +1182,7 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
>   return 1;
>  }
>  
> -int __init arch_ioremap_p4d_supported(void)
> +bool arch_vmap_p4d_supported(pgprot_t prot)
>  {
> - return 0;
> + return false;
>  }
> diff 

Re: [PATCH v11 12/13] mm/vmalloc: Hugepage vmalloc mappings

2021-01-26 Thread Ding Tianhong
On 2021/1/26 17:47, Nicholas Piggin wrote:
> Excerpts from Ding Tianhong's message of January 26, 2021 4:59 pm:
>> On 2021/1/26 12:45, Nicholas Piggin wrote:
>>> Support huge page vmalloc mappings. Config option HAVE_ARCH_HUGE_VMALLOC
>>> enables support on architectures that define HAVE_ARCH_HUGE_VMAP and
>>> supports PMD sized vmap mappings.
>>>
>>> vmalloc will attempt to allocate PMD-sized pages if allocating PMD size
>>> or larger, and fall back to small pages if that was unsuccessful.
>>>
>>> Architectures must ensure that any arch specific vmalloc allocations
>>> that require PAGE_SIZE mappings (e.g., module allocations vs strict
>>> module rwx) use the VM_NOHUGE flag to inhibit larger mappings.
>>>
>>> When hugepage vmalloc mappings are enabled in the next patch, this
>>> reduces TLB misses by nearly 30x on a `git diff` workload on a 2-node
>>> POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%.
>>>
>>> This can result in more internal fragmentation and memory overhead for a
>>> given allocation, an option nohugevmalloc is added to disable at boot.
>>>
>>> Signed-off-by: Nicholas Piggin 
>>> ---
>>>  arch/Kconfig|  11 ++
>>>  include/linux/vmalloc.h |  21 
>>>  mm/page_alloc.c |   5 +-
>>>  mm/vmalloc.c| 215 +++-
>>>  4 files changed, 205 insertions(+), 47 deletions(-)
>>>
>>> diff --git a/arch/Kconfig b/arch/Kconfig
>>> index 24862d15f3a3..eef170e0c9b8 100644
>>> --- a/arch/Kconfig
>>> +++ b/arch/Kconfig
>>> @@ -724,6 +724,17 @@ config HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>>>  config HAVE_ARCH_HUGE_VMAP
>>> bool
>>>  
>>> +#
>>> +#  Archs that select this would be capable of PMD-sized vmaps (i.e.,
>>> +#  arch_vmap_pmd_supported() returns true), and they must make no 
>>> assumptions
>>> +#  that vmalloc memory is mapped with PAGE_SIZE ptes. The VM_NO_HUGE_VMAP 
>>> flag
>>> +#  can be used to prohibit arch-specific allocations from using hugepages 
>>> to
>>> +#  help with this (e.g., modules may require it).
>>> +#
>>> +config HAVE_ARCH_HUGE_VMALLOC
>>> +   depends on HAVE_ARCH_HUGE_VMAP
>>> +   bool
>>> +
>>>  config ARCH_WANT_HUGE_PMD_SHARE
>>> bool
>>>  
>>> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
>>> index 99ea72d547dc..93270adf5db5 100644
>>> --- a/include/linux/vmalloc.h
>>> +++ b/include/linux/vmalloc.h
>>> @@ -25,6 +25,7 @@ struct notifier_block;/* in notifier.h */
>>>  #define VM_NO_GUARD0x0040  /* don't add guard page 
>>> */
>>>  #define VM_KASAN   0x0080  /* has allocated kasan shadow 
>>> memory */
>>>  #define VM_MAP_PUT_PAGES   0x0100  /* put pages and free array in 
>>> vfree */
>>> +#define VM_NO_HUGE_VMAP0x0200  /* force PAGE_SIZE pte 
>>> mapping */
>>>
>>>  /*
>>>   * VM_KASAN is used slighly differently depending on CONFIG_KASAN_VMALLOC.
>>> @@ -59,6 +60,9 @@ struct vm_struct {
>>> unsigned long   size;
>>> unsigned long   flags;
>>> struct page **pages;
>>> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMALLOC
>>> +   unsigned intpage_order;
>>> +#endif
>>> unsigned intnr_pages;
>>> phys_addr_t phys_addr;
>>> const void  *caller;
>> Hi Nicholas:
>>
>> Give a suggestion :)
>>
>> The page order was only used to indicate the huge page flag for vm area, and 
>> only valid when
>> size bigger than PMD_SIZE, so can we use the vm flgas to instead of that, 
>> just like define the
>> new flag named VM_HUGEPAGE, it would not break the vm struct, and it is 
>> easier for me to backport the serious
>> patches to our own branches. (Base on the lts version).
> 
> Hmm, it might be possible. I'm not sure if 1GB vmallocs will be used any 
> time soon (or maybe they will for edge case configurations? It would be 
> trivial to add support for).
> 

1GB vmallocs is really crazy, but maybe used for future. :)

> The other concern I have is that Christophe IIRC was asking about 
> implementing a mapping for PPC which used TLB mappings that were 
> different than kernel page table tree size. Although I guess we could 
> deal with that when it comes.
> 

I didn't check the PPC platform, but a agree with you.

> I like the flexibility of page_order though. How hard would it be for 
> you to do the backport with VM_HUGEPAGE yourself?
> 

Yes, i can fix it with VM_HUGEPAGE for my own branch.

> I should also say, thanks for all the review and testing from the Huawei 
> team. Do you have an x86 patch?
I only enable and use it for x86 and aarch64 platform, this serious patches is
really help us a lot. Thanks.

Ding

> Thanks,
> Nick
> .
> 



Re: [PATCH v11 12/13] mm/vmalloc: Hugepage vmalloc mappings

2021-01-26 Thread Ding Tianhong
On 2021/1/26 12:45, Nicholas Piggin wrote:
> Support huge page vmalloc mappings. Config option HAVE_ARCH_HUGE_VMALLOC
> enables support on architectures that define HAVE_ARCH_HUGE_VMAP and
> supports PMD sized vmap mappings.
> 
> vmalloc will attempt to allocate PMD-sized pages if allocating PMD size
> or larger, and fall back to small pages if that was unsuccessful.
> 
> Architectures must ensure that any arch specific vmalloc allocations
> that require PAGE_SIZE mappings (e.g., module allocations vs strict
> module rwx) use the VM_NOHUGE flag to inhibit larger mappings.
> 
> When hugepage vmalloc mappings are enabled in the next patch, this
> reduces TLB misses by nearly 30x on a `git diff` workload on a 2-node
> POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%.
> 
> This can result in more internal fragmentation and memory overhead for a
> given allocation, an option nohugevmalloc is added to disable at boot.
> 
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/Kconfig|  11 ++
>  include/linux/vmalloc.h |  21 
>  mm/page_alloc.c |   5 +-
>  mm/vmalloc.c| 215 +++-
>  4 files changed, 205 insertions(+), 47 deletions(-)
> 
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 24862d15f3a3..eef170e0c9b8 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -724,6 +724,17 @@ config HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>  config HAVE_ARCH_HUGE_VMAP
>   bool
>  
> +#
> +#  Archs that select this would be capable of PMD-sized vmaps (i.e.,
> +#  arch_vmap_pmd_supported() returns true), and they must make no assumptions
> +#  that vmalloc memory is mapped with PAGE_SIZE ptes. The VM_NO_HUGE_VMAP 
> flag
> +#  can be used to prohibit arch-specific allocations from using hugepages to
> +#  help with this (e.g., modules may require it).
> +#
> +config HAVE_ARCH_HUGE_VMALLOC
> + depends on HAVE_ARCH_HUGE_VMAP
> + bool
> +
>  config ARCH_WANT_HUGE_PMD_SHARE
>   bool
>  
> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> index 99ea72d547dc..93270adf5db5 100644
> --- a/include/linux/vmalloc.h
> +++ b/include/linux/vmalloc.h
> @@ -25,6 +25,7 @@ struct notifier_block;  /* in notifier.h */
>  #define VM_NO_GUARD  0x0040  /* don't add guard page */
>  #define VM_KASAN 0x0080  /* has allocated kasan shadow 
> memory */
>  #define VM_MAP_PUT_PAGES 0x0100  /* put pages and free array in 
> vfree */
> +#define VM_NO_HUGE_VMAP  0x0200  /* force PAGE_SIZE pte 
> mapping */
> 
>  /*
>   * VM_KASAN is used slighly differently depending on CONFIG_KASAN_VMALLOC.
> @@ -59,6 +60,9 @@ struct vm_struct {
>   unsigned long   size;
>   unsigned long   flags;
>   struct page **pages;
> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMALLOC
> + unsigned intpage_order;
> +#endif
>   unsigned intnr_pages;
>   phys_addr_t phys_addr;
>   const void  *caller;
Hi Nicholas:

Give a suggestion :)

The page order was only used to indicate the huge page flag for vm area, and 
only valid when
size bigger than PMD_SIZE, so can we use the vm flgas to instead of that, just 
like define the
new flag named VM_HUGEPAGE, it would not break the vm struct, and it is easier 
for me to backport the serious
patches to our own branches. (Base on the lts version).

Tianhong

> @@ -193,6 +197,22 @@ void free_vm_area(struct vm_struct *area);
>  extern struct vm_struct *remove_vm_area(const void *addr);
>  extern struct vm_struct *find_vm_area(const void *addr);
>  
> +static inline bool is_vm_area_hugepages(const void *addr)
> +{
> + /*
> +  * This may not 100% tell if the area is mapped with > PAGE_SIZE
> +  * page table entries, if for some reason the architecture indicates
> +  * larger sizes are available but decides not to use them, nothing
> +  * prevents that. This only indicates the size of the physical page
> +  * allocated in the vmalloc layer.
> +  */
> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMALLOC
> + return find_vm_area(addr)->page_order > 0;
> +#else
> + return false;
> +#endif
> +}
> +
>  #ifdef CONFIG_MMU
>  int vmap_range(unsigned long addr, unsigned long end,
>   phys_addr_t phys_addr, pgprot_t prot,
> @@ -210,6 +230,7 @@ static inline void set_vm_flush_reset_perms(void *addr)
>   if (vm)
>   vm->flags |= VM_FLUSH_RESET_PERMS;
>  }
> +
>  #else
>  static inline int
>  map_kernel_range_noflush(unsigned long start, unsigned long size,
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 027f6481ba59..b7a9661fa232 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -72,6 +72,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -8238,6 +8239,7 @@ void *__init alloc_large_system_hash(const char 
> *tablename,
>   void *table = NULL;
>   gfp_t gfp_flags;
>   bool 

Re: [PATCH v9 05/12] mm: HUGE_VMAP arch support cleanup

2021-01-04 Thread Ding Tianhong
On 2020/12/5 14:57, Nicholas Piggin wrote:
> This changes the awkward approach where architectures provide init
> functions to determine which levels they can provide large mappings for,
> to one where the arch is queried for each call.
> 
> This removes code and indirection, and allows constant-folding of dead
> code for unsupported levels.
> 
> This also adds a prot argument to the arch query. This is unused
> currently but could help with some architectures (e.g., some powerpc
> processors can't map uncacheable memory with large pages).
> 
> Cc: linuxppc-...@lists.ozlabs.org
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Borislav Petkov 
> Cc: x...@kernel.org
> Cc: "H. Peter Anvin" 
> Acked-by: Catalin Marinas  [arm64]
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/arm64/include/asm/vmalloc.h |  8 +++
>  arch/arm64/mm/mmu.c  | 10 +--
>  arch/powerpc/include/asm/vmalloc.h   |  8 +++
>  arch/powerpc/mm/book3s64/radix_pgtable.c |  8 +--
>  arch/x86/include/asm/vmalloc.h   |  7 ++
>  arch/x86/mm/ioremap.c| 10 +--
>  include/linux/io.h   |  9 ---
>  include/linux/vmalloc.h  |  6 ++
>  init/main.c  |  1 -
>  mm/ioremap.c | 88 +---
>  10 files changed, 77 insertions(+), 78 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/vmalloc.h 
> b/arch/arm64/include/asm/vmalloc.h
> index 2ca708ab9b20..597b40405319 100644
> --- a/arch/arm64/include/asm/vmalloc.h
> +++ b/arch/arm64/include/asm/vmalloc.h
> @@ -1,4 +1,12 @@
>  #ifndef _ASM_ARM64_VMALLOC_H
>  #define _ASM_ARM64_VMALLOC_H
>  
> +#include 
> +
> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
> +bool arch_vmap_p4d_supported(pgprot_t prot);
> +bool arch_vmap_pud_supported(pgprot_t prot);
> +bool arch_vmap_pmd_supported(pgprot_t prot);
> +#endif
> +
>  #endif /* _ASM_ARM64_VMALLOC_H */
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index ca692a815731..1b60079c1cef 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -1315,12 +1315,12 @@ void *__init fixmap_remap_fdt(phys_addr_t dt_phys, 
> int *size, pgprot_t prot)
>   return dt_virt;
>  }
>  
> -int __init arch_ioremap_p4d_supported(void)
> +bool arch_vmap_p4d_supported(pgprot_t prot)
>  {
> - return 0;
> + return false;
>  }
>  

I think you should put this function in the CONFIG_HAVE_ARCH_HUGE_VMAP, 
otherwise it may break the compile when disable the CONFIG_HAVE_ARCH_HUGE_VMAP, 
the same
as the x86 and ppc.

Ding

> -int __init arch_ioremap_pud_supported(void)
> +bool arch_vmap_pud_supported(pgprot_t prot);
>  {
>   /*
>* Only 4k granule supports level 1 block mappings.
> @@ -1330,9 +1330,9 @@ int __init arch_ioremap_pud_supported(void)
>  !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS);
>  }
>  
> -int __init arch_ioremap_pmd_supported(void)
> +bool arch_vmap_pmd_supported(pgprot_t prot)
>  {
> - /* See arch_ioremap_pud_supported() */
> + /* See arch_vmap_pud_supported() */
>   return !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS);
>  }
>  
> diff --git a/arch/powerpc/include/asm/vmalloc.h 
> b/arch/powerpc/include/asm/vmalloc.h
> index b992dfaaa161..105abb73f075 100644
> --- a/arch/powerpc/include/asm/vmalloc.h
> +++ b/arch/powerpc/include/asm/vmalloc.h
> @@ -1,4 +1,12 @@
>  #ifndef _ASM_POWERPC_VMALLOC_H
>  #define _ASM_POWERPC_VMALLOC_H
>  
> +#include 
> +
> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
> +bool arch_vmap_p4d_supported(pgprot_t prot);
> +bool arch_vmap_pud_supported(pgprot_t prot);
> +bool arch_vmap_pmd_supported(pgprot_t prot);
> +#endif
> +
>  #endif /* _ASM_POWERPC_VMALLOC_H */
> diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
> b/arch/powerpc/mm/book3s64/radix_pgtable.c
> index 3adcf730f478..ab426fc0cd4b 100644
> --- a/arch/powerpc/mm/book3s64/radix_pgtable.c
> +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
> @@ -1121,13 +1121,13 @@ void radix__ptep_modify_prot_commit(struct 
> vm_area_struct *vma,
>   set_pte_at(mm, addr, ptep, pte);
>  }
>  
> -int __init arch_ioremap_pud_supported(void)
> +bool arch_vmap_pud_supported(pgprot_t prot)
>  {
>   /* HPT does not cope with large pages in the vmalloc area */
>   return radix_enabled();
>  }
>  
> -int __init arch_ioremap_pmd_supported(void)
> +bool arch_vmap_pmd_supported(pgprot_t prot)
>  {
>   return radix_enabled();
>  }
> @@ -1221,7 +1221,7 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
>   return 1;
>  }
>  
> -int __init arch_ioremap_p4d_supported(void)
> +bool arch_vmap_p4d_supported(pgprot_t prot)
>  {
> - return 0;
> + return false;
>  }
> diff --git a/arch/x86/include/asm/vmalloc.h b/arch/x86/include/asm/vmalloc.h
> index 29837740b520..094ea2b565f3 100644
> --- a/arch/x86/include/asm/vmalloc.h
> +++ b/arch/x86/include/asm/vmalloc.h
> @@ -1,6 +1,13 @@
>  #ifndef _ASM_X86_VMALLOC_H
>  #define 

Re: [PATCH v9 11/12] mm/vmalloc: Hugepage vmalloc mappings

2020-12-25 Thread Ding Tianhong


> +again:
> + size = PAGE_ALIGN(size);
> + area = __get_vm_area_node(size, align, VM_ALLOC | VM_UNINITIALIZED |
>   vm_flags, start, end, node, gfp_mask, caller);
>   if (!area)
>   goto fail;
>  
> - addr = __vmalloc_area_node(area, gfp_mask, prot, node);
> + addr = __vmalloc_area_node(area, gfp_mask, prot, shift, node);
>   if (!addr)
> - return NULL;
> + goto fail;
>  
>   /*
>* In this function, newly allocated vm_struct has VM_UNINITIALIZED
> @@ -2788,8 +2878,19 @@ void *__vmalloc_node_range(unsigned long size, 
> unsigned long align,
>   return addr;
>  
>  fail:
> - warn_alloc(gfp_mask, NULL,
> + if (shift > PAGE_SHIFT) {
> + free_vm_area(area);
> + shift = PAGE_SHIFT;
> + align = real_align;
> + size = real_size;
> + goto again;
> + }
> +
Hi, Nicholas:

I met a problem like this:

[   67.103584] [ cut here ]
[   67.103884] kernel BUG at vmalloc.c:2892!
[   67.104387] Internal error: Oops - BUG: 0 [#1] SMP
[   67.104942] Process insmod (pid: 1161, stack limit = 0x(ptrval))
[   67.105356] CPU: 2 PID: 1161 Comm: insmod Tainted: G   O  
4.19.95+ #9
[   67.105702] Hardware name: linux,dummy-virt (DT)
[   67.106006] pstate: a005 (NzCv daif -PAN -UAO)
[   67.106285] pc : free_vm_area+0x78/0x80
[   67.106549] lr : free_vm_area+0x58/0x80

it looks like when __vmalloc_area_node failed, the area is already released, 
and the free_vm_area
will release the vm area again, so trigger the problem.

3405 ret = remove_vm_area(area->addr);
3406 BUG_ON(ret != area);
3407 kfree(area);


Ding
> + if (!area) {
> + /* Warn for area allocation, page allocations already warn */
> + warn_alloc(gfp_mask, NULL,
> "vmalloc: allocation failure: %lu bytes", real_size);
> + }
>   return NULL;
>  }
>  
> 



[PATCH] gpio: dwapb: add support for new hisilicon ascend soc

2020-08-21 Thread Ding Tianhong
The hisilicon ascend soc's gpio is based on the synopsys DW gpio,
and expand the register to support for INTCOMB_MASK, the new
register is used to enable/disable the interrupt combine features.

Both support for ACPI and Device Tree.

Signed-off-by: Ding Tianhong 
---
 drivers/gpio/gpio-dwapb.c | 28 
 1 file changed, 28 insertions(+)

diff --git a/drivers/gpio/gpio-dwapb.c b/drivers/gpio/gpio-dwapb.c
index 1d8d55b..923b381 100644
--- a/drivers/gpio/gpio-dwapb.c
+++ b/drivers/gpio/gpio-dwapb.c
@@ -49,6 +49,8 @@
 #define GPIO_EXT_PORTC 0x58
 #define GPIO_EXT_PORTD 0x5c
 
+#define GPIO_INTCOMB_MASK  0xffc
+
 #define DWAPB_DRIVER_NAME  "gpio-dwapb"
 #define DWAPB_MAX_PORTS4
 
@@ -58,6 +60,10 @@
 
 #define GPIO_REG_OFFSET_V2 1
 
+#define GPIO_REG_INT_COMB  2
+#define ENABLE_INT_COMB1
+#define DISABLE_INT_COMB   0
+
 #define GPIO_INTMASK_V20x44
 #define GPIO_INTTYPE_LEVEL_V2  0x34
 #define GPIO_INT_POLARITY_V2   0x38
@@ -354,6 +360,20 @@ static irqreturn_t dwapb_irq_handler_mfd(int irq, void 
*dev_id)
return IRQ_RETVAL(dwapb_do_irq(dev_id));
 }
 
+static void dwapb_enable_inq_combine(struct dwapb_gpio *gpio, unsigned int 
enable)
+{
+   u32 val;
+
+   if (gpio->flags & GPIO_REG_INT_COMB) {
+   val = dwapb_read(gpio, GPIO_INTCOMB_MASK);
+   if (enable)
+   val |= BIT(0);
+   else
+   val &= BIT(0);
+   dwapb_write(gpio, GPIO_INTCOMB_MASK, val);
+   }
+}
+
 static void dwapb_configure_irqs(struct dwapb_gpio *gpio,
 struct dwapb_gpio_port *port,
 struct dwapb_port_property *pp)
@@ -446,6 +466,8 @@ static void dwapb_configure_irqs(struct dwapb_gpio *gpio,
irq_create_mapping(gpio->domain, hwirq);
 
port->gc.to_irq = dwapb_gpio_to_irq;
+
+   dwapb_enable_inq_combine(gpio, ENABLE_INT_COMB);
 }
 
 static void dwapb_irq_teardown(struct dwapb_gpio *gpio)
@@ -618,6 +640,7 @@ static struct dwapb_platform_data 
*dwapb_gpio_get_pdata(struct device *dev)
 static const struct of_device_id dwapb_of_match[] = {
{ .compatible = "snps,dw-apb-gpio", .data = (void *)0},
{ .compatible = "apm,xgene-gpio-v2", .data = (void 
*)GPIO_REG_OFFSET_V2},
+   { .compatible = "hisi,ascend-gpio", .data = (void *)GPIO_REG_INT_COMB},
{ /* Sentinel */ }
 };
 MODULE_DEVICE_TABLE(of, dwapb_of_match);
@@ -626,6 +649,7 @@ static struct dwapb_platform_data 
*dwapb_gpio_get_pdata(struct device *dev)
{"HISI0181", 0},
{"APMC0D07", 0},
{"APMC0D81", GPIO_REG_OFFSET_V2},
+   {"HISI19XX", GPIO_REG_INT_COMB},
{ }
 };
 MODULE_DEVICE_TABLE(acpi, dwapb_acpi_match);
@@ -713,6 +737,8 @@ static int dwapb_gpio_remove(struct platform_device *pdev)
reset_control_assert(gpio->rst);
clk_bulk_disable_unprepare(DWAPB_NR_CLOCKS, gpio->clks);
 
+   dwapb_enable_inq_combine(gpio, DISABLE_INT_COMB);
+
return 0;
 }
 
@@ -794,6 +820,8 @@ static int dwapb_gpio_resume(struct device *dev)
dwapb_write(gpio, GPIO_INTEN, ctx->int_en);
dwapb_write(gpio, GPIO_INTMASK, ctx->int_mask);
 
+   dwapb_enable_inq_combine(gpio, ENABLE_INT_COMB);
+
/* Clear out spurious interrupts */
dwapb_write(gpio, GPIO_PORTA_EOI, 0x);
}
-- 
1.8.3.1



Re: [PATCH] fs: Fix signed integer overflow for vfs_setpos

2017-12-07 Thread Ding Tianhong


On 2017/12/7 23:27, Al Viro wrote:
> On Thu, Dec 07, 2017 at 09:19:10PM +0800, Ding Tianhong wrote:
>> The undefined behaviour sanatizer detected an signed integer overflow like 
>> this:
>>
>> r0 = 
>> memfd_create(&(0x7f002000-0x12)="2e726571756573745f6b65795f6175746800",0x0)
>> lseek(r0, 0x4040, 0x1)
>> setsockopt$inet6_IPV6_FLOWLABEL_MGR(r0, 0x29, 0x20,
>> &(0x7f00b000-0xd)={@empty={[0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
>> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0]}, 0x9, 0x1, 0xff, 0x2, 0x6, 
>> 0x1,0xd27}, 0x20)
>> mmap(&(0x7f00e000/0x1000)=nil, 0x1000, 0x3, 0x32,0x, 0x0)
>> ioctl$sock_SIOCGSKNS(r0, 0x894c, &(0x7f00f000-0x4)=0x1)
>> -
>> UBSAN: Undefined behaviour in fs/read_write.c:107:12
>> signed integer overflow:
>> 4629700416936869888 + 4629700416936869888 cannot be represented in type
>> 'long long int'
>> CPU: 0 PID: 11653 Comm: syz-executor0 Not tainted 4.x.xx+ #2
>> Hardware name: linux,dummy-virt (DT)
>> Call trace:
>> [] dump_backtrace+0x0/0x2a0
>> [] show_stack+0x20/0x30
>> [] dump_stack+0x11c/0x16c
>> [] ubsan_epilogue+0x18/0x70
>> [] handle_overflow+0x14c/0x188
>> [] __ubsan_handle_add_overflow+0x34/0x44
>> [] generic_file_llseek_size+0x1f8/0x2a0
>> [] shmem_file_llseek+0x7c/0x1f8
>> [] SyS_lseek+0xc0/0x118
>> 
>>
>> The problem happened because the calculation of signed integer resulted
>> an overflow for the signed integer, so use the unsigned integer to avoid
>> undefined behaviour when it does overflow.
> 
> TBH, I don't like that solution - there's too much of "make UBSAN STFU" in
> it.  Besides, there are very similar places elsewhere.  Right next to this
> one there's default_llseek(), with its
> case SEEK_CUR:
> if (offset == 0) {
> retval = file->f_pos;
> goto out;
> }
> offset += file->f_pos;
> break;
> and offset is loff_t there.  Exact same issue, IOW.  Grepping around shows
> tons of similar places.  E.g. ceph_llseek() has
> if (offset == 0) {
> ret = file->f_pos;
> goto out;
> }
> offset += file->f_pos;
> break;
> with offset being loff_t and ocfs2_file_llseek() is the same.  memory_lseek()
> does something very similar, except that it doesn't use vfs_setpos(),
> ditto for xillybus_llseek(), wil_pmc_llseek(), hmcdrv_dev_seek(), etc.
> 
> That kind of whack-a-mole ("UBSAN has stepped on that one, let's plug it",
> while the other places like that keep breeding) is, IMO, the wrong approach 
> ;-/
> 
> BTW, a fun unrelated bogosity:
> static loff_t scom_llseek(struct file *file, loff_t offset, int whence)
> {
> switch (whence) {
> case SEEK_CUR:
> break;
> case SEEK_SET:
> file->f_pos = offset;
> break;
> default:
> return -EINVAL;
> }
> 
> return offset;
> }
> IOW, lseek(fd, SEEK_CUR, n) quietly returns n there.  Separate issue, 
> though...
> 

Totally agree with you, this problem also make me very confused, but looks like
the undefined behaviour is really critical issue,  so should we reconsider this
problem and solve it completely ?

Thanks
Ding

> .
> 



Re: [PATCH] fs: Fix signed integer overflow for vfs_setpos

2017-12-07 Thread Ding Tianhong


On 2017/12/7 23:27, Al Viro wrote:
> On Thu, Dec 07, 2017 at 09:19:10PM +0800, Ding Tianhong wrote:
>> The undefined behaviour sanatizer detected an signed integer overflow like 
>> this:
>>
>> r0 = 
>> memfd_create(&(0x7f002000-0x12)="2e726571756573745f6b65795f6175746800",0x0)
>> lseek(r0, 0x4040, 0x1)
>> setsockopt$inet6_IPV6_FLOWLABEL_MGR(r0, 0x29, 0x20,
>> &(0x7f00b000-0xd)={@empty={[0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
>> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0]}, 0x9, 0x1, 0xff, 0x2, 0x6, 
>> 0x1,0xd27}, 0x20)
>> mmap(&(0x7f00e000/0x1000)=nil, 0x1000, 0x3, 0x32,0x, 0x0)
>> ioctl$sock_SIOCGSKNS(r0, 0x894c, &(0x7f00f000-0x4)=0x1)
>> -
>> UBSAN: Undefined behaviour in fs/read_write.c:107:12
>> signed integer overflow:
>> 4629700416936869888 + 4629700416936869888 cannot be represented in type
>> 'long long int'
>> CPU: 0 PID: 11653 Comm: syz-executor0 Not tainted 4.x.xx+ #2
>> Hardware name: linux,dummy-virt (DT)
>> Call trace:
>> [] dump_backtrace+0x0/0x2a0
>> [] show_stack+0x20/0x30
>> [] dump_stack+0x11c/0x16c
>> [] ubsan_epilogue+0x18/0x70
>> [] handle_overflow+0x14c/0x188
>> [] __ubsan_handle_add_overflow+0x34/0x44
>> [] generic_file_llseek_size+0x1f8/0x2a0
>> [] shmem_file_llseek+0x7c/0x1f8
>> [] SyS_lseek+0xc0/0x118
>> 
>>
>> The problem happened because the calculation of signed integer resulted
>> an overflow for the signed integer, so use the unsigned integer to avoid
>> undefined behaviour when it does overflow.
> 
> TBH, I don't like that solution - there's too much of "make UBSAN STFU" in
> it.  Besides, there are very similar places elsewhere.  Right next to this
> one there's default_llseek(), with its
> case SEEK_CUR:
> if (offset == 0) {
> retval = file->f_pos;
> goto out;
> }
> offset += file->f_pos;
> break;
> and offset is loff_t there.  Exact same issue, IOW.  Grepping around shows
> tons of similar places.  E.g. ceph_llseek() has
> if (offset == 0) {
> ret = file->f_pos;
> goto out;
> }
> offset += file->f_pos;
> break;
> with offset being loff_t and ocfs2_file_llseek() is the same.  memory_lseek()
> does something very similar, except that it doesn't use vfs_setpos(),
> ditto for xillybus_llseek(), wil_pmc_llseek(), hmcdrv_dev_seek(), etc.
> 
> That kind of whack-a-mole ("UBSAN has stepped on that one, let's plug it",
> while the other places like that keep breeding) is, IMO, the wrong approach 
> ;-/
> 
> BTW, a fun unrelated bogosity:
> static loff_t scom_llseek(struct file *file, loff_t offset, int whence)
> {
> switch (whence) {
> case SEEK_CUR:
> break;
> case SEEK_SET:
> file->f_pos = offset;
> break;
> default:
> return -EINVAL;
> }
> 
> return offset;
> }
> IOW, lseek(fd, SEEK_CUR, n) quietly returns n there.  Separate issue, 
> though...
> 

Totally agree with you, this problem also make me very confused, but looks like
the undefined behaviour is really critical issue,  so should we reconsider this
problem and solve it completely ?

Thanks
Ding

> .
> 



[PATCH] fs: Fix signed integer overflow for vfs_setpos

2017-12-07 Thread Ding Tianhong
The undefined behaviour sanatizer detected an signed integer overflow like this:

r0 = 
memfd_create(&(0x7f002000-0x12)="2e726571756573745f6b65795f6175746800",0x0)
lseek(r0, 0x4040, 0x1)
setsockopt$inet6_IPV6_FLOWLABEL_MGR(r0, 0x29, 0x20,
&(0x7f00b000-0xd)={@empty={[0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0]}, 0x9, 0x1, 0xff, 0x2, 0x6, 0x1,0xd27}, 
0x20)
mmap(&(0x7f00e000/0x1000)=nil, 0x1000, 0x3, 0x32,0x, 0x0)
ioctl$sock_SIOCGSKNS(r0, 0x894c, &(0x7f00f000-0x4)=0x1)
-
UBSAN: Undefined behaviour in fs/read_write.c:107:12
signed integer overflow:
4629700416936869888 + 4629700416936869888 cannot be represented in type
'long long int'
CPU: 0 PID: 11653 Comm: syz-executor0 Not tainted 4.x.xx+ #2
Hardware name: linux,dummy-virt (DT)
Call trace:
[] dump_backtrace+0x0/0x2a0
[] show_stack+0x20/0x30
[] dump_stack+0x11c/0x16c
[] ubsan_epilogue+0x18/0x70
[] handle_overflow+0x14c/0x188
[] __ubsan_handle_add_overflow+0x34/0x44
[] generic_file_llseek_size+0x1f8/0x2a0
[] shmem_file_llseek+0x7c/0x1f8
[] SyS_lseek+0xc0/0x118


The problem happened because the calculation of signed integer resulted
an overflow for the signed integer, so use the unsigned integer to avoid
undefined behaviour when it does overflow.

Signed-off-by: Ding Tianhong <dingtianh...@huawei.com>
---
 fs/read_write.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index f8547b8..2c377fc 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -105,7 +105,7 @@ loff_t vfs_setpos(struct file *file, loff_t offset, loff_t 
maxsize)
 * like SEEK_SET.
 */
spin_lock(>f_lock);
-   offset = vfs_setpos(file, file->f_pos + offset, maxsize);
+   offset = vfs_setpos(file, (u64)file->f_pos + offset, maxsize);
spin_unlock(>f_lock);
return offset;
case SEEK_DATA:
-- 
1.8.3.1




[PATCH] fs: Fix signed integer overflow for vfs_setpos

2017-12-07 Thread Ding Tianhong
The undefined behaviour sanatizer detected an signed integer overflow like this:

r0 = 
memfd_create(&(0x7f002000-0x12)="2e726571756573745f6b65795f6175746800",0x0)
lseek(r0, 0x4040, 0x1)
setsockopt$inet6_IPV6_FLOWLABEL_MGR(r0, 0x29, 0x20,
&(0x7f00b000-0xd)={@empty={[0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0]}, 0x9, 0x1, 0xff, 0x2, 0x6, 0x1,0xd27}, 
0x20)
mmap(&(0x7f00e000/0x1000)=nil, 0x1000, 0x3, 0x32,0x, 0x0)
ioctl$sock_SIOCGSKNS(r0, 0x894c, &(0x7f00f000-0x4)=0x1)
-
UBSAN: Undefined behaviour in fs/read_write.c:107:12
signed integer overflow:
4629700416936869888 + 4629700416936869888 cannot be represented in type
'long long int'
CPU: 0 PID: 11653 Comm: syz-executor0 Not tainted 4.x.xx+ #2
Hardware name: linux,dummy-virt (DT)
Call trace:
[] dump_backtrace+0x0/0x2a0
[] show_stack+0x20/0x30
[] dump_stack+0x11c/0x16c
[] ubsan_epilogue+0x18/0x70
[] handle_overflow+0x14c/0x188
[] __ubsan_handle_add_overflow+0x34/0x44
[] generic_file_llseek_size+0x1f8/0x2a0
[] shmem_file_llseek+0x7c/0x1f8
[] SyS_lseek+0xc0/0x118


The problem happened because the calculation of signed integer resulted
an overflow for the signed integer, so use the unsigned integer to avoid
undefined behaviour when it does overflow.

Signed-off-by: Ding Tianhong 
---
 fs/read_write.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index f8547b8..2c377fc 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -105,7 +105,7 @@ loff_t vfs_setpos(struct file *file, loff_t offset, loff_t 
maxsize)
 * like SEEK_SET.
 */
spin_lock(>f_lock);
-   offset = vfs_setpos(file, file->f_pos + offset, maxsize);
+   offset = vfs_setpos(file, (u64)file->f_pos + offset, maxsize);
spin_unlock(>f_lock);
return offset;
case SEEK_DATA:
-- 
1.8.3.1




[PATCH v3] ubsan: don't handle misaligned address when support unaligned access

2017-12-06 Thread Ding Tianhong
The ubsan always report Warning just like:

UBSAN: Undefined behaviour in ../include/linux/etherdevice.h:386:9
load of misaligned address ffc069ba0482 for type 'long unsigned int'
which requires 8 byte alignment
CPU: 0 PID: 901 Comm: sshd Not tainted 4.xx+ #1
Hardware name: linux,dummy-virt (DT)
Call trace:
[] dump_backtrace+0x0/0x348
[] show_stack+0x20/0x30
[] dump_stack+0x144/0x1b4
[] ubsan_epilogue+0x18/0x74
[] __ubsan_handle_type_mismatch+0x1a0/0x25c
[] dev_gro_receive+0x17d8/0x1830
[] napi_gro_receive+0x30/0x158
[] virtnet_receive+0xad4/0x1fa8

The reason is that when enable the CONFIG_UBSAN_ALIGNMENT, the ubsan
will report the unaligned access even if the system support it
(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y), it will produce a lot
of noise in the log and cause confusion.

This patch will close the detection of unaligned access when
the system support unaligned access.

Signed-off-by: Ding Tianhong <dingtianh...@huawei.com>
---
 lib/ubsan.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/lib/ubsan.c b/lib/ubsan.c
index fb0409d..0799678 100644
--- a/lib/ubsan.c
+++ b/lib/ubsan.c
@@ -321,9 +321,10 @@ void __ubsan_handle_type_mismatch(struct 
type_mismatch_data *data,

if (!ptr)
handle_null_ptr_deref(data);
-   else if (data->alignment && !IS_ALIGNED(ptr, data->alignment))
-   handle_missaligned_access(data, ptr);
-   else
+   else if (data->alignment && !IS_ALIGNED(ptr, data->alignment)) {
+   if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS))
+   handle_missaligned_access(data, ptr);
+   } else
handle_object_size_mismatch(data, ptr);
 }
 EXPORT_SYMBOL(__ubsan_handle_type_mismatch);
-- 
1.8.3.1





[PATCH v3] ubsan: don't handle misaligned address when support unaligned access

2017-12-06 Thread Ding Tianhong
The ubsan always report Warning just like:

UBSAN: Undefined behaviour in ../include/linux/etherdevice.h:386:9
load of misaligned address ffc069ba0482 for type 'long unsigned int'
which requires 8 byte alignment
CPU: 0 PID: 901 Comm: sshd Not tainted 4.xx+ #1
Hardware name: linux,dummy-virt (DT)
Call trace:
[] dump_backtrace+0x0/0x348
[] show_stack+0x20/0x30
[] dump_stack+0x144/0x1b4
[] ubsan_epilogue+0x18/0x74
[] __ubsan_handle_type_mismatch+0x1a0/0x25c
[] dev_gro_receive+0x17d8/0x1830
[] napi_gro_receive+0x30/0x158
[] virtnet_receive+0xad4/0x1fa8

The reason is that when enable the CONFIG_UBSAN_ALIGNMENT, the ubsan
will report the unaligned access even if the system support it
(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y), it will produce a lot
of noise in the log and cause confusion.

This patch will close the detection of unaligned access when
the system support unaligned access.

Signed-off-by: Ding Tianhong 
---
 lib/ubsan.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/lib/ubsan.c b/lib/ubsan.c
index fb0409d..0799678 100644
--- a/lib/ubsan.c
+++ b/lib/ubsan.c
@@ -321,9 +321,10 @@ void __ubsan_handle_type_mismatch(struct 
type_mismatch_data *data,

if (!ptr)
handle_null_ptr_deref(data);
-   else if (data->alignment && !IS_ALIGNED(ptr, data->alignment))
-   handle_missaligned_access(data, ptr);
-   else
+   else if (data->alignment && !IS_ALIGNED(ptr, data->alignment)) {
+   if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS))
+   handle_missaligned_access(data, ptr);
+   } else
handle_object_size_mismatch(data, ptr);
 }
 EXPORT_SYMBOL(__ubsan_handle_type_mismatch);
-- 
1.8.3.1





Re: [PATCH v2] ubsan: don't handle misaligned address when support unaligned access

2017-12-06 Thread Ding Tianhong
Hi Andrew:

Sorry for the mistaken of the Andrey's email.

After the test I found this version still exist the problem that will transfer 
the align problem to size
mismatch, I will send a new version to fix it.

The correct way is like this:

diff --git a/lib/ubsan.c b/lib/ubsan.c
index fb0409d..0799678 100644
--- a/lib/ubsan.c
+++ b/lib/ubsan.c
@@ -321,9 +321,10 @@ void __ubsan_handle_type_mismatch(struct 
type_mismatch_data *data,

if (!ptr)
handle_null_ptr_deref(data);
-   else if (data->alignment && !IS_ALIGNED(ptr, data->alignment))
-   handle_missaligned_access(data, ptr);
-   else
+   else if (data->alignment && !IS_ALIGNED(ptr, data->alignment)) {
+   if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS))
+   handle_missaligned_access(data, ptr);
+   } else
handle_object_size_mismatch(data, ptr);
 }
 EXPORT_SYMBOL(__ubsan_handle_type_mismatch);
--


Thanks
Ding

On 2017/12/7 8:49, Andrew Morton wrote:
> (correcting Andrey's email address)
> 
> 
> From: Ding Tianhong <dingtianh...@huawei.com>
> Subject: lib/ubsan.c: don't handle misaligned address when kernel supports 
> unaligned access
> 
> ubsan reports a warning like:
> 
> UBSAN: Undefined behaviour in ../include/linux/etherdevice.h:386:9
> load of misaligned address ffc069ba0482 for type 'long unsigned int'
> which requires 8 byte alignment
> CPU: 0 PID: 901 Comm: sshd Not tainted 4.xx+ #1
> Hardware name: linux,dummy-virt (DT)
> Call trace:
> [] dump_backtrace+0x0/0x348
> [] show_stack+0x20/0x30
> [] dump_stack+0x144/0x1b4
> [] ubsan_epilogue+0x18/0x74
> [] __ubsan_handle_type_mismatch+0x1a0/0x25c
> [] dev_gro_receive+0x17d8/0x1830
> [] napi_gro_receive+0x30/0x158
> [] virtnet_receive+0xad4/0x1fa8
> 
> The reason is that when enabling the CONFIG_UBSAN_ALIGNMENT, ubsan will
> report the unaligned access even if the system supports it
> (CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y).  This produces a lot of noise
> in the log and causes confusion.
> 
> Prevent the detection of unaligned access when the system support
> unaligned access.
> 
> Link: http://lkml.kernel.org/r/5b905d56-609e-3822-096a-3b93b3eb7...@huawei.com
> Signed-off-by: Ding Tianhong <dingtianh...@huawei.com>
> Cc: David Laight <david.lai...@aculab.com>
> Cc: Andrey Ryabinin <aryabi...@virtuozzo.com>
> Signed-off-by: Andrew Morton <a...@linux-foundation.org>
> ---
> 
>  lib/ubsan.c |3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff -puN 
> lib/ubsan.c~ubsan-dont-handle-misaligned-address-when-support-unaligned-access
>  lib/ubsan.c
> --- 
> a/lib/ubsan.c~ubsan-dont-handle-misaligned-address-when-support-unaligned-access
> +++ a/lib/ubsan.c
> @@ -322,7 +322,8 @@ void __ubsan_handle_type_mismatch(struct
>   if (!ptr)
>   handle_null_ptr_deref(data);
>   else if (data->alignment && !IS_ALIGNED(ptr, data->alignment))
> - handle_missaligned_access(data, ptr);
> + if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS))
> + handle_missaligned_access(data, ptr);
>   else
>   handle_object_size_mismatch(data, ptr);
>  }
> _
> 
> 
> .
> 



Re: [PATCH v2] ubsan: don't handle misaligned address when support unaligned access

2017-12-06 Thread Ding Tianhong
Hi Andrew:

Sorry for the mistaken of the Andrey's email.

After the test I found this version still exist the problem that will transfer 
the align problem to size
mismatch, I will send a new version to fix it.

The correct way is like this:

diff --git a/lib/ubsan.c b/lib/ubsan.c
index fb0409d..0799678 100644
--- a/lib/ubsan.c
+++ b/lib/ubsan.c
@@ -321,9 +321,10 @@ void __ubsan_handle_type_mismatch(struct 
type_mismatch_data *data,

if (!ptr)
handle_null_ptr_deref(data);
-   else if (data->alignment && !IS_ALIGNED(ptr, data->alignment))
-   handle_missaligned_access(data, ptr);
-   else
+   else if (data->alignment && !IS_ALIGNED(ptr, data->alignment)) {
+   if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS))
+   handle_missaligned_access(data, ptr);
+   } else
handle_object_size_mismatch(data, ptr);
 }
 EXPORT_SYMBOL(__ubsan_handle_type_mismatch);
--


Thanks
Ding

On 2017/12/7 8:49, Andrew Morton wrote:
> (correcting Andrey's email address)
> 
> 
> From: Ding Tianhong 
> Subject: lib/ubsan.c: don't handle misaligned address when kernel supports 
> unaligned access
> 
> ubsan reports a warning like:
> 
> UBSAN: Undefined behaviour in ../include/linux/etherdevice.h:386:9
> load of misaligned address ffc069ba0482 for type 'long unsigned int'
> which requires 8 byte alignment
> CPU: 0 PID: 901 Comm: sshd Not tainted 4.xx+ #1
> Hardware name: linux,dummy-virt (DT)
> Call trace:
> [] dump_backtrace+0x0/0x348
> [] show_stack+0x20/0x30
> [] dump_stack+0x144/0x1b4
> [] ubsan_epilogue+0x18/0x74
> [] __ubsan_handle_type_mismatch+0x1a0/0x25c
> [] dev_gro_receive+0x17d8/0x1830
> [] napi_gro_receive+0x30/0x158
> [] virtnet_receive+0xad4/0x1fa8
> 
> The reason is that when enabling the CONFIG_UBSAN_ALIGNMENT, ubsan will
> report the unaligned access even if the system supports it
> (CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y).  This produces a lot of noise
> in the log and causes confusion.
> 
> Prevent the detection of unaligned access when the system support
> unaligned access.
> 
> Link: http://lkml.kernel.org/r/5b905d56-609e-3822-096a-3b93b3eb7...@huawei.com
> Signed-off-by: Ding Tianhong 
> Cc: David Laight 
> Cc: Andrey Ryabinin 
> Signed-off-by: Andrew Morton 
> ---
> 
>  lib/ubsan.c |3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff -puN 
> lib/ubsan.c~ubsan-dont-handle-misaligned-address-when-support-unaligned-access
>  lib/ubsan.c
> --- 
> a/lib/ubsan.c~ubsan-dont-handle-misaligned-address-when-support-unaligned-access
> +++ a/lib/ubsan.c
> @@ -322,7 +322,8 @@ void __ubsan_handle_type_mismatch(struct
>   if (!ptr)
>   handle_null_ptr_deref(data);
>   else if (data->alignment && !IS_ALIGNED(ptr, data->alignment))
> - handle_missaligned_access(data, ptr);
> + if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS))
> + handle_missaligned_access(data, ptr);
>   else
>   handle_object_size_mismatch(data, ptr);
>  }
> _
> 
> 
> .
> 



[PATCH] fs/sync: fix the signed integer overflow warning

2017-12-05 Thread Ding Tianhong
The syzkaller report the warning when enable the UBSAN:

UBSAN: Undefined behaviour in fs/sync.c:290:10
signed integer overflow:
-1 + -9223372036854775808 cannot be represented in type 'long long int'
CPU: 0 PID: 3149 Comm: syz-executor3 Not tainted 4.xx #2
Hardware name: linux,dummy-virt (DT)
Call trace:
[] dump_backtrace+0x0/0x2a0
[] show_stack+0x20/0x30
[] dump_stack+0x11c/0x16c
[] ubsan_epilogue+0x18/0x70
[] handle_overflow+0x14c/0x188
[] __ubsan_handle_add_overflow+0x34/0x44
[] SyS_sync_file_range+0x118/0x210

===

The problem is that the input parameter is a wrong value, resulting in
an overflow of the 'endbyte', also it will not cause any serious problem
and return out in the next step.

This patch only fix the warning and no change the logic.

Signed-off-by: Ding Tianhong <dingtianh...@huawei.com>
---
 fs/sync.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/sync.c b/fs/sync.c
index 6e0a2cb..0f77586 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -293,10 +293,11 @@ static int do_fsync(unsigned int fd, int datasync)
if (flags & ~VALID_FLAGS)
goto out;

-   endbyte = offset + nbytes;
-
if ((s64)offset < 0)
goto out;
+
+   endbyte = offset + nbytes;
+
if ((s64)endbyte < 0)
goto out;
if (endbyte < offset)
-- 
1.8.3.1



[PATCH] fs/sync: fix the signed integer overflow warning

2017-12-05 Thread Ding Tianhong
The syzkaller report the warning when enable the UBSAN:

UBSAN: Undefined behaviour in fs/sync.c:290:10
signed integer overflow:
-1 + -9223372036854775808 cannot be represented in type 'long long int'
CPU: 0 PID: 3149 Comm: syz-executor3 Not tainted 4.xx #2
Hardware name: linux,dummy-virt (DT)
Call trace:
[] dump_backtrace+0x0/0x2a0
[] show_stack+0x20/0x30
[] dump_stack+0x11c/0x16c
[] ubsan_epilogue+0x18/0x70
[] handle_overflow+0x14c/0x188
[] __ubsan_handle_add_overflow+0x34/0x44
[] SyS_sync_file_range+0x118/0x210

===

The problem is that the input parameter is a wrong value, resulting in
an overflow of the 'endbyte', also it will not cause any serious problem
and return out in the next step.

This patch only fix the warning and no change the logic.

Signed-off-by: Ding Tianhong 
---
 fs/sync.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/sync.c b/fs/sync.c
index 6e0a2cb..0f77586 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -293,10 +293,11 @@ static int do_fsync(unsigned int fd, int datasync)
if (flags & ~VALID_FLAGS)
goto out;

-   endbyte = offset + nbytes;
-
if ((s64)offset < 0)
goto out;
+
+   endbyte = offset + nbytes;
+
if ((s64)endbyte < 0)
goto out;
if (endbyte < offset)
-- 
1.8.3.1



[PATCH v2] ubsan: don't handle misaligned address when support unaligned access

2017-12-01 Thread Ding Tianhong
The ubsan always report Warning just like:

UBSAN: Undefined behaviour in ../include/linux/etherdevice.h:386:9
load of misaligned address ffc069ba0482 for type 'long unsigned int'
which requires 8 byte alignment
CPU: 0 PID: 901 Comm: sshd Not tainted 4.xx+ #1
Hardware name: linux,dummy-virt (DT)
Call trace:
[] dump_backtrace+0x0/0x348
[] show_stack+0x20/0x30
[] dump_stack+0x144/0x1b4
[] ubsan_epilogue+0x18/0x74
[] __ubsan_handle_type_mismatch+0x1a0/0x25c
[] dev_gro_receive+0x17d8/0x1830
[] napi_gro_receive+0x30/0x158
[] virtnet_receive+0xad4/0x1fa8

The reason is that when enable the CONFIG_UBSAN_ALIGNMENT, the ubsan
will report the unaligned access even if the system support it
(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y), it will produce a lot
of noise in the log and cause confusion.

This patch will close the detection of unaligned access when
the system support unaligned access.

Signed-off-by: Ding Tianhong <dingtianh...@huawei.com>
---
 lib/ubsan.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/ubsan.c b/lib/ubsan.c
index fb0409d..9207e65 100644
--- a/lib/ubsan.c
+++ b/lib/ubsan.c
@@ -322,7 +322,8 @@ void __ubsan_handle_type_mismatch(struct type_mismatch_data 
*data,
if (!ptr)
handle_null_ptr_deref(data);
else if (data->alignment && !IS_ALIGNED(ptr, data->alignment))
-   handle_missaligned_access(data, ptr);
+   if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS))
+   handle_missaligned_access(data, ptr);
else
handle_object_size_mismatch(data, ptr);
 }
-- 
1.8.3.1



[PATCH v2] ubsan: don't handle misaligned address when support unaligned access

2017-12-01 Thread Ding Tianhong
The ubsan always report Warning just like:

UBSAN: Undefined behaviour in ../include/linux/etherdevice.h:386:9
load of misaligned address ffc069ba0482 for type 'long unsigned int'
which requires 8 byte alignment
CPU: 0 PID: 901 Comm: sshd Not tainted 4.xx+ #1
Hardware name: linux,dummy-virt (DT)
Call trace:
[] dump_backtrace+0x0/0x348
[] show_stack+0x20/0x30
[] dump_stack+0x144/0x1b4
[] ubsan_epilogue+0x18/0x74
[] __ubsan_handle_type_mismatch+0x1a0/0x25c
[] dev_gro_receive+0x17d8/0x1830
[] napi_gro_receive+0x30/0x158
[] virtnet_receive+0xad4/0x1fa8

The reason is that when enable the CONFIG_UBSAN_ALIGNMENT, the ubsan
will report the unaligned access even if the system support it
(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y), it will produce a lot
of noise in the log and cause confusion.

This patch will close the detection of unaligned access when
the system support unaligned access.

Signed-off-by: Ding Tianhong 
---
 lib/ubsan.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/ubsan.c b/lib/ubsan.c
index fb0409d..9207e65 100644
--- a/lib/ubsan.c
+++ b/lib/ubsan.c
@@ -322,7 +322,8 @@ void __ubsan_handle_type_mismatch(struct type_mismatch_data 
*data,
if (!ptr)
handle_null_ptr_deref(data);
else if (data->alignment && !IS_ALIGNED(ptr, data->alignment))
-   handle_missaligned_access(data, ptr);
+   if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS))
+   handle_missaligned_access(data, ptr);
else
handle_object_size_mismatch(data, ptr);
 }
-- 
1.8.3.1



Re: [PATCH] ubsan: don't handle misaligned address when support unaligned access

2017-12-01 Thread Ding Tianhong


On 2017/12/1 19:47, David Laight wrote:
>> of noise in the log and cause confusion.
>>
>> This patch will close the detection of unaligned access when
>> the system support unaligned access.
>>
>> Signed-off-by: Ding Tianhong <dingtianh...@huawei.com>
>> ---
>>  lib/ubsan.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/lib/ubsan.c b/lib/ubsan.c
>> index fb0409d..278b4c3 100644
>> --- a/lib/ubsan.c
>> +++ b/lib/ubsan.c
>> @@ -321,7 +321,8 @@ void __ubsan_handle_type_mismatch(struct 
>> type_mismatch_data *data,
>>
>>  if (!ptr)
>>  handle_null_ptr_deref(data);
>> -else if (data->alignment && !IS_ALIGNED(ptr, data->alignment))
>> +else if (data->alignment && !IS_ALIGNED(ptr, data->alignment) &&
>> + !IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS))
>>  handle_missaligned_access(data, ptr);
>>  else
>>  handle_object_size_mismatch(data, ptr);
> 
> Won't that report an object size error instead of actually
> doing the required access?
>

Yes,I miss it.


> Surely it shouldn't get into this function at all?
> 
> I guess 'alignment' is set to 4 or 8.
> If it were set to 3 or 7 (or 0) then the tests on the pointer
> would be much simpler - maybe at a slight extra cost in setup.
> 

Looks like we need to fix it in the handle_missaligned_access:

diff --git a/lib/ubsan.c b/lib/ubsan.c
index 278b4c3..040f8b2 100644
--- a/lib/ubsan.c
+++ b/lib/ubsan.c
@@ -289,6 +289,9 @@ static void handle_missaligned_access(struct 
type_mismatch_data *data,
if (suppress_report(>location))
return;

+   if (IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS))
+   return;
+
ubsan_prologue(>location, );

pr_err("%s misaligned address %p for type %s\n",


Thanks
Ding


>   David
> 



Re: [PATCH] ubsan: don't handle misaligned address when support unaligned access

2017-12-01 Thread Ding Tianhong


On 2017/12/1 19:47, David Laight wrote:
>> of noise in the log and cause confusion.
>>
>> This patch will close the detection of unaligned access when
>> the system support unaligned access.
>>
>> Signed-off-by: Ding Tianhong 
>> ---
>>  lib/ubsan.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/lib/ubsan.c b/lib/ubsan.c
>> index fb0409d..278b4c3 100644
>> --- a/lib/ubsan.c
>> +++ b/lib/ubsan.c
>> @@ -321,7 +321,8 @@ void __ubsan_handle_type_mismatch(struct 
>> type_mismatch_data *data,
>>
>>  if (!ptr)
>>  handle_null_ptr_deref(data);
>> -else if (data->alignment && !IS_ALIGNED(ptr, data->alignment))
>> +else if (data->alignment && !IS_ALIGNED(ptr, data->alignment) &&
>> + !IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS))
>>  handle_missaligned_access(data, ptr);
>>  else
>>  handle_object_size_mismatch(data, ptr);
> 
> Won't that report an object size error instead of actually
> doing the required access?
>

Yes,I miss it.


> Surely it shouldn't get into this function at all?
> 
> I guess 'alignment' is set to 4 or 8.
> If it were set to 3 or 7 (or 0) then the tests on the pointer
> would be much simpler - maybe at a slight extra cost in setup.
> 

Looks like we need to fix it in the handle_missaligned_access:

diff --git a/lib/ubsan.c b/lib/ubsan.c
index 278b4c3..040f8b2 100644
--- a/lib/ubsan.c
+++ b/lib/ubsan.c
@@ -289,6 +289,9 @@ static void handle_missaligned_access(struct 
type_mismatch_data *data,
if (suppress_report(>location))
return;

+   if (IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS))
+   return;
+
ubsan_prologue(>location, );

pr_err("%s misaligned address %p for type %s\n",


Thanks
Ding


>   David
> 



[PATCH] ubsan: don't handle misaligned address when support unaligned access

2017-12-01 Thread Ding Tianhong
The ubsan always report Warning just like:

UBSAN: Undefined behaviour in ../include/linux/etherdevice.h:386:9
load of misaligned address ffc069ba0482 for type 'long unsigned int'
which requires 8 byte alignment
CPU: 0 PID: 901 Comm: sshd Not tainted 4.xx+ #1
Hardware name: linux,dummy-virt (DT)
Call trace:
[] dump_backtrace+0x0/0x348
[] show_stack+0x20/0x30
[] dump_stack+0x144/0x1b4
[] ubsan_epilogue+0x18/0x74
[] __ubsan_handle_type_mismatch+0x1a0/0x25c
[] dev_gro_receive+0x17d8/0x1830
[] napi_gro_receive+0x30/0x158
[] virtnet_receive+0xad4/0x1fa8

The reason is that when enable the CONFIG_UBSAN_ALIGNMENT, the ubsan
will report the unaligned access even if the system support it
(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y), it will produce a lot
of noise in the log and cause confusion.

This patch will close the detection of unaligned access when
the system support unaligned access.

Signed-off-by: Ding Tianhong <dingtianh...@huawei.com>
---
 lib/ubsan.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/ubsan.c b/lib/ubsan.c
index fb0409d..278b4c3 100644
--- a/lib/ubsan.c
+++ b/lib/ubsan.c
@@ -321,7 +321,8 @@ void __ubsan_handle_type_mismatch(struct type_mismatch_data 
*data,

if (!ptr)
handle_null_ptr_deref(data);
-   else if (data->alignment && !IS_ALIGNED(ptr, data->alignment))
+   else if (data->alignment && !IS_ALIGNED(ptr, data->alignment) &&
+!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS))
handle_missaligned_access(data, ptr);
else
handle_object_size_mismatch(data, ptr);
-- 
1.8.3.1



[PATCH] ubsan: don't handle misaligned address when support unaligned access

2017-12-01 Thread Ding Tianhong
The ubsan always report Warning just like:

UBSAN: Undefined behaviour in ../include/linux/etherdevice.h:386:9
load of misaligned address ffc069ba0482 for type 'long unsigned int'
which requires 8 byte alignment
CPU: 0 PID: 901 Comm: sshd Not tainted 4.xx+ #1
Hardware name: linux,dummy-virt (DT)
Call trace:
[] dump_backtrace+0x0/0x348
[] show_stack+0x20/0x30
[] dump_stack+0x144/0x1b4
[] ubsan_epilogue+0x18/0x74
[] __ubsan_handle_type_mismatch+0x1a0/0x25c
[] dev_gro_receive+0x17d8/0x1830
[] napi_gro_receive+0x30/0x158
[] virtnet_receive+0xad4/0x1fa8

The reason is that when enable the CONFIG_UBSAN_ALIGNMENT, the ubsan
will report the unaligned access even if the system support it
(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y), it will produce a lot
of noise in the log and cause confusion.

This patch will close the detection of unaligned access when
the system support unaligned access.

Signed-off-by: Ding Tianhong 
---
 lib/ubsan.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/ubsan.c b/lib/ubsan.c
index fb0409d..278b4c3 100644
--- a/lib/ubsan.c
+++ b/lib/ubsan.c
@@ -321,7 +321,8 @@ void __ubsan_handle_type_mismatch(struct type_mismatch_data 
*data,

if (!ptr)
handle_null_ptr_deref(data);
-   else if (data->alignment && !IS_ALIGNED(ptr, data->alignment))
+   else if (data->alignment && !IS_ALIGNED(ptr, data->alignment) &&
+!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS))
handle_missaligned_access(data, ptr);
else
handle_object_size_mismatch(data, ptr);
-- 
1.8.3.1



[PATCH v3 net 0/2 RESEND] net: ixgbe: Use new flag to disable Relaxed Ordering

2017-08-20 Thread Ding Tianhong
The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING has been added
to indicate that Relaxed Ordering Attributes (RO) should not
be used for Transaction Layer Packets (TLP) targeted toward
these affected Root Port, it will clear the bit4 in the PCIe
Device Control register, so the PCIe device drivers could
query PCIe configuration space to determine if it can send
TLPs to Root Port with the Relaxed Ordering Attributes set.

The ixgbe driver could use this flag to determine if it can
send TLPs to Root Port with the Relaxed Ordering Attributes set.

v2: Simplify the original program according Alex's suggestion,
remove the new ixgbe flag2 and only check the bit4 in the
PCIe Device Control register. 

v3: Remove the code that clears the bits in DCA_T/RXCTRL, relaxed
ordering should be enabled by the HW when the bus allow it.

Ding Tianhong (2):
  Revert commit 1a8b6d76dc5b ("net:add one common config...")
  net: ixgbe: Use new IXGBE_FLAG2_ROOT_NO_RELAXED_ORDERING flag

 arch/Kconfig|  3 --
 arch/sparc/Kconfig  |  1 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c  | 37 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 32 +++--
 4 files changed, 35 insertions(+), 38 deletions(-)

-- 
1.8.3.1




[PATCH v3 net 0/2 RESEND] net: ixgbe: Use new flag to disable Relaxed Ordering

2017-08-20 Thread Ding Tianhong
The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING has been added
to indicate that Relaxed Ordering Attributes (RO) should not
be used for Transaction Layer Packets (TLP) targeted toward
these affected Root Port, it will clear the bit4 in the PCIe
Device Control register, so the PCIe device drivers could
query PCIe configuration space to determine if it can send
TLPs to Root Port with the Relaxed Ordering Attributes set.

The ixgbe driver could use this flag to determine if it can
send TLPs to Root Port with the Relaxed Ordering Attributes set.

v2: Simplify the original program according Alex's suggestion,
remove the new ixgbe flag2 and only check the bit4 in the
PCIe Device Control register. 

v3: Remove the code that clears the bits in DCA_T/RXCTRL, relaxed
ordering should be enabled by the HW when the bus allow it.

Ding Tianhong (2):
  Revert commit 1a8b6d76dc5b ("net:add one common config...")
  net: ixgbe: Use new IXGBE_FLAG2_ROOT_NO_RELAXED_ORDERING flag

 arch/Kconfig|  3 --
 arch/sparc/Kconfig  |  1 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c  | 37 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 32 +++--
 4 files changed, 35 insertions(+), 38 deletions(-)

-- 
1.8.3.1




[PATCH v3 net 2/2 RESEND] net: ixgbe: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-08-20 Thread Ding Tianhong
The ixgbe driver use the compile check to determine if it can
send TLPs to Root Port with the Relaxed Ordering Attribute set,
this is too inconvenient, now the new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING
has been added to the kernel and we could check the bit4 in the PCIe
Device Control register to determine whether we should use the Relaxed
Ordering Attributes or not, so use this new way in the ixgbe driver.

Signed-off-by: Ding Tianhong <dingtianh...@huawei.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c  | 22 --
 drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 19 ---
 2 files changed, 41 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c
index 523f9d0..8a32eb7 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c
@@ -175,31 +175,9 @@ static s32 ixgbe_init_phy_ops_82598(struct ixgbe_hw *hw)
  **/
 static s32 ixgbe_start_hw_82598(struct ixgbe_hw *hw)
 {
-#ifndef CONFIG_SPARC
-   u32 regval;
-   u32 i;
-#endif
s32 ret_val;
 
ret_val = ixgbe_start_hw_generic(hw);
-
-#ifndef CONFIG_SPARC
-   /* Disable relaxed ordering */
-   for (i = 0; ((i < hw->mac.max_tx_queues) &&
-(i < IXGBE_DCA_MAX_QUEUES_82598)); i++) {
-   regval = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL(i));
-   regval &= ~IXGBE_DCA_TXCTRL_DESC_WRO_EN;
-   IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL(i), regval);
-   }
-
-   for (i = 0; ((i < hw->mac.max_rx_queues) &&
-(i < IXGBE_DCA_MAX_QUEUES_82598)); i++) {
-   regval = IXGBE_READ_REG(hw, IXGBE_DCA_RXCTRL(i));
-   regval &= ~(IXGBE_DCA_RXCTRL_DATA_WRO_EN |
-   IXGBE_DCA_RXCTRL_HEAD_WRO_EN);
-   IXGBE_WRITE_REG(hw, IXGBE_DCA_RXCTRL(i), regval);
-   }
-#endif
if (ret_val)
return ret_val;
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
index d4933d2..96c324f 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
@@ -350,25 +350,6 @@ s32 ixgbe_start_hw_gen2(struct ixgbe_hw *hw)
}
IXGBE_WRITE_FLUSH(hw);
 
-#ifndef CONFIG_SPARC
-   /* Disable relaxed ordering */
-   for (i = 0; i < hw->mac.max_tx_queues; i++) {
-   u32 regval;
-
-   regval = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL_82599(i));
-   regval &= ~IXGBE_DCA_TXCTRL_DESC_WRO_EN;
-   IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL_82599(i), regval);
-   }
-
-   for (i = 0; i < hw->mac.max_rx_queues; i++) {
-   u32 regval;
-
-   regval = IXGBE_READ_REG(hw, IXGBE_DCA_RXCTRL(i));
-   regval &= ~(IXGBE_DCA_RXCTRL_DATA_WRO_EN |
-   IXGBE_DCA_RXCTRL_HEAD_WRO_EN);
-   IXGBE_WRITE_REG(hw, IXGBE_DCA_RXCTRL(i), regval);
-   }
-#endif
return 0;
 }
 
-- 
1.8.3.1




[PATCH v3 net 1/2 RESEND] Revert commit 1a8b6d76dc5b ("net:add one common config...")

2017-08-20 Thread Ding Tianhong
The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING has been added
to indicate that Relaxed Ordering Attributes (RO) should not
be used for Transaction Layer Packets (TLP) targeted toward
these affected Root Port, it will clear the bit4 in the PCIe
Device Control register, so the PCIe device drivers could
query PCIe configuration space to determine if it can send
TLPs to Root Port with the Relaxed Ordering Attributes set.

With this new flag  we don't need the config ARCH_WANT_RELAX_ORDER
to control the Relaxed Ordering Attributes for the ixgbe drivers
just like the commit 1a8b6d76dc5b ("net:add one common config...") did,
so revert this commit.

Signed-off-by: Ding Tianhong <dingtianh...@huawei.com>
---
 arch/Kconfig| 3 ---
 arch/sparc/Kconfig  | 1 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 2 +-
 3 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 21d0089..00cfc63 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -928,9 +928,6 @@ config STRICT_MODULE_RWX
  and non-text memory will be made non-executable. This provides
  protection against certain security exploits (e.g. writing to text)
 
-config ARCH_WANT_RELAX_ORDER
-   bool
-
 config REFCOUNT_FULL
bool "Perform full reference count validation at the expense of speed"
help
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index a4a6261..987a575 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -44,7 +44,6 @@ config SPARC
select ARCH_HAS_SG_CHAIN
select CPU_NO_EFFICIENT_FFS
select LOCKDEP_SMALL if LOCKDEP
-   select ARCH_WANT_RELAX_ORDER
 
 config SPARC32
def_bool !64BIT
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
index 4e35e70..d4933d2 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
@@ -350,7 +350,7 @@ s32 ixgbe_start_hw_gen2(struct ixgbe_hw *hw)
}
IXGBE_WRITE_FLUSH(hw);
 
-#ifndef CONFIG_ARCH_WANT_RELAX_ORDER
+#ifndef CONFIG_SPARC
/* Disable relaxed ordering */
for (i = 0; i < hw->mac.max_tx_queues; i++) {
u32 regval;
-- 
1.8.3.1




[PATCH v3 net 2/2 RESEND] net: ixgbe: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-08-20 Thread Ding Tianhong
The ixgbe driver use the compile check to determine if it can
send TLPs to Root Port with the Relaxed Ordering Attribute set,
this is too inconvenient, now the new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING
has been added to the kernel and we could check the bit4 in the PCIe
Device Control register to determine whether we should use the Relaxed
Ordering Attributes or not, so use this new way in the ixgbe driver.

Signed-off-by: Ding Tianhong 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c  | 22 --
 drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 19 ---
 2 files changed, 41 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c
index 523f9d0..8a32eb7 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c
@@ -175,31 +175,9 @@ static s32 ixgbe_init_phy_ops_82598(struct ixgbe_hw *hw)
  **/
 static s32 ixgbe_start_hw_82598(struct ixgbe_hw *hw)
 {
-#ifndef CONFIG_SPARC
-   u32 regval;
-   u32 i;
-#endif
s32 ret_val;
 
ret_val = ixgbe_start_hw_generic(hw);
-
-#ifndef CONFIG_SPARC
-   /* Disable relaxed ordering */
-   for (i = 0; ((i < hw->mac.max_tx_queues) &&
-(i < IXGBE_DCA_MAX_QUEUES_82598)); i++) {
-   regval = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL(i));
-   regval &= ~IXGBE_DCA_TXCTRL_DESC_WRO_EN;
-   IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL(i), regval);
-   }
-
-   for (i = 0; ((i < hw->mac.max_rx_queues) &&
-(i < IXGBE_DCA_MAX_QUEUES_82598)); i++) {
-   regval = IXGBE_READ_REG(hw, IXGBE_DCA_RXCTRL(i));
-   regval &= ~(IXGBE_DCA_RXCTRL_DATA_WRO_EN |
-   IXGBE_DCA_RXCTRL_HEAD_WRO_EN);
-   IXGBE_WRITE_REG(hw, IXGBE_DCA_RXCTRL(i), regval);
-   }
-#endif
if (ret_val)
return ret_val;
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
index d4933d2..96c324f 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
@@ -350,25 +350,6 @@ s32 ixgbe_start_hw_gen2(struct ixgbe_hw *hw)
}
IXGBE_WRITE_FLUSH(hw);
 
-#ifndef CONFIG_SPARC
-   /* Disable relaxed ordering */
-   for (i = 0; i < hw->mac.max_tx_queues; i++) {
-   u32 regval;
-
-   regval = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL_82599(i));
-   regval &= ~IXGBE_DCA_TXCTRL_DESC_WRO_EN;
-   IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL_82599(i), regval);
-   }
-
-   for (i = 0; i < hw->mac.max_rx_queues; i++) {
-   u32 regval;
-
-   regval = IXGBE_READ_REG(hw, IXGBE_DCA_RXCTRL(i));
-   regval &= ~(IXGBE_DCA_RXCTRL_DATA_WRO_EN |
-   IXGBE_DCA_RXCTRL_HEAD_WRO_EN);
-   IXGBE_WRITE_REG(hw, IXGBE_DCA_RXCTRL(i), regval);
-   }
-#endif
return 0;
 }
 
-- 
1.8.3.1




[PATCH v3 net 1/2 RESEND] Revert commit 1a8b6d76dc5b ("net:add one common config...")

2017-08-20 Thread Ding Tianhong
The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING has been added
to indicate that Relaxed Ordering Attributes (RO) should not
be used for Transaction Layer Packets (TLP) targeted toward
these affected Root Port, it will clear the bit4 in the PCIe
Device Control register, so the PCIe device drivers could
query PCIe configuration space to determine if it can send
TLPs to Root Port with the Relaxed Ordering Attributes set.

With this new flag  we don't need the config ARCH_WANT_RELAX_ORDER
to control the Relaxed Ordering Attributes for the ixgbe drivers
just like the commit 1a8b6d76dc5b ("net:add one common config...") did,
so revert this commit.

Signed-off-by: Ding Tianhong 
---
 arch/Kconfig| 3 ---
 arch/sparc/Kconfig  | 1 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 2 +-
 3 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 21d0089..00cfc63 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -928,9 +928,6 @@ config STRICT_MODULE_RWX
  and non-text memory will be made non-executable. This provides
  protection against certain security exploits (e.g. writing to text)
 
-config ARCH_WANT_RELAX_ORDER
-   bool
-
 config REFCOUNT_FULL
bool "Perform full reference count validation at the expense of speed"
help
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index a4a6261..987a575 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -44,7 +44,6 @@ config SPARC
select ARCH_HAS_SG_CHAIN
select CPU_NO_EFFICIENT_FFS
select LOCKDEP_SMALL if LOCKDEP
-   select ARCH_WANT_RELAX_ORDER
 
 config SPARC32
def_bool !64BIT
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
index 4e35e70..d4933d2 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
@@ -350,7 +350,7 @@ s32 ixgbe_start_hw_gen2(struct ixgbe_hw *hw)
}
IXGBE_WRITE_FLUSH(hw);
 
-#ifndef CONFIG_ARCH_WANT_RELAX_ORDER
+#ifndef CONFIG_SPARC
/* Disable relaxed ordering */
for (i = 0; i < hw->mac.max_tx_queues; i++) {
u32 regval;
-- 
1.8.3.1




[PATCH v3 net 0/2] net: ixgbe: Use new flag to disable Relaxed Ordering

2017-08-18 Thread Ding Tianhong
From: Mao Wenan <maowe...@huawei.com>

The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING has been added
to indicate that Relaxed Ordering Attributes (RO) should not
be used for Transaction Layer Packets (TLP) targeted toward
these affected Root Port, it will clear the bit4 in the PCIe
Device Control register, so the PCIe device drivers could
query PCIe configuration space to determine if it can send
TLPs to Root Port with the Relaxed Ordering Attributes set.

The ixgbe driver could use this flag to determine if it can
send TLPs to Root Port with the Relaxed Ordering Attributes set.

v2: Simplify the original program according Alex's suggestion,
remove the new ixgbe flag2 and only check the bit4 in the
PCIe Device Control register. 

v3: Remove the code that clears the bits in DCA_T/RXCTRL, relaxed
ordering should be enabled by the HW when the bus allow it.

Ding Tianhong (2):
  Revert commit 1a8b6d76dc5b ("net:add one common config...")
  net: ixgbe: Use new IXGBE_FLAG2_ROOT_NO_RELAXED_ORDERING flag

 arch/Kconfig|  3 --
 arch/sparc/Kconfig  |  1 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c  | 37 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 32 +++--
 4 files changed, 35 insertions(+), 38 deletions(-)

-- 
1.8.3.1




[PATCH v3 net 2/2] net: ixgbe: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-08-18 Thread Ding Tianhong
The ixgbe driver use the compile check to determine if it can
send TLPs to Root Port with the Relaxed Ordering Attribute set,
this is too inconvenient, now the new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING
has been added to the kernel and we could check the bit4 in the PCIe
Device Control register to determine whether we should use the Relaxed
Ordering Attributes or not, so use this new way in the ixgbe driver.

Signed-off-by: Ding Tianhong <dingtianh...@huawei.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c  | 22 --
 drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 19 ---
 2 files changed, 41 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c
index 523f9d0..8a32eb7 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c
@@ -175,31 +175,9 @@ static s32 ixgbe_init_phy_ops_82598(struct ixgbe_hw *hw)
  **/
 static s32 ixgbe_start_hw_82598(struct ixgbe_hw *hw)
 {
-#ifndef CONFIG_SPARC
-   u32 regval;
-   u32 i;
-#endif
s32 ret_val;
 
ret_val = ixgbe_start_hw_generic(hw);
-
-#ifndef CONFIG_SPARC
-   /* Disable relaxed ordering */
-   for (i = 0; ((i < hw->mac.max_tx_queues) &&
-(i < IXGBE_DCA_MAX_QUEUES_82598)); i++) {
-   regval = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL(i));
-   regval &= ~IXGBE_DCA_TXCTRL_DESC_WRO_EN;
-   IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL(i), regval);
-   }
-
-   for (i = 0; ((i < hw->mac.max_rx_queues) &&
-(i < IXGBE_DCA_MAX_QUEUES_82598)); i++) {
-   regval = IXGBE_READ_REG(hw, IXGBE_DCA_RXCTRL(i));
-   regval &= ~(IXGBE_DCA_RXCTRL_DATA_WRO_EN |
-   IXGBE_DCA_RXCTRL_HEAD_WRO_EN);
-   IXGBE_WRITE_REG(hw, IXGBE_DCA_RXCTRL(i), regval);
-   }
-#endif
if (ret_val)
return ret_val;
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
index d4933d2..96c324f 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
@@ -350,25 +350,6 @@ s32 ixgbe_start_hw_gen2(struct ixgbe_hw *hw)
}
IXGBE_WRITE_FLUSH(hw);
 
-#ifndef CONFIG_SPARC
-   /* Disable relaxed ordering */
-   for (i = 0; i < hw->mac.max_tx_queues; i++) {
-   u32 regval;
-
-   regval = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL_82599(i));
-   regval &= ~IXGBE_DCA_TXCTRL_DESC_WRO_EN;
-   IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL_82599(i), regval);
-   }
-
-   for (i = 0; i < hw->mac.max_rx_queues; i++) {
-   u32 regval;
-
-   regval = IXGBE_READ_REG(hw, IXGBE_DCA_RXCTRL(i));
-   regval &= ~(IXGBE_DCA_RXCTRL_DATA_WRO_EN |
-   IXGBE_DCA_RXCTRL_HEAD_WRO_EN);
-   IXGBE_WRITE_REG(hw, IXGBE_DCA_RXCTRL(i), regval);
-   }
-#endif
return 0;
 }
 
-- 
1.8.3.1




[PATCH v3 net 0/2] net: ixgbe: Use new flag to disable Relaxed Ordering

2017-08-18 Thread Ding Tianhong
From: Mao Wenan 

The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING has been added
to indicate that Relaxed Ordering Attributes (RO) should not
be used for Transaction Layer Packets (TLP) targeted toward
these affected Root Port, it will clear the bit4 in the PCIe
Device Control register, so the PCIe device drivers could
query PCIe configuration space to determine if it can send
TLPs to Root Port with the Relaxed Ordering Attributes set.

The ixgbe driver could use this flag to determine if it can
send TLPs to Root Port with the Relaxed Ordering Attributes set.

v2: Simplify the original program according Alex's suggestion,
remove the new ixgbe flag2 and only check the bit4 in the
PCIe Device Control register. 

v3: Remove the code that clears the bits in DCA_T/RXCTRL, relaxed
ordering should be enabled by the HW when the bus allow it.

Ding Tianhong (2):
  Revert commit 1a8b6d76dc5b ("net:add one common config...")
  net: ixgbe: Use new IXGBE_FLAG2_ROOT_NO_RELAXED_ORDERING flag

 arch/Kconfig|  3 --
 arch/sparc/Kconfig  |  1 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c  | 37 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 32 +++--
 4 files changed, 35 insertions(+), 38 deletions(-)

-- 
1.8.3.1




[PATCH v3 net 2/2] net: ixgbe: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-08-18 Thread Ding Tianhong
The ixgbe driver use the compile check to determine if it can
send TLPs to Root Port with the Relaxed Ordering Attribute set,
this is too inconvenient, now the new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING
has been added to the kernel and we could check the bit4 in the PCIe
Device Control register to determine whether we should use the Relaxed
Ordering Attributes or not, so use this new way in the ixgbe driver.

Signed-off-by: Ding Tianhong 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c  | 22 --
 drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 19 ---
 2 files changed, 41 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c
index 523f9d0..8a32eb7 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c
@@ -175,31 +175,9 @@ static s32 ixgbe_init_phy_ops_82598(struct ixgbe_hw *hw)
  **/
 static s32 ixgbe_start_hw_82598(struct ixgbe_hw *hw)
 {
-#ifndef CONFIG_SPARC
-   u32 regval;
-   u32 i;
-#endif
s32 ret_val;
 
ret_val = ixgbe_start_hw_generic(hw);
-
-#ifndef CONFIG_SPARC
-   /* Disable relaxed ordering */
-   for (i = 0; ((i < hw->mac.max_tx_queues) &&
-(i < IXGBE_DCA_MAX_QUEUES_82598)); i++) {
-   regval = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL(i));
-   regval &= ~IXGBE_DCA_TXCTRL_DESC_WRO_EN;
-   IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL(i), regval);
-   }
-
-   for (i = 0; ((i < hw->mac.max_rx_queues) &&
-(i < IXGBE_DCA_MAX_QUEUES_82598)); i++) {
-   regval = IXGBE_READ_REG(hw, IXGBE_DCA_RXCTRL(i));
-   regval &= ~(IXGBE_DCA_RXCTRL_DATA_WRO_EN |
-   IXGBE_DCA_RXCTRL_HEAD_WRO_EN);
-   IXGBE_WRITE_REG(hw, IXGBE_DCA_RXCTRL(i), regval);
-   }
-#endif
if (ret_val)
return ret_val;
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
index d4933d2..96c324f 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
@@ -350,25 +350,6 @@ s32 ixgbe_start_hw_gen2(struct ixgbe_hw *hw)
}
IXGBE_WRITE_FLUSH(hw);
 
-#ifndef CONFIG_SPARC
-   /* Disable relaxed ordering */
-   for (i = 0; i < hw->mac.max_tx_queues; i++) {
-   u32 regval;
-
-   regval = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL_82599(i));
-   regval &= ~IXGBE_DCA_TXCTRL_DESC_WRO_EN;
-   IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL_82599(i), regval);
-   }
-
-   for (i = 0; i < hw->mac.max_rx_queues; i++) {
-   u32 regval;
-
-   regval = IXGBE_READ_REG(hw, IXGBE_DCA_RXCTRL(i));
-   regval &= ~(IXGBE_DCA_RXCTRL_DATA_WRO_EN |
-   IXGBE_DCA_RXCTRL_HEAD_WRO_EN);
-   IXGBE_WRITE_REG(hw, IXGBE_DCA_RXCTRL(i), regval);
-   }
-#endif
return 0;
 }
 
-- 
1.8.3.1




[PATCH v3 net 1/2] Revert commit 1a8b6d76dc5b ("net:add one common config...")

2017-08-18 Thread Ding Tianhong
The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING has been added
to indicate that Relaxed Ordering Attributes (RO) should not
be used for Transaction Layer Packets (TLP) targeted toward
these affected Root Port, it will clear the bit4 in the PCIe
Device Control register, so the PCIe device drivers could
query PCIe configuration space to determine if it can send
TLPs to Root Port with the Relaxed Ordering Attributes set.

With this new flag  we don't need the config ARCH_WANT_RELAX_ORDER
to control the Relaxed Ordering Attributes for the ixgbe drivers
just like the commit 1a8b6d76dc5b ("net:add one common config...") did,
so revert this commit.

Signed-off-by: Ding Tianhong <dingtianh...@huawei.com>
---
 arch/Kconfig| 3 ---
 arch/sparc/Kconfig  | 1 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 2 +-
 3 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 21d0089..00cfc63 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -928,9 +928,6 @@ config STRICT_MODULE_RWX
  and non-text memory will be made non-executable. This provides
  protection against certain security exploits (e.g. writing to text)
 
-config ARCH_WANT_RELAX_ORDER
-   bool
-
 config REFCOUNT_FULL
bool "Perform full reference count validation at the expense of speed"
help
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index a4a6261..987a575 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -44,7 +44,6 @@ config SPARC
select ARCH_HAS_SG_CHAIN
select CPU_NO_EFFICIENT_FFS
select LOCKDEP_SMALL if LOCKDEP
-   select ARCH_WANT_RELAX_ORDER
 
 config SPARC32
def_bool !64BIT
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
index 4e35e70..d4933d2 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
@@ -350,7 +350,7 @@ s32 ixgbe_start_hw_gen2(struct ixgbe_hw *hw)
}
IXGBE_WRITE_FLUSH(hw);
 
-#ifndef CONFIG_ARCH_WANT_RELAX_ORDER
+#ifndef CONFIG_SPARC
/* Disable relaxed ordering */
for (i = 0; i < hw->mac.max_tx_queues; i++) {
u32 regval;
-- 
1.8.3.1




[PATCH v3 net 1/2] Revert commit 1a8b6d76dc5b ("net:add one common config...")

2017-08-18 Thread Ding Tianhong
The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING has been added
to indicate that Relaxed Ordering Attributes (RO) should not
be used for Transaction Layer Packets (TLP) targeted toward
these affected Root Port, it will clear the bit4 in the PCIe
Device Control register, so the PCIe device drivers could
query PCIe configuration space to determine if it can send
TLPs to Root Port with the Relaxed Ordering Attributes set.

With this new flag  we don't need the config ARCH_WANT_RELAX_ORDER
to control the Relaxed Ordering Attributes for the ixgbe drivers
just like the commit 1a8b6d76dc5b ("net:add one common config...") did,
so revert this commit.

Signed-off-by: Ding Tianhong 
---
 arch/Kconfig| 3 ---
 arch/sparc/Kconfig  | 1 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 2 +-
 3 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 21d0089..00cfc63 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -928,9 +928,6 @@ config STRICT_MODULE_RWX
  and non-text memory will be made non-executable. This provides
  protection against certain security exploits (e.g. writing to text)
 
-config ARCH_WANT_RELAX_ORDER
-   bool
-
 config REFCOUNT_FULL
bool "Perform full reference count validation at the expense of speed"
help
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index a4a6261..987a575 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -44,7 +44,6 @@ config SPARC
select ARCH_HAS_SG_CHAIN
select CPU_NO_EFFICIENT_FFS
select LOCKDEP_SMALL if LOCKDEP
-   select ARCH_WANT_RELAX_ORDER
 
 config SPARC32
def_bool !64BIT
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
index 4e35e70..d4933d2 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
@@ -350,7 +350,7 @@ s32 ixgbe_start_hw_gen2(struct ixgbe_hw *hw)
}
IXGBE_WRITE_FLUSH(hw);
 
-#ifndef CONFIG_ARCH_WANT_RELAX_ORDER
+#ifndef CONFIG_SPARC
/* Disable relaxed ordering */
for (i = 0; i < hw->mac.max_tx_queues; i++) {
u32 regval;
-- 
1.8.3.1




Re: [PATCH net v2 2/2] net: ixgbe: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-08-17 Thread Ding Tianhong


On 2017/8/18 13:04, Tantilov, Emil S wrote:
>> -Original Message-
>> From: Ding Tianhong [mailto:dingtianh...@huawei.com]
>> Sent: Thursday, August 17, 2017 5:39 PM
>> To: Tantilov, Emil S <emil.s.tanti...@intel.com>; da...@davemloft.net;
>> Kirsher, Jeffrey T <jeffrey.t.kirs...@intel.com>; keesc...@chromium.org;
>> linux-kernel@vger.kernel.org; sparcli...@vger.kernel.org; intel-wired-
>> l...@lists.osuosl.org; alexander.du...@gmail.com; net...@vger.kernel.org;
>> linux...@huawei.com
>> Subject: Re: [PATCH net v2 2/2] net: ixgbe: Use new
>> PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
>>
>>
>>
>> On 2017/8/17 22:17, Tantilov, Emil S wrote:
>>
>>>>ret_val = ixgbe_start_hw_generic(hw);
>>>>
>>>> -#ifndef CONFIG_SPARC
>>>> -  /* Disable relaxed ordering */
>>>> -  for (i = 0; ((i < hw->mac.max_tx_queues) &&
>>>> -   (i < IXGBE_DCA_MAX_QUEUES_82598)); i++) {
>>>> -  regval = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL(i));
>>>> -  regval &= ~IXGBE_DCA_TXCTRL_DESC_WRO_EN;
>>>> -  IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL(i), regval);
>>>> -  }
>>>> +  if (!pcie_relaxed_ordering_enabled(adapter->pdev)) {
>>>
>>> As Alex mentioned there is no need for this check in any form.
>>>
>>> The HW defaults to Relaxed Ordering enabled unless it is disabled in
>>> the PCIe Device Control Register. So the above logic is already done by
>> HW.
>>>
>>> All you have to do is strip the code disabling relaxed ordering.
>>>
>>
>> Hi Tantilov:
>>
>> I misunderstood Alex's suggestion, But I still couldn't find the logic
>> where
>> the HW disable the Relaxed Ordering when the PCIe Device Control Register
>> disable it, can you point it out?
> 
> If you look at the datasheet (82599) - the description of CTRL_EXT.RO_DIS 
> (bit 17, 0b):
> 
> Relaxed Ordering Disable. When set to 1b, the device does not request any 
> relaxed
> ordering transactions. When this bit is cleared and the Enable Relaxed 
> Ordering bit in
> the Device Control register is set, the device requests relaxed ordering 
> transactions per queues as configured in the DCA_RXCTRL[n] and DCA_TXCTRL[n] 
> registers.
> 
> So if you remove the code that clears the bits in DCA_T/RXCTRL relaxed 
> ordering should
> be enabled by HW when the bus allows it.
> 

Great, Thanks for your explanation.

> Thanks,
> Emil
> 
> 
> .
> 



Re: [PATCH net v2 2/2] net: ixgbe: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-08-17 Thread Ding Tianhong


On 2017/8/18 13:04, Tantilov, Emil S wrote:
>> -Original Message-
>> From: Ding Tianhong [mailto:dingtianh...@huawei.com]
>> Sent: Thursday, August 17, 2017 5:39 PM
>> To: Tantilov, Emil S ; da...@davemloft.net;
>> Kirsher, Jeffrey T ; keesc...@chromium.org;
>> linux-kernel@vger.kernel.org; sparcli...@vger.kernel.org; intel-wired-
>> l...@lists.osuosl.org; alexander.du...@gmail.com; net...@vger.kernel.org;
>> linux...@huawei.com
>> Subject: Re: [PATCH net v2 2/2] net: ixgbe: Use new
>> PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
>>
>>
>>
>> On 2017/8/17 22:17, Tantilov, Emil S wrote:
>>
>>>>ret_val = ixgbe_start_hw_generic(hw);
>>>>
>>>> -#ifndef CONFIG_SPARC
>>>> -  /* Disable relaxed ordering */
>>>> -  for (i = 0; ((i < hw->mac.max_tx_queues) &&
>>>> -   (i < IXGBE_DCA_MAX_QUEUES_82598)); i++) {
>>>> -  regval = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL(i));
>>>> -  regval &= ~IXGBE_DCA_TXCTRL_DESC_WRO_EN;
>>>> -  IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL(i), regval);
>>>> -  }
>>>> +  if (!pcie_relaxed_ordering_enabled(adapter->pdev)) {
>>>
>>> As Alex mentioned there is no need for this check in any form.
>>>
>>> The HW defaults to Relaxed Ordering enabled unless it is disabled in
>>> the PCIe Device Control Register. So the above logic is already done by
>> HW.
>>>
>>> All you have to do is strip the code disabling relaxed ordering.
>>>
>>
>> Hi Tantilov:
>>
>> I misunderstood Alex's suggestion, But I still couldn't find the logic
>> where
>> the HW disable the Relaxed Ordering when the PCIe Device Control Register
>> disable it, can you point it out?
> 
> If you look at the datasheet (82599) - the description of CTRL_EXT.RO_DIS 
> (bit 17, 0b):
> 
> Relaxed Ordering Disable. When set to 1b, the device does not request any 
> relaxed
> ordering transactions. When this bit is cleared and the Enable Relaxed 
> Ordering bit in
> the Device Control register is set, the device requests relaxed ordering 
> transactions per queues as configured in the DCA_RXCTRL[n] and DCA_TXCTRL[n] 
> registers.
> 
> So if you remove the code that clears the bits in DCA_T/RXCTRL relaxed 
> ordering should
> be enabled by HW when the bus allows it.
> 

Great, Thanks for your explanation.

> Thanks,
> Emil
> 
> 
> .
> 



Re: [PATCH net v2 2/2] net: ixgbe: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-08-17 Thread Ding Tianhong


On 2017/8/17 22:17, Tantilov, Emil S wrote:

>>  ret_val = ixgbe_start_hw_generic(hw);
>>
>> -#ifndef CONFIG_SPARC
>> -/* Disable relaxed ordering */
>> -for (i = 0; ((i < hw->mac.max_tx_queues) &&
>> - (i < IXGBE_DCA_MAX_QUEUES_82598)); i++) {
>> -regval = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL(i));
>> -regval &= ~IXGBE_DCA_TXCTRL_DESC_WRO_EN;
>> -IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL(i), regval);
>> -}
>> +if (!pcie_relaxed_ordering_enabled(adapter->pdev)) {
> 
> As Alex mentioned there is no need for this check in any form.
> 
> The HW defaults to Relaxed Ordering enabled unless it is disabled in 
> the PCIe Device Control Register. So the above logic is already done by HW.
> 
> All you have to do is strip the code disabling relaxed ordering.
> 

Hi Tantilov:

I misunderstood Alex's suggestion, But I still couldn't find the logic where
the HW disable the Relaxed Ordering when the PCIe Device Control Register
disable it, can you point it out?

Thanks
Ding

> Thanks,
> Emil
> 
> 
> .
> 



Re: [PATCH net v2 2/2] net: ixgbe: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-08-17 Thread Ding Tianhong


On 2017/8/17 22:17, Tantilov, Emil S wrote:

>>  ret_val = ixgbe_start_hw_generic(hw);
>>
>> -#ifndef CONFIG_SPARC
>> -/* Disable relaxed ordering */
>> -for (i = 0; ((i < hw->mac.max_tx_queues) &&
>> - (i < IXGBE_DCA_MAX_QUEUES_82598)); i++) {
>> -regval = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL(i));
>> -regval &= ~IXGBE_DCA_TXCTRL_DESC_WRO_EN;
>> -IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL(i), regval);
>> -}
>> +if (!pcie_relaxed_ordering_enabled(adapter->pdev)) {
> 
> As Alex mentioned there is no need for this check in any form.
> 
> The HW defaults to Relaxed Ordering enabled unless it is disabled in 
> the PCIe Device Control Register. So the above logic is already done by HW.
> 
> All you have to do is strip the code disabling relaxed ordering.
> 

Hi Tantilov:

I misunderstood Alex's suggestion, But I still couldn't find the logic where
the HW disable the Relaxed Ordering when the PCIe Device Control Register
disable it, can you point it out?

Thanks
Ding

> Thanks,
> Emil
> 
> 
> .
> 



Re: [PATCH net] PCI: fix the return value for the pci_find_pcie_root_port()

2017-08-17 Thread Ding Tianhong


On 2017/8/17 21:30, Thierry Reding wrote:
> On Thu, Aug 17, 2017 at 08:40:16PM +0800, Ding Tianhong wrote:
>>
>>
>> On 2017/8/17 18:51, Thierry Reding wrote:
>>> On Thu, Aug 17, 2017 at 10:25:30AM +0800, Ding Tianhong wrote:
>>>> The pci_find_pcie_root_port() would return NULL if the given
>>>> dev is already a Root Port, it looks like unfriendly to the
>>>> PCIe Root Port device, Thierry and Bjorn suggest to let this
>>>> function return the given dev under this circumstances.
>>>>
>>>> Fixes: 0e405232871d6 ("PCI: fix oops when try to find Root Port for a PCI 
>>>> device")
>>>> Suggested-by: Thierry Reding <thierry.red...@gmail.com>
>>>> Suggested-by: Bjorn Helgaas <helg...@kernel.org>
>>>> Signed-off-by: Thierry Reding <thierry.red...@gmail.com>
>>>> Signed-off-by: Ding Tianhong <dingtianh...@huawei.com>
>>>> ---
>>>>  drivers/pci/pci.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>>>> index 7e2022f..352bb53 100644
>>>> --- a/drivers/pci/pci.c
>>>> +++ b/drivers/pci/pci.c
>>>> @@ -514,7 +514,7 @@ struct resource *pci_find_resource(struct pci_dev 
>>>> *dev, struct resource *res)
>>>>   */
>>>>  struct pci_dev *pci_find_pcie_root_port(struct pci_dev *dev)
>>>>  {
>>>> -  struct pci_dev *bridge, *highest_pcie_bridge = NULL;
>>>> +  struct pci_dev *bridge, *highest_pcie_bridge = dev;
>>>>  
>>>>bridge = pci_upstream_bridge(dev);
>>>>while (bridge && pci_is_pcie(bridge)) {
>>>
>>> I think this should actually be this change on top of a revert of commit
>>> 0e405232871d6 ("PCI: fix oops when try to find Root Port for a PCI
>>> device"). After the above change, the previous fix will have a redundant
>>> check because highest_pcie_bridge will never be NULL.
>>>
>>> Let me send out that version to clarify what I mean.
>>>
>>
>> Hi Thierry:
>>
>> The patch ("PCI: fix oops when try to find Root Port for a PCI device")
>> has been merge to the linus mainline tree before you found this 
>> deficiencies
> 
> I understand that. I'm just saying that there's no point keeping that
> change around because it no longer makes sense after we initialize the
> highest_pcie_bridge variable to dev.
> 

Ok, NO problem.:)

> Thierry
> 



Re: [PATCH net] PCI: fix the return value for the pci_find_pcie_root_port()

2017-08-17 Thread Ding Tianhong


On 2017/8/17 21:30, Thierry Reding wrote:
> On Thu, Aug 17, 2017 at 08:40:16PM +0800, Ding Tianhong wrote:
>>
>>
>> On 2017/8/17 18:51, Thierry Reding wrote:
>>> On Thu, Aug 17, 2017 at 10:25:30AM +0800, Ding Tianhong wrote:
>>>> The pci_find_pcie_root_port() would return NULL if the given
>>>> dev is already a Root Port, it looks like unfriendly to the
>>>> PCIe Root Port device, Thierry and Bjorn suggest to let this
>>>> function return the given dev under this circumstances.
>>>>
>>>> Fixes: 0e405232871d6 ("PCI: fix oops when try to find Root Port for a PCI 
>>>> device")
>>>> Suggested-by: Thierry Reding 
>>>> Suggested-by: Bjorn Helgaas 
>>>> Signed-off-by: Thierry Reding 
>>>> Signed-off-by: Ding Tianhong 
>>>> ---
>>>>  drivers/pci/pci.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>>>> index 7e2022f..352bb53 100644
>>>> --- a/drivers/pci/pci.c
>>>> +++ b/drivers/pci/pci.c
>>>> @@ -514,7 +514,7 @@ struct resource *pci_find_resource(struct pci_dev 
>>>> *dev, struct resource *res)
>>>>   */
>>>>  struct pci_dev *pci_find_pcie_root_port(struct pci_dev *dev)
>>>>  {
>>>> -  struct pci_dev *bridge, *highest_pcie_bridge = NULL;
>>>> +  struct pci_dev *bridge, *highest_pcie_bridge = dev;
>>>>  
>>>>bridge = pci_upstream_bridge(dev);
>>>>while (bridge && pci_is_pcie(bridge)) {
>>>
>>> I think this should actually be this change on top of a revert of commit
>>> 0e405232871d6 ("PCI: fix oops when try to find Root Port for a PCI
>>> device"). After the above change, the previous fix will have a redundant
>>> check because highest_pcie_bridge will never be NULL.
>>>
>>> Let me send out that version to clarify what I mean.
>>>
>>
>> Hi Thierry:
>>
>> The patch ("PCI: fix oops when try to find Root Port for a PCI device")
>> has been merge to the linus mainline tree before you found this 
>> deficiencies
> 
> I understand that. I'm just saying that there's no point keeping that
> change around because it no longer makes sense after we initialize the
> highest_pcie_bridge variable to dev.
> 

Ok, NO problem.:)

> Thierry
> 



Re: [PATCH net] PCI: fix the return value for the pci_find_pcie_root_port()

2017-08-17 Thread Ding Tianhong


On 2017/8/17 18:51, Thierry Reding wrote:
> On Thu, Aug 17, 2017 at 10:25:30AM +0800, Ding Tianhong wrote:
>> The pci_find_pcie_root_port() would return NULL if the given
>> dev is already a Root Port, it looks like unfriendly to the
>> PCIe Root Port device, Thierry and Bjorn suggest to let this
>> function return the given dev under this circumstances.
>>
>> Fixes: 0e405232871d6 ("PCI: fix oops when try to find Root Port for a PCI 
>> device")
>> Suggested-by: Thierry Reding <thierry.red...@gmail.com>
>> Suggested-by: Bjorn Helgaas <helg...@kernel.org>
>> Signed-off-by: Thierry Reding <thierry.red...@gmail.com>
>> Signed-off-by: Ding Tianhong <dingtianh...@huawei.com>
>> ---
>>  drivers/pci/pci.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index 7e2022f..352bb53 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -514,7 +514,7 @@ struct resource *pci_find_resource(struct pci_dev *dev, 
>> struct resource *res)
>>   */
>>  struct pci_dev *pci_find_pcie_root_port(struct pci_dev *dev)
>>  {
>> -struct pci_dev *bridge, *highest_pcie_bridge = NULL;
>> +struct pci_dev *bridge, *highest_pcie_bridge = dev;
>>  
>>  bridge = pci_upstream_bridge(dev);
>>  while (bridge && pci_is_pcie(bridge)) {
> 
> I think this should actually be this change on top of a revert of commit
> 0e405232871d6 ("PCI: fix oops when try to find Root Port for a PCI
> device"). After the above change, the previous fix will have a redundant
> check because highest_pcie_bridge will never be NULL.
> 
> Let me send out that version to clarify what I mean.
> 

Hi Thierry:

The patch ("PCI: fix oops when try to find Root Port for a PCI device")
has been merge to the linus mainline tree before you found this deficiencies

Regards
Tianhong

> Thierry
> 



Re: [PATCH net] PCI: fix the return value for the pci_find_pcie_root_port()

2017-08-17 Thread Ding Tianhong


On 2017/8/17 18:51, Thierry Reding wrote:
> On Thu, Aug 17, 2017 at 10:25:30AM +0800, Ding Tianhong wrote:
>> The pci_find_pcie_root_port() would return NULL if the given
>> dev is already a Root Port, it looks like unfriendly to the
>> PCIe Root Port device, Thierry and Bjorn suggest to let this
>> function return the given dev under this circumstances.
>>
>> Fixes: 0e405232871d6 ("PCI: fix oops when try to find Root Port for a PCI 
>> device")
>> Suggested-by: Thierry Reding 
>> Suggested-by: Bjorn Helgaas 
>> Signed-off-by: Thierry Reding 
>> Signed-off-by: Ding Tianhong 
>> ---
>>  drivers/pci/pci.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index 7e2022f..352bb53 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -514,7 +514,7 @@ struct resource *pci_find_resource(struct pci_dev *dev, 
>> struct resource *res)
>>   */
>>  struct pci_dev *pci_find_pcie_root_port(struct pci_dev *dev)
>>  {
>> -struct pci_dev *bridge, *highest_pcie_bridge = NULL;
>> +struct pci_dev *bridge, *highest_pcie_bridge = dev;
>>  
>>  bridge = pci_upstream_bridge(dev);
>>  while (bridge && pci_is_pcie(bridge)) {
> 
> I think this should actually be this change on top of a revert of commit
> 0e405232871d6 ("PCI: fix oops when try to find Root Port for a PCI
> device"). After the above change, the previous fix will have a redundant
> check because highest_pcie_bridge will never be NULL.
> 
> Let me send out that version to clarify what I mean.
> 

Hi Thierry:

The patch ("PCI: fix oops when try to find Root Port for a PCI device")
has been merge to the linus mainline tree before you found this deficiencies

Regards
Tianhong

> Thierry
> 



[PATCH net v2 1/2] Revert commit 1a8b6d76dc5b ("net:add one common config...")

2017-08-16 Thread Ding Tianhong
The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING has been added
to indicate that Relaxed Ordering Attributes (RO) should not
be used for Transaction Layer Packets (TLP) targeted toward
these affected Root Port, it will clear the bit4 in the PCIe
Device Control register, so the PCIe device drivers could
query PCIe configuration space to determine if it can send
TLPs to Root Port with the Relaxed Ordering Attributes set.

With this new flag  we don't need the config ARCH_WANT_RELAX_ORDER
to control the Relaxed Ordering Attributes for the ixgbe drivers
just like the commit 1a8b6d76dc5b ("net:add one common config...") did,
so revert this commit.

Signed-off-by: Ding Tianhong <dingtianh...@huawei.com>
---
 arch/Kconfig| 3 ---
 arch/sparc/Kconfig  | 1 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 2 +-
 3 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 21d0089..00cfc63 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -928,9 +928,6 @@ config STRICT_MODULE_RWX
  and non-text memory will be made non-executable. This provides
  protection against certain security exploits (e.g. writing to text)
 
-config ARCH_WANT_RELAX_ORDER
-   bool
-
 config REFCOUNT_FULL
bool "Perform full reference count validation at the expense of speed"
help
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index a4a6261..987a575 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -44,7 +44,6 @@ config SPARC
select ARCH_HAS_SG_CHAIN
select CPU_NO_EFFICIENT_FFS
select LOCKDEP_SMALL if LOCKDEP
-   select ARCH_WANT_RELAX_ORDER
 
 config SPARC32
def_bool !64BIT
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
index 4e35e70..d4933d2 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
@@ -350,7 +350,7 @@ s32 ixgbe_start_hw_gen2(struct ixgbe_hw *hw)
}
IXGBE_WRITE_FLUSH(hw);
 
-#ifndef CONFIG_ARCH_WANT_RELAX_ORDER
+#ifndef CONFIG_SPARC
/* Disable relaxed ordering */
for (i = 0; i < hw->mac.max_tx_queues; i++) {
u32 regval;
-- 
1.8.3.1




[PATCH net v2 1/2] Revert commit 1a8b6d76dc5b ("net:add one common config...")

2017-08-16 Thread Ding Tianhong
The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING has been added
to indicate that Relaxed Ordering Attributes (RO) should not
be used for Transaction Layer Packets (TLP) targeted toward
these affected Root Port, it will clear the bit4 in the PCIe
Device Control register, so the PCIe device drivers could
query PCIe configuration space to determine if it can send
TLPs to Root Port with the Relaxed Ordering Attributes set.

With this new flag  we don't need the config ARCH_WANT_RELAX_ORDER
to control the Relaxed Ordering Attributes for the ixgbe drivers
just like the commit 1a8b6d76dc5b ("net:add one common config...") did,
so revert this commit.

Signed-off-by: Ding Tianhong 
---
 arch/Kconfig| 3 ---
 arch/sparc/Kconfig  | 1 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 2 +-
 3 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 21d0089..00cfc63 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -928,9 +928,6 @@ config STRICT_MODULE_RWX
  and non-text memory will be made non-executable. This provides
  protection against certain security exploits (e.g. writing to text)
 
-config ARCH_WANT_RELAX_ORDER
-   bool
-
 config REFCOUNT_FULL
bool "Perform full reference count validation at the expense of speed"
help
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index a4a6261..987a575 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -44,7 +44,6 @@ config SPARC
select ARCH_HAS_SG_CHAIN
select CPU_NO_EFFICIENT_FFS
select LOCKDEP_SMALL if LOCKDEP
-   select ARCH_WANT_RELAX_ORDER
 
 config SPARC32
def_bool !64BIT
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
index 4e35e70..d4933d2 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
@@ -350,7 +350,7 @@ s32 ixgbe_start_hw_gen2(struct ixgbe_hw *hw)
}
IXGBE_WRITE_FLUSH(hw);
 
-#ifndef CONFIG_ARCH_WANT_RELAX_ORDER
+#ifndef CONFIG_SPARC
/* Disable relaxed ordering */
for (i = 0; i < hw->mac.max_tx_queues; i++) {
u32 regval;
-- 
1.8.3.1




[PATCH net v2 0/2] net: ixgbe: Use new flag to disable Relaxed Ordering

2017-08-16 Thread Ding Tianhong
The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING has been added
to indicate that Relaxed Ordering Attributes (RO) should not
be used for Transaction Layer Packets (TLP) targeted toward
these affected Root Port, it will clear the bit4 in the PCIe
Device Control register, so the PCIe device drivers could
query PCIe configuration space to determine if it can send
TLPs to Root Port with the Relaxed Ordering Attributes set.

The ixgbe driver could use this flag to determine if it can
send TLPs to Root Port with the Relaxed Ordering Attributes set.

v2: Simplify the original program according Alex's suggestion,
remove the new ixgbe flag2 and only check the bit4 in the
PCIe Device Control register. 

Ding Tianhong (2):
  Revert commit 1a8b6d76dc5b ("net:add one common config...")
  net: ixgbe: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

 arch/Kconfig|  3 --
 arch/sparc/Kconfig  |  1 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c  | 37 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 32 +++--
 4 files changed, 35 insertions(+), 38 deletions(-)

-- 
1.8.3.1




[PATCH net v2 0/2] net: ixgbe: Use new flag to disable Relaxed Ordering

2017-08-16 Thread Ding Tianhong
The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING has been added
to indicate that Relaxed Ordering Attributes (RO) should not
be used for Transaction Layer Packets (TLP) targeted toward
these affected Root Port, it will clear the bit4 in the PCIe
Device Control register, so the PCIe device drivers could
query PCIe configuration space to determine if it can send
TLPs to Root Port with the Relaxed Ordering Attributes set.

The ixgbe driver could use this flag to determine if it can
send TLPs to Root Port with the Relaxed Ordering Attributes set.

v2: Simplify the original program according Alex's suggestion,
remove the new ixgbe flag2 and only check the bit4 in the
PCIe Device Control register. 

Ding Tianhong (2):
  Revert commit 1a8b6d76dc5b ("net:add one common config...")
  net: ixgbe: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

 arch/Kconfig|  3 --
 arch/sparc/Kconfig  |  1 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c  | 37 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 32 +++--
 4 files changed, 35 insertions(+), 38 deletions(-)

-- 
1.8.3.1




[PATCH net v2 2/2] net: ixgbe: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-08-16 Thread Ding Tianhong
The ixgbe driver use the compile check to determine if it can
send TLPs to Root Port with the Relaxed Ordering Attribute set,
this is too inconvenient, now the new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING
has been added to the kernel and we could check the bit4 in the PCIe
Device Control register to determine whether we should use the Relaxed
Ordering Attributes or not, so use this new way in the ixgbe driver.

Signed-off-by: Ding Tianhong <dingtianh...@huawei.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c  | 37 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 32 +++--
 2 files changed, 35 insertions(+), 34 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c
index 523f9d0..d1571e3 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c
@@ -175,31 +175,30 @@ static s32 ixgbe_init_phy_ops_82598(struct ixgbe_hw *hw)
  **/
 static s32 ixgbe_start_hw_82598(struct ixgbe_hw *hw)
 {
-#ifndef CONFIG_SPARC
-   u32 regval;
-   u32 i;
-#endif
+   u32 regval, i;
s32 ret_val;
+   struct ixgbe_adapter *adapter = hw->back;
 
ret_val = ixgbe_start_hw_generic(hw);
 
-#ifndef CONFIG_SPARC
-   /* Disable relaxed ordering */
-   for (i = 0; ((i < hw->mac.max_tx_queues) &&
-(i < IXGBE_DCA_MAX_QUEUES_82598)); i++) {
-   regval = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL(i));
-   regval &= ~IXGBE_DCA_TXCTRL_DESC_WRO_EN;
-   IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL(i), regval);
-   }
+   if (!pcie_relaxed_ordering_enabled(adapter->pdev)) {
+   /* Disable relaxed ordering */
+   for (i = 0; ((i < hw->mac.max_tx_queues) &&
+(i < IXGBE_DCA_MAX_QUEUES_82598)); i++) {
+   regval = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL(i));
+   regval &= ~IXGBE_DCA_TXCTRL_DESC_WRO_EN;
+   IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL(i), regval);
+   }
 
-   for (i = 0; ((i < hw->mac.max_rx_queues) &&
-(i < IXGBE_DCA_MAX_QUEUES_82598)); i++) {
-   regval = IXGBE_READ_REG(hw, IXGBE_DCA_RXCTRL(i));
-   regval &= ~(IXGBE_DCA_RXCTRL_DATA_WRO_EN |
-   IXGBE_DCA_RXCTRL_HEAD_WRO_EN);
-   IXGBE_WRITE_REG(hw, IXGBE_DCA_RXCTRL(i), regval);
+   for (i = 0; ((i < hw->mac.max_rx_queues) &&
+(i < IXGBE_DCA_MAX_QUEUES_82598)); i++) {
+   regval = IXGBE_READ_REG(hw, IXGBE_DCA_RXCTRL(i));
+   regval &= ~(IXGBE_DCA_RXCTRL_DATA_WRO_EN |
+   IXGBE_DCA_RXCTRL_HEAD_WRO_EN);
+   IXGBE_WRITE_REG(hw, IXGBE_DCA_RXCTRL(i), regval);
+   }
}
-#endif
+
if (ret_val)
return ret_val;
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
index d4933d2..d1052ee 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
@@ -342,6 +342,7 @@ s32 ixgbe_start_hw_generic(struct ixgbe_hw *hw)
 s32 ixgbe_start_hw_gen2(struct ixgbe_hw *hw)
 {
u32 i;
+   struct ixgbe_adapter *adapter = hw->back;
 
/* Clear the rate limiters */
for (i = 0; i < hw->mac.max_tx_queues; i++) {
@@ -350,25 +351,26 @@ s32 ixgbe_start_hw_gen2(struct ixgbe_hw *hw)
}
IXGBE_WRITE_FLUSH(hw);
 
-#ifndef CONFIG_SPARC
-   /* Disable relaxed ordering */
-   for (i = 0; i < hw->mac.max_tx_queues; i++) {
-   u32 regval;
+   if (!pcie_relaxed_ordering_enabled(adapter->pdev)) {
+   /* Disable relaxed ordering */
+   for (i = 0; i < hw->mac.max_tx_queues; i++) {
+   u32 regval;
 
-   regval = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL_82599(i));
-   regval &= ~IXGBE_DCA_TXCTRL_DESC_WRO_EN;
-   IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL_82599(i), regval);
-   }
+   regval = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL_82599(i));
+   regval &= ~IXGBE_DCA_TXCTRL_DESC_WRO_EN;
+   IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL_82599(i), regval);
+   }
 
-   for (i = 0; i < hw->mac.max_rx_queues; i++) {
-   u32 regval;
+   for (i = 0; i < hw->mac.max_rx_queues; i++) {
+   u32 regval;
 
-   regval = IXGBE_READ_REG(hw, IXGBE_DCA_RXCTRL(i));
-   regval &= ~(IXGBE_DCA_RXCTRL_DATA_WRO_EN |
-   IXGBE_DCA_RXCTRL_HEAD_WRO_EN);
-   IXGBE_WRITE_REG(hw, IXGBE_DCA_RXCTRL(i), reg

[PATCH net v2 2/2] net: ixgbe: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-08-16 Thread Ding Tianhong
The ixgbe driver use the compile check to determine if it can
send TLPs to Root Port with the Relaxed Ordering Attribute set,
this is too inconvenient, now the new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING
has been added to the kernel and we could check the bit4 in the PCIe
Device Control register to determine whether we should use the Relaxed
Ordering Attributes or not, so use this new way in the ixgbe driver.

Signed-off-by: Ding Tianhong 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c  | 37 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 32 +++--
 2 files changed, 35 insertions(+), 34 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c
index 523f9d0..d1571e3 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c
@@ -175,31 +175,30 @@ static s32 ixgbe_init_phy_ops_82598(struct ixgbe_hw *hw)
  **/
 static s32 ixgbe_start_hw_82598(struct ixgbe_hw *hw)
 {
-#ifndef CONFIG_SPARC
-   u32 regval;
-   u32 i;
-#endif
+   u32 regval, i;
s32 ret_val;
+   struct ixgbe_adapter *adapter = hw->back;
 
ret_val = ixgbe_start_hw_generic(hw);
 
-#ifndef CONFIG_SPARC
-   /* Disable relaxed ordering */
-   for (i = 0; ((i < hw->mac.max_tx_queues) &&
-(i < IXGBE_DCA_MAX_QUEUES_82598)); i++) {
-   regval = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL(i));
-   regval &= ~IXGBE_DCA_TXCTRL_DESC_WRO_EN;
-   IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL(i), regval);
-   }
+   if (!pcie_relaxed_ordering_enabled(adapter->pdev)) {
+   /* Disable relaxed ordering */
+   for (i = 0; ((i < hw->mac.max_tx_queues) &&
+(i < IXGBE_DCA_MAX_QUEUES_82598)); i++) {
+   regval = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL(i));
+   regval &= ~IXGBE_DCA_TXCTRL_DESC_WRO_EN;
+   IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL(i), regval);
+   }
 
-   for (i = 0; ((i < hw->mac.max_rx_queues) &&
-(i < IXGBE_DCA_MAX_QUEUES_82598)); i++) {
-   regval = IXGBE_READ_REG(hw, IXGBE_DCA_RXCTRL(i));
-   regval &= ~(IXGBE_DCA_RXCTRL_DATA_WRO_EN |
-   IXGBE_DCA_RXCTRL_HEAD_WRO_EN);
-   IXGBE_WRITE_REG(hw, IXGBE_DCA_RXCTRL(i), regval);
+   for (i = 0; ((i < hw->mac.max_rx_queues) &&
+(i < IXGBE_DCA_MAX_QUEUES_82598)); i++) {
+   regval = IXGBE_READ_REG(hw, IXGBE_DCA_RXCTRL(i));
+   regval &= ~(IXGBE_DCA_RXCTRL_DATA_WRO_EN |
+   IXGBE_DCA_RXCTRL_HEAD_WRO_EN);
+   IXGBE_WRITE_REG(hw, IXGBE_DCA_RXCTRL(i), regval);
+   }
}
-#endif
+
if (ret_val)
return ret_val;
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
index d4933d2..d1052ee 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
@@ -342,6 +342,7 @@ s32 ixgbe_start_hw_generic(struct ixgbe_hw *hw)
 s32 ixgbe_start_hw_gen2(struct ixgbe_hw *hw)
 {
u32 i;
+   struct ixgbe_adapter *adapter = hw->back;
 
/* Clear the rate limiters */
for (i = 0; i < hw->mac.max_tx_queues; i++) {
@@ -350,25 +351,26 @@ s32 ixgbe_start_hw_gen2(struct ixgbe_hw *hw)
}
IXGBE_WRITE_FLUSH(hw);
 
-#ifndef CONFIG_SPARC
-   /* Disable relaxed ordering */
-   for (i = 0; i < hw->mac.max_tx_queues; i++) {
-   u32 regval;
+   if (!pcie_relaxed_ordering_enabled(adapter->pdev)) {
+   /* Disable relaxed ordering */
+   for (i = 0; i < hw->mac.max_tx_queues; i++) {
+   u32 regval;
 
-   regval = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL_82599(i));
-   regval &= ~IXGBE_DCA_TXCTRL_DESC_WRO_EN;
-   IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL_82599(i), regval);
-   }
+   regval = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL_82599(i));
+   regval &= ~IXGBE_DCA_TXCTRL_DESC_WRO_EN;
+   IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL_82599(i), regval);
+   }
 
-   for (i = 0; i < hw->mac.max_rx_queues; i++) {
-   u32 regval;
+   for (i = 0; i < hw->mac.max_rx_queues; i++) {
+   u32 regval;
 
-   regval = IXGBE_READ_REG(hw, IXGBE_DCA_RXCTRL(i));
-   regval &= ~(IXGBE_DCA_RXCTRL_DATA_WRO_EN |
-   IXGBE_DCA_RXCTRL_HEAD_WRO_EN);
-   IXGBE_WRITE_REG(hw, IXGBE_DCA_RXCTRL(i), regval);

[PATCH net] PCI: fix the return value for the pci_find_pcie_root_port()

2017-08-16 Thread Ding Tianhong
The pci_find_pcie_root_port() would return NULL if the given
dev is already a Root Port, it looks like unfriendly to the
PCIe Root Port device, Thierry and Bjorn suggest to let this
function return the given dev under this circumstances.

Fixes: 0e405232871d6 ("PCI: fix oops when try to find Root Port for a PCI 
device")
Suggested-by: Thierry Reding <thierry.red...@gmail.com>
Suggested-by: Bjorn Helgaas <helg...@kernel.org>
Signed-off-by: Thierry Reding <thierry.red...@gmail.com>
Signed-off-by: Ding Tianhong <dingtianh...@huawei.com>
---
 drivers/pci/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 7e2022f..352bb53 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -514,7 +514,7 @@ struct resource *pci_find_resource(struct pci_dev *dev, 
struct resource *res)
  */
 struct pci_dev *pci_find_pcie_root_port(struct pci_dev *dev)
 {
-   struct pci_dev *bridge, *highest_pcie_bridge = NULL;
+   struct pci_dev *bridge, *highest_pcie_bridge = dev;
 
bridge = pci_upstream_bridge(dev);
while (bridge && pci_is_pcie(bridge)) {
-- 
1.8.3.1




[PATCH net] PCI: fix the return value for the pci_find_pcie_root_port()

2017-08-16 Thread Ding Tianhong
The pci_find_pcie_root_port() would return NULL if the given
dev is already a Root Port, it looks like unfriendly to the
PCIe Root Port device, Thierry and Bjorn suggest to let this
function return the given dev under this circumstances.

Fixes: 0e405232871d6 ("PCI: fix oops when try to find Root Port for a PCI 
device")
Suggested-by: Thierry Reding 
Suggested-by: Bjorn Helgaas 
Signed-off-by: Thierry Reding 
Signed-off-by: Ding Tianhong 
---
 drivers/pci/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 7e2022f..352bb53 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -514,7 +514,7 @@ struct resource *pci_find_resource(struct pci_dev *dev, 
struct resource *res)
  */
 struct pci_dev *pci_find_pcie_root_port(struct pci_dev *dev)
 {
-   struct pci_dev *bridge, *highest_pcie_bridge = NULL;
+   struct pci_dev *bridge, *highest_pcie_bridge = dev;
 
bridge = pci_upstream_bridge(dev);
while (bridge && pci_is_pcie(bridge)) {
-- 
1.8.3.1




Re: [PATCH net 0/2] net: ixgbe: Use new flag to disable Relaxed Ordering

2017-08-16 Thread Ding Tianhong


On 2017/8/17 1:56, David Miller wrote:
> From: Ding Tianhong <dingtianh...@huawei.com>
> Date: Wed, 16 Aug 2017 17:41:45 +0800
> 
>> The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING has been added
>> to indicate that Relaxed Ordering Attributes (RO) should not
>> be used for Transaction Layer Packets (TLP) targeted toward
>> these affected Root Port, it will clear the bit4 in the PCIe
>> Device Control register, so the PCIe device drivers could
>> query PCIe configuration space to determine if it can send
>> TLPs to Root Port with the Relaxed Ordering Attributes set.
>>
>> The ixgbe driver could use this flag to determine if it can
>> send TLPs to Root Port with the Relaxed Ordering Attributes set.
> 
> I'll let the Intel guys pick this up.
> 
Thanks David, but I am not sure when the Intel guys would take over,
just Alex has replied, so I will release a new version according Alex's
suggestion.

> .
> 



Re: [PATCH net 0/2] net: ixgbe: Use new flag to disable Relaxed Ordering

2017-08-16 Thread Ding Tianhong


On 2017/8/17 1:56, David Miller wrote:
> From: Ding Tianhong 
> Date: Wed, 16 Aug 2017 17:41:45 +0800
> 
>> The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING has been added
>> to indicate that Relaxed Ordering Attributes (RO) should not
>> be used for Transaction Layer Packets (TLP) targeted toward
>> these affected Root Port, it will clear the bit4 in the PCIe
>> Device Control register, so the PCIe device drivers could
>> query PCIe configuration space to determine if it can send
>> TLPs to Root Port with the Relaxed Ordering Attributes set.
>>
>> The ixgbe driver could use this flag to determine if it can
>> send TLPs to Root Port with the Relaxed Ordering Attributes set.
> 
> I'll let the Intel guys pick this up.
> 
Thanks David, but I am not sure when the Intel guys would take over,
just Alex has replied, so I will release a new version according Alex's
suggestion.

> .
> 



Re: [PATCH net RESEND] PCI: fix oops when try to find Root Port for a PCI device

2017-08-16 Thread Ding Tianhong


On 2017/8/17 4:59, David Miller wrote:
> From: Bjorn Helgaas 
> Date: Wed, 16 Aug 2017 15:02:37 -0500
> 
>> Your fix looks right to me.
> 
> Someone please submit this fix formally because this change is now in
> Linus's tree.
> 

I will send it.

> Thank you.
> 
> .
> 



Re: [PATCH net RESEND] PCI: fix oops when try to find Root Port for a PCI device

2017-08-16 Thread Ding Tianhong


On 2017/8/17 4:59, David Miller wrote:
> From: Bjorn Helgaas 
> Date: Wed, 16 Aug 2017 15:02:37 -0500
> 
>> Your fix looks right to me.
> 
> Someone please submit this fix formally because this change is now in
> Linus's tree.
> 

I will send it.

> Thank you.
> 
> .
> 



[PATCH net 1/2] Revert commit 1a8b6d76dc5b ("net:add one common config...")

2017-08-16 Thread Ding Tianhong
The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING has been added
to indicate that Relaxed Ordering Attributes (RO) should not
be used for Transaction Layer Packets (TLP) targeted toward
these affected Root Port, it will clear the bit4 in the PCIe
Device Control register, so the PCIe device drivers could
query PCIe configuration space to determine if it can send
TLPs to Root Port with the Relaxed Ordering Attributes set.

With this new flag  we don't need the config ARCH_WANT_RELAX_ORDER
to control the Relaxed Ordering Attributes for the ixgbe drivers
just like the commit 1a8b6d76dc5b ("net:add one common config...") did,
so revert this commit.

Signed-off-by: Ding Tianhong <dingtianh...@huawei.com>
---
 arch/Kconfig| 3 ---
 arch/sparc/Kconfig  | 1 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 2 +-
 3 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 21d0089..00cfc63 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -928,9 +928,6 @@ config STRICT_MODULE_RWX
  and non-text memory will be made non-executable. This provides
  protection against certain security exploits (e.g. writing to text)
 
-config ARCH_WANT_RELAX_ORDER
-   bool
-
 config REFCOUNT_FULL
bool "Perform full reference count validation at the expense of speed"
help
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index a4a6261..987a575 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -44,7 +44,6 @@ config SPARC
select ARCH_HAS_SG_CHAIN
select CPU_NO_EFFICIENT_FFS
select LOCKDEP_SMALL if LOCKDEP
-   select ARCH_WANT_RELAX_ORDER
 
 config SPARC32
def_bool !64BIT
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
index 4e35e70..d4933d2 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
@@ -350,7 +350,7 @@ s32 ixgbe_start_hw_gen2(struct ixgbe_hw *hw)
}
IXGBE_WRITE_FLUSH(hw);
 
-#ifndef CONFIG_ARCH_WANT_RELAX_ORDER
+#ifndef CONFIG_SPARC
/* Disable relaxed ordering */
for (i = 0; i < hw->mac.max_tx_queues; i++) {
u32 regval;
-- 
1.8.3.1




[PATCH net 1/2] Revert commit 1a8b6d76dc5b ("net:add one common config...")

2017-08-16 Thread Ding Tianhong
The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING has been added
to indicate that Relaxed Ordering Attributes (RO) should not
be used for Transaction Layer Packets (TLP) targeted toward
these affected Root Port, it will clear the bit4 in the PCIe
Device Control register, so the PCIe device drivers could
query PCIe configuration space to determine if it can send
TLPs to Root Port with the Relaxed Ordering Attributes set.

With this new flag  we don't need the config ARCH_WANT_RELAX_ORDER
to control the Relaxed Ordering Attributes for the ixgbe drivers
just like the commit 1a8b6d76dc5b ("net:add one common config...") did,
so revert this commit.

Signed-off-by: Ding Tianhong 
---
 arch/Kconfig| 3 ---
 arch/sparc/Kconfig  | 1 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 2 +-
 3 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 21d0089..00cfc63 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -928,9 +928,6 @@ config STRICT_MODULE_RWX
  and non-text memory will be made non-executable. This provides
  protection against certain security exploits (e.g. writing to text)
 
-config ARCH_WANT_RELAX_ORDER
-   bool
-
 config REFCOUNT_FULL
bool "Perform full reference count validation at the expense of speed"
help
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index a4a6261..987a575 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -44,7 +44,6 @@ config SPARC
select ARCH_HAS_SG_CHAIN
select CPU_NO_EFFICIENT_FFS
select LOCKDEP_SMALL if LOCKDEP
-   select ARCH_WANT_RELAX_ORDER
 
 config SPARC32
def_bool !64BIT
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
index 4e35e70..d4933d2 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
@@ -350,7 +350,7 @@ s32 ixgbe_start_hw_gen2(struct ixgbe_hw *hw)
}
IXGBE_WRITE_FLUSH(hw);
 
-#ifndef CONFIG_ARCH_WANT_RELAX_ORDER
+#ifndef CONFIG_SPARC
/* Disable relaxed ordering */
for (i = 0; i < hw->mac.max_tx_queues; i++) {
u32 regval;
-- 
1.8.3.1




[PATCH net 0/2] net: ixgbe: Use new flag to disable Relaxed Ordering

2017-08-16 Thread Ding Tianhong
The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING has been added
to indicate that Relaxed Ordering Attributes (RO) should not
be used for Transaction Layer Packets (TLP) targeted toward
these affected Root Port, it will clear the bit4 in the PCIe
Device Control register, so the PCIe device drivers could
query PCIe configuration space to determine if it can send
TLPs to Root Port with the Relaxed Ordering Attributes set.

The ixgbe driver could use this flag to determine if it can
send TLPs to Root Port with the Relaxed Ordering Attributes set.

Ding Tianhong (2):
  Revert commit 1a8b6d76dc5b ("net:add one common config...")
  net: ixgbe: Use new IXGBE_FLAG2_ROOT_NO_RELAXED_ORDERING flag

 arch/Kconfig|  3 --
 arch/sparc/Kconfig  |  1 -
 drivers/net/ethernet/intel/ixgbe/ixgbe.h|  1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c  | 37 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 32 +++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c   | 17 
 6 files changed, 53 insertions(+), 38 deletions(-)

-- 
1.8.3.1




[PATCH net 0/2] net: ixgbe: Use new flag to disable Relaxed Ordering

2017-08-16 Thread Ding Tianhong
The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING has been added
to indicate that Relaxed Ordering Attributes (RO) should not
be used for Transaction Layer Packets (TLP) targeted toward
these affected Root Port, it will clear the bit4 in the PCIe
Device Control register, so the PCIe device drivers could
query PCIe configuration space to determine if it can send
TLPs to Root Port with the Relaxed Ordering Attributes set.

The ixgbe driver could use this flag to determine if it can
send TLPs to Root Port with the Relaxed Ordering Attributes set.

Ding Tianhong (2):
  Revert commit 1a8b6d76dc5b ("net:add one common config...")
  net: ixgbe: Use new IXGBE_FLAG2_ROOT_NO_RELAXED_ORDERING flag

 arch/Kconfig|  3 --
 arch/sparc/Kconfig  |  1 -
 drivers/net/ethernet/intel/ixgbe/ixgbe.h|  1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c  | 37 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 32 +++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c   | 17 
 6 files changed, 53 insertions(+), 38 deletions(-)

-- 
1.8.3.1




[PATCH net 2/2] net: ixgbe: Use new IXGBE_FLAG2_ROOT_NO_RELAXED_ORDERING flag

2017-08-16 Thread Ding Tianhong
The ixgbe driver use the compile check to determine if it can
send TLPs to Root Port with the Relaxed Ordering Attribute set,
this is too inconvenient, now the new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING
has been added to the kernel and we could check the bit4 in the PCIe
Davice Control register to determine whether we should use the Relaxed
Ordering Attributes or not, so we add a new flag which called
IXGBE_FLAG2_ROOT_NO_RELAXED_ORDERING to the ixgbe driver, it will
be set if the Root Port couldn't deal the upstream TLPs with Relaxed
Ordering Attribute, then the driver could know what to do next.

Signed-off-by: Ding Tianhong <dingtianh...@huawei.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h|  1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c  | 37 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 32 +++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c   | 17 
 4 files changed, 53 insertions(+), 34 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index dd55787..50e0553 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -621,6 +621,7 @@ struct ixgbe_adapter {
 #define IXGBE_FLAG2_EEE_CAPABLEBIT(14)
 #define IXGBE_FLAG2_EEE_ENABLEDBIT(15)
 #define IXGBE_FLAG2_RX_LEGACY  BIT(16)
+#define IXGBE_FLAG2_ROOT_NO_RELAXED_ORDERING   BIT(17)
 
/* Tx fast path data */
int num_tx_queues;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c
index 523f9d0..0727a30 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c
@@ -175,31 +175,30 @@ static s32 ixgbe_init_phy_ops_82598(struct ixgbe_hw *hw)
  **/
 static s32 ixgbe_start_hw_82598(struct ixgbe_hw *hw)
 {
-#ifndef CONFIG_SPARC
-   u32 regval;
-   u32 i;
-#endif
+   u32 regval, i;
s32 ret_val;
+   struct ixgbe_adapter *adapter = hw->back;
 
ret_val = ixgbe_start_hw_generic(hw);
 
-#ifndef CONFIG_SPARC
-   /* Disable relaxed ordering */
-   for (i = 0; ((i < hw->mac.max_tx_queues) &&
-(i < IXGBE_DCA_MAX_QUEUES_82598)); i++) {
-   regval = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL(i));
-   regval &= ~IXGBE_DCA_TXCTRL_DESC_WRO_EN;
-   IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL(i), regval);
-   }
+   if (adapter->flags2 & IXGBE_FLAG2_ROOT_NO_RELAXED_ORDERING) {
+   /* Disable relaxed ordering */
+   for (i = 0; ((i < hw->mac.max_tx_queues) &&
+(i < IXGBE_DCA_MAX_QUEUES_82598)); i++) {
+   regval = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL(i));
+   regval &= ~IXGBE_DCA_TXCTRL_DESC_WRO_EN;
+   IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL(i), regval);
+   }
 
-   for (i = 0; ((i < hw->mac.max_rx_queues) &&
-(i < IXGBE_DCA_MAX_QUEUES_82598)); i++) {
-   regval = IXGBE_READ_REG(hw, IXGBE_DCA_RXCTRL(i));
-   regval &= ~(IXGBE_DCA_RXCTRL_DATA_WRO_EN |
-   IXGBE_DCA_RXCTRL_HEAD_WRO_EN);
-   IXGBE_WRITE_REG(hw, IXGBE_DCA_RXCTRL(i), regval);
+   for (i = 0; ((i < hw->mac.max_rx_queues) &&
+(i < IXGBE_DCA_MAX_QUEUES_82598)); i++) {
+   regval = IXGBE_READ_REG(hw, IXGBE_DCA_RXCTRL(i));
+   regval &= ~(IXGBE_DCA_RXCTRL_DATA_WRO_EN |
+   IXGBE_DCA_RXCTRL_HEAD_WRO_EN);
+   IXGBE_WRITE_REG(hw, IXGBE_DCA_RXCTRL(i), regval);
+   }
}
-#endif
+
if (ret_val)
return ret_val;
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
index d4933d2..2473c0b 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
@@ -342,6 +342,7 @@ s32 ixgbe_start_hw_generic(struct ixgbe_hw *hw)
 s32 ixgbe_start_hw_gen2(struct ixgbe_hw *hw)
 {
u32 i;
+   struct ixgbe_adapter *adapter = hw->back;
 
/* Clear the rate limiters */
for (i = 0; i < hw->mac.max_tx_queues; i++) {
@@ -350,25 +351,26 @@ s32 ixgbe_start_hw_gen2(struct ixgbe_hw *hw)
}
IXGBE_WRITE_FLUSH(hw);
 
-#ifndef CONFIG_SPARC
-   /* Disable relaxed ordering */
-   for (i = 0; i < hw->mac.max_tx_queues; i++) {
-   u32 regval;
+   if (adapter->flags2 & IXGBE_FLAG2_ROOT_NO_RELAXED_ORDERING) {
+   /* Disable relaxed ordering */
+   for (i = 0; i < hw->mac.max_tx_queues; i++) {
+   u32 regval;
 

[PATCH net 2/2] net: ixgbe: Use new IXGBE_FLAG2_ROOT_NO_RELAXED_ORDERING flag

2017-08-16 Thread Ding Tianhong
The ixgbe driver use the compile check to determine if it can
send TLPs to Root Port with the Relaxed Ordering Attribute set,
this is too inconvenient, now the new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING
has been added to the kernel and we could check the bit4 in the PCIe
Davice Control register to determine whether we should use the Relaxed
Ordering Attributes or not, so we add a new flag which called
IXGBE_FLAG2_ROOT_NO_RELAXED_ORDERING to the ixgbe driver, it will
be set if the Root Port couldn't deal the upstream TLPs with Relaxed
Ordering Attribute, then the driver could know what to do next.

Signed-off-by: Ding Tianhong 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h|  1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c  | 37 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 32 +++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c   | 17 
 4 files changed, 53 insertions(+), 34 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index dd55787..50e0553 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -621,6 +621,7 @@ struct ixgbe_adapter {
 #define IXGBE_FLAG2_EEE_CAPABLEBIT(14)
 #define IXGBE_FLAG2_EEE_ENABLEDBIT(15)
 #define IXGBE_FLAG2_RX_LEGACY  BIT(16)
+#define IXGBE_FLAG2_ROOT_NO_RELAXED_ORDERING   BIT(17)
 
/* Tx fast path data */
int num_tx_queues;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c
index 523f9d0..0727a30 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c
@@ -175,31 +175,30 @@ static s32 ixgbe_init_phy_ops_82598(struct ixgbe_hw *hw)
  **/
 static s32 ixgbe_start_hw_82598(struct ixgbe_hw *hw)
 {
-#ifndef CONFIG_SPARC
-   u32 regval;
-   u32 i;
-#endif
+   u32 regval, i;
s32 ret_val;
+   struct ixgbe_adapter *adapter = hw->back;
 
ret_val = ixgbe_start_hw_generic(hw);
 
-#ifndef CONFIG_SPARC
-   /* Disable relaxed ordering */
-   for (i = 0; ((i < hw->mac.max_tx_queues) &&
-(i < IXGBE_DCA_MAX_QUEUES_82598)); i++) {
-   regval = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL(i));
-   regval &= ~IXGBE_DCA_TXCTRL_DESC_WRO_EN;
-   IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL(i), regval);
-   }
+   if (adapter->flags2 & IXGBE_FLAG2_ROOT_NO_RELAXED_ORDERING) {
+   /* Disable relaxed ordering */
+   for (i = 0; ((i < hw->mac.max_tx_queues) &&
+(i < IXGBE_DCA_MAX_QUEUES_82598)); i++) {
+   regval = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL(i));
+   regval &= ~IXGBE_DCA_TXCTRL_DESC_WRO_EN;
+   IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL(i), regval);
+   }
 
-   for (i = 0; ((i < hw->mac.max_rx_queues) &&
-(i < IXGBE_DCA_MAX_QUEUES_82598)); i++) {
-   regval = IXGBE_READ_REG(hw, IXGBE_DCA_RXCTRL(i));
-   regval &= ~(IXGBE_DCA_RXCTRL_DATA_WRO_EN |
-   IXGBE_DCA_RXCTRL_HEAD_WRO_EN);
-   IXGBE_WRITE_REG(hw, IXGBE_DCA_RXCTRL(i), regval);
+   for (i = 0; ((i < hw->mac.max_rx_queues) &&
+(i < IXGBE_DCA_MAX_QUEUES_82598)); i++) {
+   regval = IXGBE_READ_REG(hw, IXGBE_DCA_RXCTRL(i));
+   regval &= ~(IXGBE_DCA_RXCTRL_DATA_WRO_EN |
+   IXGBE_DCA_RXCTRL_HEAD_WRO_EN);
+   IXGBE_WRITE_REG(hw, IXGBE_DCA_RXCTRL(i), regval);
+   }
}
-#endif
+
if (ret_val)
return ret_val;
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
index d4933d2..2473c0b 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
@@ -342,6 +342,7 @@ s32 ixgbe_start_hw_generic(struct ixgbe_hw *hw)
 s32 ixgbe_start_hw_gen2(struct ixgbe_hw *hw)
 {
u32 i;
+   struct ixgbe_adapter *adapter = hw->back;
 
/* Clear the rate limiters */
for (i = 0; i < hw->mac.max_tx_queues; i++) {
@@ -350,25 +351,26 @@ s32 ixgbe_start_hw_gen2(struct ixgbe_hw *hw)
}
IXGBE_WRITE_FLUSH(hw);
 
-#ifndef CONFIG_SPARC
-   /* Disable relaxed ordering */
-   for (i = 0; i < hw->mac.max_tx_queues; i++) {
-   u32 regval;
+   if (adapter->flags2 & IXGBE_FLAG2_ROOT_NO_RELAXED_ORDERING) {
+   /* Disable relaxed ordering */
+   for (i = 0; i < hw->mac.max_tx_queues; i++) {
+   u32 regval;
 
-   regval = IXGBE_READ_REG(h

[PATCH net RESEND] PCI: fix oops when try to find Root Port for a PCI device

2017-08-15 Thread Ding Tianhong
Eric report a oops when booting the system after applying
the commit a99b646afa8a ("PCI: Disable PCIe Relaxed..."):

[4.241029] BUG: unable to handle kernel NULL pointer dereference at 
0050
[4.247001] IP: pci_find_pcie_root_port+0x62/0x80
[4.253011] PGD 0
[4.253011] P4D 0
[4.253011]
[4.258013] Oops:  [#1] SMP DEBUG_PAGEALLOC
[4.262015] Modules linked in:
[4.265005] CPU: 31 PID: 1 Comm: swapper/0 Not tainted 4.13.0-dbx-DEV #316
[4.271002] Hardware name: Intel RML,PCH/Iota_QC_19, BIOS 2.40.0 06/22/2016
[4.279002] task: a2ee38cfa040 task.stack: a51ec0004000
[4.285001] RIP: 0010:pci_find_pcie_root_port+0x62/0x80
[4.290012] RSP: :a51ec0007ab8 EFLAGS: 00010246
[4.295003] RAX:  RBX: a2ee36bae000 RCX: 0006
[4.303002] RDX: 081c RSI: a2ee38cfa8c8 RDI: a2ee36bae000
[4.310013] RBP: a51ec0007b58 R08: 0001 R09: 
[4.317001] R10:  R11:  R12: a51ec0007ad0
[4.324005] R13: a2ee36bae098 R14: 0002 R15: a2ee37204818
[4.331002] FS:  () GS:a2ee3fcc() 
knlGS:
[4.339002] CS:  0010 DS:  ES:  CR0: 80050033
[4.345001] CR2: 0050 CR3: 00401000f000 CR4: 001406e0
[4.351002] Call Trace:
[4.354012]  ? pci_configure_device+0x19f/0x570
[4.359002]  ? pci_conf1_read+0xb8/0xf0
[4.363002]  ? raw_pci_read+0x23/0x40
[4.366011]  ? pci_read+0x2c/0x30
[4.370014]  ? pci_read_config_word+0x67/0x70
[4.374012]  pci_device_add+0x28/0x230
[4.378012]  ? pci_vpd_f0_read+0x50/0x80
[4.382014]  pci_scan_single_device+0x96/0xc0
[4.386012]  pci_scan_slot+0x79/0xf0
[4.389001]  pci_scan_child_bus+0x31/0x180
[4.394014]  acpi_pci_root_create+0x1c6/0x240
[4.398013]  pci_acpi_scan_root+0x15f/0x1b0
[4.402012]  acpi_pci_root_add+0x2e6/0x400
[4.406012]  ? acpi_evaluate_integer+0x37/0x60
[4.411002]  acpi_bus_attach+0xdf/0x200
[4.415002]  acpi_bus_attach+0x6a/0x200
[4.418014]  acpi_bus_attach+0x6a/0x200
[4.422013]  acpi_bus_scan+0x38/0x70
[4.426011]  acpi_scan_init+0x10c/0x271
[4.429001]  acpi_init+0x2fa/0x348
[4.433004]  ? acpi_sleep_proc_init+0x2d/0x2d
[4.437001]  do_one_initcall+0x43/0x169
[4.441001]  kernel_init_freeable+0x1d0/0x258
[4.445003]  ? rest_init+0xe0/0xe0
[4.449001]  kernel_init+0xe/0x150

== cut here =

It looks like the pci_find_pcie_root_port() was trying to
find the Root Port for the PCI device which is the Root
Port already, it will return NULL and trigger the problem,
so check the highest_pcie_bridge to fix thie problem.

Fixes: a99b646afa8a ("PCI: Disable PCIe Relaxed Ordering if unsupported")
Reported-by: Eric Dumazet <eric.duma...@gmail.com>
Signed-off-by: Eric Dumazet <eric.duma...@gmail.com>
Signed-off-by: Ding Tianhong <dingtianh...@huawei.com>
---
 drivers/pci/pci.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index af0cc34..7e2022f 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -522,7 +522,8 @@ struct pci_dev *pci_find_pcie_root_port(struct pci_dev *dev)
bridge = pci_upstream_bridge(bridge);
}
 
-   if (pci_pcie_type(highest_pcie_bridge) != PCI_EXP_TYPE_ROOT_PORT)
+   if (highest_pcie_bridge &&
+   pci_pcie_type(highest_pcie_bridge) != PCI_EXP_TYPE_ROOT_PORT)
return NULL;
 
return highest_pcie_bridge;
-- 
1.8.3.1




[PATCH net RESEND] PCI: fix oops when try to find Root Port for a PCI device

2017-08-15 Thread Ding Tianhong
Eric report a oops when booting the system after applying
the commit a99b646afa8a ("PCI: Disable PCIe Relaxed..."):

[4.241029] BUG: unable to handle kernel NULL pointer dereference at 
0050
[4.247001] IP: pci_find_pcie_root_port+0x62/0x80
[4.253011] PGD 0
[4.253011] P4D 0
[4.253011]
[4.258013] Oops:  [#1] SMP DEBUG_PAGEALLOC
[4.262015] Modules linked in:
[4.265005] CPU: 31 PID: 1 Comm: swapper/0 Not tainted 4.13.0-dbx-DEV #316
[4.271002] Hardware name: Intel RML,PCH/Iota_QC_19, BIOS 2.40.0 06/22/2016
[4.279002] task: a2ee38cfa040 task.stack: a51ec0004000
[4.285001] RIP: 0010:pci_find_pcie_root_port+0x62/0x80
[4.290012] RSP: :a51ec0007ab8 EFLAGS: 00010246
[4.295003] RAX:  RBX: a2ee36bae000 RCX: 0006
[4.303002] RDX: 081c RSI: a2ee38cfa8c8 RDI: a2ee36bae000
[4.310013] RBP: a51ec0007b58 R08: 0001 R09: 
[4.317001] R10:  R11:  R12: a51ec0007ad0
[4.324005] R13: a2ee36bae098 R14: 0002 R15: a2ee37204818
[4.331002] FS:  () GS:a2ee3fcc() 
knlGS:
[4.339002] CS:  0010 DS:  ES:  CR0: 80050033
[4.345001] CR2: 0050 CR3: 00401000f000 CR4: 001406e0
[4.351002] Call Trace:
[4.354012]  ? pci_configure_device+0x19f/0x570
[4.359002]  ? pci_conf1_read+0xb8/0xf0
[4.363002]  ? raw_pci_read+0x23/0x40
[4.366011]  ? pci_read+0x2c/0x30
[4.370014]  ? pci_read_config_word+0x67/0x70
[4.374012]  pci_device_add+0x28/0x230
[4.378012]  ? pci_vpd_f0_read+0x50/0x80
[4.382014]  pci_scan_single_device+0x96/0xc0
[4.386012]  pci_scan_slot+0x79/0xf0
[4.389001]  pci_scan_child_bus+0x31/0x180
[4.394014]  acpi_pci_root_create+0x1c6/0x240
[4.398013]  pci_acpi_scan_root+0x15f/0x1b0
[4.402012]  acpi_pci_root_add+0x2e6/0x400
[4.406012]  ? acpi_evaluate_integer+0x37/0x60
[4.411002]  acpi_bus_attach+0xdf/0x200
[4.415002]  acpi_bus_attach+0x6a/0x200
[4.418014]  acpi_bus_attach+0x6a/0x200
[4.422013]  acpi_bus_scan+0x38/0x70
[4.426011]  acpi_scan_init+0x10c/0x271
[4.429001]  acpi_init+0x2fa/0x348
[4.433004]  ? acpi_sleep_proc_init+0x2d/0x2d
[4.437001]  do_one_initcall+0x43/0x169
[4.441001]  kernel_init_freeable+0x1d0/0x258
[4.445003]  ? rest_init+0xe0/0xe0
[4.449001]  kernel_init+0xe/0x150

== cut here =

It looks like the pci_find_pcie_root_port() was trying to
find the Root Port for the PCI device which is the Root
Port already, it will return NULL and trigger the problem,
so check the highest_pcie_bridge to fix thie problem.

Fixes: a99b646afa8a ("PCI: Disable PCIe Relaxed Ordering if unsupported")
Reported-by: Eric Dumazet 
Signed-off-by: Eric Dumazet 
Signed-off-by: Ding Tianhong 
---
 drivers/pci/pci.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index af0cc34..7e2022f 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -522,7 +522,8 @@ struct pci_dev *pci_find_pcie_root_port(struct pci_dev *dev)
bridge = pci_upstream_bridge(bridge);
}
 
-   if (pci_pcie_type(highest_pcie_bridge) != PCI_EXP_TYPE_ROOT_PORT)
+   if (highest_pcie_bridge &&
+   pci_pcie_type(highest_pcie_bridge) != PCI_EXP_TYPE_ROOT_PORT)
return NULL;
 
return highest_pcie_bridge;
-- 
1.8.3.1




[PATCH net] pci: fix oops when try to find Root Port for a PCI device

2017-08-15 Thread Ding Tianhong
Eric report a oops when booting the system after applying
the commit a99b646afa8a ("PCI: Disable PCIe Relaxed..."):

[4.241029] BUG: unable to handle kernel NULL pointer dereference at 
0050
[4.247001] IP: pci_find_pcie_root_port+0x62/0x80
[4.253011] PGD 0
[4.253011] P4D 0
[4.253011]
[4.258013] Oops:  [#1] SMP DEBUG_PAGEALLOC
[4.262015] Modules linked in:
[4.265005] CPU: 31 PID: 1 Comm: swapper/0 Not tainted 4.13.0-dbx-DEV #316
[4.271002] Hardware name: Intel RML,PCH/Iota_QC_19, BIOS 2.40.0 06/22/2016
[4.279002] task: a2ee38cfa040 task.stack: a51ec0004000
[4.285001] RIP: 0010:pci_find_pcie_root_port+0x62/0x80
[4.290012] RSP: :a51ec0007ab8 EFLAGS: 00010246
[4.295003] RAX:  RBX: a2ee36bae000 RCX: 0006
[4.303002] RDX: 081c RSI: a2ee38cfa8c8 RDI: a2ee36bae000
[4.310013] RBP: a51ec0007b58 R08: 0001 R09: 
[4.317001] R10:  R11:  R12: a51ec0007ad0
[4.324005] R13: a2ee36bae098 R14: 0002 R15: a2ee37204818
[4.331002] FS:  () GS:a2ee3fcc() 
knlGS:
[4.339002] CS:  0010 DS:  ES:  CR0: 80050033
[4.345001] CR2: 0050 CR3: 00401000f000 CR4: 001406e0
[4.351002] Call Trace:
[4.354012]  ? pci_configure_device+0x19f/0x570
[4.359002]  ? pci_conf1_read+0xb8/0xf0
[4.363002]  ? raw_pci_read+0x23/0x40
[4.366011]  ? pci_read+0x2c/0x30
[4.370014]  ? pci_read_config_word+0x67/0x70
[4.374012]  pci_device_add+0x28/0x230
[4.378012]  ? pci_vpd_f0_read+0x50/0x80
[4.382014]  pci_scan_single_device+0x96/0xc0
[4.386012]  pci_scan_slot+0x79/0xf0
[4.389001]  pci_scan_child_bus+0x31/0x180
[4.394014]  acpi_pci_root_create+0x1c6/0x240
[4.398013]  pci_acpi_scan_root+0x15f/0x1b0
[4.402012]  acpi_pci_root_add+0x2e6/0x400
[4.406012]  ? acpi_evaluate_integer+0x37/0x60
[4.411002]  acpi_bus_attach+0xdf/0x200
[4.415002]  acpi_bus_attach+0x6a/0x200
[4.418014]  acpi_bus_attach+0x6a/0x200
[4.422013]  acpi_bus_scan+0x38/0x70
[4.426011]  acpi_scan_init+0x10c/0x271
[4.429001]  acpi_init+0x2fa/0x348
[4.433004]  ? acpi_sleep_proc_init+0x2d/0x2d
[4.437001]  do_one_initcall+0x43/0x169
[4.441001]  kernel_init_freeable+0x1d0/0x258
[4.445003]  ? rest_init+0xe0/0xe0
[4.449001]  kernel_init+0xe/0x150

== cut here =

It looks like the pci_find_pcie_root_port() was trying to
find the Root Port for the PCI device which is the Root
Port already, it will return NULL and trigger the problem,
so check the highest_pcie_bridge to fix thie problem.

Fixes: a99b646afa8a ("PCI: Disable PCIe Relaxed Ordering if unsupported")
Reported-by: Eric Dumazet <eric.duma...@gmail.com>
Signed-off-by: Eric Dumazet <eric.duma...@gmail.com>
Signed-off-by: Ding Tianhong <dingtianh...@huawei.com>
---
 drivers/pci/pci.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index af0cc34..7e2022f 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -522,7 +522,8 @@ struct pci_dev *pci_find_pcie_root_port(struct pci_dev *dev)
bridge = pci_upstream_bridge(bridge);
}
 
-   if (pci_pcie_type(highest_pcie_bridge) != PCI_EXP_TYPE_ROOT_PORT)
+   if (highest_pcie_bridge &&
+   pci_pcie_type(highest_pcie_bridge) != PCI_EXP_TYPE_ROOT_PORT)
return NULL;
 
return highest_pcie_bridge;
-- 
1.8.3.1




[PATCH net] pci: fix oops when try to find Root Port for a PCI device

2017-08-15 Thread Ding Tianhong
Eric report a oops when booting the system after applying
the commit a99b646afa8a ("PCI: Disable PCIe Relaxed..."):

[4.241029] BUG: unable to handle kernel NULL pointer dereference at 
0050
[4.247001] IP: pci_find_pcie_root_port+0x62/0x80
[4.253011] PGD 0
[4.253011] P4D 0
[4.253011]
[4.258013] Oops:  [#1] SMP DEBUG_PAGEALLOC
[4.262015] Modules linked in:
[4.265005] CPU: 31 PID: 1 Comm: swapper/0 Not tainted 4.13.0-dbx-DEV #316
[4.271002] Hardware name: Intel RML,PCH/Iota_QC_19, BIOS 2.40.0 06/22/2016
[4.279002] task: a2ee38cfa040 task.stack: a51ec0004000
[4.285001] RIP: 0010:pci_find_pcie_root_port+0x62/0x80
[4.290012] RSP: :a51ec0007ab8 EFLAGS: 00010246
[4.295003] RAX:  RBX: a2ee36bae000 RCX: 0006
[4.303002] RDX: 081c RSI: a2ee38cfa8c8 RDI: a2ee36bae000
[4.310013] RBP: a51ec0007b58 R08: 0001 R09: 
[4.317001] R10:  R11:  R12: a51ec0007ad0
[4.324005] R13: a2ee36bae098 R14: 0002 R15: a2ee37204818
[4.331002] FS:  () GS:a2ee3fcc() 
knlGS:
[4.339002] CS:  0010 DS:  ES:  CR0: 80050033
[4.345001] CR2: 0050 CR3: 00401000f000 CR4: 001406e0
[4.351002] Call Trace:
[4.354012]  ? pci_configure_device+0x19f/0x570
[4.359002]  ? pci_conf1_read+0xb8/0xf0
[4.363002]  ? raw_pci_read+0x23/0x40
[4.366011]  ? pci_read+0x2c/0x30
[4.370014]  ? pci_read_config_word+0x67/0x70
[4.374012]  pci_device_add+0x28/0x230
[4.378012]  ? pci_vpd_f0_read+0x50/0x80
[4.382014]  pci_scan_single_device+0x96/0xc0
[4.386012]  pci_scan_slot+0x79/0xf0
[4.389001]  pci_scan_child_bus+0x31/0x180
[4.394014]  acpi_pci_root_create+0x1c6/0x240
[4.398013]  pci_acpi_scan_root+0x15f/0x1b0
[4.402012]  acpi_pci_root_add+0x2e6/0x400
[4.406012]  ? acpi_evaluate_integer+0x37/0x60
[4.411002]  acpi_bus_attach+0xdf/0x200
[4.415002]  acpi_bus_attach+0x6a/0x200
[4.418014]  acpi_bus_attach+0x6a/0x200
[4.422013]  acpi_bus_scan+0x38/0x70
[4.426011]  acpi_scan_init+0x10c/0x271
[4.429001]  acpi_init+0x2fa/0x348
[4.433004]  ? acpi_sleep_proc_init+0x2d/0x2d
[4.437001]  do_one_initcall+0x43/0x169
[4.441001]  kernel_init_freeable+0x1d0/0x258
[4.445003]  ? rest_init+0xe0/0xe0
[4.449001]  kernel_init+0xe/0x150

== cut here =

It looks like the pci_find_pcie_root_port() was trying to
find the Root Port for the PCI device which is the Root
Port already, it will return NULL and trigger the problem,
so check the highest_pcie_bridge to fix thie problem.

Fixes: a99b646afa8a ("PCI: Disable PCIe Relaxed Ordering if unsupported")
Reported-by: Eric Dumazet 
Signed-off-by: Eric Dumazet 
Signed-off-by: Ding Tianhong 
---
 drivers/pci/pci.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index af0cc34..7e2022f 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -522,7 +522,8 @@ struct pci_dev *pci_find_pcie_root_port(struct pci_dev *dev)
bridge = pci_upstream_bridge(bridge);
}
 
-   if (pci_pcie_type(highest_pcie_bridge) != PCI_EXP_TYPE_ROOT_PORT)
+   if (highest_pcie_bridge &&
+   pci_pcie_type(highest_pcie_bridge) != PCI_EXP_TYPE_ROOT_PORT)
return NULL;
 
return highest_pcie_bridge;
-- 
1.8.3.1




Re: [PATCH v11 0/5] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-08-15 Thread Ding Tianhong


On 2017/8/15 22:03, Eric Dumazet wrote:
> On Tue, 2017-08-15 at 06:58 -0700, Eric Dumazet wrote:
>> On Mon, 2017-08-14 at 22:15 -0700, David Miller wrote:
>>> From: Ding Tianhong <dingtianh...@huawei.com>
>>> Date: Tue, 15 Aug 2017 11:23:22 +0800
>>>
>>>> Some devices have problems with Transaction Layer Packets with the Relaxed
>>>> Ordering Attribute set.  This patch set adds a new PCIe Device Flag,
>>>> PCI_DEV_FLAGS_NO_RELAXED_ORDERING, a set of PCI Quirks to catch some known
>>>> devices with Relaxed Ordering issues, and a use of this new flag by the
>>>> cxgb4 driver to avoid using Relaxed Ordering with problematic Root Complex
>>>> Ports.
>>>  ...
>>>
>>> Series applied, thanks.
>>
>> I got a NULL deref in pci_find_pcie_root_port()
>>
> 
> This was :
> 
> [4.241029] BUG: unable to handle kernel NULL pointer dereference at 
> 0050
> [4.247001] IP: pci_find_pcie_root_port+0x62/0x80
> [4.253011] PGD 0 
> [4.253011] P4D 0 
> [4.253011] 
> [4.258013] Oops:  [#1] SMP DEBUG_PAGEALLOC
> [4.262015] Modules linked in:
> [4.265005] CPU: 31 PID: 1 Comm: swapper/0 Not tainted 4.13.0-dbx-DEV #316
> [4.271002] Hardware name: Intel RML,PCH/Iota_QC_19, BIOS 2.40.0 06/22/2016
> [4.279002] task: a2ee38cfa040 task.stack: a51ec0004000
> [4.285001] RIP: 0010:pci_find_pcie_root_port+0x62/0x80
> [4.290012] RSP: :a51ec0007ab8 EFLAGS: 00010246
> [4.295003] RAX:  RBX: a2ee36bae000 RCX: 
> 0006
> [4.303002] RDX: 081c RSI: a2ee38cfa8c8 RDI: 
> a2ee36bae000
> [4.310013] RBP: a51ec0007b58 R08: 0001 R09: 
> 
> [4.317001] R10:  R11:  R12: 
> a51ec0007ad0
> [4.324005] R13: a2ee36bae098 R14: 0002 R15: 
> a2ee37204818
> [4.331002] FS:  () GS:a2ee3fcc() 
> knlGS:
> [4.339002] CS:  0010 DS:  ES:  CR0: 80050033
> [4.345001] CR2: 0050 CR3: 00401000f000 CR4: 
> 001406e0
> [4.351002] Call Trace:
> [4.354012]  ? pci_configure_device+0x19f/0x570
> [4.359002]  ? pci_conf1_read+0xb8/0xf0
> [4.363002]  ? raw_pci_read+0x23/0x40
> [4.366011]  ? pci_read+0x2c/0x30
> [4.370014]  ? pci_read_config_word+0x67/0x70
> [4.374012]  pci_device_add+0x28/0x230
> [4.378012]  ? pci_vpd_f0_read+0x50/0x80
> [4.382014]  pci_scan_single_device+0x96/0xc0
> [4.386012]  pci_scan_slot+0x79/0xf0
> [4.389001]  pci_scan_child_bus+0x31/0x180
> [4.394014]  acpi_pci_root_create+0x1c6/0x240
> [4.398013]  pci_acpi_scan_root+0x15f/0x1b0
> [4.402012]  acpi_pci_root_add+0x2e6/0x400
> [4.406012]  ? acpi_evaluate_integer+0x37/0x60
> [4.411002]  acpi_bus_attach+0xdf/0x200
> [4.415002]  acpi_bus_attach+0x6a/0x200
> [4.418014]  acpi_bus_attach+0x6a/0x200
> [4.422013]  acpi_bus_scan+0x38/0x70
> [4.426011]  acpi_scan_init+0x10c/0x271
> [4.429001]  acpi_init+0x2fa/0x348
> [4.433004]  ? acpi_sleep_proc_init+0x2d/0x2d
> [4.437001]  do_one_initcall+0x43/0x169
> [4.441001]  kernel_init_freeable+0x1d0/0x258
> [4.445003]  ? rest_init+0xe0/0xe0
> [4.449001]  kernel_init+0xe/0x150
> [4.451002]  ret_from_fork+0x27/0x40
> [4.457004] Code: 85 d2 74 27 80 7a 4a 00 74 21 48 89 d0 48 89 c2 f6 80 1b 
> 09 00 00 10 74 07 48 8b 90 a0 0a 00 00 48 8b 52 10 48 83 7a 10 00 75 d0 <0f> 
> b7 50 50 5d 81 e2 f0 00 00 00 83 fa 40 ba 00 00 00 00 48 0f 
> [4.474012] RIP: pci_find_pcie_root_port+0x62/0x80 RSP: a51ec0007ab8
> [4.481004] CR2: 0050
> [4.484001] ---[ end trace 6f9be6a057581199 ]---
> [4.488001] Kernel panic - not syncing: Fatal exception
> [4.494013] Rebooting in 10 seconds..
> [4.494013] ACPI MEMORY or I/O RESET_REG.
> 
>>
>> This local hack seems to fix the issue.
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index 
>> af0cc3456dc1b48b1325c06c5edd2ca8cc22a640..cfd8eb5a3d0ba8347d44952ffab28d9c761044d3
>>  100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -522,7 +522,7 @@ struct pci_dev *pci_find_pcie_root_port(struct pci_dev 
>> *dev)
>> bridge = pci_upstream_bridge(bridge);
>> }
>>  
>> -   if (pci_pcie_type(highest_pcie_bridge) != PCI_EXP_TYPE_ROOT_PORT)
>> +   if (highest_pcie_bridge && pci_pcie_type(highest_pcie_bridge) != 
>> PCI_EXP_TYPE_ROOT_PORT)
>> return NULL;
>>  
>> return highest_pcie_bridge;
> 

It is very strange that I could not reproduce this problem on my server which 
is Xeon 2690v3,
but it is really a obviously issue when the dev could not find a upstream 
bridge in the
pci_find_pcie_root_port(), so the better way is just like your did in this 
patch. Thanks.

Regards
Tianhong

> 
> 
> .
> 



Re: [PATCH v11 0/5] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-08-15 Thread Ding Tianhong


On 2017/8/15 22:03, Eric Dumazet wrote:
> On Tue, 2017-08-15 at 06:58 -0700, Eric Dumazet wrote:
>> On Mon, 2017-08-14 at 22:15 -0700, David Miller wrote:
>>> From: Ding Tianhong 
>>> Date: Tue, 15 Aug 2017 11:23:22 +0800
>>>
>>>> Some devices have problems with Transaction Layer Packets with the Relaxed
>>>> Ordering Attribute set.  This patch set adds a new PCIe Device Flag,
>>>> PCI_DEV_FLAGS_NO_RELAXED_ORDERING, a set of PCI Quirks to catch some known
>>>> devices with Relaxed Ordering issues, and a use of this new flag by the
>>>> cxgb4 driver to avoid using Relaxed Ordering with problematic Root Complex
>>>> Ports.
>>>  ...
>>>
>>> Series applied, thanks.
>>
>> I got a NULL deref in pci_find_pcie_root_port()
>>
> 
> This was :
> 
> [4.241029] BUG: unable to handle kernel NULL pointer dereference at 
> 0050
> [4.247001] IP: pci_find_pcie_root_port+0x62/0x80
> [4.253011] PGD 0 
> [4.253011] P4D 0 
> [4.253011] 
> [4.258013] Oops:  [#1] SMP DEBUG_PAGEALLOC
> [4.262015] Modules linked in:
> [4.265005] CPU: 31 PID: 1 Comm: swapper/0 Not tainted 4.13.0-dbx-DEV #316
> [4.271002] Hardware name: Intel RML,PCH/Iota_QC_19, BIOS 2.40.0 06/22/2016
> [4.279002] task: a2ee38cfa040 task.stack: a51ec0004000
> [4.285001] RIP: 0010:pci_find_pcie_root_port+0x62/0x80
> [4.290012] RSP: :a51ec0007ab8 EFLAGS: 00010246
> [4.295003] RAX:  RBX: a2ee36bae000 RCX: 
> 0006
> [4.303002] RDX: 081c RSI: a2ee38cfa8c8 RDI: 
> a2ee36bae000
> [4.310013] RBP: a51ec0007b58 R08: 0001 R09: 
> 
> [4.317001] R10:  R11:  R12: 
> a51ec0007ad0
> [4.324005] R13: a2ee36bae098 R14: 0002 R15: 
> a2ee37204818
> [4.331002] FS:  () GS:a2ee3fcc() 
> knlGS:
> [4.339002] CS:  0010 DS:  ES:  CR0: 80050033
> [4.345001] CR2: 0050 CR3: 00401000f000 CR4: 
> 001406e0
> [4.351002] Call Trace:
> [4.354012]  ? pci_configure_device+0x19f/0x570
> [4.359002]  ? pci_conf1_read+0xb8/0xf0
> [4.363002]  ? raw_pci_read+0x23/0x40
> [4.366011]  ? pci_read+0x2c/0x30
> [4.370014]  ? pci_read_config_word+0x67/0x70
> [4.374012]  pci_device_add+0x28/0x230
> [4.378012]  ? pci_vpd_f0_read+0x50/0x80
> [4.382014]  pci_scan_single_device+0x96/0xc0
> [4.386012]  pci_scan_slot+0x79/0xf0
> [4.389001]  pci_scan_child_bus+0x31/0x180
> [4.394014]  acpi_pci_root_create+0x1c6/0x240
> [4.398013]  pci_acpi_scan_root+0x15f/0x1b0
> [4.402012]  acpi_pci_root_add+0x2e6/0x400
> [4.406012]  ? acpi_evaluate_integer+0x37/0x60
> [4.411002]  acpi_bus_attach+0xdf/0x200
> [4.415002]  acpi_bus_attach+0x6a/0x200
> [4.418014]  acpi_bus_attach+0x6a/0x200
> [4.422013]  acpi_bus_scan+0x38/0x70
> [4.426011]  acpi_scan_init+0x10c/0x271
> [4.429001]  acpi_init+0x2fa/0x348
> [4.433004]  ? acpi_sleep_proc_init+0x2d/0x2d
> [4.437001]  do_one_initcall+0x43/0x169
> [4.441001]  kernel_init_freeable+0x1d0/0x258
> [4.445003]  ? rest_init+0xe0/0xe0
> [4.449001]  kernel_init+0xe/0x150
> [4.451002]  ret_from_fork+0x27/0x40
> [4.457004] Code: 85 d2 74 27 80 7a 4a 00 74 21 48 89 d0 48 89 c2 f6 80 1b 
> 09 00 00 10 74 07 48 8b 90 a0 0a 00 00 48 8b 52 10 48 83 7a 10 00 75 d0 <0f> 
> b7 50 50 5d 81 e2 f0 00 00 00 83 fa 40 ba 00 00 00 00 48 0f 
> [4.474012] RIP: pci_find_pcie_root_port+0x62/0x80 RSP: a51ec0007ab8
> [4.481004] CR2: 0050
> [4.484001] ---[ end trace 6f9be6a057581199 ]---
> [4.488001] Kernel panic - not syncing: Fatal exception
> [4.494013] Rebooting in 10 seconds..
> [4.494013] ACPI MEMORY or I/O RESET_REG.
> 
>>
>> This local hack seems to fix the issue.
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index 
>> af0cc3456dc1b48b1325c06c5edd2ca8cc22a640..cfd8eb5a3d0ba8347d44952ffab28d9c761044d3
>>  100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -522,7 +522,7 @@ struct pci_dev *pci_find_pcie_root_port(struct pci_dev 
>> *dev)
>> bridge = pci_upstream_bridge(bridge);
>> }
>>  
>> -   if (pci_pcie_type(highest_pcie_bridge) != PCI_EXP_TYPE_ROOT_PORT)
>> +   if (highest_pcie_bridge && pci_pcie_type(highest_pcie_bridge) != 
>> PCI_EXP_TYPE_ROOT_PORT)
>> return NULL;
>>  
>> return highest_pcie_bridge;
> 

It is very strange that I could not reproduce this problem on my server which 
is Xeon 2690v3,
but it is really a obviously issue when the dev could not find a upstream 
bridge in the
pci_find_pcie_root_port(), so the better way is just like your did in this 
patch. Thanks.

Regards
Tianhong

> 
> 
> .
> 



[PATCH v11 5/5] net/cxgb4vf: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-08-14 Thread Ding Tianhong
From: Casey Leedom <lee...@chelsio.com>

cxgb4vf Ethernet driver now queries PCIe configuration space to
determine if it can send TLPs to it with the Relaxed Ordering
Attribute set, just like the pf did.

Signed-off-by: Casey Leedom <lee...@chelsio.com>
Signed-off-by: Ding Tianhong <dingtianh...@huawei.com>
Reviewed-by: Casey Leedom <lee...@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4vf/adapter.h  |  1 +
 drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c | 18 ++
 drivers/net/ethernet/chelsio/cxgb4vf/sge.c  |  3 +++
 3 files changed, 22 insertions(+)

diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h 
b/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h
index 109bc63..08c6ddb 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h
@@ -408,6 +408,7 @@ enum { /* adapter flags */
USING_MSI  = (1UL << 1),
USING_MSIX = (1UL << 2),
QUEUES_BOUND   = (1UL << 3),
+   ROOT_NO_RELAXED_ORDERING = (1UL << 4),
 };
 
 /*
diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c 
b/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
index ac7a150..2b85b87 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
@@ -2888,6 +2888,24 @@ static int cxgb4vf_pci_probe(struct pci_dev *pdev,
 */
adapter->name = pci_name(pdev);
adapter->msg_enable = DFLT_MSG_ENABLE;
+
+   /* If possible, we use PCIe Relaxed Ordering Attribute to deliver
+* Ingress Packet Data to Free List Buffers in order to allow for
+* chipset performance optimizations between the Root Complex and
+* Memory Controllers.  (Messages to the associated Ingress Queue
+* notifying new Packet Placement in the Free Lists Buffers will be
+* send without the Relaxed Ordering Attribute thus guaranteeing that
+* all preceding PCIe Transaction Layer Packets will be processed
+* first.)  But some Root Complexes have various issues with Upstream
+* Transaction Layer Packets with the Relaxed Ordering Attribute set.
+* The PCIe devices which under the Root Complexes will be cleared the
+* Relaxed Ordering bit in the configuration space, So we check our
+* PCIe configuration space to see if it's flagged with advice against
+* using Relaxed Ordering.
+*/
+   if (!pcie_relaxed_ordering_enabled(pdev))
+   adapter->flags |= ROOT_NO_RELAXED_ORDERING;
+
err = adap_init0(adapter);
if (err)
goto err_unmap_bar;
diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/sge.c 
b/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
index e37dde2..05498e7 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
@@ -2205,6 +2205,7 @@ int t4vf_sge_alloc_rxq(struct adapter *adapter, struct 
sge_rspq *rspq,
struct port_info *pi = netdev_priv(dev);
struct fw_iq_cmd cmd, rpl;
int ret, iqandst, flsz = 0;
+   int relaxed = !(adapter->flags & ROOT_NO_RELAXED_ORDERING);
 
/*
 * If we're using MSI interrupts and we're not initializing the
@@ -2300,6 +2301,8 @@ int t4vf_sge_alloc_rxq(struct adapter *adapter, struct 
sge_rspq *rspq,
cpu_to_be32(
FW_IQ_CMD_FL0HOSTFCMODE_V(SGE_HOSTFCMODE_NONE) |
FW_IQ_CMD_FL0PACKEN_F |
+   FW_IQ_CMD_FL0FETCHRO_V(relaxed) |
+   FW_IQ_CMD_FL0DATARO_V(relaxed) |
FW_IQ_CMD_FL0PADEN_F);
 
/* In T6, for egress queue type FL there is internal overhead
-- 
1.8.3.1




[PATCH v11 5/5] net/cxgb4vf: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-08-14 Thread Ding Tianhong
From: Casey Leedom 

cxgb4vf Ethernet driver now queries PCIe configuration space to
determine if it can send TLPs to it with the Relaxed Ordering
Attribute set, just like the pf did.

Signed-off-by: Casey Leedom 
Signed-off-by: Ding Tianhong 
Reviewed-by: Casey Leedom 
---
 drivers/net/ethernet/chelsio/cxgb4vf/adapter.h  |  1 +
 drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c | 18 ++
 drivers/net/ethernet/chelsio/cxgb4vf/sge.c  |  3 +++
 3 files changed, 22 insertions(+)

diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h 
b/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h
index 109bc63..08c6ddb 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h
@@ -408,6 +408,7 @@ enum { /* adapter flags */
USING_MSI  = (1UL << 1),
USING_MSIX = (1UL << 2),
QUEUES_BOUND   = (1UL << 3),
+   ROOT_NO_RELAXED_ORDERING = (1UL << 4),
 };
 
 /*
diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c 
b/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
index ac7a150..2b85b87 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
@@ -2888,6 +2888,24 @@ static int cxgb4vf_pci_probe(struct pci_dev *pdev,
 */
adapter->name = pci_name(pdev);
adapter->msg_enable = DFLT_MSG_ENABLE;
+
+   /* If possible, we use PCIe Relaxed Ordering Attribute to deliver
+* Ingress Packet Data to Free List Buffers in order to allow for
+* chipset performance optimizations between the Root Complex and
+* Memory Controllers.  (Messages to the associated Ingress Queue
+* notifying new Packet Placement in the Free Lists Buffers will be
+* send without the Relaxed Ordering Attribute thus guaranteeing that
+* all preceding PCIe Transaction Layer Packets will be processed
+* first.)  But some Root Complexes have various issues with Upstream
+* Transaction Layer Packets with the Relaxed Ordering Attribute set.
+* The PCIe devices which under the Root Complexes will be cleared the
+* Relaxed Ordering bit in the configuration space, So we check our
+* PCIe configuration space to see if it's flagged with advice against
+* using Relaxed Ordering.
+*/
+   if (!pcie_relaxed_ordering_enabled(pdev))
+   adapter->flags |= ROOT_NO_RELAXED_ORDERING;
+
err = adap_init0(adapter);
if (err)
goto err_unmap_bar;
diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/sge.c 
b/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
index e37dde2..05498e7 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
@@ -2205,6 +2205,7 @@ int t4vf_sge_alloc_rxq(struct adapter *adapter, struct 
sge_rspq *rspq,
struct port_info *pi = netdev_priv(dev);
struct fw_iq_cmd cmd, rpl;
int ret, iqandst, flsz = 0;
+   int relaxed = !(adapter->flags & ROOT_NO_RELAXED_ORDERING);
 
/*
 * If we're using MSI interrupts and we're not initializing the
@@ -2300,6 +2301,8 @@ int t4vf_sge_alloc_rxq(struct adapter *adapter, struct 
sge_rspq *rspq,
cpu_to_be32(
FW_IQ_CMD_FL0HOSTFCMODE_V(SGE_HOSTFCMODE_NONE) |
FW_IQ_CMD_FL0PACKEN_F |
+   FW_IQ_CMD_FL0FETCHRO_V(relaxed) |
+   FW_IQ_CMD_FL0DATARO_V(relaxed) |
FW_IQ_CMD_FL0PADEN_F);
 
/* In T6, for egress queue type FL there is internal overhead
-- 
1.8.3.1




[PATCH v11 2/5] PCI: Disable Relaxed Ordering for some Intel processors

2017-08-14 Thread Ding Tianhong
According to the Intel spec section 3.9.1 said:

3.9.1 Optimizing PCIe Performance for Accesses Toward Coherent Memory
  and Toward MMIO Regions (P2P)

In order to maximize performance for PCIe devices in the processors
listed in Table 3-6 below, the soft- ware should determine whether the
accesses are toward coherent memory (system memory) or toward MMIO
regions (P2P access to other devices). If the access is toward MMIO
region, then software can command HW to set the RO bit in the TLP
header, as this would allow hardware to achieve maximum throughput for
these types of accesses. For accesses toward coherent memory, software
can command HW to clear the RO bit in the TLP header (no RO), as this
would allow hardware to achieve maximum throughput for these types of
accesses.

Table 3-6. Intel Processor CPU RP Device IDs for Processors Optimizing
   PCIe Performance

ProcessorCPU RP Device IDs

Intel Xeon processors based on   6F01H-6F0EH
Broadwell microarchitecture

Intel Xeon processors based on   2F01H-2F0EH
Haswell microarchitecture

It means some Intel processors has performance issue when use the Relaxed
Ordering Attribute, so disable Relaxed Ordering for these root port.

Signed-off-by: Casey Leedom <lee...@chelsio.com>
Signed-off-by: Ding Tianhong <dingtianh...@huawei.com>
Acked-by: Alexander Duyck <alexander.h.du...@intel.com>
Acked-by: Ashok Raj <ashok@intel.com>
---
 drivers/pci/quirks.c | 62 
 1 file changed, 62 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 61b59bf..1272f7e 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4027,6 +4027,68 @@ static void quirk_relaxedordering_disable(struct pci_dev 
*dev)
 }
 
 /*
+ * Intel Xeon processors based on Broadwell/Haswell microarchitecture Root
+ * Complex has a Flow Control Credit issue which can cause performance
+ * problems with Upstream Transaction Layer Packets with Relaxed Ordering set.
+ */
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f01, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f02, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f03, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f04, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f05, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f06, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f07, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f08, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f09, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f0a, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f0b, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f0c, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f0d, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f0e, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f01, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f02, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f03, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f04, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f05, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_E

[PATCH v11 2/5] PCI: Disable Relaxed Ordering for some Intel processors

2017-08-14 Thread Ding Tianhong
According to the Intel spec section 3.9.1 said:

3.9.1 Optimizing PCIe Performance for Accesses Toward Coherent Memory
  and Toward MMIO Regions (P2P)

In order to maximize performance for PCIe devices in the processors
listed in Table 3-6 below, the soft- ware should determine whether the
accesses are toward coherent memory (system memory) or toward MMIO
regions (P2P access to other devices). If the access is toward MMIO
region, then software can command HW to set the RO bit in the TLP
header, as this would allow hardware to achieve maximum throughput for
these types of accesses. For accesses toward coherent memory, software
can command HW to clear the RO bit in the TLP header (no RO), as this
would allow hardware to achieve maximum throughput for these types of
accesses.

Table 3-6. Intel Processor CPU RP Device IDs for Processors Optimizing
   PCIe Performance

ProcessorCPU RP Device IDs

Intel Xeon processors based on   6F01H-6F0EH
Broadwell microarchitecture

Intel Xeon processors based on   2F01H-2F0EH
Haswell microarchitecture

It means some Intel processors has performance issue when use the Relaxed
Ordering Attribute, so disable Relaxed Ordering for these root port.

Signed-off-by: Casey Leedom 
Signed-off-by: Ding Tianhong 
Acked-by: Alexander Duyck 
Acked-by: Ashok Raj 
---
 drivers/pci/quirks.c | 62 
 1 file changed, 62 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 61b59bf..1272f7e 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4027,6 +4027,68 @@ static void quirk_relaxedordering_disable(struct pci_dev 
*dev)
 }
 
 /*
+ * Intel Xeon processors based on Broadwell/Haswell microarchitecture Root
+ * Complex has a Flow Control Credit issue which can cause performance
+ * problems with Upstream Transaction Layer Packets with Relaxed Ordering set.
+ */
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f01, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f02, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f03, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f04, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f05, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f06, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f07, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f08, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f09, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f0a, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f0b, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f0c, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f0d, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f0e, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f01, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f02, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f03, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f04, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f05, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f06, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable

[PATCH v11 4/5] net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-08-14 Thread Ding Tianhong
From: Casey Leedom <lee...@chelsio.com>

cxgb4 Ethernet driver now queries PCIe configuration space to determine
if it can send TLPs to it with the Relaxed Ordering Attribute set.

Remove the enable_pcie_relaxed_ordering() to avoid enable PCIe Capability
Device Control[Relaxed Ordering Enable] at probe routine, to make sure
the driver will not send the Relaxed Ordering TLPs to the Root Complex which
could not deal the Relaxed Ordering TLPs.

Signed-off-by: Casey Leedom <lee...@chelsio.com>
Signed-off-by: Ding Tianhong <dingtianh...@huawei.com>
Reviewed-by: Casey Leedom <lee...@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h  |  1 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 23 +--
 drivers/net/ethernet/chelsio/cxgb4/sge.c|  5 +++--
 3 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index ef4be78..09ea62e 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -529,6 +529,7 @@ enum { /* adapter flags */
USING_SOFT_PARAMS  = (1 << 6),
MASTER_PF  = (1 << 7),
FW_OFLD_CONN   = (1 << 9),
+   ROOT_NO_RELAXED_ORDERING = (1 << 10),
 };
 
 enum {
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index e403fa1..33bb867 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -4654,11 +4654,6 @@ static void print_port_info(const struct net_device *dev)
dev->name, adap->params.vpd.id, adap->name, buf);
 }
 
-static void enable_pcie_relaxed_ordering(struct pci_dev *dev)
-{
-   pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_DEVCTL_RELAX_EN);
-}
-
 /*
  * Free the following resources:
  * - memory used for tables
@@ -4908,7 +4903,6 @@ static int init_one(struct pci_dev *pdev, const struct 
pci_device_id *ent)
}
 
pci_enable_pcie_error_reporting(pdev);
-   enable_pcie_relaxed_ordering(pdev);
pci_set_master(pdev);
pci_save_state(pdev);
 
@@ -4947,6 +4941,23 @@ static int init_one(struct pci_dev *pdev, const struct 
pci_device_id *ent)
adapter->msg_enable = DFLT_MSG_ENABLE;
memset(adapter->chan_map, 0xff, sizeof(adapter->chan_map));
 
+   /* If possible, we use PCIe Relaxed Ordering Attribute to deliver
+* Ingress Packet Data to Free List Buffers in order to allow for
+* chipset performance optimizations between the Root Complex and
+* Memory Controllers.  (Messages to the associated Ingress Queue
+* notifying new Packet Placement in the Free Lists Buffers will be
+* send without the Relaxed Ordering Attribute thus guaranteeing that
+* all preceding PCIe Transaction Layer Packets will be processed
+* first.)  But some Root Complexes have various issues with Upstream
+* Transaction Layer Packets with the Relaxed Ordering Attribute set.
+* The PCIe devices which under the Root Complexes will be cleared the
+* Relaxed Ordering bit in the configuration space, So we check our
+* PCIe configuration space to see if it's flagged with advice against
+* using Relaxed Ordering.
+*/
+   if (!pcie_relaxed_ordering_enabled(pdev))
+   adapter->flags |= ROOT_NO_RELAXED_ORDERING;
+
spin_lock_init(>stats_lock);
spin_lock_init(>tid_release_lock);
spin_lock_init(>win0_lock);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c 
b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index ede1220..4ef68f6 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -2719,6 +2719,7 @@ int t4_sge_alloc_rxq(struct adapter *adap, struct 
sge_rspq *iq, bool fwevtq,
struct fw_iq_cmd c;
struct sge *s = >sge;
struct port_info *pi = netdev_priv(dev);
+   int relaxed = !(adap->flags & ROOT_NO_RELAXED_ORDERING);
 
/* Size needs to be multiple of 16, including status entry. */
iq->size = roundup(iq->size, 16);
@@ -2772,8 +2773,8 @@ int t4_sge_alloc_rxq(struct adapter *adap, struct 
sge_rspq *iq, bool fwevtq,
 
flsz = fl->size / 8 + s->stat_len / sizeof(struct tx_desc);
c.iqns_to_fl0congen |= htonl(FW_IQ_CMD_FL0PACKEN_F |
-FW_IQ_CMD_FL0FETCHRO_F |
-FW_IQ_CMD_FL0DATARO_F |
+FW_IQ_CMD_FL0FETCHRO_V(relaxed) |
+FW_IQ_CMD_FL0DATARO_V(relaxed) |
 FW_IQ_CMD_FL0PADEN_F);
if (cong >= 0)
c.iqns_to_fl0congen |=
-- 
1.8.3.1




[PATCH v11 4/5] net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-08-14 Thread Ding Tianhong
From: Casey Leedom 

cxgb4 Ethernet driver now queries PCIe configuration space to determine
if it can send TLPs to it with the Relaxed Ordering Attribute set.

Remove the enable_pcie_relaxed_ordering() to avoid enable PCIe Capability
Device Control[Relaxed Ordering Enable] at probe routine, to make sure
the driver will not send the Relaxed Ordering TLPs to the Root Complex which
could not deal the Relaxed Ordering TLPs.

Signed-off-by: Casey Leedom 
Signed-off-by: Ding Tianhong 
Reviewed-by: Casey Leedom 
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h  |  1 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 23 +--
 drivers/net/ethernet/chelsio/cxgb4/sge.c|  5 +++--
 3 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index ef4be78..09ea62e 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -529,6 +529,7 @@ enum { /* adapter flags */
USING_SOFT_PARAMS  = (1 << 6),
MASTER_PF  = (1 << 7),
FW_OFLD_CONN   = (1 << 9),
+   ROOT_NO_RELAXED_ORDERING = (1 << 10),
 };
 
 enum {
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index e403fa1..33bb867 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -4654,11 +4654,6 @@ static void print_port_info(const struct net_device *dev)
dev->name, adap->params.vpd.id, adap->name, buf);
 }
 
-static void enable_pcie_relaxed_ordering(struct pci_dev *dev)
-{
-   pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_DEVCTL_RELAX_EN);
-}
-
 /*
  * Free the following resources:
  * - memory used for tables
@@ -4908,7 +4903,6 @@ static int init_one(struct pci_dev *pdev, const struct 
pci_device_id *ent)
}
 
pci_enable_pcie_error_reporting(pdev);
-   enable_pcie_relaxed_ordering(pdev);
pci_set_master(pdev);
pci_save_state(pdev);
 
@@ -4947,6 +4941,23 @@ static int init_one(struct pci_dev *pdev, const struct 
pci_device_id *ent)
adapter->msg_enable = DFLT_MSG_ENABLE;
memset(adapter->chan_map, 0xff, sizeof(adapter->chan_map));
 
+   /* If possible, we use PCIe Relaxed Ordering Attribute to deliver
+* Ingress Packet Data to Free List Buffers in order to allow for
+* chipset performance optimizations between the Root Complex and
+* Memory Controllers.  (Messages to the associated Ingress Queue
+* notifying new Packet Placement in the Free Lists Buffers will be
+* send without the Relaxed Ordering Attribute thus guaranteeing that
+* all preceding PCIe Transaction Layer Packets will be processed
+* first.)  But some Root Complexes have various issues with Upstream
+* Transaction Layer Packets with the Relaxed Ordering Attribute set.
+* The PCIe devices which under the Root Complexes will be cleared the
+* Relaxed Ordering bit in the configuration space, So we check our
+* PCIe configuration space to see if it's flagged with advice against
+* using Relaxed Ordering.
+*/
+   if (!pcie_relaxed_ordering_enabled(pdev))
+   adapter->flags |= ROOT_NO_RELAXED_ORDERING;
+
spin_lock_init(>stats_lock);
spin_lock_init(>tid_release_lock);
spin_lock_init(>win0_lock);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c 
b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index ede1220..4ef68f6 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -2719,6 +2719,7 @@ int t4_sge_alloc_rxq(struct adapter *adap, struct 
sge_rspq *iq, bool fwevtq,
struct fw_iq_cmd c;
struct sge *s = >sge;
struct port_info *pi = netdev_priv(dev);
+   int relaxed = !(adap->flags & ROOT_NO_RELAXED_ORDERING);
 
/* Size needs to be multiple of 16, including status entry. */
iq->size = roundup(iq->size, 16);
@@ -2772,8 +2773,8 @@ int t4_sge_alloc_rxq(struct adapter *adap, struct 
sge_rspq *iq, bool fwevtq,
 
flsz = fl->size / 8 + s->stat_len / sizeof(struct tx_desc);
c.iqns_to_fl0congen |= htonl(FW_IQ_CMD_FL0PACKEN_F |
-FW_IQ_CMD_FL0FETCHRO_F |
-FW_IQ_CMD_FL0DATARO_F |
+FW_IQ_CMD_FL0FETCHRO_V(relaxed) |
+FW_IQ_CMD_FL0DATARO_V(relaxed) |
 FW_IQ_CMD_FL0PADEN_F);
if (cong >= 0)
c.iqns_to_fl0congen |=
-- 
1.8.3.1




[PATCH v11 3/5] PCI: Disable Relaxed Ordering Attributes for AMD A1100

2017-08-14 Thread Ding Tianhong
Casey reported that the AMD ARM A1100 SoC has a bug in its PCIe
Root Port where Upstream Transaction Layer Packets with the Relaxed
Ordering Attribute clear are allowed to bypass earlier TLPs with
Relaxed Ordering set, it would cause Data Corruption, so we need
to disable Relaxed Ordering Attribute when Upstream TLPs to the
Root Port.

Reported-and-suggested-by: Casey Leedom <lee...@chelsio.com>
Signed-off-by: Casey Leedom <lee...@chelsio.com>
Signed-off-by: Ding Tianhong <dingtianh...@huawei.com>
Acked-by: Casey Leedom <lee...@chelsio.com>
---
 drivers/pci/quirks.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 1272f7e..1407604 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4089,6 +4089,22 @@ static void quirk_relaxedordering_disable(struct pci_dev 
*dev)
  quirk_relaxedordering_disable);
 
 /*
+ * The AMD ARM A1100 (AKA "SEATTLE") SoC has a bug in its PCIe Root Complex
+ * where Upstream Transaction Layer Packets with the Relaxed Ordering
+ * Attribute clear are allowed to bypass earlier TLPs with Relaxed Ordering
+ * set.  This is a violation of the PCIe 3.0 Transaction Ordering Rules
+ * outlined in Section 2.4.1 (PCI Express(r) Base Specification Revision 3.0
+ * November 10, 2010).  As a result, on this platform we can't use Relaxed
+ * Ordering for Upstream TLPs.
+ */
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a00, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a01, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a02, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+
+/*
  * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
  * values for the Attribute as were supplied in the header of the
  * corresponding Request, except as explicitly allowed when IDO is used."
-- 
1.8.3.1




[PATCH v11 3/5] PCI: Disable Relaxed Ordering Attributes for AMD A1100

2017-08-14 Thread Ding Tianhong
Casey reported that the AMD ARM A1100 SoC has a bug in its PCIe
Root Port where Upstream Transaction Layer Packets with the Relaxed
Ordering Attribute clear are allowed to bypass earlier TLPs with
Relaxed Ordering set, it would cause Data Corruption, so we need
to disable Relaxed Ordering Attribute when Upstream TLPs to the
Root Port.

Reported-and-suggested-by: Casey Leedom 
Signed-off-by: Casey Leedom 
Signed-off-by: Ding Tianhong 
Acked-by: Casey Leedom 
---
 drivers/pci/quirks.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 1272f7e..1407604 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4089,6 +4089,22 @@ static void quirk_relaxedordering_disable(struct pci_dev 
*dev)
  quirk_relaxedordering_disable);
 
 /*
+ * The AMD ARM A1100 (AKA "SEATTLE") SoC has a bug in its PCIe Root Complex
+ * where Upstream Transaction Layer Packets with the Relaxed Ordering
+ * Attribute clear are allowed to bypass earlier TLPs with Relaxed Ordering
+ * set.  This is a violation of the PCIe 3.0 Transaction Ordering Rules
+ * outlined in Section 2.4.1 (PCI Express(r) Base Specification Revision 3.0
+ * November 10, 2010).  As a result, on this platform we can't use Relaxed
+ * Ordering for Upstream TLPs.
+ */
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a00, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a01, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a02, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+
+/*
  * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
  * values for the Attribute as were supplied in the header of the
  * corresponding Request, except as explicitly allowed when IDO is used."
-- 
1.8.3.1




[PATCH v11 1/5] PCI: Disable PCIe Relaxed Ordering if unsupported

2017-08-14 Thread Ding Tianhong
When bit4 is set in the PCIe Device Control register, it indicates
whether the device is permitted to use relaxed ordering.
On some platforms using relaxed ordering can have performance issues or
due to erratum can cause data-corruption. In such cases devices must avoid
using relaxed ordering.

The patch adds a new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING to indicate that
Relaxed Ordering (RO) attribute should not be used for Transaction Layer
Packets (TLP) targeted towards these affected root complexes.

This patch checks if there is any node in the hierarchy that indicates that
using relaxed ordering is not safe. In such cases the patch turns off the
relaxed ordering by clearing the capability for this device.

Signed-off-by: Casey Leedom <lee...@chelsio.com>
Signed-off-by: Ding Tianhong <dingtianh...@huawei.com>
Acked-by: Ashok Raj <ashok@intel.com>
Acked-by: Alexander Duyck <alexander.h.du...@intel.com>
Acked-by: Casey Leedom <lee...@chelsio.com>
---
 drivers/pci/probe.c  | 43 +++
 drivers/pci/quirks.c | 11 +++
 include/linux/pci.h  |  3 +++
 3 files changed, 57 insertions(+)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index c31310d..779e646 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1762,6 +1762,48 @@ static void pci_configure_extended_tags(struct pci_dev 
*dev)
 PCI_EXP_DEVCTL_EXT_TAG);
 }
 
+/**
+ * pcie_relaxed_ordering_enabled - Probe for PCIe relaxed ordering enable
+ * @dev: PCI device to query
+ *
+ * Returns true if the device has enabled relaxed ordering attribute.
+ */
+bool pcie_relaxed_ordering_enabled(struct pci_dev *dev)
+{
+   u16 v;
+
+   pcie_capability_read_word(dev, PCI_EXP_DEVCTL, );
+
+   return !!(v & PCI_EXP_DEVCTL_RELAX_EN);
+}
+EXPORT_SYMBOL(pcie_relaxed_ordering_enabled);
+
+static void pci_configure_relaxed_ordering(struct pci_dev *dev)
+{
+   struct pci_dev *root;
+
+   /* PCI_EXP_DEVICE_RELAX_EN is RsvdP in VFs */
+   if (dev->is_virtfn)
+   return;
+
+   if (!pcie_relaxed_ordering_enabled(dev))
+   return;
+
+   /*
+* For now, we only deal with Relaxed Ordering issues with Root
+* Ports. Peer-to-Peer DMA is another can of worms.
+*/
+   root = pci_find_pcie_root_port(dev);
+   if (!root)
+   return;
+
+   if (root->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING) {
+   pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
+  PCI_EXP_DEVCTL_RELAX_EN);
+   dev_info(>dev, "Disable Relaxed Ordering because the Root 
Port didn't support it\n");
+   }
+}
+
 static void pci_configure_device(struct pci_dev *dev)
 {
struct hotplug_params hpp;
@@ -1769,6 +1811,7 @@ static void pci_configure_device(struct pci_dev *dev)
 
pci_configure_mps(dev);
pci_configure_extended_tags(dev);
+   pci_configure_relaxed_ordering(dev);
 
memset(, 0, sizeof(hpp));
ret = pci_get_hp_params(dev, );
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 6967c6b..61b59bf 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4016,6 +4016,17 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
  quirk_tw686x_class);
 
 /*
+ * Some devices have problems with Transaction Layer Packets with the Relaxed
+ * Ordering Attribute set.  Such devices should mark themselves and other
+ * Device Drivers should check before sending TLPs with RO set.
+ */
+static void quirk_relaxedordering_disable(struct pci_dev *dev)
+{
+   dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
+   dev_info(>dev, "Disable Relaxed Ordering Attributes to avoid PCIe 
Completion erratum\n");
+}
+
+/*
  * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
  * values for the Attribute as were supplied in the header of the
  * corresponding Request, except as explicitly allowed when IDO is used."
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 4869e66..29606fb 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -188,6 +188,8 @@ enum pci_dev_flags {
 * the direct_complete optimization.
 */
PCI_DEV_FLAGS_NEEDS_RESUME = (__force pci_dev_flags_t) (1 << 11),
+   /* Don't use Relaxed Ordering for TLPs directed at this device */
+   PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 12),
 };
 
 enum pci_irq_reroute_variant {
@@ -1125,6 +1127,7 @@ int pci_add_ext_cap_save_buffer(struct pci_dev *dev,
 void pci_pme_wakeup_bus(struct pci_bus *bus);
 void pci_d3cold_enable(struct pci_dev *dev);
 void pci_d3cold_disable(struct pci_dev *dev);
+bool pcie_relaxed_ordering_enabled(struct pci_dev *dev);
 
 /* PCI Virtual Channel */
 int pci_save_vc_state(struct pci_dev *dev);
-- 
1.8.3.1




[PATCH v11 1/5] PCI: Disable PCIe Relaxed Ordering if unsupported

2017-08-14 Thread Ding Tianhong
When bit4 is set in the PCIe Device Control register, it indicates
whether the device is permitted to use relaxed ordering.
On some platforms using relaxed ordering can have performance issues or
due to erratum can cause data-corruption. In such cases devices must avoid
using relaxed ordering.

The patch adds a new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING to indicate that
Relaxed Ordering (RO) attribute should not be used for Transaction Layer
Packets (TLP) targeted towards these affected root complexes.

This patch checks if there is any node in the hierarchy that indicates that
using relaxed ordering is not safe. In such cases the patch turns off the
relaxed ordering by clearing the capability for this device.

Signed-off-by: Casey Leedom 
Signed-off-by: Ding Tianhong 
Acked-by: Ashok Raj 
Acked-by: Alexander Duyck 
Acked-by: Casey Leedom 
---
 drivers/pci/probe.c  | 43 +++
 drivers/pci/quirks.c | 11 +++
 include/linux/pci.h  |  3 +++
 3 files changed, 57 insertions(+)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index c31310d..779e646 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1762,6 +1762,48 @@ static void pci_configure_extended_tags(struct pci_dev 
*dev)
 PCI_EXP_DEVCTL_EXT_TAG);
 }
 
+/**
+ * pcie_relaxed_ordering_enabled - Probe for PCIe relaxed ordering enable
+ * @dev: PCI device to query
+ *
+ * Returns true if the device has enabled relaxed ordering attribute.
+ */
+bool pcie_relaxed_ordering_enabled(struct pci_dev *dev)
+{
+   u16 v;
+
+   pcie_capability_read_word(dev, PCI_EXP_DEVCTL, );
+
+   return !!(v & PCI_EXP_DEVCTL_RELAX_EN);
+}
+EXPORT_SYMBOL(pcie_relaxed_ordering_enabled);
+
+static void pci_configure_relaxed_ordering(struct pci_dev *dev)
+{
+   struct pci_dev *root;
+
+   /* PCI_EXP_DEVICE_RELAX_EN is RsvdP in VFs */
+   if (dev->is_virtfn)
+   return;
+
+   if (!pcie_relaxed_ordering_enabled(dev))
+   return;
+
+   /*
+* For now, we only deal with Relaxed Ordering issues with Root
+* Ports. Peer-to-Peer DMA is another can of worms.
+*/
+   root = pci_find_pcie_root_port(dev);
+   if (!root)
+   return;
+
+   if (root->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING) {
+   pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
+  PCI_EXP_DEVCTL_RELAX_EN);
+   dev_info(>dev, "Disable Relaxed Ordering because the Root 
Port didn't support it\n");
+   }
+}
+
 static void pci_configure_device(struct pci_dev *dev)
 {
struct hotplug_params hpp;
@@ -1769,6 +1811,7 @@ static void pci_configure_device(struct pci_dev *dev)
 
pci_configure_mps(dev);
pci_configure_extended_tags(dev);
+   pci_configure_relaxed_ordering(dev);
 
memset(, 0, sizeof(hpp));
ret = pci_get_hp_params(dev, );
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 6967c6b..61b59bf 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4016,6 +4016,17 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
  quirk_tw686x_class);
 
 /*
+ * Some devices have problems with Transaction Layer Packets with the Relaxed
+ * Ordering Attribute set.  Such devices should mark themselves and other
+ * Device Drivers should check before sending TLPs with RO set.
+ */
+static void quirk_relaxedordering_disable(struct pci_dev *dev)
+{
+   dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
+   dev_info(>dev, "Disable Relaxed Ordering Attributes to avoid PCIe 
Completion erratum\n");
+}
+
+/*
  * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
  * values for the Attribute as were supplied in the header of the
  * corresponding Request, except as explicitly allowed when IDO is used."
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 4869e66..29606fb 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -188,6 +188,8 @@ enum pci_dev_flags {
 * the direct_complete optimization.
 */
PCI_DEV_FLAGS_NEEDS_RESUME = (__force pci_dev_flags_t) (1 << 11),
+   /* Don't use Relaxed Ordering for TLPs directed at this device */
+   PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 12),
 };
 
 enum pci_irq_reroute_variant {
@@ -1125,6 +1127,7 @@ int pci_add_ext_cap_save_buffer(struct pci_dev *dev,
 void pci_pme_wakeup_bus(struct pci_bus *bus);
 void pci_d3cold_enable(struct pci_dev *dev);
 void pci_d3cold_disable(struct pci_dev *dev);
+bool pcie_relaxed_ordering_enabled(struct pci_dev *dev);
 
 /* PCI Virtual Channel */
 int pci_save_vc_state(struct pci_dev *dev);
-- 
1.8.3.1




[PATCH v11 0/5] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-08-14 Thread Ding Tianhong
Some devices have problems with Transaction Layer Packets with the Relaxed
Ordering Attribute set.  This patch set adds a new PCIe Device Flag,
PCI_DEV_FLAGS_NO_RELAXED_ORDERING, a set of PCI Quirks to catch some known
devices with Relaxed Ordering issues, and a use of this new flag by the
cxgb4 driver to avoid using Relaxed Ordering with problematic Root Complex
Ports.

It's been years since I've submitted kernel.org patches, I appolgise for the
almost certain submission errors.

v2: Alexander point out that the v1 was only a part of the whole solution,
some platform which has some issues could use the new flag to indicate
that it is not safe to enable relaxed ordering attribute, then we need
to clear the relaxed ordering enable bits in the PCI configuration when
initializing the device. So add a new second patch to modify the PCI
initialization code to clear the relaxed ordering enable bit in the
event that the root complex doesn't want relaxed ordering enabled.

The third patch was base on the v1's second patch and only be changed
to query the relaxed ordering enable bit in the PCI configuration space
to allow the Chelsio NIC to send TLPs with the relaxed ordering attributes
set.

This version didn't plan to drop the defines for Intel Drivers to use the
new checking way to enable relaxed ordering because it is not the hardest
part of the moment, we could fix it in next patchset when this patches
reach the goal.

v3: Redesigned the logic for pci_configure_relaxed_ordering when configuration,
If a PCIe device didn't enable the relaxed ordering attribute default,
we should not do anything in the PCIe configuration, otherwise we
should check if any of the devices above us do not support relaxed
ordering by the PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag, then base on
the result if we get a return that indicate that the relaxed ordering
is not supported we should update our device to disable relaxed ordering
in configuration space. If the device above us doesn't exist or isn't
the PCIe device, we shouldn't do anything and skip updating relaxed ordering
because we are probably running in a guest.

v4: Rename the functions pcie_get_relaxed_ordering and 
pcie_disable_relaxed_ordering
according John's suggestion, and modify the description, use the true/false
as the return value.

We shouldn't enable relaxed ordering attribute by the setting in the root
complex configuration space for PCIe device, so fix it for cxgb4.

Fix some format issues.

v5: Removed the unnecessary code for some function which only return the bool
value, and add the check for VF device.

Make this patch set base on 4.12-rc5.

v6: Fix the logic error in the need to enable the relaxed ordering attribute 
for cxgb4.

v7: The cxgb4 drivers will enable the PCIe Capability Device Control[Relaxed
Ordering Enable] in PCI Probe() routine, this will break our current
solution for some platform which has problematic when enable the relaxed
ordering attribute. According to the latest recommendations, remove the
enable_pcie_relaxed_ordering(), although it could not cover the Peer-to-Peer
scene, but we agree to leave this problem until we really trigger it.

Make this patch set base on 4.12 release version.

v8: Change the second patch title and description to make it more reasonable,
add the acked-by from Alex and Ashok.

Add a new patch to enable the Relaxed Ordering Attribute for cxgb4vf driver.

Make this patch set base on 4.13-rc2.

v9: The document (https://software.intel.com/sites/default/files/managed/9e/
bc/64-ia-32-architectures-optimization-manual.pdf) indicate that the Xeon
processors based on Broadwell/Haswell microarchitecture has the problem
with Relaxed Ordering Attribute enabled, so add the whole list Device ID
from Intel to the patch.

v10: Significant rework based on Bjorn's feedback, reorganize the first 2 
patches,
 now the Intel and AMD erratum soc has been divided to the different 
patches,
 rename the pcie_relaxed_ordering_supported() to 
pcie_relaxed_ordering_enabled(),
 and no need to check every intervening switch except the root ports, update
 some commits.

v11: We shouldn't let the Intel engineer to acked the AMD's erratum patch, fix 
the
 funny mistake.

Casey Leedom (2):
  net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
  net/cxgb4vf: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

Ding Tianhong (3):
  PCI: Disable PCIe Relaxed Ordering if unsupported
  PCI: Disable Relaxed Ordering for some Intel processors
  PCI: Disable Relaxed Ordering Attributes for AMD A1100

 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h |  1 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c| 23 --
 drivers/net/ethernet/chelsio/cxgb4/sge.c   |  5 +-
 drivers/net/ethernet/chelsio/cxgb4vf/adapter.h |  1 +
 .../net/ethernet/chelsio

[PATCH v11 0/5] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-08-14 Thread Ding Tianhong
Some devices have problems with Transaction Layer Packets with the Relaxed
Ordering Attribute set.  This patch set adds a new PCIe Device Flag,
PCI_DEV_FLAGS_NO_RELAXED_ORDERING, a set of PCI Quirks to catch some known
devices with Relaxed Ordering issues, and a use of this new flag by the
cxgb4 driver to avoid using Relaxed Ordering with problematic Root Complex
Ports.

It's been years since I've submitted kernel.org patches, I appolgise for the
almost certain submission errors.

v2: Alexander point out that the v1 was only a part of the whole solution,
some platform which has some issues could use the new flag to indicate
that it is not safe to enable relaxed ordering attribute, then we need
to clear the relaxed ordering enable bits in the PCI configuration when
initializing the device. So add a new second patch to modify the PCI
initialization code to clear the relaxed ordering enable bit in the
event that the root complex doesn't want relaxed ordering enabled.

The third patch was base on the v1's second patch and only be changed
to query the relaxed ordering enable bit in the PCI configuration space
to allow the Chelsio NIC to send TLPs with the relaxed ordering attributes
set.

This version didn't plan to drop the defines for Intel Drivers to use the
new checking way to enable relaxed ordering because it is not the hardest
part of the moment, we could fix it in next patchset when this patches
reach the goal.

v3: Redesigned the logic for pci_configure_relaxed_ordering when configuration,
If a PCIe device didn't enable the relaxed ordering attribute default,
we should not do anything in the PCIe configuration, otherwise we
should check if any of the devices above us do not support relaxed
ordering by the PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag, then base on
the result if we get a return that indicate that the relaxed ordering
is not supported we should update our device to disable relaxed ordering
in configuration space. If the device above us doesn't exist or isn't
the PCIe device, we shouldn't do anything and skip updating relaxed ordering
because we are probably running in a guest.

v4: Rename the functions pcie_get_relaxed_ordering and 
pcie_disable_relaxed_ordering
according John's suggestion, and modify the description, use the true/false
as the return value.

We shouldn't enable relaxed ordering attribute by the setting in the root
complex configuration space for PCIe device, so fix it for cxgb4.

Fix some format issues.

v5: Removed the unnecessary code for some function which only return the bool
value, and add the check for VF device.

Make this patch set base on 4.12-rc5.

v6: Fix the logic error in the need to enable the relaxed ordering attribute 
for cxgb4.

v7: The cxgb4 drivers will enable the PCIe Capability Device Control[Relaxed
Ordering Enable] in PCI Probe() routine, this will break our current
solution for some platform which has problematic when enable the relaxed
ordering attribute. According to the latest recommendations, remove the
enable_pcie_relaxed_ordering(), although it could not cover the Peer-to-Peer
scene, but we agree to leave this problem until we really trigger it.

Make this patch set base on 4.12 release version.

v8: Change the second patch title and description to make it more reasonable,
add the acked-by from Alex and Ashok.

Add a new patch to enable the Relaxed Ordering Attribute for cxgb4vf driver.

Make this patch set base on 4.13-rc2.

v9: The document (https://software.intel.com/sites/default/files/managed/9e/
bc/64-ia-32-architectures-optimization-manual.pdf) indicate that the Xeon
processors based on Broadwell/Haswell microarchitecture has the problem
with Relaxed Ordering Attribute enabled, so add the whole list Device ID
from Intel to the patch.

v10: Significant rework based on Bjorn's feedback, reorganize the first 2 
patches,
 now the Intel and AMD erratum soc has been divided to the different 
patches,
 rename the pcie_relaxed_ordering_supported() to 
pcie_relaxed_ordering_enabled(),
 and no need to check every intervening switch except the root ports, update
 some commits.

v11: We shouldn't let the Intel engineer to acked the AMD's erratum patch, fix 
the
 funny mistake.

Casey Leedom (2):
  net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
  net/cxgb4vf: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

Ding Tianhong (3):
  PCI: Disable PCIe Relaxed Ordering if unsupported
  PCI: Disable Relaxed Ordering for some Intel processors
  PCI: Disable Relaxed Ordering Attributes for AMD A1100

 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h |  1 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c| 23 --
 drivers/net/ethernet/chelsio/cxgb4/sge.c   |  5 +-
 drivers/net/ethernet/chelsio/cxgb4vf/adapter.h |  1 +
 .../net/ethernet/chelsio

Re: [PATCH v10 3/5] PCI: Disable Relaxed Ordering Attributes for AMD A1100

2017-08-14 Thread Ding Tianhong


On 2017/8/15 1:19, Raj, Ashok wrote:
> On Mon, Aug 14, 2017 at 11:44:57PM +0800, Ding Tianhong wrote:
>> Casey reported that the AMD ARM A1100 SoC has a bug in its PCIe
>> Root Port where Upstream Transaction Layer Packets with the Relaxed
>> Ordering Attribute clear are allowed to bypass earlier TLPs with
>> Relaxed Ordering set, it would cause Data Corruption, so we need
>> to disable Relaxed Ordering Attribute when Upstream TLPs to the
>> Root Port.
>>
>> Signed-off-by: Casey Leedom <lee...@chelsio.com>
>> Signed-off-by: Ding Tianhong <dingtianh...@huawei.com>
>> Acked-by: Alexander Duyck <alexander.h.du...@intel.com>
>> Acked-by: Ashok Raj <ashok@intel.com>
> 
> I can't ack this patch :-).. must be someone from AMD. Please remove my
> signature from this.
> 

Sorry for funny mistake :)  I will fix it.

Ding

>> ---
>>  drivers/pci/quirks.c | 16 
>>  1 file changed, 16 insertions(+)
>>
>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>> index 1272f7e..1407604 100644
>> --- a/drivers/pci/quirks.c
>> +++ b/drivers/pci/quirks.c
>> @@ -4089,6 +4089,22 @@ static void quirk_relaxedordering_disable(struct 
>> pci_dev *dev)
>>quirk_relaxedordering_disable);
>>  
>>  /*
>> + * The AMD ARM A1100 (AKA "SEATTLE") SoC has a bug in its PCIe Root Complex
>> + * where Upstream Transaction Layer Packets with the Relaxed Ordering
>> + * Attribute clear are allowed to bypass earlier TLPs with Relaxed Ordering
>> + * set.  This is a violation of the PCIe 3.0 Transaction Ordering Rules
>> + * outlined in Section 2.4.1 (PCI Express(r) Base Specification Revision 3.0
>> + * November 10, 2010).  As a result, on this platform we can't use Relaxed
>> + * Ordering for Upstream TLPs.
>> + */
>> +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a00, 
>> PCI_CLASS_NOT_DEFINED, 8,
>> +  quirk_relaxedordering_disable);
>> +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a01, 
>> PCI_CLASS_NOT_DEFINED, 8,
>> +  quirk_relaxedordering_disable);
>> +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a02, 
>> PCI_CLASS_NOT_DEFINED, 8,
>> +  quirk_relaxedordering_disable);
>> +
>> +/*
>>   * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
>>   * values for the Attribute as were supplied in the header of the
>>   * corresponding Request, except as explicitly allowed when IDO is used."
>> -- 
>> 1.8.3.1
>>
>>
> 
> .
> 



Re: [PATCH v10 3/5] PCI: Disable Relaxed Ordering Attributes for AMD A1100

2017-08-14 Thread Ding Tianhong


On 2017/8/15 1:19, Raj, Ashok wrote:
> On Mon, Aug 14, 2017 at 11:44:57PM +0800, Ding Tianhong wrote:
>> Casey reported that the AMD ARM A1100 SoC has a bug in its PCIe
>> Root Port where Upstream Transaction Layer Packets with the Relaxed
>> Ordering Attribute clear are allowed to bypass earlier TLPs with
>> Relaxed Ordering set, it would cause Data Corruption, so we need
>> to disable Relaxed Ordering Attribute when Upstream TLPs to the
>> Root Port.
>>
>> Signed-off-by: Casey Leedom 
>> Signed-off-by: Ding Tianhong 
>> Acked-by: Alexander Duyck 
>> Acked-by: Ashok Raj 
> 
> I can't ack this patch :-).. must be someone from AMD. Please remove my
> signature from this.
> 

Sorry for funny mistake :)  I will fix it.

Ding

>> ---
>>  drivers/pci/quirks.c | 16 
>>  1 file changed, 16 insertions(+)
>>
>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>> index 1272f7e..1407604 100644
>> --- a/drivers/pci/quirks.c
>> +++ b/drivers/pci/quirks.c
>> @@ -4089,6 +4089,22 @@ static void quirk_relaxedordering_disable(struct 
>> pci_dev *dev)
>>quirk_relaxedordering_disable);
>>  
>>  /*
>> + * The AMD ARM A1100 (AKA "SEATTLE") SoC has a bug in its PCIe Root Complex
>> + * where Upstream Transaction Layer Packets with the Relaxed Ordering
>> + * Attribute clear are allowed to bypass earlier TLPs with Relaxed Ordering
>> + * set.  This is a violation of the PCIe 3.0 Transaction Ordering Rules
>> + * outlined in Section 2.4.1 (PCI Express(r) Base Specification Revision 3.0
>> + * November 10, 2010).  As a result, on this platform we can't use Relaxed
>> + * Ordering for Upstream TLPs.
>> + */
>> +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a00, 
>> PCI_CLASS_NOT_DEFINED, 8,
>> +  quirk_relaxedordering_disable);
>> +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a01, 
>> PCI_CLASS_NOT_DEFINED, 8,
>> +  quirk_relaxedordering_disable);
>> +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a02, 
>> PCI_CLASS_NOT_DEFINED, 8,
>> +  quirk_relaxedordering_disable);
>> +
>> +/*
>>   * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
>>   * values for the Attribute as were supplied in the header of the
>>   * corresponding Request, except as explicitly allowed when IDO is used."
>> -- 
>> 1.8.3.1
>>
>>
> 
> .
> 



[PATCH v10 3/5] PCI: Disable Relaxed Ordering Attributes for AMD A1100

2017-08-14 Thread Ding Tianhong
Casey reported that the AMD ARM A1100 SoC has a bug in its PCIe
Root Port where Upstream Transaction Layer Packets with the Relaxed
Ordering Attribute clear are allowed to bypass earlier TLPs with
Relaxed Ordering set, it would cause Data Corruption, so we need
to disable Relaxed Ordering Attribute when Upstream TLPs to the
Root Port.

Signed-off-by: Casey Leedom <lee...@chelsio.com>
Signed-off-by: Ding Tianhong <dingtianh...@huawei.com>
Acked-by: Alexander Duyck <alexander.h.du...@intel.com>
Acked-by: Ashok Raj <ashok@intel.com>
---
 drivers/pci/quirks.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 1272f7e..1407604 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4089,6 +4089,22 @@ static void quirk_relaxedordering_disable(struct pci_dev 
*dev)
  quirk_relaxedordering_disable);
 
 /*
+ * The AMD ARM A1100 (AKA "SEATTLE") SoC has a bug in its PCIe Root Complex
+ * where Upstream Transaction Layer Packets with the Relaxed Ordering
+ * Attribute clear are allowed to bypass earlier TLPs with Relaxed Ordering
+ * set.  This is a violation of the PCIe 3.0 Transaction Ordering Rules
+ * outlined in Section 2.4.1 (PCI Express(r) Base Specification Revision 3.0
+ * November 10, 2010).  As a result, on this platform we can't use Relaxed
+ * Ordering for Upstream TLPs.
+ */
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a00, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a01, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a02, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+
+/*
  * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
  * values for the Attribute as were supplied in the header of the
  * corresponding Request, except as explicitly allowed when IDO is used."
-- 
1.8.3.1




[PATCH v10 3/5] PCI: Disable Relaxed Ordering Attributes for AMD A1100

2017-08-14 Thread Ding Tianhong
Casey reported that the AMD ARM A1100 SoC has a bug in its PCIe
Root Port where Upstream Transaction Layer Packets with the Relaxed
Ordering Attribute clear are allowed to bypass earlier TLPs with
Relaxed Ordering set, it would cause Data Corruption, so we need
to disable Relaxed Ordering Attribute when Upstream TLPs to the
Root Port.

Signed-off-by: Casey Leedom 
Signed-off-by: Ding Tianhong 
Acked-by: Alexander Duyck 
Acked-by: Ashok Raj 
---
 drivers/pci/quirks.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 1272f7e..1407604 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4089,6 +4089,22 @@ static void quirk_relaxedordering_disable(struct pci_dev 
*dev)
  quirk_relaxedordering_disable);
 
 /*
+ * The AMD ARM A1100 (AKA "SEATTLE") SoC has a bug in its PCIe Root Complex
+ * where Upstream Transaction Layer Packets with the Relaxed Ordering
+ * Attribute clear are allowed to bypass earlier TLPs with Relaxed Ordering
+ * set.  This is a violation of the PCIe 3.0 Transaction Ordering Rules
+ * outlined in Section 2.4.1 (PCI Express(r) Base Specification Revision 3.0
+ * November 10, 2010).  As a result, on this platform we can't use Relaxed
+ * Ordering for Upstream TLPs.
+ */
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a00, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a01, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a02, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+
+/*
  * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
  * values for the Attribute as were supplied in the header of the
  * corresponding Request, except as explicitly allowed when IDO is used."
-- 
1.8.3.1




[PATCH v10 0/5] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-08-14 Thread Ding Tianhong
Some devices have problems with Transaction Layer Packets with the Relaxed
Ordering Attribute set.  This patch set adds a new PCIe Device Flag,
PCI_DEV_FLAGS_NO_RELAXED_ORDERING, a set of PCI Quirks to catch some known
devices with Relaxed Ordering issues, and a use of this new flag by the
cxgb4 driver to avoid using Relaxed Ordering with problematic Root Complex
Ports.

It's been years since I've submitted kernel.org patches, I appolgise for the
almost certain submission errors.

v2: Alexander point out that the v1 was only a part of the whole solution,
some platform which has some issues could use the new flag to indicate
that it is not safe to enable relaxed ordering attribute, then we need
to clear the relaxed ordering enable bits in the PCI configuration when
initializing the device. So add a new second patch to modify the PCI
initialization code to clear the relaxed ordering enable bit in the
event that the root complex doesn't want relaxed ordering enabled.

The third patch was base on the v1's second patch and only be changed
to query the relaxed ordering enable bit in the PCI configuration space
to allow the Chelsio NIC to send TLPs with the relaxed ordering attributes
set.

This version didn't plan to drop the defines for Intel Drivers to use the
new checking way to enable relaxed ordering because it is not the hardest
part of the moment, we could fix it in next patchset when this patches
reach the goal.

v3: Redesigned the logic for pci_configure_relaxed_ordering when configuration,
If a PCIe device didn't enable the relaxed ordering attribute default,
we should not do anything in the PCIe configuration, otherwise we
should check if any of the devices above us do not support relaxed
ordering by the PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag, then base on
the result if we get a return that indicate that the relaxed ordering
is not supported we should update our device to disable relaxed ordering
in configuration space. If the device above us doesn't exist or isn't
the PCIe device, we shouldn't do anything and skip updating relaxed ordering
because we are probably running in a guest.

v4: Rename the functions pcie_get_relaxed_ordering and 
pcie_disable_relaxed_ordering
according John's suggestion, and modify the description, use the true/false
as the return value.

We shouldn't enable relaxed ordering attribute by the setting in the root
complex configuration space for PCIe device, so fix it for cxgb4.

Fix some format issues.

v5: Removed the unnecessary code for some function which only return the bool
value, and add the check for VF device.

Make this patch set base on 4.12-rc5.

v6: Fix the logic error in the need to enable the relaxed ordering attribute 
for cxgb4.

v7: The cxgb4 drivers will enable the PCIe Capability Device Control[Relaxed
Ordering Enable] in PCI Probe() routine, this will break our current
solution for some platform which has problematic when enable the relaxed
ordering attribute. According to the latest recommendations, remove the
enable_pcie_relaxed_ordering(), although it could not cover the Peer-to-Peer
scene, but we agree to leave this problem until we really trigger it.

Make this patch set base on 4.12 release version.

v8: Change the second patch title and description to make it more reasonable,
add the acked-by from Alex and Ashok.

Add a new patch to enable the Relaxed Ordering Attribute for cxgb4vf driver.

Make this patch set base on 4.13-rc2.

v9: The document (https://software.intel.com/sites/default/files/managed/9e/
bc/64-ia-32-architectures-optimization-manual.pdf) indicate that the Xeon
processors based on Broadwell/Haswell microarchitecture has the problem
with Relaxed Ordering Attribute enabled, so add the whole list Device ID
from Intel to the patch.

v10: Significant rework based on Bjorn's feedback, reorganize the first 2 
patches,
 now the Intel and AMD erratum soc has been divided to the different 
patches,
 rename the pcie_relaxed_ordering_supported() to 
pcie_relaxed_ordering_enabled(),
 and no need to check every intervening switch except the root ports, update
 some commits.

Casey Leedom (2):
  net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
  net/cxgb4vf: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

Ding Tianhong (3):
  PCI: Disable PCIe Relaxed Ordering if unsupported
  PCI: Disable Relaxed Ordering for some Intel processors
  PCI: Disable Relaxed Ordering Attributes for AMD A1100

 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h |  1 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c| 23 --
 drivers/net/ethernet/chelsio/cxgb4/sge.c   |  5 +-
 drivers/net/ethernet/chelsio/cxgb4vf/adapter.h |  1 +
 .../net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c| 18 +
 drivers/net/ethernet/chelsio/cxgb4vf/sge.c |  3 +
 drivers

[PATCH v10 0/5] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-08-14 Thread Ding Tianhong
Some devices have problems with Transaction Layer Packets with the Relaxed
Ordering Attribute set.  This patch set adds a new PCIe Device Flag,
PCI_DEV_FLAGS_NO_RELAXED_ORDERING, a set of PCI Quirks to catch some known
devices with Relaxed Ordering issues, and a use of this new flag by the
cxgb4 driver to avoid using Relaxed Ordering with problematic Root Complex
Ports.

It's been years since I've submitted kernel.org patches, I appolgise for the
almost certain submission errors.

v2: Alexander point out that the v1 was only a part of the whole solution,
some platform which has some issues could use the new flag to indicate
that it is not safe to enable relaxed ordering attribute, then we need
to clear the relaxed ordering enable bits in the PCI configuration when
initializing the device. So add a new second patch to modify the PCI
initialization code to clear the relaxed ordering enable bit in the
event that the root complex doesn't want relaxed ordering enabled.

The third patch was base on the v1's second patch and only be changed
to query the relaxed ordering enable bit in the PCI configuration space
to allow the Chelsio NIC to send TLPs with the relaxed ordering attributes
set.

This version didn't plan to drop the defines for Intel Drivers to use the
new checking way to enable relaxed ordering because it is not the hardest
part of the moment, we could fix it in next patchset when this patches
reach the goal.

v3: Redesigned the logic for pci_configure_relaxed_ordering when configuration,
If a PCIe device didn't enable the relaxed ordering attribute default,
we should not do anything in the PCIe configuration, otherwise we
should check if any of the devices above us do not support relaxed
ordering by the PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag, then base on
the result if we get a return that indicate that the relaxed ordering
is not supported we should update our device to disable relaxed ordering
in configuration space. If the device above us doesn't exist or isn't
the PCIe device, we shouldn't do anything and skip updating relaxed ordering
because we are probably running in a guest.

v4: Rename the functions pcie_get_relaxed_ordering and 
pcie_disable_relaxed_ordering
according John's suggestion, and modify the description, use the true/false
as the return value.

We shouldn't enable relaxed ordering attribute by the setting in the root
complex configuration space for PCIe device, so fix it for cxgb4.

Fix some format issues.

v5: Removed the unnecessary code for some function which only return the bool
value, and add the check for VF device.

Make this patch set base on 4.12-rc5.

v6: Fix the logic error in the need to enable the relaxed ordering attribute 
for cxgb4.

v7: The cxgb4 drivers will enable the PCIe Capability Device Control[Relaxed
Ordering Enable] in PCI Probe() routine, this will break our current
solution for some platform which has problematic when enable the relaxed
ordering attribute. According to the latest recommendations, remove the
enable_pcie_relaxed_ordering(), although it could not cover the Peer-to-Peer
scene, but we agree to leave this problem until we really trigger it.

Make this patch set base on 4.12 release version.

v8: Change the second patch title and description to make it more reasonable,
add the acked-by from Alex and Ashok.

Add a new patch to enable the Relaxed Ordering Attribute for cxgb4vf driver.

Make this patch set base on 4.13-rc2.

v9: The document (https://software.intel.com/sites/default/files/managed/9e/
bc/64-ia-32-architectures-optimization-manual.pdf) indicate that the Xeon
processors based on Broadwell/Haswell microarchitecture has the problem
with Relaxed Ordering Attribute enabled, so add the whole list Device ID
from Intel to the patch.

v10: Significant rework based on Bjorn's feedback, reorganize the first 2 
patches,
 now the Intel and AMD erratum soc has been divided to the different 
patches,
 rename the pcie_relaxed_ordering_supported() to 
pcie_relaxed_ordering_enabled(),
 and no need to check every intervening switch except the root ports, update
 some commits.

Casey Leedom (2):
  net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
  net/cxgb4vf: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

Ding Tianhong (3):
  PCI: Disable PCIe Relaxed Ordering if unsupported
  PCI: Disable Relaxed Ordering for some Intel processors
  PCI: Disable Relaxed Ordering Attributes for AMD A1100

 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h |  1 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c| 23 --
 drivers/net/ethernet/chelsio/cxgb4/sge.c   |  5 +-
 drivers/net/ethernet/chelsio/cxgb4vf/adapter.h |  1 +
 .../net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c| 18 +
 drivers/net/ethernet/chelsio/cxgb4vf/sge.c |  3 +
 drivers

[PATCH v10 2/5] PCI: Disable Relaxed Ordering for some Intel processors

2017-08-14 Thread Ding Tianhong
According to the Intel spec section 3.9.1 said:

3.9.1 Optimizing PCIe Performance for Accesses Toward Coherent Memory
  and Toward MMIO Regions (P2P)

In order to maximize performance for PCIe devices in the processors
listed in Table 3-6 below, the soft- ware should determine whether the
accesses are toward coherent memory (system memory) or toward MMIO
regions (P2P access to other devices). If the access is toward MMIO
region, then software can command HW to set the RO bit in the TLP
header, as this would allow hardware to achieve maximum throughput for
these types of accesses. For accesses toward coherent memory, software
can command HW to clear the RO bit in the TLP header (no RO), as this
would allow hardware to achieve maximum throughput for these types of
accesses.

Table 3-6. Intel Processor CPU RP Device IDs for Processors Optimizing
   PCIe Performance

ProcessorCPU RP Device IDs

Intel Xeon processors based on   6F01H-6F0EH
Broadwell microarchitecture

Intel Xeon processors based on   2F01H-2F0EH
Haswell microarchitecture

It means some Intel processors has performance issue when use the Relaxed
Ordering Attribute, so disable Relaxed Ordering for these root port.

Signed-off-by: Casey Leedom <lee...@chelsio.com>
Signed-off-by: Ding Tianhong <dingtianh...@huawei.com>
Acked-by: Alexander Duyck <alexander.h.du...@intel.com>
Acked-by: Ashok Raj <ashok@intel.com>
---
 drivers/pci/quirks.c | 62 
 1 file changed, 62 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 61b59bf..1272f7e 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4027,6 +4027,68 @@ static void quirk_relaxedordering_disable(struct pci_dev 
*dev)
 }
 
 /*
+ * Intel Xeon processors based on Broadwell/Haswell microarchitecture Root
+ * Complex has a Flow Control Credit issue which can cause performance
+ * problems with Upstream Transaction Layer Packets with Relaxed Ordering set.
+ */
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f01, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f02, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f03, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f04, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f05, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f06, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f07, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f08, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f09, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f0a, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f0b, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f0c, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f0d, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f0e, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f01, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f02, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f03, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f04, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f05, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_E

[PATCH v10 2/5] PCI: Disable Relaxed Ordering for some Intel processors

2017-08-14 Thread Ding Tianhong
According to the Intel spec section 3.9.1 said:

3.9.1 Optimizing PCIe Performance for Accesses Toward Coherent Memory
  and Toward MMIO Regions (P2P)

In order to maximize performance for PCIe devices in the processors
listed in Table 3-6 below, the soft- ware should determine whether the
accesses are toward coherent memory (system memory) or toward MMIO
regions (P2P access to other devices). If the access is toward MMIO
region, then software can command HW to set the RO bit in the TLP
header, as this would allow hardware to achieve maximum throughput for
these types of accesses. For accesses toward coherent memory, software
can command HW to clear the RO bit in the TLP header (no RO), as this
would allow hardware to achieve maximum throughput for these types of
accesses.

Table 3-6. Intel Processor CPU RP Device IDs for Processors Optimizing
   PCIe Performance

ProcessorCPU RP Device IDs

Intel Xeon processors based on   6F01H-6F0EH
Broadwell microarchitecture

Intel Xeon processors based on   2F01H-2F0EH
Haswell microarchitecture

It means some Intel processors has performance issue when use the Relaxed
Ordering Attribute, so disable Relaxed Ordering for these root port.

Signed-off-by: Casey Leedom 
Signed-off-by: Ding Tianhong 
Acked-by: Alexander Duyck 
Acked-by: Ashok Raj 
---
 drivers/pci/quirks.c | 62 
 1 file changed, 62 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 61b59bf..1272f7e 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4027,6 +4027,68 @@ static void quirk_relaxedordering_disable(struct pci_dev 
*dev)
 }
 
 /*
+ * Intel Xeon processors based on Broadwell/Haswell microarchitecture Root
+ * Complex has a Flow Control Credit issue which can cause performance
+ * problems with Upstream Transaction Layer Packets with Relaxed Ordering set.
+ */
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f01, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f02, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f03, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f04, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f05, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f06, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f07, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f08, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f09, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f0a, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f0b, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f0c, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f0d, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f0e, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f01, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f02, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f03, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f04, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f05, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f06, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable

[PATCH v10 4/5] net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-08-14 Thread Ding Tianhong
From: Casey Leedom <lee...@chelsio.com>

cxgb4 Ethernet driver now queries PCIe configuration space to determine
if it can send TLPs to it with the Relaxed Ordering Attribute set.

Remove the enable_pcie_relaxed_ordering() to avoid enable PCIe Capability
Device Control[Relaxed Ordering Enable] at probe routine, to make sure
the driver will not send the Relaxed Ordering TLPs to the Root Complex which
could not deal the Relaxed Ordering TLPs.

Signed-off-by: Casey Leedom <lee...@chelsio.com>
Signed-off-by: Ding Tianhong <dingtianh...@huawei.com>
Reviewed-by: Casey Leedom <lee...@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h  |  1 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 23 +--
 drivers/net/ethernet/chelsio/cxgb4/sge.c|  5 +++--
 3 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index ef4be78..09ea62e 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -529,6 +529,7 @@ enum { /* adapter flags */
USING_SOFT_PARAMS  = (1 << 6),
MASTER_PF  = (1 << 7),
FW_OFLD_CONN   = (1 << 9),
+   ROOT_NO_RELAXED_ORDERING = (1 << 10),
 };
 
 enum {
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index e403fa1..33bb867 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -4654,11 +4654,6 @@ static void print_port_info(const struct net_device *dev)
dev->name, adap->params.vpd.id, adap->name, buf);
 }
 
-static void enable_pcie_relaxed_ordering(struct pci_dev *dev)
-{
-   pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_DEVCTL_RELAX_EN);
-}
-
 /*
  * Free the following resources:
  * - memory used for tables
@@ -4908,7 +4903,6 @@ static int init_one(struct pci_dev *pdev, const struct 
pci_device_id *ent)
}
 
pci_enable_pcie_error_reporting(pdev);
-   enable_pcie_relaxed_ordering(pdev);
pci_set_master(pdev);
pci_save_state(pdev);
 
@@ -4947,6 +4941,23 @@ static int init_one(struct pci_dev *pdev, const struct 
pci_device_id *ent)
adapter->msg_enable = DFLT_MSG_ENABLE;
memset(adapter->chan_map, 0xff, sizeof(adapter->chan_map));
 
+   /* If possible, we use PCIe Relaxed Ordering Attribute to deliver
+* Ingress Packet Data to Free List Buffers in order to allow for
+* chipset performance optimizations between the Root Complex and
+* Memory Controllers.  (Messages to the associated Ingress Queue
+* notifying new Packet Placement in the Free Lists Buffers will be
+* send without the Relaxed Ordering Attribute thus guaranteeing that
+* all preceding PCIe Transaction Layer Packets will be processed
+* first.)  But some Root Complexes have various issues with Upstream
+* Transaction Layer Packets with the Relaxed Ordering Attribute set.
+* The PCIe devices which under the Root Complexes will be cleared the
+* Relaxed Ordering bit in the configuration space, So we check our
+* PCIe configuration space to see if it's flagged with advice against
+* using Relaxed Ordering.
+*/
+   if (!pcie_relaxed_ordering_enabled(pdev))
+   adapter->flags |= ROOT_NO_RELAXED_ORDERING;
+
spin_lock_init(>stats_lock);
spin_lock_init(>tid_release_lock);
spin_lock_init(>win0_lock);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c 
b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index ede1220..4ef68f6 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -2719,6 +2719,7 @@ int t4_sge_alloc_rxq(struct adapter *adap, struct 
sge_rspq *iq, bool fwevtq,
struct fw_iq_cmd c;
struct sge *s = >sge;
struct port_info *pi = netdev_priv(dev);
+   int relaxed = !(adap->flags & ROOT_NO_RELAXED_ORDERING);
 
/* Size needs to be multiple of 16, including status entry. */
iq->size = roundup(iq->size, 16);
@@ -2772,8 +2773,8 @@ int t4_sge_alloc_rxq(struct adapter *adap, struct 
sge_rspq *iq, bool fwevtq,
 
flsz = fl->size / 8 + s->stat_len / sizeof(struct tx_desc);
c.iqns_to_fl0congen |= htonl(FW_IQ_CMD_FL0PACKEN_F |
-FW_IQ_CMD_FL0FETCHRO_F |
-FW_IQ_CMD_FL0DATARO_F |
+FW_IQ_CMD_FL0FETCHRO_V(relaxed) |
+FW_IQ_CMD_FL0DATARO_V(relaxed) |
 FW_IQ_CMD_FL0PADEN_F);
if (cong >= 0)
c.iqns_to_fl0congen |=
-- 
1.8.3.1




[PATCH v10 4/5] net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-08-14 Thread Ding Tianhong
From: Casey Leedom 

cxgb4 Ethernet driver now queries PCIe configuration space to determine
if it can send TLPs to it with the Relaxed Ordering Attribute set.

Remove the enable_pcie_relaxed_ordering() to avoid enable PCIe Capability
Device Control[Relaxed Ordering Enable] at probe routine, to make sure
the driver will not send the Relaxed Ordering TLPs to the Root Complex which
could not deal the Relaxed Ordering TLPs.

Signed-off-by: Casey Leedom 
Signed-off-by: Ding Tianhong 
Reviewed-by: Casey Leedom 
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h  |  1 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 23 +--
 drivers/net/ethernet/chelsio/cxgb4/sge.c|  5 +++--
 3 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index ef4be78..09ea62e 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -529,6 +529,7 @@ enum { /* adapter flags */
USING_SOFT_PARAMS  = (1 << 6),
MASTER_PF  = (1 << 7),
FW_OFLD_CONN   = (1 << 9),
+   ROOT_NO_RELAXED_ORDERING = (1 << 10),
 };
 
 enum {
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index e403fa1..33bb867 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -4654,11 +4654,6 @@ static void print_port_info(const struct net_device *dev)
dev->name, adap->params.vpd.id, adap->name, buf);
 }
 
-static void enable_pcie_relaxed_ordering(struct pci_dev *dev)
-{
-   pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_DEVCTL_RELAX_EN);
-}
-
 /*
  * Free the following resources:
  * - memory used for tables
@@ -4908,7 +4903,6 @@ static int init_one(struct pci_dev *pdev, const struct 
pci_device_id *ent)
}
 
pci_enable_pcie_error_reporting(pdev);
-   enable_pcie_relaxed_ordering(pdev);
pci_set_master(pdev);
pci_save_state(pdev);
 
@@ -4947,6 +4941,23 @@ static int init_one(struct pci_dev *pdev, const struct 
pci_device_id *ent)
adapter->msg_enable = DFLT_MSG_ENABLE;
memset(adapter->chan_map, 0xff, sizeof(adapter->chan_map));
 
+   /* If possible, we use PCIe Relaxed Ordering Attribute to deliver
+* Ingress Packet Data to Free List Buffers in order to allow for
+* chipset performance optimizations between the Root Complex and
+* Memory Controllers.  (Messages to the associated Ingress Queue
+* notifying new Packet Placement in the Free Lists Buffers will be
+* send without the Relaxed Ordering Attribute thus guaranteeing that
+* all preceding PCIe Transaction Layer Packets will be processed
+* first.)  But some Root Complexes have various issues with Upstream
+* Transaction Layer Packets with the Relaxed Ordering Attribute set.
+* The PCIe devices which under the Root Complexes will be cleared the
+* Relaxed Ordering bit in the configuration space, So we check our
+* PCIe configuration space to see if it's flagged with advice against
+* using Relaxed Ordering.
+*/
+   if (!pcie_relaxed_ordering_enabled(pdev))
+   adapter->flags |= ROOT_NO_RELAXED_ORDERING;
+
spin_lock_init(>stats_lock);
spin_lock_init(>tid_release_lock);
spin_lock_init(>win0_lock);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c 
b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index ede1220..4ef68f6 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -2719,6 +2719,7 @@ int t4_sge_alloc_rxq(struct adapter *adap, struct 
sge_rspq *iq, bool fwevtq,
struct fw_iq_cmd c;
struct sge *s = >sge;
struct port_info *pi = netdev_priv(dev);
+   int relaxed = !(adap->flags & ROOT_NO_RELAXED_ORDERING);
 
/* Size needs to be multiple of 16, including status entry. */
iq->size = roundup(iq->size, 16);
@@ -2772,8 +2773,8 @@ int t4_sge_alloc_rxq(struct adapter *adap, struct 
sge_rspq *iq, bool fwevtq,
 
flsz = fl->size / 8 + s->stat_len / sizeof(struct tx_desc);
c.iqns_to_fl0congen |= htonl(FW_IQ_CMD_FL0PACKEN_F |
-FW_IQ_CMD_FL0FETCHRO_F |
-FW_IQ_CMD_FL0DATARO_F |
+FW_IQ_CMD_FL0FETCHRO_V(relaxed) |
+FW_IQ_CMD_FL0DATARO_V(relaxed) |
 FW_IQ_CMD_FL0PADEN_F);
if (cong >= 0)
c.iqns_to_fl0congen |=
-- 
1.8.3.1




[PATCH v10 5/5] net/cxgb4vf: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-08-14 Thread Ding Tianhong
From: Casey Leedom <lee...@chelsio.com>

cxgb4vf Ethernet driver now queries PCIe configuration space to
determine if it can send TLPs to it with the Relaxed Ordering
Attribute set, just like the pf did.

Signed-off-by: Casey Leedom <lee...@chelsio.com>
Signed-off-by: Ding Tianhong <dingtianh...@huawei.com>
Reviewed-by: Casey Leedom <lee...@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4vf/adapter.h  |  1 +
 drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c | 18 ++
 drivers/net/ethernet/chelsio/cxgb4vf/sge.c  |  3 +++
 3 files changed, 22 insertions(+)

diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h 
b/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h
index 109bc63..08c6ddb 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h
@@ -408,6 +408,7 @@ enum { /* adapter flags */
USING_MSI  = (1UL << 1),
USING_MSIX = (1UL << 2),
QUEUES_BOUND   = (1UL << 3),
+   ROOT_NO_RELAXED_ORDERING = (1UL << 4),
 };
 
 /*
diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c 
b/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
index ac7a150..2b85b87 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
@@ -2888,6 +2888,24 @@ static int cxgb4vf_pci_probe(struct pci_dev *pdev,
 */
adapter->name = pci_name(pdev);
adapter->msg_enable = DFLT_MSG_ENABLE;
+
+   /* If possible, we use PCIe Relaxed Ordering Attribute to deliver
+* Ingress Packet Data to Free List Buffers in order to allow for
+* chipset performance optimizations between the Root Complex and
+* Memory Controllers.  (Messages to the associated Ingress Queue
+* notifying new Packet Placement in the Free Lists Buffers will be
+* send without the Relaxed Ordering Attribute thus guaranteeing that
+* all preceding PCIe Transaction Layer Packets will be processed
+* first.)  But some Root Complexes have various issues with Upstream
+* Transaction Layer Packets with the Relaxed Ordering Attribute set.
+* The PCIe devices which under the Root Complexes will be cleared the
+* Relaxed Ordering bit in the configuration space, So we check our
+* PCIe configuration space to see if it's flagged with advice against
+* using Relaxed Ordering.
+*/
+   if (!pcie_relaxed_ordering_enabled(pdev))
+   adapter->flags |= ROOT_NO_RELAXED_ORDERING;
+
err = adap_init0(adapter);
if (err)
goto err_unmap_bar;
diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/sge.c 
b/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
index e37dde2..05498e7 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
@@ -2205,6 +2205,7 @@ int t4vf_sge_alloc_rxq(struct adapter *adapter, struct 
sge_rspq *rspq,
struct port_info *pi = netdev_priv(dev);
struct fw_iq_cmd cmd, rpl;
int ret, iqandst, flsz = 0;
+   int relaxed = !(adapter->flags & ROOT_NO_RELAXED_ORDERING);
 
/*
 * If we're using MSI interrupts and we're not initializing the
@@ -2300,6 +2301,8 @@ int t4vf_sge_alloc_rxq(struct adapter *adapter, struct 
sge_rspq *rspq,
cpu_to_be32(
FW_IQ_CMD_FL0HOSTFCMODE_V(SGE_HOSTFCMODE_NONE) |
FW_IQ_CMD_FL0PACKEN_F |
+   FW_IQ_CMD_FL0FETCHRO_V(relaxed) |
+   FW_IQ_CMD_FL0DATARO_V(relaxed) |
FW_IQ_CMD_FL0PADEN_F);
 
/* In T6, for egress queue type FL there is internal overhead
-- 
1.8.3.1




[PATCH v10 5/5] net/cxgb4vf: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-08-14 Thread Ding Tianhong
From: Casey Leedom 

cxgb4vf Ethernet driver now queries PCIe configuration space to
determine if it can send TLPs to it with the Relaxed Ordering
Attribute set, just like the pf did.

Signed-off-by: Casey Leedom 
Signed-off-by: Ding Tianhong 
Reviewed-by: Casey Leedom 
---
 drivers/net/ethernet/chelsio/cxgb4vf/adapter.h  |  1 +
 drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c | 18 ++
 drivers/net/ethernet/chelsio/cxgb4vf/sge.c  |  3 +++
 3 files changed, 22 insertions(+)

diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h 
b/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h
index 109bc63..08c6ddb 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h
@@ -408,6 +408,7 @@ enum { /* adapter flags */
USING_MSI  = (1UL << 1),
USING_MSIX = (1UL << 2),
QUEUES_BOUND   = (1UL << 3),
+   ROOT_NO_RELAXED_ORDERING = (1UL << 4),
 };
 
 /*
diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c 
b/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
index ac7a150..2b85b87 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
@@ -2888,6 +2888,24 @@ static int cxgb4vf_pci_probe(struct pci_dev *pdev,
 */
adapter->name = pci_name(pdev);
adapter->msg_enable = DFLT_MSG_ENABLE;
+
+   /* If possible, we use PCIe Relaxed Ordering Attribute to deliver
+* Ingress Packet Data to Free List Buffers in order to allow for
+* chipset performance optimizations between the Root Complex and
+* Memory Controllers.  (Messages to the associated Ingress Queue
+* notifying new Packet Placement in the Free Lists Buffers will be
+* send without the Relaxed Ordering Attribute thus guaranteeing that
+* all preceding PCIe Transaction Layer Packets will be processed
+* first.)  But some Root Complexes have various issues with Upstream
+* Transaction Layer Packets with the Relaxed Ordering Attribute set.
+* The PCIe devices which under the Root Complexes will be cleared the
+* Relaxed Ordering bit in the configuration space, So we check our
+* PCIe configuration space to see if it's flagged with advice against
+* using Relaxed Ordering.
+*/
+   if (!pcie_relaxed_ordering_enabled(pdev))
+   adapter->flags |= ROOT_NO_RELAXED_ORDERING;
+
err = adap_init0(adapter);
if (err)
goto err_unmap_bar;
diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/sge.c 
b/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
index e37dde2..05498e7 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
@@ -2205,6 +2205,7 @@ int t4vf_sge_alloc_rxq(struct adapter *adapter, struct 
sge_rspq *rspq,
struct port_info *pi = netdev_priv(dev);
struct fw_iq_cmd cmd, rpl;
int ret, iqandst, flsz = 0;
+   int relaxed = !(adapter->flags & ROOT_NO_RELAXED_ORDERING);
 
/*
 * If we're using MSI interrupts and we're not initializing the
@@ -2300,6 +2301,8 @@ int t4vf_sge_alloc_rxq(struct adapter *adapter, struct 
sge_rspq *rspq,
cpu_to_be32(
FW_IQ_CMD_FL0HOSTFCMODE_V(SGE_HOSTFCMODE_NONE) |
FW_IQ_CMD_FL0PACKEN_F |
+   FW_IQ_CMD_FL0FETCHRO_V(relaxed) |
+   FW_IQ_CMD_FL0DATARO_V(relaxed) |
FW_IQ_CMD_FL0PADEN_F);
 
/* In T6, for egress queue type FL there is internal overhead
-- 
1.8.3.1




[PATCH v10 1/5] PCI: Disable PCIe Relaxed Ordering if unsupported

2017-08-14 Thread Ding Tianhong
When bit4 is set in the PCIe Device Control register, it indicates
whether the device is permitted to use relaxed ordering.
On some platforms using relaxed ordering can have performance issues or
due to erratum can cause data-corruption. In such cases devices must avoid
using relaxed ordering.

The patch adds a new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING to indicate that
Relaxed Ordering (RO) attribute should not be used for Transaction Layer
Packets (TLP) targeted towards these affected root complexes.

This patch checks if there is any node in the hierarchy that indicates that
using relaxed ordering is not safe. In such cases the patch turns off the
relaxed ordering by clearing the capability for this device.

Signed-off-by: Casey Leedom <lee...@chelsio.com>
Signed-off-by: Ding Tianhong <dingtianh...@huawei.com>
Acked-by: Ashok Raj <ashok@intel.com>
Acked-by: Alexander Duyck <alexander.h.du...@intel.com>
Acked-by: Casey Leedom <lee...@chelsio.com>
---
 drivers/pci/probe.c  | 43 +++
 drivers/pci/quirks.c | 11 +++
 include/linux/pci.h  |  3 +++
 3 files changed, 57 insertions(+)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index c31310d..779e646 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1762,6 +1762,48 @@ static void pci_configure_extended_tags(struct pci_dev 
*dev)
 PCI_EXP_DEVCTL_EXT_TAG);
 }
 
+/**
+ * pcie_relaxed_ordering_enabled - Probe for PCIe relaxed ordering enable
+ * @dev: PCI device to query
+ *
+ * Returns true if the device has enabled relaxed ordering attribute.
+ */
+bool pcie_relaxed_ordering_enabled(struct pci_dev *dev)
+{
+   u16 v;
+
+   pcie_capability_read_word(dev, PCI_EXP_DEVCTL, );
+
+   return !!(v & PCI_EXP_DEVCTL_RELAX_EN);
+}
+EXPORT_SYMBOL(pcie_relaxed_ordering_enabled);
+
+static void pci_configure_relaxed_ordering(struct pci_dev *dev)
+{
+   struct pci_dev *root;
+
+   /* PCI_EXP_DEVICE_RELAX_EN is RsvdP in VFs */
+   if (dev->is_virtfn)
+   return;
+
+   if (!pcie_relaxed_ordering_enabled(dev))
+   return;
+
+   /*
+* For now, we only deal with Relaxed Ordering issues with Root
+* Ports. Peer-to-Peer DMA is another can of worms.
+*/
+   root = pci_find_pcie_root_port(dev);
+   if (!root)
+   return;
+
+   if (root->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING) {
+   pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
+  PCI_EXP_DEVCTL_RELAX_EN);
+   dev_info(>dev, "Disable Relaxed Ordering because the Root 
Port didn't support it\n");
+   }
+}
+
 static void pci_configure_device(struct pci_dev *dev)
 {
struct hotplug_params hpp;
@@ -1769,6 +1811,7 @@ static void pci_configure_device(struct pci_dev *dev)
 
pci_configure_mps(dev);
pci_configure_extended_tags(dev);
+   pci_configure_relaxed_ordering(dev);
 
memset(, 0, sizeof(hpp));
ret = pci_get_hp_params(dev, );
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 6967c6b..61b59bf 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4016,6 +4016,17 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
  quirk_tw686x_class);
 
 /*
+ * Some devices have problems with Transaction Layer Packets with the Relaxed
+ * Ordering Attribute set.  Such devices should mark themselves and other
+ * Device Drivers should check before sending TLPs with RO set.
+ */
+static void quirk_relaxedordering_disable(struct pci_dev *dev)
+{
+   dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
+   dev_info(>dev, "Disable Relaxed Ordering Attributes to avoid PCIe 
Completion erratum\n");
+}
+
+/*
  * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
  * values for the Attribute as were supplied in the header of the
  * corresponding Request, except as explicitly allowed when IDO is used."
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 4869e66..29606fb 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -188,6 +188,8 @@ enum pci_dev_flags {
 * the direct_complete optimization.
 */
PCI_DEV_FLAGS_NEEDS_RESUME = (__force pci_dev_flags_t) (1 << 11),
+   /* Don't use Relaxed Ordering for TLPs directed at this device */
+   PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 12),
 };
 
 enum pci_irq_reroute_variant {
@@ -1125,6 +1127,7 @@ int pci_add_ext_cap_save_buffer(struct pci_dev *dev,
 void pci_pme_wakeup_bus(struct pci_bus *bus);
 void pci_d3cold_enable(struct pci_dev *dev);
 void pci_d3cold_disable(struct pci_dev *dev);
+bool pcie_relaxed_ordering_enabled(struct pci_dev *dev);
 
 /* PCI Virtual Channel */
 int pci_save_vc_state(struct pci_dev *dev);
-- 
1.8.3.1




[PATCH v10 1/5] PCI: Disable PCIe Relaxed Ordering if unsupported

2017-08-14 Thread Ding Tianhong
When bit4 is set in the PCIe Device Control register, it indicates
whether the device is permitted to use relaxed ordering.
On some platforms using relaxed ordering can have performance issues or
due to erratum can cause data-corruption. In such cases devices must avoid
using relaxed ordering.

The patch adds a new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING to indicate that
Relaxed Ordering (RO) attribute should not be used for Transaction Layer
Packets (TLP) targeted towards these affected root complexes.

This patch checks if there is any node in the hierarchy that indicates that
using relaxed ordering is not safe. In such cases the patch turns off the
relaxed ordering by clearing the capability for this device.

Signed-off-by: Casey Leedom 
Signed-off-by: Ding Tianhong 
Acked-by: Ashok Raj 
Acked-by: Alexander Duyck 
Acked-by: Casey Leedom 
---
 drivers/pci/probe.c  | 43 +++
 drivers/pci/quirks.c | 11 +++
 include/linux/pci.h  |  3 +++
 3 files changed, 57 insertions(+)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index c31310d..779e646 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1762,6 +1762,48 @@ static void pci_configure_extended_tags(struct pci_dev 
*dev)
 PCI_EXP_DEVCTL_EXT_TAG);
 }
 
+/**
+ * pcie_relaxed_ordering_enabled - Probe for PCIe relaxed ordering enable
+ * @dev: PCI device to query
+ *
+ * Returns true if the device has enabled relaxed ordering attribute.
+ */
+bool pcie_relaxed_ordering_enabled(struct pci_dev *dev)
+{
+   u16 v;
+
+   pcie_capability_read_word(dev, PCI_EXP_DEVCTL, );
+
+   return !!(v & PCI_EXP_DEVCTL_RELAX_EN);
+}
+EXPORT_SYMBOL(pcie_relaxed_ordering_enabled);
+
+static void pci_configure_relaxed_ordering(struct pci_dev *dev)
+{
+   struct pci_dev *root;
+
+   /* PCI_EXP_DEVICE_RELAX_EN is RsvdP in VFs */
+   if (dev->is_virtfn)
+   return;
+
+   if (!pcie_relaxed_ordering_enabled(dev))
+   return;
+
+   /*
+* For now, we only deal with Relaxed Ordering issues with Root
+* Ports. Peer-to-Peer DMA is another can of worms.
+*/
+   root = pci_find_pcie_root_port(dev);
+   if (!root)
+   return;
+
+   if (root->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING) {
+   pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
+  PCI_EXP_DEVCTL_RELAX_EN);
+   dev_info(>dev, "Disable Relaxed Ordering because the Root 
Port didn't support it\n");
+   }
+}
+
 static void pci_configure_device(struct pci_dev *dev)
 {
struct hotplug_params hpp;
@@ -1769,6 +1811,7 @@ static void pci_configure_device(struct pci_dev *dev)
 
pci_configure_mps(dev);
pci_configure_extended_tags(dev);
+   pci_configure_relaxed_ordering(dev);
 
memset(, 0, sizeof(hpp));
ret = pci_get_hp_params(dev, );
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 6967c6b..61b59bf 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4016,6 +4016,17 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
  quirk_tw686x_class);
 
 /*
+ * Some devices have problems with Transaction Layer Packets with the Relaxed
+ * Ordering Attribute set.  Such devices should mark themselves and other
+ * Device Drivers should check before sending TLPs with RO set.
+ */
+static void quirk_relaxedordering_disable(struct pci_dev *dev)
+{
+   dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
+   dev_info(>dev, "Disable Relaxed Ordering Attributes to avoid PCIe 
Completion erratum\n");
+}
+
+/*
  * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
  * values for the Attribute as were supplied in the header of the
  * corresponding Request, except as explicitly allowed when IDO is used."
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 4869e66..29606fb 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -188,6 +188,8 @@ enum pci_dev_flags {
 * the direct_complete optimization.
 */
PCI_DEV_FLAGS_NEEDS_RESUME = (__force pci_dev_flags_t) (1 << 11),
+   /* Don't use Relaxed Ordering for TLPs directed at this device */
+   PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 12),
 };
 
 enum pci_irq_reroute_variant {
@@ -1125,6 +1127,7 @@ int pci_add_ext_cap_save_buffer(struct pci_dev *dev,
 void pci_pme_wakeup_bus(struct pci_bus *bus);
 void pci_d3cold_enable(struct pci_dev *dev);
 void pci_d3cold_disable(struct pci_dev *dev);
+bool pcie_relaxed_ordering_enabled(struct pci_dev *dev);
 
 /* PCI Virtual Channel */
 int pci_save_vc_state(struct pci_dev *dev);
-- 
1.8.3.1




Re: [PATCH v2] arm64: arch_timer: avoid infinite recursion when ftrace is enabled

2017-08-09 Thread Ding Tianhong
add Danial and Thomas.

On 2017/8/10 10:52, Ding Tianhong wrote:
> On platforms with an arch timer erratum workaround, it's possible for
> arch_timer_reg_read_stable() to recurse into itself when certain
> tracing options are enabled, leading to stack overflows and related
> problems.
> 
> For example, when PREEMPT_TRACER and FUNCTION_GRAPH_TRACER are
> selected, it's possible to trigger this with:
> 
> $ mount -t debugfs nodev /sys/kernel/debug/
> $ echo function_graph > /sys/kernel/debug/tracing/current_tracer
> 
> The problem is that in such cases, preempt_disable() instrumentation
> attempts to acquire a timestamp via trace_clock(), resulting in a call
> back to arch_timer_reg_read_stable(), and hence recursion.
> 
> This patch changes arch_timer_reg_read_stable() to use
> preempt_{disable,enable}_notrace(), which avoids this.
> 
> This problem is similar to the fixed by upstream commit 96b3d28bf4
> ("sched/clock: Prevent tracing recursion in sched_clock_cpu()").
> 
> Fixes: 6acc71ccac71 ("arm64: arch_timer: Allows a CPU-specific erratum to 
> only affect a subset of CPUs")
> Signed-off-by: Ding Tianhong <dingtianh...@huawei.com>
> Acked-by: Mark Rutland <mark.rutl...@arm.com>
> Acked-by: Marc Zyngier <marc.zyng...@arm.com>
> ---
>  arch/arm64/include/asm/arch_timer.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/arch_timer.h 
> b/arch/arm64/include/asm/arch_timer.h
> index 74d08e4..67bb7a4 100644
> --- a/arch/arm64/include/asm/arch_timer.h
> +++ b/arch/arm64/include/asm/arch_timer.h
> @@ -65,13 +65,13 @@ struct arch_timer_erratum_workaround {
>   u64 _val;   \
>   if (needs_unstable_timer_counter_workaround()) {\
>   const struct arch_timer_erratum_workaround *wa; \
> - preempt_disable();  \
> + preempt_disable_notrace();  \
>   wa = __this_cpu_read(timer_unstable_counter_workaround); \
>   if (wa && wa->read_##reg)   \
>   _val = wa->read_##reg();\
>   else\
>   _val = read_sysreg(reg);\
> - preempt_enable();   \
> + preempt_enable_notrace();   \
>   } else {\
>   _val = read_sysreg(reg);\
>   }   \
> 



Re: [PATCH v2] arm64: arch_timer: avoid infinite recursion when ftrace is enabled

2017-08-09 Thread Ding Tianhong
add Danial and Thomas.

On 2017/8/10 10:52, Ding Tianhong wrote:
> On platforms with an arch timer erratum workaround, it's possible for
> arch_timer_reg_read_stable() to recurse into itself when certain
> tracing options are enabled, leading to stack overflows and related
> problems.
> 
> For example, when PREEMPT_TRACER and FUNCTION_GRAPH_TRACER are
> selected, it's possible to trigger this with:
> 
> $ mount -t debugfs nodev /sys/kernel/debug/
> $ echo function_graph > /sys/kernel/debug/tracing/current_tracer
> 
> The problem is that in such cases, preempt_disable() instrumentation
> attempts to acquire a timestamp via trace_clock(), resulting in a call
> back to arch_timer_reg_read_stable(), and hence recursion.
> 
> This patch changes arch_timer_reg_read_stable() to use
> preempt_{disable,enable}_notrace(), which avoids this.
> 
> This problem is similar to the fixed by upstream commit 96b3d28bf4
> ("sched/clock: Prevent tracing recursion in sched_clock_cpu()").
> 
> Fixes: 6acc71ccac71 ("arm64: arch_timer: Allows a CPU-specific erratum to 
> only affect a subset of CPUs")
> Signed-off-by: Ding Tianhong 
> Acked-by: Mark Rutland 
> Acked-by: Marc Zyngier 
> ---
>  arch/arm64/include/asm/arch_timer.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/arch_timer.h 
> b/arch/arm64/include/asm/arch_timer.h
> index 74d08e4..67bb7a4 100644
> --- a/arch/arm64/include/asm/arch_timer.h
> +++ b/arch/arm64/include/asm/arch_timer.h
> @@ -65,13 +65,13 @@ struct arch_timer_erratum_workaround {
>   u64 _val;   \
>   if (needs_unstable_timer_counter_workaround()) {\
>   const struct arch_timer_erratum_workaround *wa; \
> - preempt_disable();  \
> + preempt_disable_notrace();  \
>   wa = __this_cpu_read(timer_unstable_counter_workaround); \
>   if (wa && wa->read_##reg)   \
>   _val = wa->read_##reg();\
>   else\
>   _val = read_sysreg(reg);\
> - preempt_enable();   \
> + preempt_enable_notrace();   \
>   } else {\
>   _val = read_sysreg(reg);\
>   }   \
> 



[PATCH v2] arm64: arch_timer: avoid infinite recursion when ftrace is enabled

2017-08-09 Thread Ding Tianhong
On platforms with an arch timer erratum workaround, it's possible for
arch_timer_reg_read_stable() to recurse into itself when certain
tracing options are enabled, leading to stack overflows and related
problems.

For example, when PREEMPT_TRACER and FUNCTION_GRAPH_TRACER are
selected, it's possible to trigger this with:

$ mount -t debugfs nodev /sys/kernel/debug/
$ echo function_graph > /sys/kernel/debug/tracing/current_tracer

The problem is that in such cases, preempt_disable() instrumentation
attempts to acquire a timestamp via trace_clock(), resulting in a call
back to arch_timer_reg_read_stable(), and hence recursion.

This patch changes arch_timer_reg_read_stable() to use
preempt_{disable,enable}_notrace(), which avoids this.

This problem is similar to the fixed by upstream commit 96b3d28bf4
("sched/clock: Prevent tracing recursion in sched_clock_cpu()").

Fixes: 6acc71ccac71 ("arm64: arch_timer: Allows a CPU-specific erratum to only 
affect a subset of CPUs")
Signed-off-by: Ding Tianhong <dingtianh...@huawei.com>
Acked-by: Mark Rutland <mark.rutl...@arm.com>
Acked-by: Marc Zyngier <marc.zyng...@arm.com>
---
 arch/arm64/include/asm/arch_timer.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/arch_timer.h 
b/arch/arm64/include/asm/arch_timer.h
index 74d08e4..67bb7a4 100644
--- a/arch/arm64/include/asm/arch_timer.h
+++ b/arch/arm64/include/asm/arch_timer.h
@@ -65,13 +65,13 @@ struct arch_timer_erratum_workaround {
u64 _val;   \
if (needs_unstable_timer_counter_workaround()) {\
const struct arch_timer_erratum_workaround *wa; \
-   preempt_disable();  \
+   preempt_disable_notrace();  \
wa = __this_cpu_read(timer_unstable_counter_workaround); \
if (wa && wa->read_##reg)   \
_val = wa->read_##reg();\
else\
_val = read_sysreg(reg);\
-   preempt_enable();   \
+   preempt_enable_notrace();   \
} else {\
_val = read_sysreg(reg);\
}   \
-- 
1.9.0




  1   2   3   4   5   6   7   8   9   >