Re: [PATCH kernel 6/6] powerpc/powernv/ioda: Allocate indirect TCE levels on demand

2018-06-11 Thread David Gibson
On Fri, Jun 08, 2018 at 03:46:33PM +1000, Alexey Kardashevskiy wrote:
> At the moment we allocate the entire TCE table, twice (hardware part and
> userspace translation cache). This normally works as we normally have
> contigous memory and the guest will map entire RAM for 64bit DMA.
> 
> However if we have sparse RAM (one example is a memory device), then
> we will allocate TCEs which will never be used as the guest only maps
> actual memory for DMA. If it is a single level TCE table, there is nothing
> we can really do but if it a multilevel table, we can skip allocating
> TCEs we know we won't need.
> 
> This adds ability to allocate only first level, saving memory.
> 
> This changes iommu_table::free() to avoid allocating of an extra level;
> iommu_table::set() will do this when needed.
> 
> This adds @alloc parameter to iommu_table::exchange() to tell the callback
> if it can allocate an extra level; the flag is set to "false" for
> the realmode KVM handlers of H_PUT_TCE hcalls and the callback returns
> H_TOO_HARD.
> 
> This still requires the entire table to be counted in mm::locked_vm.
> 
> To be conservative, this only does on-demand allocation when
> the usespace cache table is requested which is the case of VFIO.
> 
> The example math for a system replicating a powernv setup with NVLink2
> in a guest:
> 16GB RAM mapped at 0x0
> 128GB GPU RAM window (16GB of actual RAM) mapped at 0x2440
> 
> the table to cover that all with 64K pages takes:
> (((0x2440 + 0x20) >> 16)*8)>>20 = 4556MB
> 
> If we allocate only necessary TCE levels, we will only need:
> (((0x4 + 0x4) >> 16)*8)>>20 = 4MB (plus some for indirect
> levels).
> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
>  arch/powerpc/include/asm/iommu.h  |  7 ++-
>  arch/powerpc/platforms/powernv/pci.h  |  6 ++-
>  arch/powerpc/kvm/book3s_64_vio_hv.c   |  4 +-
>  arch/powerpc/platforms/powernv/pci-ioda-tce.c | 69 
> ---
>  arch/powerpc/platforms/powernv/pci-ioda.c |  8 ++--
>  drivers/vfio/vfio_iommu_spapr_tce.c   |  2 +-
>  6 files changed, 69 insertions(+), 27 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/iommu.h 
> b/arch/powerpc/include/asm/iommu.h
> index 4bdcf22..daa3ee5 100644
> --- a/arch/powerpc/include/asm/iommu.h
> +++ b/arch/powerpc/include/asm/iommu.h
> @@ -70,7 +70,7 @@ struct iommu_table_ops {
>   unsigned long *hpa,
>   enum dma_data_direction *direction);
>  
> - __be64 *(*useraddrptr)(struct iommu_table *tbl, long index);
> + __be64 *(*useraddrptr)(struct iommu_table *tbl, long index, bool alloc);
>  #endif
>   void (*clear)(struct iommu_table *tbl,
>   long index, long npages);
> @@ -122,10 +122,13 @@ struct iommu_table {
>   __be64 *it_userspace; /* userspace view of the table */
>   struct iommu_table_ops *it_ops;
>   struct krefit_kref;
> + int it_nid;
>  };
>  
> +#define IOMMU_TABLE_USERSPACE_ENTRY_RM(tbl, entry) \
> + ((tbl)->it_ops->useraddrptr((tbl), (entry), false))

Is real mode really the only case where you want to inhibit new
allocations?  I would have thought some paths would be read-only and
you wouldn't want to allocate, even in virtual mode.

>  #define IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry) \
> - ((tbl)->it_ops->useraddrptr((tbl), (entry)))
> + ((tbl)->it_ops->useraddrptr((tbl), (entry), true))
>  
>  /* Pure 2^n version of get_order */
>  static inline __attribute_const__
> diff --git a/arch/powerpc/platforms/powernv/pci.h 
> b/arch/powerpc/platforms/powernv/pci.h
> index 5e02408..1fa5590 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -267,8 +267,10 @@ extern int pnv_tce_build(struct iommu_table *tbl, long 
> index, long npages,
>   unsigned long attrs);
>  extern void pnv_tce_free(struct iommu_table *tbl, long index, long npages);
>  extern int pnv_tce_xchg(struct iommu_table *tbl, long index,
> - unsigned long *hpa, enum dma_data_direction *direction);
> -extern __be64 *pnv_tce_useraddrptr(struct iommu_table *tbl, long index);
> + unsigned long *hpa, enum dma_data_direction *direction,
> + bool alloc);
> +extern __be64 *pnv_tce_useraddrptr(struct iommu_table *tbl, long index,
> + bool alloc);
>  extern unsigned long pnv_tce_get(struct iommu_table *tbl, long index);
>  
>  extern long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c 
> b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index db0490c..05b4865 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -200,7 +200,7 @@ static long kvmppc_rm_tce_iommu_mapped_dec(struct kvm 
> *kvm,
>  {
>   struct mm_iommu_table_group_mem_t *mem = NULL;
>   const unsigned long pgsize = 1ULL << tbl->it_page_shift;
> - 

Re: [PATCH v5 2/4] resource: Use list_head to link sibling resource

2018-06-11 Thread kbuild test robot
Hi Baoquan,

I love your patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v4.17 next-20180608]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Baoquan-He/resource-Use-list_head-to-link-sibling-resource/20180612-113600
config: x86_64-randconfig-x011-201823 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   kernel/resource.c: In function 'reparent_resources':
   kernel/resource.c:1005:26: error: passing argument 2 of 'list_add' from 
incompatible pointer type [-Werror=incompatible-pointer-types]
 list_add(>sibling, >sibling.prev);
 ^
   In file included from include/linux/ioport.h:15:0,
from kernel/resource.c:14:
   include/linux/list.h:77:20: note: expected 'struct list_head *' but argument 
is of type 'struct list_head **'
static inline void list_add(struct list_head *new, struct list_head *head)
   ^~~~
   In file included from include/linux/list.h:9:0,
from include/linux/ioport.h:15,
from kernel/resource.c:14:
   kernel/resource.c:1013:26: error: 'new' undeclared (first use in this 
function); did you mean 'net'?
 list_for_each_entry(p, >child, sibling) {
 ^
   include/linux/kernel.h:963:26: note: in definition of macro 'container_of'
 void *__mptr = (void *)(ptr); \
 ^~~
   include/linux/list.h:377:2: note: in expansion of macro 'list_entry'
 list_entry((ptr)->next, type, member)
 ^~
   include/linux/list.h:464:13: note: in expansion of macro 'list_first_entry'
 for (pos = list_first_entry(head, typeof(*pos), member); \
^~~~
>> kernel/resource.c:1013:2: note: in expansion of macro 'list_for_each_entry'
 list_for_each_entry(p, >child, sibling) {
 ^~~
   kernel/resource.c:1013:26: note: each undeclared identifier is reported only 
once for each function it appears in
 list_for_each_entry(p, >child, sibling) {
 ^
   include/linux/kernel.h:963:26: note: in definition of macro 'container_of'
 void *__mptr = (void *)(ptr); \
 ^~~
   include/linux/list.h:377:2: note: in expansion of macro 'list_entry'
 list_entry((ptr)->next, type, member)
 ^~
   include/linux/list.h:464:13: note: in expansion of macro 'list_first_entry'
 for (pos = list_first_entry(head, typeof(*pos), member); \
^~~~
>> kernel/resource.c:1013:2: note: in expansion of macro 'list_for_each_entry'
 list_for_each_entry(p, >child, sibling) {
 ^~~
   cc1: some warnings being treated as errors

vim +/list_for_each_entry +1013 kernel/resource.c

   983  
   984  /*
   985   * Reparent resource children of pr that conflict with res
   986   * under res, and make res replace those children.
   987   */
   988  int reparent_resources(struct resource *parent, struct resource *res)
   989  {
   990  struct resource *p, *first = NULL;
   991  
   992  list_for_each_entry(p, >child, sibling) {
   993  if (p->end < res->start)
   994  continue;
   995  if (res->end < p->start)
   996  break;
   997  if (p->start < res->start || p->end > res->end)
   998  return -1;  /* not completely contained */
   999  if (first == NULL)
  1000  first = p;
  1001  }
  1002  if (first == NULL)
  1003  return -1;  /* didn't find any conflicting entries? 
*/
  1004  res->parent = parent;
  1005  list_add(>sibling, >sibling.prev);
  1006  INIT_LIST_HEAD(>child);
  1007  
  1008  /*
  1009   * From first to p's previous sibling, they all fall into
  1010   * res's region, change them as res's children.
  1011   */
  1012  list_cut_position(>child, first->sibling.prev, 
res->sibling.prev);
> 1013  list_for_each_entry(p, >child, sibling) {
  1014  p->parent = new;
  1015  pr_debug("PCI: Reparented %s %pR under %s\n",
  1016   p->name, p, res->name);
  1017  }
  1018  return 0;
  1019  }
  1020  EXPORT_SYMBOL(reparent_resources);
  1021  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH] misc: ocxl: Change return type for fault handler

2018-06-11 Thread Andrew Donnellan

On 12/06/18 06:29, Souptick Joarder wrote:

Use new return type vm_fault_t for fault handler. For
now, this is just documenting that the function returns
a VM_FAULT value rather than an errno. Once all instances
are converted, vm_fault_t will become a distinct type.

Ref-> commit 1c8f422059ae ("mm: change return type to vm_fault_t")

There is an existing bug when vm_insert_pfn() can return
ENOMEM which was ignored and VM_FAULT_NOPAGE returned as
default. The new inline vmf_insert_pfn() has removed
this inefficiency by returning correct vm_fault_ type.

Signed-off-by: Souptick Joarder 


Looks okay to me

Acked-by: Andrew Donnellan 


---
  drivers/misc/ocxl/context.c | 22 +++---
  drivers/misc/ocxl/sysfs.c   |  5 ++---
  2 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/drivers/misc/ocxl/context.c b/drivers/misc/ocxl/context.c
index 909e880..98daf91 100644
--- a/drivers/misc/ocxl/context.c
+++ b/drivers/misc/ocxl/context.c
@@ -83,7 +83,7 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64 amr)
return rc;
  }
  
-static int map_afu_irq(struct vm_area_struct *vma, unsigned long address,

+static vm_fault_t map_afu_irq(struct vm_area_struct *vma, unsigned long 
address,
u64 offset, struct ocxl_context *ctx)
  {
u64 trigger_addr;
@@ -92,15 +92,15 @@ static int map_afu_irq(struct vm_area_struct *vma, unsigned 
long address,
if (!trigger_addr)
return VM_FAULT_SIGBUS;
  
-	vm_insert_pfn(vma, address, trigger_addr >> PAGE_SHIFT);

-   return VM_FAULT_NOPAGE;
+   return vmf_insert_pfn(vma, address, trigger_addr >> PAGE_SHIFT);
  }
  
-static int map_pp_mmio(struct vm_area_struct *vma, unsigned long address,

+static vm_fault_t map_pp_mmio(struct vm_area_struct *vma, unsigned long 
address,
u64 offset, struct ocxl_context *ctx)
  {
u64 pp_mmio_addr;
int pasid_off;
+   vm_fault_t ret;
  
  	if (offset >= ctx->afu->config.pp_mmio_stride)

return VM_FAULT_SIGBUS;
@@ -118,27 +118,27 @@ static int map_pp_mmio(struct vm_area_struct *vma, 
unsigned long address,
pasid_off * ctx->afu->config.pp_mmio_stride +
offset;
  
-	vm_insert_pfn(vma, address, pp_mmio_addr >> PAGE_SHIFT);

+   ret = vmf_insert_pfn(vma, address, pp_mmio_addr >> PAGE_SHIFT);
mutex_unlock(>status_mutex);
-   return VM_FAULT_NOPAGE;
+   return ret;
  }
  
-static int ocxl_mmap_fault(struct vm_fault *vmf)

+static vm_fault_t ocxl_mmap_fault(struct vm_fault *vmf)
  {
struct vm_area_struct *vma = vmf->vma;
struct ocxl_context *ctx = vma->vm_file->private_data;
u64 offset;
-   int rc;
+   vm_fault_t ret;
  
  	offset = vmf->pgoff << PAGE_SHIFT;

pr_debug("%s: pasid %d address 0x%lx offset 0x%llx\n", __func__,
ctx->pasid, vmf->address, offset);
  
  	if (offset < ctx->afu->irq_base_offset)

-   rc = map_pp_mmio(vma, vmf->address, offset, ctx);
+   ret = map_pp_mmio(vma, vmf->address, offset, ctx);
else
-   rc = map_afu_irq(vma, vmf->address, offset, ctx);
-   return rc;
+   ret = map_afu_irq(vma, vmf->address, offset, ctx);
+   return ret;
  }
  
  static const struct vm_operations_struct ocxl_vmops = {

diff --git a/drivers/misc/ocxl/sysfs.c b/drivers/misc/ocxl/sysfs.c
index d9753a1..0ab1fd1 100644
--- a/drivers/misc/ocxl/sysfs.c
+++ b/drivers/misc/ocxl/sysfs.c
@@ -64,7 +64,7 @@ static ssize_t global_mmio_read(struct file *filp, struct 
kobject *kobj,
return count;
  }
  
-static int global_mmio_fault(struct vm_fault *vmf)

+static vm_fault_t global_mmio_fault(struct vm_fault *vmf)
  {
struct vm_area_struct *vma = vmf->vma;
struct ocxl_afu *afu = vma->vm_private_data;
@@ -75,8 +75,7 @@ static int global_mmio_fault(struct vm_fault *vmf)
  
  	offset = vmf->pgoff;

offset += (afu->global_mmio_start >> PAGE_SHIFT);
-   vm_insert_pfn(vma, vmf->address, offset);
-   return VM_FAULT_NOPAGE;
+   return vmf_insert_pfn(vma, vmf->address, offset);
  }
  
  static const struct vm_operations_struct global_mmio_vmops = {




--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited



Re: [PATCH v5 2/4] resource: Use list_head to link sibling resource

2018-06-11 Thread kbuild test robot
Hi Baoquan,

I love your patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.17 next-20180608]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Baoquan-He/resource-Use-list_head-to-link-sibling-resource/20180612-113600
config: i386-tinyconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   kernel/resource.c: In function 'reparent_resources':
>> kernel/resource.c:1005:26: error: passing argument 2 of 'list_add' from 
>> incompatible pointer type [-Werror=incompatible-pointer-types]
 list_add(>sibling, >sibling.prev);
 ^
   In file included from include/linux/ioport.h:15:0,
from kernel/resource.c:14:
   include/linux/list.h:77:20: note: expected 'struct list_head *' but argument 
is of type 'struct list_head **'
static inline void list_add(struct list_head *new, struct list_head *head)
   ^~~~
   In file included from include/linux/list.h:9:0,
from include/linux/ioport.h:15,
from kernel/resource.c:14:
>> kernel/resource.c:1013:26: error: 'new' undeclared (first use in this 
>> function); did you mean 'net'?
 list_for_each_entry(p, >child, sibling) {
 ^
   include/linux/kernel.h:963:26: note: in definition of macro 'container_of'
 void *__mptr = (void *)(ptr); \
 ^~~
   include/linux/list.h:377:2: note: in expansion of macro 'list_entry'
 list_entry((ptr)->next, type, member)
 ^~
   include/linux/list.h:464:13: note: in expansion of macro 'list_first_entry'
 for (pos = list_first_entry(head, typeof(*pos), member); \
^~~~
   kernel/resource.c:1013:2: note: in expansion of macro 'list_for_each_entry'
 list_for_each_entry(p, >child, sibling) {
 ^~~
   kernel/resource.c:1013:26: note: each undeclared identifier is reported only 
once for each function it appears in
 list_for_each_entry(p, >child, sibling) {
 ^
   include/linux/kernel.h:963:26: note: in definition of macro 'container_of'
 void *__mptr = (void *)(ptr); \
 ^~~
   include/linux/list.h:377:2: note: in expansion of macro 'list_entry'
 list_entry((ptr)->next, type, member)
 ^~
   include/linux/list.h:464:13: note: in expansion of macro 'list_first_entry'
 for (pos = list_first_entry(head, typeof(*pos), member); \
^~~~
   kernel/resource.c:1013:2: note: in expansion of macro 'list_for_each_entry'
 list_for_each_entry(p, >child, sibling) {
 ^~~
   cc1: some warnings being treated as errors

vim +/list_add +1005 kernel/resource.c

   983  
   984  /*
   985   * Reparent resource children of pr that conflict with res
   986   * under res, and make res replace those children.
   987   */
   988  int reparent_resources(struct resource *parent, struct resource *res)
   989  {
   990  struct resource *p, *first = NULL;
   991  
   992  list_for_each_entry(p, >child, sibling) {
   993  if (p->end < res->start)
   994  continue;
   995  if (res->end < p->start)
   996  break;
   997  if (p->start < res->start || p->end > res->end)
   998  return -1;  /* not completely contained */
   999  if (first == NULL)
  1000  first = p;
  1001  }
  1002  if (first == NULL)
  1003  return -1;  /* didn't find any conflicting entries? 
*/
  1004  res->parent = parent;
> 1005  list_add(>sibling, >sibling.prev);
  1006  INIT_LIST_HEAD(>child);
  1007  
  1008  /*
  1009   * From first to p's previous sibling, they all fall into
  1010   * res's region, change them as res's children.
  1011   */
  1012  list_cut_position(>child, first->sibling.prev, 
res->sibling.prev);
> 1013  list_for_each_entry(p, >child, sibling) {
  1014  p->parent = new;
  1015  pr_debug("PCI: Reparented %s %pR under %s\n",
  1016   p->name, p, res->name);
  1017  }
  1018  return 0;
  1019  }
  1020  EXPORT_SYMBOL(reparent_resources);
  1021  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH v5 1/4] resource: Move reparent_resources() to kernel/resource.c and make it public

2018-06-11 Thread kbuild test robot
Hi Baoquan,

I love your patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.17 next-20180608]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Baoquan-He/resource-Use-list_head-to-link-sibling-resource/20180612-113600
config: i386-tinyconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

Note: the 
linux-review/Baoquan-He/resource-Use-list_head-to-link-sibling-resource/20180612-113600
 HEAD 5545e79eef6387857faf41cdffa7be6b1f5d4efe builds fine.
  It only hurts bisectibility.

All errors (new ones prefixed by >>):

>> kernel/resource.c:990:12: error: static declaration of 'reparent_resources' 
>> follows non-static declaration
static int reparent_resources(struct resource *parent,
   ^~
   In file included from kernel/resource.c:14:0:
   include/linux/ioport.h:195:5: note: previous declaration of 
'reparent_resources' was here
int reparent_resources(struct resource *parent, struct resource *res);
^~
   kernel/resource.c:990:12: warning: 'reparent_resources' defined but not used 
[-Wunused-function]
static int reparent_resources(struct resource *parent,
   ^~

vim +/reparent_resources +990 kernel/resource.c

   985  
   986  /*
   987   * Reparent resource children of pr that conflict with res
   988   * under res, and make res replace those children.
   989   */
 > 990  static int reparent_resources(struct resource *parent,
   991   struct resource *res)
   992  {
   993  struct resource *p, **pp;
   994  struct resource **firstpp = NULL;
   995  
   996  for (pp = >child; (p = *pp) != NULL; pp = >sibling) {
   997  if (p->end < res->start)
   998  continue;
   999  if (res->end < p->start)
  1000  break;
  1001  if (p->start < res->start || p->end > res->end)
  1002  return -1;  /* not completely contained */
  1003  if (firstpp == NULL)
  1004  firstpp = pp;
  1005  }
  1006  if (firstpp == NULL)
  1007  return -1;  /* didn't find any conflicting entries? 
*/
  1008  res->parent = parent;
  1009  res->child = *firstpp;
  1010  res->sibling = *pp;
  1011  *firstpp = res;
  1012  *pp = NULL;
  1013  for (p = res->child; p != NULL; p = p->sibling) {
  1014  p->parent = res;
  1015  pr_debug("PCI: Reparented %s %pR under %s\n",
  1016   p->name, p, res->name);
  1017  }
  1018  return 0;
  1019  }
  1020  EXPORT_SYMBOL(reparent_resources);
  1021  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH v5 1/4] resource: Move reparent_resources() to kernel/resource.c and make it public

2018-06-11 Thread Baoquan He
On 06/12/18 at 11:28am, Baoquan He wrote:
> reparent_resources() is duplicated in arch/microblaze/pci/pci-common.c
> and arch/powerpc/kernel/pci-common.c, so move it to kernel/resource.c
> so that it's shared. Later its code also need be updated using list_head
> to replace singly linked list.
> 
> Signed-off-by: Baoquan He 
> Cc: Michal Simek 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> ---
> v4->v5:
>   Fix several code bugs reported by test robot on ARCH powerpc and
>   microblaze.

Oops, I mistakenly added the patch change log of the current patch 0002
here. This patch is a newly added one.

> 
> v3->v4:
>   Fix several bugs test robot reported. And change patch log.
> 
> v2->v3:
>   Rename resource functions first_child() and sibling() to
>   resource_first_chils() and resource_sibling(). Dan suggested this.
> 
>   Move resource_first_chils() and resource_sibling() to linux/ioport.h
>   and make them as inline function. Rob suggested this. Accordingly add
>   linux/list.h including in linux/ioport.h, please help review if this
>   bring efficiency degradation or code redundancy.
> 
>   The change on struct resource {} bring two pointers of size increase,
>   mention this in git log to make it more specifically, Rob suggested
>   this.
> 
>  arch/microblaze/pci/pci-common.c | 37 -
>  arch/powerpc/kernel/pci-common.c | 35 ---
>  include/linux/ioport.h   |  1 +
>  kernel/resource.c| 36 
>  4 files changed, 37 insertions(+), 72 deletions(-)
> 
> diff --git a/arch/microblaze/pci/pci-common.c 
> b/arch/microblaze/pci/pci-common.c
> index f34346d56095..7899bafab064 100644
> --- a/arch/microblaze/pci/pci-common.c
> +++ b/arch/microblaze/pci/pci-common.c
> @@ -619,43 +619,6 @@ int pcibios_add_device(struct pci_dev *dev)
>  EXPORT_SYMBOL(pcibios_add_device);
>  
>  /*
> - * Reparent resource children of pr that conflict with res
> - * under res, and make res replace those children.
> - */
> -static int __init reparent_resources(struct resource *parent,
> -  struct resource *res)
> -{
> - struct resource *p, **pp;
> - struct resource **firstpp = NULL;
> -
> - for (pp = >child; (p = *pp) != NULL; pp = >sibling) {
> - if (p->end < res->start)
> - continue;
> - if (res->end < p->start)
> - break;
> - if (p->start < res->start || p->end > res->end)
> - return -1;  /* not completely contained */
> - if (firstpp == NULL)
> - firstpp = pp;
> - }
> - if (firstpp == NULL)
> - return -1;  /* didn't find any conflicting entries? */
> - res->parent = parent;
> - res->child = *firstpp;
> - res->sibling = *pp;
> - *firstpp = res;
> - *pp = NULL;
> - for (p = res->child; p != NULL; p = p->sibling) {
> - p->parent = res;
> - pr_debug("PCI: Reparented %s [%llx..%llx] under %s\n",
> -  p->name,
> -  (unsigned long long)p->start,
> -  (unsigned long long)p->end, res->name);
> - }
> - return 0;
> -}
> -
> -/*
>   *  Handle resources of PCI devices.  If the world were perfect, we could
>   *  just allocate all the resource regions and do nothing more.  It isn't.
>   *  On the other hand, we cannot just re-allocate all devices, as it would
> diff --git a/arch/powerpc/kernel/pci-common.c 
> b/arch/powerpc/kernel/pci-common.c
> index fe9733aa..926035bb378d 100644
> --- a/arch/powerpc/kernel/pci-common.c
> +++ b/arch/powerpc/kernel/pci-common.c
> @@ -1088,41 +1088,6 @@ resource_size_t pcibios_align_resource(void *data, 
> const struct resource *res,
>  EXPORT_SYMBOL(pcibios_align_resource);
>  
>  /*
> - * Reparent resource children of pr that conflict with res
> - * under res, and make res replace those children.
> - */
> -static int reparent_resources(struct resource *parent,
> -  struct resource *res)
> -{
> - struct resource *p, **pp;
> - struct resource **firstpp = NULL;
> -
> - for (pp = >child; (p = *pp) != NULL; pp = >sibling) {
> - if (p->end < res->start)
> - continue;
> - if (res->end < p->start)
> - break;
> - if (p->start < res->start || p->end > res->end)
> - return -1;  /* not completely contained */
> - if (firstpp == NULL)
> - firstpp = pp;
> - }
> - if (firstpp == NULL)
> - return -1;  /* didn't find any conflicting entries? */
> - res->parent = parent;
> - res->child = *firstpp;
> - res->sibling = *pp;
> - *firstpp = res;
> - *pp = NULL;
> - for (p = res->child; p != NULL; p = p->sibling) {
> - p->parent = res;
> 

[PATCH v5 4/4] kexec_file: Load kernel at top of system RAM if required

2018-06-11 Thread Baoquan He
For kexec_file loading, if kexec_buf.top_down is 'true', the memory which
is used to load kernel/initrd/purgatory is supposed to be allocated from
top to down. This is what we have been doing all along in the old kexec
loading interface and the kexec loading is still default setting in some
distributions. However, the current kexec_file loading interface doesn't
do likt this. The function arch_kexec_walk_mem() it calls ignores checking
kexec_buf.top_down, but calls walk_system_ram_res() directly to go through
all resources of System RAM from bottom to up, to try to find memory region
which can contain the specific kexec buffer, then call 
locate_mem_hole_callback()
to allocate memory in that found memory region from top to down. This brings
confusion especially when KASLR is widely supported , users have to make clear
why kexec/kdump kernel loading position is different between these two
interfaces in order to exclude unnecessary noises. Hence these two interfaces
need be unified on behaviour.

Here add checking if kexec_buf.top_down is 'true' in arch_kexec_walk_mem(),
if yes, call the newly added walk_system_ram_res_rev() to find memory region
from top to down to load kernel.

Signed-off-by: Baoquan He 
Cc: Eric Biederman 
Cc: Vivek Goyal 
Cc: Dave Young 
Cc: Andrew Morton 
Cc: Yinghai Lu 
Cc: ke...@lists.infradead.org
---
 kernel/kexec_file.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 75d8e7cf040e..7a66d9d5a534 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -518,6 +518,8 @@ int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf,
   IORESOURCE_SYSTEM_RAM | 
IORESOURCE_BUSY,
   crashk_res.start, crashk_res.end,
   kbuf, func);
+   else if (kbuf->top_down)
+   return walk_system_ram_res_rev(0, ULONG_MAX, kbuf, func);
else
return walk_system_ram_res(0, ULONG_MAX, kbuf, func);
 }
-- 
2.13.6



[PATCH v5 3/4] resource: add walk_system_ram_res_rev()

2018-06-11 Thread Baoquan He
This function, being a variant of walk_system_ram_res() introduced in
commit 8c86e70acead ("resource: provide new functions to walk through
resources"), walks through a list of all the resources of System RAM
in reversed order, i.e., from higher to lower.

It will be used in kexec_file code.

Signed-off-by: Baoquan He 
Cc: Andrew Morton 
Cc: Thomas Gleixner 
Cc: Brijesh Singh 
Cc: "Jérôme Glisse" 
Cc: Borislav Petkov 
Cc: Tom Lendacky 
Cc: Wei Yang 
---
 include/linux/ioport.h |  3 +++
 kernel/resource.c  | 40 
 2 files changed, 43 insertions(+)

diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index b7456ae889dd..066cc263e2cc 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -279,6 +279,9 @@ extern int
 walk_system_ram_res(u64 start, u64 end, void *arg,
int (*func)(struct resource *, void *));
 extern int
+walk_system_ram_res_rev(u64 start, u64 end, void *arg,
+   int (*func)(struct resource *, void *));
+extern int
 walk_iomem_res_desc(unsigned long desc, unsigned long flags, u64 start, u64 
end,
void *arg, int (*func)(struct resource *, void *));
 
diff --git a/kernel/resource.c b/kernel/resource.c
index ef9a20b75234..3128ac938f38 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -23,6 +23,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 
 
@@ -443,6 +445,44 @@ int walk_system_ram_res(u64 start, u64 end, void *arg,
 }
 
 /*
+ * This function, being a variant of walk_system_ram_res(), calls the @func
+ * callback against all memory ranges of type System RAM which are marked as
+ * IORESOURCE_SYSTEM_RAM and IORESOUCE_BUSY in reversed order, i.e., from
+ * higher to lower.
+ */
+int walk_system_ram_res_rev(u64 start, u64 end, void *arg,
+   int (*func)(struct resource *, void *))
+{
+   unsigned long flags;
+   struct resource *res;
+   int ret = -1;
+
+   flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
+
+   read_lock(_lock);
+   list_for_each_entry_reverse(res, _resource.child, sibling) {
+   if (start >= end)
+   break;
+   if ((res->flags & flags) != flags)
+   continue;
+   if (res->desc != IORES_DESC_NONE)
+   continue;
+   if (res->end < start)
+   break;
+
+   if ((res->end >= start) && (res->start < end)) {
+   ret = (*func)(res, arg);
+   if (ret)
+   break;
+   }
+   end = res->start - 1;
+
+   }
+   read_unlock(_lock);
+   return ret;
+}
+
+/*
  * This function calls the @func callback against all memory ranges, which
  * are ranges marked as IORESOURCE_MEM and IORESOUCE_BUSY.
  */
-- 
2.13.6



[PATCH v5 2/4] resource: Use list_head to link sibling resource

2018-06-11 Thread Baoquan He
The struct resource uses singly linked list to link siblings, implemented
by pointer operation. Replace it with list_head for better code readability.

Based on this list_head replacement, it will be very easy to do reverse
iteration on iomem_resource's sibling list in later patch.

Besides, type of member variables of struct resource, sibling and child, are
changed from 'struct resource *' to 'struct list_head'. This brings two
pointers of size increase.

Suggested-by: Andrew Morton 
Signed-off-by: Baoquan He 
Cc: Patrik Jakobsson 
Cc: David Airlie 
Cc: "K. Y. Srinivasan" 
Cc: Haiyang Zhang 
Cc: Stephen Hemminger 
Cc: Dmitry Torokhov 
Cc: Dan Williams 
Cc: Rob Herring 
Cc: Frank Rowand 
Cc: Keith Busch 
Cc: Jonathan Derrick 
Cc: Lorenzo Pieralisi 
Cc: Bjorn Helgaas 
Cc: Thomas Gleixner 
Cc: Brijesh Singh 
Cc: "Jérôme Glisse" 
Cc: Borislav Petkov 
Cc: Tom Lendacky 
Cc: Greg Kroah-Hartman 
Cc: Yaowei Bai 
Cc: Wei Yang 
Cc: de...@linuxdriverproject.org
Cc: linux-in...@vger.kernel.org
Cc: linux-nvd...@lists.01.org
Cc: devicet...@vger.kernel.org
Cc: linux-...@vger.kernel.org
---
 arch/arm/plat-samsung/pm-check.c|   6 +-
 arch/microblaze/pci/pci-common.c|   4 +-
 arch/powerpc/kernel/pci-common.c|   4 +-
 arch/sparc/kernel/ioport.c  |   2 +-
 arch/xtensa/include/asm/pci-bridge.h|   4 +-
 drivers/eisa/eisa-bus.c |   2 +
 drivers/gpu/drm/drm_memory.c|   3 +-
 drivers/gpu/drm/gma500/gtt.c|   5 +-
 drivers/hv/vmbus_drv.c  |  52 +++
 drivers/input/joystick/iforce/iforce-main.c |   4 +-
 drivers/nvdimm/namespace_devs.c |   6 +-
 drivers/nvdimm/nd.h |   5 +-
 drivers/of/address.c|   4 +-
 drivers/parisc/lba_pci.c|   4 +-
 drivers/pci/host/vmd.c  |   8 +-
 drivers/pci/probe.c |   2 +
 drivers/pci/setup-bus.c |   2 +-
 include/linux/ioport.h  |  17 ++-
 kernel/resource.c   | 211 ++--
 19 files changed, 176 insertions(+), 169 deletions(-)

diff --git a/arch/arm/plat-samsung/pm-check.c b/arch/arm/plat-samsung/pm-check.c
index cd2c02c68bc3..5494355b1c49 100644
--- a/arch/arm/plat-samsung/pm-check.c
+++ b/arch/arm/plat-samsung/pm-check.c
@@ -46,8 +46,8 @@ typedef u32 *(run_fn_t)(struct resource *ptr, u32 *arg);
 static void s3c_pm_run_res(struct resource *ptr, run_fn_t fn, u32 *arg)
 {
while (ptr != NULL) {
-   if (ptr->child != NULL)
-   s3c_pm_run_res(ptr->child, fn, arg);
+   if (!list_empty(>child))
+   s3c_pm_run_res(resource_first_child(>child), fn, 
arg);
 
if ((ptr->flags & IORESOURCE_SYSTEM_RAM)
== IORESOURCE_SYSTEM_RAM) {
@@ -57,7 +57,7 @@ static void s3c_pm_run_res(struct resource *ptr, run_fn_t fn, 
u32 *arg)
arg = (fn)(ptr, arg);
}
 
-   ptr = ptr->sibling;
+   ptr = resource_sibling(ptr);
}
 }
 
diff --git a/arch/microblaze/pci/pci-common.c b/arch/microblaze/pci/pci-common.c
index 7899bafab064..2bf73e27e231 100644
--- a/arch/microblaze/pci/pci-common.c
+++ b/arch/microblaze/pci/pci-common.c
@@ -533,7 +533,9 @@ void pci_process_bridge_OF_ranges(struct pci_controller 
*hose,
res->flags = range.flags;
res->start = range.cpu_addr;
res->end = range.cpu_addr + range.size - 1;
-   res->parent = res->child = res->sibling = NULL;
+   res->parent = NULL;
+   INIT_LIST_HEAD(>child);
+   INIT_LIST_HEAD(>sibling);
}
}
 
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 926035bb378d..28fbe83c9daf 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -761,7 +761,9 @@ void pci_process_bridge_OF_ranges(struct pci_controller 
*hose,
res->flags = range.flags;
res->start = range.cpu_addr;
res->end = range.cpu_addr + range.size - 1;
-   res->parent = res->child = res->sibling = NULL;
+   res->parent = NULL;
+   INIT_LIST_HEAD(>child);
+   INIT_LIST_HEAD(>sibling);
}
}
 }
diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c
index cca9134cfa7d..99efe4e98b16 100644
--- a/arch/sparc/kernel/ioport.c
+++ b/arch/sparc/kernel/ioport.c
@@ -669,7 +669,7 @@ static int sparc_io_proc_show(struct seq_file *m, void *v)
struct resource *root = m->private, *r;
const char *nm;
 
-   for (r = root->child; r != NULL; r = r->sibling) {
+   list_for_each_entry(r, 

[PATCH v5 1/4] resource: Move reparent_resources() to kernel/resource.c and make it public

2018-06-11 Thread Baoquan He
reparent_resources() is duplicated in arch/microblaze/pci/pci-common.c
and arch/powerpc/kernel/pci-common.c, so move it to kernel/resource.c
so that it's shared. Later its code also need be updated using list_head
to replace singly linked list.

Signed-off-by: Baoquan He 
Cc: Michal Simek 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
---
v4->v5:
  Fix several code bugs reported by test robot on ARCH powerpc and
  microblaze.

v3->v4:
  Fix several bugs test robot reported. And change patch log.

v2->v3:
  Rename resource functions first_child() and sibling() to
  resource_first_chils() and resource_sibling(). Dan suggested this.

  Move resource_first_chils() and resource_sibling() to linux/ioport.h
  and make them as inline function. Rob suggested this. Accordingly add
  linux/list.h including in linux/ioport.h, please help review if this
  bring efficiency degradation or code redundancy.

  The change on struct resource {} bring two pointers of size increase,
  mention this in git log to make it more specifically, Rob suggested
  this.

 arch/microblaze/pci/pci-common.c | 37 -
 arch/powerpc/kernel/pci-common.c | 35 ---
 include/linux/ioport.h   |  1 +
 kernel/resource.c| 36 
 4 files changed, 37 insertions(+), 72 deletions(-)

diff --git a/arch/microblaze/pci/pci-common.c b/arch/microblaze/pci/pci-common.c
index f34346d56095..7899bafab064 100644
--- a/arch/microblaze/pci/pci-common.c
+++ b/arch/microblaze/pci/pci-common.c
@@ -619,43 +619,6 @@ int pcibios_add_device(struct pci_dev *dev)
 EXPORT_SYMBOL(pcibios_add_device);
 
 /*
- * Reparent resource children of pr that conflict with res
- * under res, and make res replace those children.
- */
-static int __init reparent_resources(struct resource *parent,
-struct resource *res)
-{
-   struct resource *p, **pp;
-   struct resource **firstpp = NULL;
-
-   for (pp = >child; (p = *pp) != NULL; pp = >sibling) {
-   if (p->end < res->start)
-   continue;
-   if (res->end < p->start)
-   break;
-   if (p->start < res->start || p->end > res->end)
-   return -1;  /* not completely contained */
-   if (firstpp == NULL)
-   firstpp = pp;
-   }
-   if (firstpp == NULL)
-   return -1;  /* didn't find any conflicting entries? */
-   res->parent = parent;
-   res->child = *firstpp;
-   res->sibling = *pp;
-   *firstpp = res;
-   *pp = NULL;
-   for (p = res->child; p != NULL; p = p->sibling) {
-   p->parent = res;
-   pr_debug("PCI: Reparented %s [%llx..%llx] under %s\n",
-p->name,
-(unsigned long long)p->start,
-(unsigned long long)p->end, res->name);
-   }
-   return 0;
-}
-
-/*
  *  Handle resources of PCI devices.  If the world were perfect, we could
  *  just allocate all the resource regions and do nothing more.  It isn't.
  *  On the other hand, we cannot just re-allocate all devices, as it would
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index fe9733aa..926035bb378d 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1088,41 +1088,6 @@ resource_size_t pcibios_align_resource(void *data, const 
struct resource *res,
 EXPORT_SYMBOL(pcibios_align_resource);
 
 /*
- * Reparent resource children of pr that conflict with res
- * under res, and make res replace those children.
- */
-static int reparent_resources(struct resource *parent,
-struct resource *res)
-{
-   struct resource *p, **pp;
-   struct resource **firstpp = NULL;
-
-   for (pp = >child; (p = *pp) != NULL; pp = >sibling) {
-   if (p->end < res->start)
-   continue;
-   if (res->end < p->start)
-   break;
-   if (p->start < res->start || p->end > res->end)
-   return -1;  /* not completely contained */
-   if (firstpp == NULL)
-   firstpp = pp;
-   }
-   if (firstpp == NULL)
-   return -1;  /* didn't find any conflicting entries? */
-   res->parent = parent;
-   res->child = *firstpp;
-   res->sibling = *pp;
-   *firstpp = res;
-   *pp = NULL;
-   for (p = res->child; p != NULL; p = p->sibling) {
-   p->parent = res;
-   pr_debug("PCI: Reparented %s %pR under %s\n",
-p->name, p, res->name);
-   }
-   return 0;
-}
-
-/*
  *  Handle resources of PCI devices.  If the world were perfect, we could
  *  just allocate all the resource regions and do nothing more.  It isn't.
  *  On 

[PATCH v5 0/4] resource: Use list_head to link sibling resource

2018-06-11 Thread Baoquan He
This patchset is doing:
1) Replace struct resource's sibling list from singly linked list to
list_head. Clearing out those pointer operation within singly linked
list for better code readability.
2) Based on list_head replacement, add a new function
walk_system_ram_res_rev() which can does reversed iteration on
iomem_resource's siblings.
3) Change kexec_file loading to search system RAM top down for kernel
loadin, using walk_system_ram_res_rev().

Note:
This patchset passed testing on my kvm guest, x86_64 arch with network
enabling. The thing we need pay attetion to is that a root resource's
child member need be initialized specifically with LIST_HEAD_INIT() if
statically defined or INIT_LIST_HEAD() for dynamically definition. Here
Just like we do for iomem_resource/ioport_resource, or the change in
get_pci_domain_busn_res().


Links of the old post (Boris pointed out that we should use
https://lkml.kernel.org/r/Message-ID, while it can't be opened from
my side, so paste all of them here.):
v4:
https://lkml.kernel.org/r/20180507063224.24229-1-...@redhat.com
https://lkml.org/lkml/2018/5/7/36

v3:
https://lkml.kernel.org/r/20180419001848.3041-1-...@redhat.com
https://lkml.org/lkml/2018/4/18/767

v2:
https://lkml.kernel.org/r/20180408024724.16812-1-...@redhat.com
https://lkml.org/lkml/2018/4/7/169

v1:
https://lkml.kernel.org/r/20180322033722.9279-1-...@redhat.com
https://lkml.org/lkml/2018/3/21/952

Changelog:
v4->v5:
  Add new patch 0001 to move duplicated reparent_resources() to
  kernel/resource.c to make it be shared by different ARCH-es.

  Fix several code bugs reported by test robot on ARCH powerpc and
  microblaze.
v3->v4:
  Fix several bugs test robot reported. Rewrite cover letter and patch
  log according to reviewer's comment.

v2->v3:
  Rename resource functions first_child() and sibling() to
  resource_first_chils() and resource_sibling(). Dan suggested this.

  Move resource_first_chils() and resource_sibling() to linux/ioport.h
  and make them as inline function. Rob suggested this. Accordingly add
  linux/list.h including in linux/ioport.h, please help review if this
  bring efficiency degradation or code redundancy.

  The change on struct resource {} bring two pointers of size increase,
  mention this in git log to make it more specifically, Rob suggested
  this.

v1->v2:
  Use list_head instead to link resource siblings. This is suggested by
  Andrew.

  Rewrite walk_system_ram_res_rev() after list_head is taken to link
  resouce siblings.

Baoquan He (4):
  resource: Move reparent_resources() to kernel/resource.c and make it
public
  resource: Use list_head to link sibling resource
  resource: add walk_system_ram_res_rev()
  kexec_file: Load kernel at top of system RAM if required

 arch/arm/plat-samsung/pm-check.c|   6 +-
 arch/microblaze/pci/pci-common.c|  41 +
 arch/powerpc/kernel/pci-common.c|  39 +
 arch/sparc/kernel/ioport.c  |   2 +-
 arch/xtensa/include/asm/pci-bridge.h|   4 +-
 drivers/eisa/eisa-bus.c |   2 +
 drivers/gpu/drm/drm_memory.c|   3 +-
 drivers/gpu/drm/gma500/gtt.c|   5 +-
 drivers/hv/vmbus_drv.c  |  52 +++---
 drivers/input/joystick/iforce/iforce-main.c |   4 +-
 drivers/nvdimm/namespace_devs.c |   6 +-
 drivers/nvdimm/nd.h |   5 +-
 drivers/of/address.c|   4 +-
 drivers/parisc/lba_pci.c|   4 +-
 drivers/pci/host/vmd.c  |   8 +-
 drivers/pci/probe.c |   2 +
 drivers/pci/setup-bus.c |   2 +-
 include/linux/ioport.h  |  21 ++-
 kernel/kexec_file.c |   2 +
 kernel/resource.c   | 259 ++--
 20 files changed, 244 insertions(+), 227 deletions(-)

-- 
2.13.6



RE: [PATCH v2 3/3] powerpc/fsl: Implement cpu_show_spectre_v1/v2 for NXP PowerPC Book3E

2018-06-11 Thread Bharat Bhushan
Hi Diana,

> -Original Message-
> From: Diana Craciun [mailto:diana.crac...@nxp.com]
> Sent: Monday, June 11, 2018 6:23 PM
> To: linuxppc-dev@lists.ozlabs.org
> Cc: m...@ellerman.id.au; o...@buserror.net; Leo Li ;
> Bharat Bhushan ; Diana Madalina Craciun
> 
> Subject: [PATCH v2 3/3] powerpc/fsl: Implement cpu_show_spectre_v1/v2 for
> NXP PowerPC Book3E

Please add some description

> 
> Signed-off-by: Diana Craciun 
> ---
>  arch/powerpc/Kconfig   |  2 +-
>  arch/powerpc/kernel/security.c | 15 +++
>  2 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index
> 940c955..a781d60 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -170,7 +170,7 @@ config PPC
>   select GENERIC_CLOCKEVENTS_BROADCASTif SMP
>   select GENERIC_CMOS_UPDATE
>   select GENERIC_CPU_AUTOPROBE
> - select GENERIC_CPU_VULNERABILITIES  if PPC_BOOK3S_64
> + select GENERIC_CPU_VULNERABILITIES  if PPC_BOOK3S_64 ||
> PPC_FSL_BOOK3E
>   select GENERIC_IRQ_SHOW
>   select GENERIC_IRQ_SHOW_LEVEL
>   select GENERIC_SMP_IDLE_THREAD
> diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
> index 797c975..aceaadc 100644
> --- a/arch/powerpc/kernel/security.c
> +++ b/arch/powerpc/kernel/security.c
> @@ -183,3 +183,18 @@ ssize_t cpu_show_spectre_v2(struct device *dev,
> struct device_attribute *attr, c  }  #endif /* CONFIG_PPC_BOOK3S_64 */
> 
> +#ifdef CONFIG_PPC_FSL_BOOK3E
> +ssize_t cpu_show_spectre_v1(struct device *dev, struct device_attribute
> +*attr, char *buf) {
> + if (barrier_nospec_enabled)
> + return sprintf(buf, "Mitigation: __user pointer 
> sanitization\n");
> +
> + return sprintf(buf, "Vulnerable\n");
> +}
> +
> +ssize_t cpu_show_spectre_v2(struct device *dev, struct device_attribute
> +*attr, char *buf) {
> + return sprintf(buf, "Vulnerable\n");
> +}
> +#endif /* CONFIG_PPC_FSL_BOOK3E */
> +
> --
> 2.5.5



Re: [PATCH kernel 5/6] powerpc/powernv: Rework TCE level allocation

2018-06-11 Thread David Gibson
On Fri, Jun 08, 2018 at 03:46:32PM +1000, Alexey Kardashevskiy wrote:
> This moves actual pages allocation to a separate function which is going
> to be reused later in on-demand TCE allocation.
> 
> While we are at it, remove unnecessary level size round up as the caller
> does this already.
> 
> Signed-off-by: Alexey Kardashevskiy 

Reviewed-by: David Gibson 

> ---
>  arch/powerpc/platforms/powernv/pci-ioda-tce.c | 30 
> +--
>  1 file changed, 19 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda-tce.c 
> b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
> index f14b282..36c2eb0 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda-tce.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
> @@ -31,6 +31,23 @@ void pnv_pci_setup_iommu_table(struct iommu_table *tbl,
>   tbl->it_type = TCE_PCI;
>  }
>  
> +static __be64 *pnv_alloc_tce_level(int nid, unsigned int shift)
> +{
> + struct page *tce_mem = NULL;
> + __be64 *addr;
> +
> + tce_mem = alloc_pages_node(nid, GFP_KERNEL, shift - PAGE_SHIFT);
> + if (!tce_mem) {
> + pr_err("Failed to allocate a TCE memory, level shift=%d\n",
> + shift);
> + return NULL;
> + }
> + addr = page_address(tce_mem);
> + memset(addr, 0, 1UL << shift);
> +
> + return addr;
> +}
> +
>  static __be64 *pnv_tce(struct iommu_table *tbl, bool user, long idx)
>  {
>   __be64 *tmp = user ? tbl->it_userspace : (__be64 *) tbl->it_base;
> @@ -165,21 +182,12 @@ static __be64 *pnv_pci_ioda2_table_do_alloc_pages(int 
> nid, unsigned int shift,
>   unsigned int levels, unsigned long limit,
>   unsigned long *current_offset, unsigned long *total_allocated)
>  {
> - struct page *tce_mem = NULL;
>   __be64 *addr, *tmp;
> - unsigned int order = max_t(unsigned int, shift, PAGE_SHIFT) -
> - PAGE_SHIFT;
> - unsigned long allocated = 1UL << (order + PAGE_SHIFT);
> + unsigned long allocated = 1UL << shift;
>   unsigned int entries = 1UL << (shift - 3);
>   long i;
>  
> - tce_mem = alloc_pages_node(nid, GFP_KERNEL, order);
> - if (!tce_mem) {
> - pr_err("Failed to allocate a TCE memory, order=%d\n", order);
> - return NULL;
> - }
> - addr = page_address(tce_mem);
> - memset(addr, 0, allocated);
> + addr = pnv_alloc_tce_level(nid, shift);
>   *total_allocated += allocated;
>  
>   --levels;

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH kernel 4/6] powerpc/powernv: Add indirect levels to it_userspace

2018-06-11 Thread David Gibson
On Fri, Jun 08, 2018 at 03:46:31PM +1000, Alexey Kardashevskiy wrote:
> We want to support sparse memory and therefore huge chunks of DMA windows
> do not need to be mapped. If a DMA window big enough to require 2 or more
> indirect levels, and a DMA window is used to map all RAM (which is
> a default case for 64bit window), we can actually save some memory by
> not allocation TCE for regions which we are not going to map anyway.
> 
> The hardware tables alreary support indirect levels but we also keep
> host-physical-to-userspace translation array which is allocated by
> vmalloc() and is a flat array which might use quite some memory.
> 
> This converts it_userspace from vmalloc'ed array to a multi level table.
> 
> As the format becomes platform dependend, this replaces the direct access
> to it_usespace with a iommu_table_ops::useraddrptr hook which returns
> a pointer to the userspace copy of a TCE; future extension will return
> NULL if the level was not allocated.
> 
> This should not change non-KVM handling of TCE tables and it_userspace
> will not be allocated for non-KVM tables.
> 
> Signed-off-by: Alexey Kardashevskiy 

Reviewed-by: David Gibson 

> ---
>  arch/powerpc/include/asm/iommu.h  |  6 +--
>  arch/powerpc/platforms/powernv/pci.h  |  3 +-
>  arch/powerpc/kvm/book3s_64_vio_hv.c   |  8 
>  arch/powerpc/platforms/powernv/pci-ioda-tce.c | 65 
> +--
>  arch/powerpc/platforms/powernv/pci-ioda.c | 31 ++---
>  drivers/vfio/vfio_iommu_spapr_tce.c   | 46 ---
>  6 files changed, 81 insertions(+), 78 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/iommu.h 
> b/arch/powerpc/include/asm/iommu.h
> index 803ac70..4bdcf22 100644
> --- a/arch/powerpc/include/asm/iommu.h
> +++ b/arch/powerpc/include/asm/iommu.h
> @@ -69,6 +69,8 @@ struct iommu_table_ops {
>   long index,
>   unsigned long *hpa,
>   enum dma_data_direction *direction);
> +
> + __be64 *(*useraddrptr)(struct iommu_table *tbl, long index);
>  #endif
>   void (*clear)(struct iommu_table *tbl,
>   long index, long npages);
> @@ -123,9 +125,7 @@ struct iommu_table {
>  };
>  
>  #define IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry) \
> - ((tbl)->it_userspace ? \
> - &((tbl)->it_userspace[(entry) - (tbl)->it_offset]) : \
> - NULL)
> + ((tbl)->it_ops->useraddrptr((tbl), (entry)))
>  
>  /* Pure 2^n version of get_order */
>  static inline __attribute_const__
> diff --git a/arch/powerpc/platforms/powernv/pci.h 
> b/arch/powerpc/platforms/powernv/pci.h
> index f507baf..5e02408 100644
> --- a/arch/powerpc/platforms/powernv/pci.h
> +++ b/arch/powerpc/platforms/powernv/pci.h
> @@ -268,11 +268,12 @@ extern int pnv_tce_build(struct iommu_table *tbl, long 
> index, long npages,
>  extern void pnv_tce_free(struct iommu_table *tbl, long index, long npages);
>  extern int pnv_tce_xchg(struct iommu_table *tbl, long index,
>   unsigned long *hpa, enum dma_data_direction *direction);
> +extern __be64 *pnv_tce_useraddrptr(struct iommu_table *tbl, long index);
>  extern unsigned long pnv_tce_get(struct iommu_table *tbl, long index);
>  
>  extern long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
>   __u32 page_shift, __u64 window_size, __u32 levels,
> - struct iommu_table *tbl);
> + bool alloc_userspace_copy, struct iommu_table *tbl);
>  extern void pnv_pci_ioda2_table_free_pages(struct iommu_table *tbl);
>  
>  extern long pnv_pci_link_table_and_group(int node, int num,
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c 
> b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index 18109f3..db0490c 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -206,10 +206,6 @@ static long kvmppc_rm_tce_iommu_mapped_dec(struct kvm 
> *kvm,
>   /* it_userspace allocation might be delayed */
>   return H_TOO_HARD;
>  
> - pua = (void *) vmalloc_to_phys(pua);
> - if (WARN_ON_ONCE_RM(!pua))
> - return H_HARDWARE;
> -
>   mem = mm_iommu_lookup_rm(kvm->mm, be64_to_cpu(*pua), pgsize);
>   if (!mem)
>   return H_TOO_HARD;
> @@ -282,10 +278,6 @@ static long kvmppc_rm_tce_iommu_do_map(struct kvm *kvm, 
> struct iommu_table *tbl,
>   if (WARN_ON_ONCE_RM(mm_iommu_ua_to_hpa_rm(mem, ua, )))
>   return H_HARDWARE;
>  
> - pua = (void *) vmalloc_to_phys(pua);
> - if (WARN_ON_ONCE_RM(!pua))
> - return H_HARDWARE;
> -
>   if (WARN_ON_ONCE_RM(mm_iommu_mapped_inc(mem)))
>   return H_CLOSED;
>  
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda-tce.c 
> b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
> index 700ceb1..f14b282 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda-tce.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
> 

linux-next: build failure in Linus' tree

2018-06-11 Thread Stephen Rothwell
Hi all,

Building Linus' tree, today's linux-next build (powerpc allyesconfig)
failed like this:

ld: net/bpfilter/bpfilter_umh.o: compiled for a little endian system and target 
is big endian
ld: failed to merge target specific data of file net/bpfilter/bpfilter_umh.o

This has come to light since I started using a native compiler (i.e. one
that can build executables, not just the kernel) for my PowerPC builds
on a powerpcle host.

I have switched back to my limited compiler.

-- 
Cheers,
Stephen Rothwell


pgpFBiRRabHtn.pgp
Description: OpenPGP digital signature


Re: [PATCH v2 08/12] macintosh/via-pmu68k: Don't load driver on unsupported hardware

2018-06-11 Thread Finn Thain
On Sun, 10 Jun 2018, Benjamin Herrenschmidt wrote:

> Pre-PCI is basically "NUBUS" based even in absence of an actual NuBus 
> slot :-) It has to do with the internal HW architecture. The only ones 
> that aren't are the even older designs (the 68000 based ones).
> 

There is already some disagreement in the comments in the nubus-pmac code 
about the suitability of "PMU_NUBUS_BASED" as opposed to e.g. 
"PMU_WHITNEY_BASED".

Point is, the PMU driver doesn't care about the expansion slots or 
architecture (Whitney-based PMU appears on m68k and powerpc). So NuBus vs. 
PCI is a red herring here. The pmu_kind relates to backlight, buttons and 
battery.

(Leaving aside the PMU driver, if a pre-OpenFirmware Mac has a "slot zero" 
ROM, one can argue that it is actually a NuBus machine, regardless of any 
actual expansion slots.)

> What's the situation with those NuBus things ? What do they use as a 
> bootloader ? The old Apple one or BootX ? We should merge that port of 
> it's maintained.
> 

I agree that this code should not languish out-of-tree. But it would need 
more work before it could reasonably be submitted to reviewers.

I do have some nubus-pmac hardware but I also have more mac/68k driver 
work to do before I can tackle another architecture.

I don't know what the bootloader situation is, but it looks messy...
http://nubus-pmac.sourceforge.net/#booters

Laurent, does Emile work on these machines?

-- 

> Cheers,
> Ben.
>  


linux-next: build warnings from Linus' tree

2018-06-11 Thread Stephen Rothwell
Hi all,

Building Linus' tree, today's linux-next build (powerpc ppc64_defconfig)
produced these warning:

ld: warning: orphan section `.gnu.hash' from `linker stubs' being placed in 
section `.gnu.hash'.
ld: warning: orphan section `.gnu.hash' from `linker stubs' being placed in 
section `.gnu.hash'.
ld: warning: orphan section `.gnu.hash' from `linker stubs' being placed in 
section `.gnu.hash'.

This may just be because I have started building using the native Debian
gcc for the powerpc builds ...

-- 
Cheers,
Stephen Rothwell


pgp_wgVabPhMM.pgp
Description: OpenPGP digital signature


[PATCH] misc: ocxl: Change return type for fault handler

2018-06-11 Thread Souptick Joarder
Use new return type vm_fault_t for fault handler. For
now, this is just documenting that the function returns
a VM_FAULT value rather than an errno. Once all instances
are converted, vm_fault_t will become a distinct type.

Ref-> commit 1c8f422059ae ("mm: change return type to vm_fault_t")

There is an existing bug when vm_insert_pfn() can return
ENOMEM which was ignored and VM_FAULT_NOPAGE returned as
default. The new inline vmf_insert_pfn() has removed
this inefficiency by returning correct vm_fault_ type.

Signed-off-by: Souptick Joarder 
---
 drivers/misc/ocxl/context.c | 22 +++---
 drivers/misc/ocxl/sysfs.c   |  5 ++---
 2 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/drivers/misc/ocxl/context.c b/drivers/misc/ocxl/context.c
index 909e880..98daf91 100644
--- a/drivers/misc/ocxl/context.c
+++ b/drivers/misc/ocxl/context.c
@@ -83,7 +83,7 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64 amr)
return rc;
 }
 
-static int map_afu_irq(struct vm_area_struct *vma, unsigned long address,
+static vm_fault_t map_afu_irq(struct vm_area_struct *vma, unsigned long 
address,
u64 offset, struct ocxl_context *ctx)
 {
u64 trigger_addr;
@@ -92,15 +92,15 @@ static int map_afu_irq(struct vm_area_struct *vma, unsigned 
long address,
if (!trigger_addr)
return VM_FAULT_SIGBUS;
 
-   vm_insert_pfn(vma, address, trigger_addr >> PAGE_SHIFT);
-   return VM_FAULT_NOPAGE;
+   return vmf_insert_pfn(vma, address, trigger_addr >> PAGE_SHIFT);
 }
 
-static int map_pp_mmio(struct vm_area_struct *vma, unsigned long address,
+static vm_fault_t map_pp_mmio(struct vm_area_struct *vma, unsigned long 
address,
u64 offset, struct ocxl_context *ctx)
 {
u64 pp_mmio_addr;
int pasid_off;
+   vm_fault_t ret;
 
if (offset >= ctx->afu->config.pp_mmio_stride)
return VM_FAULT_SIGBUS;
@@ -118,27 +118,27 @@ static int map_pp_mmio(struct vm_area_struct *vma, 
unsigned long address,
pasid_off * ctx->afu->config.pp_mmio_stride +
offset;
 
-   vm_insert_pfn(vma, address, pp_mmio_addr >> PAGE_SHIFT);
+   ret = vmf_insert_pfn(vma, address, pp_mmio_addr >> PAGE_SHIFT);
mutex_unlock(>status_mutex);
-   return VM_FAULT_NOPAGE;
+   return ret;
 }
 
-static int ocxl_mmap_fault(struct vm_fault *vmf)
+static vm_fault_t ocxl_mmap_fault(struct vm_fault *vmf)
 {
struct vm_area_struct *vma = vmf->vma;
struct ocxl_context *ctx = vma->vm_file->private_data;
u64 offset;
-   int rc;
+   vm_fault_t ret;
 
offset = vmf->pgoff << PAGE_SHIFT;
pr_debug("%s: pasid %d address 0x%lx offset 0x%llx\n", __func__,
ctx->pasid, vmf->address, offset);
 
if (offset < ctx->afu->irq_base_offset)
-   rc = map_pp_mmio(vma, vmf->address, offset, ctx);
+   ret = map_pp_mmio(vma, vmf->address, offset, ctx);
else
-   rc = map_afu_irq(vma, vmf->address, offset, ctx);
-   return rc;
+   ret = map_afu_irq(vma, vmf->address, offset, ctx);
+   return ret;
 }
 
 static const struct vm_operations_struct ocxl_vmops = {
diff --git a/drivers/misc/ocxl/sysfs.c b/drivers/misc/ocxl/sysfs.c
index d9753a1..0ab1fd1 100644
--- a/drivers/misc/ocxl/sysfs.c
+++ b/drivers/misc/ocxl/sysfs.c
@@ -64,7 +64,7 @@ static ssize_t global_mmio_read(struct file *filp, struct 
kobject *kobj,
return count;
 }
 
-static int global_mmio_fault(struct vm_fault *vmf)
+static vm_fault_t global_mmio_fault(struct vm_fault *vmf)
 {
struct vm_area_struct *vma = vmf->vma;
struct ocxl_afu *afu = vma->vm_private_data;
@@ -75,8 +75,7 @@ static int global_mmio_fault(struct vm_fault *vmf)
 
offset = vmf->pgoff;
offset += (afu->global_mmio_start >> PAGE_SHIFT);
-   vm_insert_pfn(vma, vmf->address, offset);
-   return VM_FAULT_NOPAGE;
+   return vmf_insert_pfn(vma, vmf->address, offset);
 }
 
 static const struct vm_operations_struct global_mmio_vmops = {
-- 
1.9.1



Re: 4.17.0-10146-gf0dc7f9c6dd9: hw csum failure on powerpc+sungem

2018-06-11 Thread Mathieu Malaterre
Hi Meelis,

On Mon, Jun 11, 2018 at 1:21 PM Meelis Roos  wrote:
>
> I am seeing this on PowerMac G4 with sungem ethernet driver. 4.17 was
> OK, 4.17.0-10146-gf0dc7f9c6dd9 is problematic.

Same here.

> [  140.518664] eth0: hw csum failure
> [  140.518699] CPU: 0 PID: 1237 Comm: postconf Not tainted 
> 4.17.0-10146-gf0dc7f9c6dd9 #83
> [  140.518707] Call Trace:
> [  140.518734] [effefd90] [c03d6db8] __skb_checksum_complete+0xd8/0xdc 
> (unreliable)
> [  140.518759] [effefdb0] [c04c1284] icmpv6_rcv+0x248/0x4ec
> [  140.518775] [effefdd0] [c049a448] ip6_input_finish.constprop.0+0x11c/0x5f4
> [  140.518786] [effefe10] [c049b1c0] ip6_mc_input+0xcc/0x100
> [  140.518807] [effefe20] [c03e110c] __netif_receive_skb_core+0x310/0x944
> [  140.518820] [effefe70] [c03e76ec] napi_gro_receive+0xd0/0xe8
> [  140.518845] [effefe80] [f3e1f66c] gem_poll+0x618/0x1274 [sungem]
> [  140.518856] [effeff30] [c03e6f0c] net_rx_action+0x198/0x374
> [  140.518872] [effeff90] [c0501a88] __do_softirq+0x120/0x278
> [  140.518890] [effeffe0] [c0036188] irq_exit+0xd8/0xdc
> [  140.518908] [effefff0] [c000f478] call_do_irq+0x24/0x3c
> [  140.518925] [d05a5d30] [c0007120] do_IRQ+0x74/0xf0
> [  140.518941] [d05a5d50] [c0012474] ret_from_except+0x0/0x14
> [  140.518960] --- interrupt: 501 at copy_page+0x40/0x90
>LR = copy_user_page+0x18/0x30
> [  140.518973] [d05a5e10] [d058cd80] 0xd058cd80 (unreliable)
> [  140.518989] [d05a5e20] [c00fa2bc] wp_page_copy+0xec/0x654
> [  140.519002] [d05a5e60] [c00fd3a4] do_wp_page+0xa8/0x5b4
> [  140.519013] [d05a5e90] [c00fe934] handle_mm_fault+0x564/0xa84
> [  140.519025] [d05a5f00] [c0016230] do_page_fault+0x1bc/0x7e8
> [  140.519037] [d05a5f40] [c0012300] handle_page_fault+0x14/0x40
> [  140.519048] --- interrupt: 301 at 0xb78b6864
>LR = 0xb78b6c54
>

For some reason if I do a git bisect it returns that:

$ git bisect good
3036bc45364f98515a2c446d7fac2c34dcfbeff4 is the first bad commit

Could you also check on your side please.

> --
> Meelis Roos (mr...@linux.ee)


Re: pkeys on POWER: Access rights not reset on execve

2018-06-11 Thread Ram Pai
On Mon, Jun 11, 2018 at 07:29:33PM +0200, Florian Weimer wrote:
> On 06/11/2018 07:23 PM, Ram Pai wrote:
> >On Fri, Jun 08, 2018 at 07:53:51AM +0200, Florian Weimer wrote:
> >>On 06/08/2018 04:34 AM, Ram Pai wrote:
> 
> So the remaining question at this point is whether the Intel
> behavior (default-deny instead of default-allow) is preferable.
> >>>
> >>>Florian, remind me what behavior needs to fixed?
> >>
> >>See the other thread.  The Intel register equivalent to the AMR by
> >>default disallows access to yet-unallocated keys, so that threads
> >>which are created before key allocation do not magically gain access
> >>to a key allocated by another thread.
> >
> >Are you referring to the thread
> >'[PATCH] pkeys: Introduce PKEY_ALLOC_SIGNALINHERIT and change signal 
> >semantics'
> 
> >Otherwise please point me to the URL of that thread. Sorry and thankx. :)
> 
> No, it's this issue:
> 
>   ...

Ok. try this patch. This patch is on top of the 5 patches that I had
sent last week i.e  "[PATCH  0/5] powerpc/pkeys: fixes to pkeys"

The following is a draft patch though to check if it meets your
expectations.

commit fe53b5fe2dcb3139ea27ade3ae7cbbe43c4af3be
Author: Ram Pai 
Date:   Mon Jun 11 14:57:34 2018 -0500

powerpc/pkeys: Deny read/write/execute by default

Deny everything for all keys; with some exceptions. Do not do this for
pkey-0, or else everything will come to a screaching halt.  Also by
default, do not deny execute for execute-only key.

This is a draft-patch for now.

Signed-off-by: Ram Pai 

diff --git a/arch/powerpc/mm/pkeys.c b/arch/powerpc/mm/pkeys.c
index 8225263..289aafd 100644
--- a/arch/powerpc/mm/pkeys.c
+++ b/arch/powerpc/mm/pkeys.c
@@ -128,13 +128,13 @@ int pkey_initialize(void)
 
/* register mask is in BE format */
pkey_amr_mask = ~0x0ul;
-   pkey_iamr_mask = ~0x0ul;
+   pkey_amr_mask &= ~(0x3ul << pkeyshift(PKEY_0));
+   pkey_amr_mask &= ~(0x3ul << pkeyshift(1));
 
-   for (i = 0; i < (pkeys_total - os_reserved); i++) {
-   pkey_amr_mask &= ~(0x3ul << pkeyshift(i));
-   pkey_iamr_mask &= ~(0x1ul << pkeyshift(i));
-   }
-   pkey_amr_mask |= (AMR_RD_BIT|AMR_WR_BIT) << pkeyshift(EXECUTE_ONLY_KEY);
+   pkey_iamr_mask = ~0x0ul;
+   pkey_iamr_mask &= ~(0x3ul << pkeyshift(PKEY_0));
+   pkey_iamr_mask &= ~(0x3ul << pkeyshift(1));
+   pkey_iamr_mask &= ~(0x3ul << pkeyshift(EXECUTE_ONLY_KEY));
 
pkey_uamor_mask = ~0x0ul;
pkey_uamor_mask &= ~(0x3ul << pkeyshift(PKEY_0));

-- 
Ram Pai



Re: [v3, 03/10] dt-binding: ptp_qoriq: add DPAA FMan support

2018-06-11 Thread Rob Herring
On Thu, Jun 07, 2018 at 05:20:43PM +0800, Yangbo Lu wrote:
> This patch is to add bindings description for DPAA
> FMan 1588 timer, and also remove its description in
> fsl-fman dt-bindings document.
> 
> Signed-off-by: Yangbo Lu 
> ---
> Changes for v2:
>   - None.
> Changes for v3:
>   - None.
> ---
>  Documentation/devicetree/bindings/net/fsl-fman.txt |   25 
> +---
>  .../devicetree/bindings/ptp/ptp-qoriq.txt  |   15 +--
>  2 files changed, 13 insertions(+), 27 deletions(-)

Reviewed-by: Rob Herring 


Re: pkeys on POWER: Access rights not reset on execve

2018-06-11 Thread Florian Weimer

On 06/11/2018 07:23 PM, Ram Pai wrote:

On Fri, Jun 08, 2018 at 07:53:51AM +0200, Florian Weimer wrote:

On 06/08/2018 04:34 AM, Ram Pai wrote:


So the remaining question at this point is whether the Intel
behavior (default-deny instead of default-allow) is preferable.


Florian, remind me what behavior needs to fixed?


See the other thread.  The Intel register equivalent to the AMR by
default disallows access to yet-unallocated keys, so that threads
which are created before key allocation do not magically gain access
to a key allocated by another thread.


Are you referring to the thread
'[PATCH] pkeys: Introduce PKEY_ALLOC_SIGNALINHERIT and change signal semantics'



Otherwise please point me to the URL of that thread. Sorry and thankx. :)


No, it's this issue:

  

The UAMOR part has been fixed (thanks), but I think processes still 
start out with default-allow AMR.


Thanks,
Florian


Re: pkeys on POWER: Access rights not reset on execve

2018-06-11 Thread Ram Pai
On Fri, Jun 08, 2018 at 07:53:51AM +0200, Florian Weimer wrote:
> On 06/08/2018 04:34 AM, Ram Pai wrote:
> >>
> >>So the remaining question at this point is whether the Intel
> >>behavior (default-deny instead of default-allow) is preferable.
> >
> >Florian, remind me what behavior needs to fixed?
> 
> See the other thread.  The Intel register equivalent to the AMR by
> default disallows access to yet-unallocated keys, so that threads
> which are created before key allocation do not magically gain access
> to a key allocated by another thread.

Are you referring to the thread
'[PATCH] pkeys: Introduce PKEY_ALLOC_SIGNALINHERIT and change signal semantics'

If yes, I will wait for your next version of the patch.

Otherwise please point me to the URL of that thread. Sorry and thankx. :)
RP



Re: [PATCH v11 00/26] Speculative page faults

2018-06-11 Thread Laurent Dufour
Hi Haiyan,

I don't have access to the same hardware you ran the test on, but I give a try
to those test on a Power8 system (2 sockets, 5 cores/s, 8 threads/c, 80 CPUs 
32G).
I run each will-it-scale test 10 times and compute the average.

test THP enabled4.17.0-rc4-mm1  spf delta
page_fault3_threads 2697.7  2683.5  -0.53%
page_fault2_threads 170660.6169574.1-0.64%
context_switch1_threads 6915269.2   6877507.3   -0.55%
context_switch1_processes   6478076.2   6529493.5   0.79%
brk1243391.2238527.5-2.00%

Test were launched with the arguments '-t 80 -s 5', only the average report is
taken in account. Note that page size is 64K by default on ppc64.

It would be nice if you could capture some perf data to figure out why the
page_fault2/3 are showing such a performance regression.

Thanks,
Laurent.

On 11/06/2018 09:49, Song, HaiyanX wrote:
> Hi Laurent,
> 
> Regression test for v11 patch serials have been run, some regression is found 
> by LKP-tools (linux kernel performance)
> tested on Intel 4s skylake platform. This time only test the cases which have 
> been run and found regressions on
> V9 patch serials.
> 
> The regression result is sorted by the metric will-it-scale.per_thread_ops.
> branch: Laurent-Dufour/Speculative-page-faults/20180520-045126
> commit id:
>   head commit : a7a8993bfe3ccb54ad468b9f1799649e4ad1ff12
>   base commit : ba98a1cdad71d259a194461b3a61471b49b14df1
> Benchmark: will-it-scale
> Download link: https://github.com/antonblanchard/will-it-scale/tree/master
> 
> Metrics:
>   will-it-scale.per_process_ops=processes/nr_cpu
>   will-it-scale.per_thread_ops=threads/nr_cpu
>   test box: lkp-skl-4sp1(nr_cpu=192,memory=768G)
> THP: enable / disable
> nr_task:100%
> 
> 1. Regressions:
> 
> a). Enable THP
> testcase  base   change  head   
> metric
> page_fault3/enable THP   10519  -20.5%836  
> will-it-scale.per_thread_ops
> page_fault2/enalbe THP8281  -18.8%   6728  
> will-it-scale.per_thread_ops
> brk1/eanble THP 998475   -2.2% 976893  
> will-it-scale.per_process_ops
> context_switch1/enable THP  223910   -1.3% 220930  
> will-it-scale.per_process_ops
> context_switch1/enable THP  233722   -1.0% 231288  
> will-it-scale.per_thread_ops
> 
> b). Disable THP
> page_fault3/disable THP  10856  -23.1%   8344  
> will-it-scale.per_thread_ops
> page_fault2/disable THP   8147  -18.8%   6613  
> will-it-scale.per_thread_ops
> brk1/disable THP   957   -7.9%881  
> will-it-scale.per_thread_ops
> context_switch1/disable THP 237006   -2.2% 231907  
> will-it-scale.per_thread_ops
> brk1/disable THP997317   -2.0% 98  
> will-it-scale.per_process_ops
> page_fault3/disable THP 467454   -1.8% 459251  
> will-it-scale.per_process_ops
> context_switch1/disable THP 224431   -1.3% 221567  
> will-it-scale.per_process_ops
> 
> Notes: for the above  values of test result, the higher is better.
> 
> 2. Improvement: not found improvement based on the selected test cases.
> 
> 
> Best regards
> Haiyan Song
> 
> From: owner-linux...@kvack.org [owner-linux...@kvack.org] on behalf of 
> Laurent Dufour [lduf...@linux.vnet.ibm.com]
> Sent: Monday, May 28, 2018 4:54 PM
> To: Song, HaiyanX
> Cc: a...@linux-foundation.org; mho...@kernel.org; pet...@infradead.org; 
> kir...@shutemov.name; a...@linux.intel.com; d...@stgolabs.net; j...@suse.cz; 
> Matthew Wilcox; khand...@linux.vnet.ibm.com; aneesh.ku...@linux.vnet.ibm.com; 
> b...@kernel.crashing.org; m...@ellerman.id.au; pau...@samba.org; Thomas 
> Gleixner; Ingo Molnar; h...@zytor.com; Will Deacon; Sergey Senozhatsky; 
> sergey.senozhatsky.w...@gmail.com; Andrea Arcangeli; Alexei Starovoitov; 
> Wang, Kemi; Daniel Jordan; David Rientjes; Jerome Glisse; Ganesh Mahendran; 
> Minchan Kim; Punit Agrawal; vinayak menon; Yang Shi; 
> linux-ker...@vger.kernel.org; linux...@kvack.org; ha...@linux.vnet.ibm.com; 
> npig...@gmail.com; bsinghar...@gmail.com; paul...@linux.vnet.ibm.com; Tim 
> Chen; linuxppc-dev@lists.ozlabs.org; x...@kernel.org
> Subject: Re: [PATCH v11 00/26] Speculative page faults
> 
> On 28/05/2018 10:22, Haiyan Song wrote:
>> Hi Laurent,
>>
>> Yes, these tests are done on V9 patch.
> 
> Do you plan to give this V11 a run ?
> 
>>
>>
>> Best regards,
>> Haiyan Song
>>
>> On Mon, May 28, 2018 at 09:51:34AM +0200, Laurent Dufour wrote:
>>> On 28/05/2018 07:23, Song, HaiyanX wrote:

 Some regression and improvements is found by LKP-tools(linux kernel 
 performance) on V9 patch series
 tested on Intel 4s 

[PATCH v2 3/3] powerpc/fsl: Implement cpu_show_spectre_v1/v2 for NXP PowerPC Book3E

2018-06-11 Thread Diana Craciun
Signed-off-by: Diana Craciun 
---
 arch/powerpc/Kconfig   |  2 +-
 arch/powerpc/kernel/security.c | 15 +++
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 940c955..a781d60 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -170,7 +170,7 @@ config PPC
select GENERIC_CLOCKEVENTS_BROADCASTif SMP
select GENERIC_CMOS_UPDATE
select GENERIC_CPU_AUTOPROBE
-   select GENERIC_CPU_VULNERABILITIES  if PPC_BOOK3S_64
+   select GENERIC_CPU_VULNERABILITIES  if PPC_BOOK3S_64 || 
PPC_FSL_BOOK3E
select GENERIC_IRQ_SHOW
select GENERIC_IRQ_SHOW_LEVEL
select GENERIC_SMP_IDLE_THREAD
diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
index 797c975..aceaadc 100644
--- a/arch/powerpc/kernel/security.c
+++ b/arch/powerpc/kernel/security.c
@@ -183,3 +183,18 @@ ssize_t cpu_show_spectre_v2(struct device *dev, struct 
device_attribute *attr, c
 }
 #endif /* CONFIG_PPC_BOOK3S_64 */
 
+#ifdef CONFIG_PPC_FSL_BOOK3E
+ssize_t cpu_show_spectre_v1(struct device *dev, struct device_attribute *attr, 
char *buf)
+{
+   if (barrier_nospec_enabled)
+   return sprintf(buf, "Mitigation: __user pointer 
sanitization\n");
+
+   return sprintf(buf, "Vulnerable\n");
+}
+
+ssize_t cpu_show_spectre_v2(struct device *dev, struct device_attribute *attr, 
char *buf)
+{
+   return sprintf(buf, "Vulnerable\n");
+}
+#endif /* CONFIG_PPC_FSL_BOOK3E */
+
-- 
2.5.5



[PATCH v2 2/3] powerpc/fsl: Add barrier_nospec implementation for NXP PowerPC Book3E

2018-06-11 Thread Diana Craciun
Implement the barrier_nospec as a isync;sync instruction sequence.
The implementation uses the infrastructure built for BOOK3S 64.

Signed-off-by: Diana Craciun 
---
 arch/powerpc/include/asm/barrier.h | 10 ++
 arch/powerpc/include/asm/setup.h   |  2 +-
 arch/powerpc/kernel/Makefile   |  2 +-
 arch/powerpc/kernel/module.c   |  5 +++--
 arch/powerpc/kernel/security.c | 15 +++
 arch/powerpc/kernel/setup_32.c |  5 +
 arch/powerpc/kernel/setup_64.c |  6 ++
 arch/powerpc/kernel/vmlinux.lds.S  |  4 +++-
 arch/powerpc/lib/feature-fixups.c  | 35 ++-
 9 files changed, 78 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/barrier.h 
b/arch/powerpc/include/asm/barrier.h
index f67b3f6..405d572 100644
--- a/arch/powerpc/include/asm/barrier.h
+++ b/arch/powerpc/include/asm/barrier.h
@@ -86,6 +86,16 @@ do { 
\
 // This also acts as a compiler barrier due to the memory clobber.
 #define barrier_nospec() asm (stringify_in_c(barrier_nospec_asm) ::: "memory")
 
+#elif defined(CONFIG_PPC_FSL_BOOK3E)
+/*
+ * Prevent the execution of subsequent instructions speculatively using a
+ * isync;sync instruction sequence.
+ */
+#define barrier_nospec_asm NOSPEC_BARRIER_FIXUP_SECTION; nop; nop
+
+// This also acts as a compiler barrier due to the memory clobber.
+#define barrier_nospec() asm (stringify_in_c(barrier_nospec_asm) ::: "memory")
+
 #else /* !CONFIG_PPC_BOOK3S_64 */
 #define barrier_nospec_asm
 #define barrier_nospec()
diff --git a/arch/powerpc/include/asm/setup.h b/arch/powerpc/include/asm/setup.h
index 8721fd0..67a2810 100644
--- a/arch/powerpc/include/asm/setup.h
+++ b/arch/powerpc/include/asm/setup.h
@@ -56,7 +56,7 @@ void setup_barrier_nospec(void);
 void do_barrier_nospec_fixups(bool enable);
 extern bool barrier_nospec_enabled;
 
-#ifdef CONFIG_PPC_BOOK3S_64
+#if defined(CONFIG_PPC_BOOK3S_64) || defined(CONFIG_PPC_FSL_BOOK3E)
 void do_barrier_nospec_fixups_range(bool enable, void *start, void *end);
 #else
 static inline void do_barrier_nospec_fixups_range(bool enable, void *start, 
void *end) { };
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 2b4c40b2..d9dee43 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -76,7 +76,7 @@ endif
 obj64-$(CONFIG_HIBERNATION)+= swsusp_asm64.o
 obj-$(CONFIG_MODULES)  += module.o module_$(BITS).o
 obj-$(CONFIG_44x)  += cpu_setup_44x.o
-obj-$(CONFIG_PPC_FSL_BOOK3E)   += cpu_setup_fsl_booke.o
+obj-$(CONFIG_PPC_FSL_BOOK3E)   += cpu_setup_fsl_booke.o security.o
 obj-$(CONFIG_PPC_DOORBELL) += dbell.o
 obj-$(CONFIG_JUMP_LABEL)   += jump_label.o
 
diff --git a/arch/powerpc/kernel/module.c b/arch/powerpc/kernel/module.c
index 1b3c683..96a9821 100644
--- a/arch/powerpc/kernel/module.c
+++ b/arch/powerpc/kernel/module.c
@@ -72,13 +72,14 @@ int module_finalize(const Elf_Ehdr *hdr,
do_feature_fixups(powerpc_firmware_features,
  (void *)sect->sh_addr,
  (void *)sect->sh_addr + sect->sh_size);
-
+#endif /* CONFIG_PPC64 */
+#if defined(CONFIG_PPC64) || defined(CONFIG_PPC_FSL_BOOK3E)
sect = find_section(hdr, sechdrs, "__spec_barrier_fixup");
if (sect != NULL)
do_barrier_nospec_fixups_range(barrier_nospec_enabled,
  (void *)sect->sh_addr,
  (void *)sect->sh_addr + sect->sh_size);
-#endif
+#endif /* CONFIG_PPC64 || CONFIG_PPC_FSL_BOOK3E */
 
sect = find_section(hdr, sechdrs, "__lwsync_fixup");
if (sect != NULL)
diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
index c55e102..797c975 100644
--- a/arch/powerpc/kernel/security.c
+++ b/arch/powerpc/kernel/security.c
@@ -13,7 +13,9 @@
 #include 
 
 
+#ifdef CONFIG_PPC_BOOK3S_64
 unsigned long powerpc_security_features __read_mostly = SEC_FTR_DEFAULT;
+#endif /* CONFIG_PPC_BOOK3S_64 */
 
 bool barrier_nospec_enabled;
 static bool no_nospec;
@@ -24,6 +26,7 @@ static void enable_barrier_nospec(bool enable)
do_barrier_nospec_fixups(enable);
 }
 
+#ifdef CONFIG_PPC_BOOK3S_64
 void setup_barrier_nospec(void)
 {
bool enable;
@@ -46,6 +49,15 @@ void setup_barrier_nospec(void)
if (!no_nospec)
enable_barrier_nospec(enable);
 }
+#endif /* CONFIG_PPC_BOOK3S_64 */
+
+#ifdef CONFIG_PPC_FSL_BOOK3E
+void setup_barrier_nospec(void)
+{
+   if (!no_nospec)
+   enable_barrier_nospec(true);
+}
+#endif /* CONFIG_PPC_FSL_BOOK3E */
 
 static int __init handle_nospectre_v1(char *p)
 {
@@ -92,6 +104,7 @@ static __init int barrier_nospec_debugfs_init(void)
 device_initcall(barrier_nospec_debugfs_init);
 #endif /* CONFIG_DEBUG_FS */
 
+#ifdef CONFIG_PPC_BOOK3S_64
 ssize_t cpu_show_meltdown(struct device *dev, struct device_attribute *attr, 
char *buf)
 {

[PATCH v2 0/3] powerpc/fsl: Speculation barrier for NXP PowerPC Book3E

2018-06-11 Thread Diana Craciun
Implement barrier_nospec for NXP PowerPC Book3E processors. 

Diana Craciun (3):
  Disable the speculation barrier from the command line
  Add barrier_nospec implementation for NXP PowerPC Book3E
  Implement cpu_show_spectre_v1/v2 for NXP PowerPC Book3E

 arch/powerpc/Kconfig   |  2 +-
 arch/powerpc/include/asm/barrier.h | 10 +
 arch/powerpc/include/asm/setup.h   |  2 +-
 arch/powerpc/kernel/Makefile   |  2 +-
 arch/powerpc/kernel/module.c   |  5 +++--
 arch/powerpc/kernel/security.c | 42 +-
 arch/powerpc/kernel/setup_32.c |  5 +
 arch/powerpc/kernel/setup_64.c |  6 ++
 arch/powerpc/kernel/vmlinux.lds.S  |  4 +++-
 arch/powerpc/lib/feature-fixups.c  | 35 ++-
 10 files changed, 105 insertions(+), 8 deletions(-)

--
History:

v1 --> v2
- added implementation for cpu_show_spectre_x functions
- the mitigation is no longer enabled through device tree options
2.5.5



[PATCH v2 1/3] powerpc/fsl: Disable the speculation barrier from the command line

2018-06-11 Thread Diana Craciun
The speculation barrier can be disabled from the command line
with the parameter: "nospectre_v1".

Signed-off-by: Diana Craciun 
---
 arch/powerpc/kernel/security.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
index 3eb9c45..c55e102 100644
--- a/arch/powerpc/kernel/security.c
+++ b/arch/powerpc/kernel/security.c
@@ -16,6 +16,7 @@
 unsigned long powerpc_security_features __read_mostly = SEC_FTR_DEFAULT;
 
 bool barrier_nospec_enabled;
+static bool no_nospec;
 
 static void enable_barrier_nospec(bool enable)
 {
@@ -42,9 +43,18 @@ void setup_barrier_nospec(void)
enable = security_ftr_enabled(SEC_FTR_FAVOUR_SECURITY) &&
 security_ftr_enabled(SEC_FTR_BNDS_CHK_SPEC_BAR);
 
-   enable_barrier_nospec(enable);
+   if (!no_nospec)
+   enable_barrier_nospec(enable);
 }
 
+static int __init handle_nospectre_v1(char *p)
+{
+   no_nospec = true;
+
+   return 0;
+}
+early_param("nospectre_v1", handle_nospectre_v1);
+
 #ifdef CONFIG_DEBUG_FS
 static int barrier_nospec_set(void *data, u64 val)
 {
-- 
2.5.5



4.17.0-10146-gf0dc7f9c6dd9: hw csum failure on powerpc+sungem

2018-06-11 Thread Meelis Roos
I am seeing this on PowerMac G4 with sungem ethernet driver. 4.17 was 
OK, 4.17.0-10146-gf0dc7f9c6dd9 is problematic.

[  140.518664] eth0: hw csum failure
[  140.518699] CPU: 0 PID: 1237 Comm: postconf Not tainted 
4.17.0-10146-gf0dc7f9c6dd9 #83
[  140.518707] Call Trace:
[  140.518734] [effefd90] [c03d6db8] __skb_checksum_complete+0xd8/0xdc 
(unreliable)
[  140.518759] [effefdb0] [c04c1284] icmpv6_rcv+0x248/0x4ec
[  140.518775] [effefdd0] [c049a448] ip6_input_finish.constprop.0+0x11c/0x5f4
[  140.518786] [effefe10] [c049b1c0] ip6_mc_input+0xcc/0x100
[  140.518807] [effefe20] [c03e110c] __netif_receive_skb_core+0x310/0x944
[  140.518820] [effefe70] [c03e76ec] napi_gro_receive+0xd0/0xe8
[  140.518845] [effefe80] [f3e1f66c] gem_poll+0x618/0x1274 [sungem]
[  140.518856] [effeff30] [c03e6f0c] net_rx_action+0x198/0x374
[  140.518872] [effeff90] [c0501a88] __do_softirq+0x120/0x278
[  140.518890] [effeffe0] [c0036188] irq_exit+0xd8/0xdc
[  140.518908] [effefff0] [c000f478] call_do_irq+0x24/0x3c
[  140.518925] [d05a5d30] [c0007120] do_IRQ+0x74/0xf0
[  140.518941] [d05a5d50] [c0012474] ret_from_except+0x0/0x14
[  140.518960] --- interrupt: 501 at copy_page+0x40/0x90
   LR = copy_user_page+0x18/0x30
[  140.518973] [d05a5e10] [d058cd80] 0xd058cd80 (unreliable)
[  140.518989] [d05a5e20] [c00fa2bc] wp_page_copy+0xec/0x654
[  140.519002] [d05a5e60] [c00fd3a4] do_wp_page+0xa8/0x5b4
[  140.519013] [d05a5e90] [c00fe934] handle_mm_fault+0x564/0xa84
[  140.519025] [d05a5f00] [c0016230] do_page_fault+0x1bc/0x7e8
[  140.519037] [d05a5f40] [c0012300] handle_page_fault+0x14/0x40
[  140.519048] --- interrupt: 301 at 0xb78b6864
   LR = 0xb78b6c54


-- 
Meelis Roos (mr...@linux.ee)


RE: [PATCH v11 00/26] Speculative page faults

2018-06-11 Thread Song, HaiyanX
Hi Laurent,

Regression test for v11 patch serials have been run, some regression is found 
by LKP-tools (linux kernel performance)
tested on Intel 4s skylake platform. This time only test the cases which have 
been run and found regressions on
V9 patch serials.

The regression result is sorted by the metric will-it-scale.per_thread_ops.
branch: Laurent-Dufour/Speculative-page-faults/20180520-045126
commit id:
  head commit : a7a8993bfe3ccb54ad468b9f1799649e4ad1ff12
  base commit : ba98a1cdad71d259a194461b3a61471b49b14df1
Benchmark: will-it-scale
Download link: https://github.com/antonblanchard/will-it-scale/tree/master

Metrics:
  will-it-scale.per_process_ops=processes/nr_cpu
  will-it-scale.per_thread_ops=threads/nr_cpu
  test box: lkp-skl-4sp1(nr_cpu=192,memory=768G)
THP: enable / disable
nr_task:100%

1. Regressions:

a). Enable THP
testcase  base   change  head   
metric
page_fault3/enable THP   10519  -20.5%836  
will-it-scale.per_thread_ops
page_fault2/enalbe THP8281  -18.8%   6728  
will-it-scale.per_thread_ops
brk1/eanble THP 998475   -2.2% 976893  
will-it-scale.per_process_ops
context_switch1/enable THP  223910   -1.3% 220930  
will-it-scale.per_process_ops
context_switch1/enable THP  233722   -1.0% 231288  
will-it-scale.per_thread_ops

b). Disable THP
page_fault3/disable THP  10856  -23.1%   8344  
will-it-scale.per_thread_ops
page_fault2/disable THP   8147  -18.8%   6613  
will-it-scale.per_thread_ops
brk1/disable THP   957   -7.9%881  
will-it-scale.per_thread_ops
context_switch1/disable THP 237006   -2.2% 231907  
will-it-scale.per_thread_ops
brk1/disable THP997317   -2.0% 98  
will-it-scale.per_process_ops
page_fault3/disable THP 467454   -1.8% 459251  
will-it-scale.per_process_ops
context_switch1/disable THP 224431   -1.3% 221567  
will-it-scale.per_process_ops

Notes: for the above  values of test result, the higher is better.

2. Improvement: not found improvement based on the selected test cases.


Best regards
Haiyan Song

From: owner-linux...@kvack.org [owner-linux...@kvack.org] on behalf of Laurent 
Dufour [lduf...@linux.vnet.ibm.com]
Sent: Monday, May 28, 2018 4:54 PM
To: Song, HaiyanX
Cc: a...@linux-foundation.org; mho...@kernel.org; pet...@infradead.org; 
kir...@shutemov.name; a...@linux.intel.com; d...@stgolabs.net; j...@suse.cz; 
Matthew Wilcox; khand...@linux.vnet.ibm.com; aneesh.ku...@linux.vnet.ibm.com; 
b...@kernel.crashing.org; m...@ellerman.id.au; pau...@samba.org; Thomas 
Gleixner; Ingo Molnar; h...@zytor.com; Will Deacon; Sergey Senozhatsky; 
sergey.senozhatsky.w...@gmail.com; Andrea Arcangeli; Alexei Starovoitov; Wang, 
Kemi; Daniel Jordan; David Rientjes; Jerome Glisse; Ganesh Mahendran; Minchan 
Kim; Punit Agrawal; vinayak menon; Yang Shi; linux-ker...@vger.kernel.org; 
linux...@kvack.org; ha...@linux.vnet.ibm.com; npig...@gmail.com; 
bsinghar...@gmail.com; paul...@linux.vnet.ibm.com; Tim Chen; 
linuxppc-dev@lists.ozlabs.org; x...@kernel.org
Subject: Re: [PATCH v11 00/26] Speculative page faults

On 28/05/2018 10:22, Haiyan Song wrote:
> Hi Laurent,
>
> Yes, these tests are done on V9 patch.

Do you plan to give this V11 a run ?

>
>
> Best regards,
> Haiyan Song
>
> On Mon, May 28, 2018 at 09:51:34AM +0200, Laurent Dufour wrote:
>> On 28/05/2018 07:23, Song, HaiyanX wrote:
>>>
>>> Some regression and improvements is found by LKP-tools(linux kernel 
>>> performance) on V9 patch series
>>> tested on Intel 4s Skylake platform.
>>
>> Hi,
>>
>> Thanks for reporting this benchmark results, but you mentioned the "V9 patch
>> series" while responding to the v11 header series...
>> Were these tests done on v9 or v11 ?
>>
>> Cheers,
>> Laurent.
>>
>>>
>>> The regression result is sorted by the metric will-it-scale.per_thread_ops.
>>> Branch: Laurent-Dufour/Speculative-page-faults/20180316-151833 (V9 patch 
>>> series)
>>> Commit id:
>>> base commit: d55f34411b1b126429a823d06c3124c16283231f
>>> head commit: 0355322b3577eeab7669066df42c550a56801110
>>> Benchmark suite: will-it-scale
>>> Download link:
>>> https://github.com/antonblanchard/will-it-scale/tree/master/tests
>>> Metrics:
>>> will-it-scale.per_process_ops=processes/nr_cpu
>>> will-it-scale.per_thread_ops=threads/nr_cpu
>>> test box: lkp-skl-4sp1(nr_cpu=192,memory=768G)
>>> THP: enable / disable
>>> nr_task: 100%
>>>
>>> 1. Regressions:
>>> a) THP enabled:
>>> testcasebasechange  head   
>>> metric
>>> page_fault3/ enable THP 10092   -17.5%  8323   
>>> will-it-scale.per_thread_ops
>>> page_fault2/ enable THP  8300