Re: [PATCH kernel 4/5] powerpc/powernv/npu: Factor out OPAL calls from context manipulation

2018-11-12 Thread Alexey Kardashevskiy



On 08/11/2018 16:05, David Gibson wrote:
> On Mon, Oct 15, 2018 at 08:33:00PM +1100, Alexey Kardashevskiy wrote:
>> At the moment the NPU context init/destroy code calls OPAL. The init
>> handler in OPAL configures the NPU to pass ATS requests to nested MMU,
>> the destroy handler does nothing besides sanity checks.
>>
>> Since the init handler programs the NPU with a wildcard for LPID/PID,
>> this can be done at the point where a GPU is mapped to an LPAR; it also
>> makes calling opal_npu_destroy_context() unnecessary in this context
>> (this will change with VFIO later though).
> 
> I think this wants a bit more explanation.  It certainly makes me
> nervous removing the destroy_context calls with nothing replacing
> them.
> 
>> Also, the pnv_npu2_init() helper does not really need to call OPAL as
>> well as it inialized an NPU structure and does not interact with GPU or
>> NPU at that moment.
>>
>> This moves OPAL calls to a separate helper. With this change, the API
>> for GPUs does not do any OPAL calls and therefore can be used by both
>> pseries and powernv platforms. The new pnv_npu2_map_lpar_phb() helper
>> should be called on powernv only as it does OPAL calls and it takes
>> an MSR mask which NPU adds to ATS requests so nested MMU knows what
>> translations are permitted; the VFIO/KVM will not set MSR_HV.
>>
>> This removes the check for FW_FEATURE_OPAL as pnv_npu2_init_context/
>> pnv_npu2_release_context/pnv_npu2_init do not call OPAL anymore.
>>
>> Signed-off-by: Alexey Kardashevskiy 
>> ---
>>  arch/powerpc/include/asm/pci.h|   1 +
>>  arch/powerpc/platforms/powernv/pci.h  |   2 +-
>>  arch/powerpc/platforms/powernv/npu-dma.c  | 100 
>> +++---
>>  arch/powerpc/platforms/powernv/pci-ioda.c |   7 ++-
>>  4 files changed, 57 insertions(+), 53 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h
>> index 1a96075..f196df6 100644
>> --- a/arch/powerpc/include/asm/pci.h
>> +++ b/arch/powerpc/include/asm/pci.h
>> @@ -130,5 +130,6 @@ extern void pcibios_scan_phb(struct pci_controller 
>> *hose);
>>  extern struct pci_dev *pnv_pci_get_gpu_dev(struct pci_dev *npdev);
>>  extern struct pci_dev *pnv_pci_get_npu_dev(struct pci_dev *gpdev, int 
>> index);
>>  extern void pnv_npu2_devices_init(void);
>> +extern int pnv_npu2_init(struct pci_controller *hose);
>>  
>>  #endif /* __ASM_POWERPC_PCI_H */
>> diff --git a/arch/powerpc/platforms/powernv/pci.h 
>> b/arch/powerpc/platforms/powernv/pci.h
>> index 3b7617d..ca2ce4b 100644
>> --- a/arch/powerpc/platforms/powernv/pci.h
>> +++ b/arch/powerpc/platforms/powernv/pci.h
>> @@ -224,7 +224,7 @@ extern long pnv_npu_set_window(struct pnv_ioda_pe *npe, 
>> int num,
>>  extern long pnv_npu_unset_window(struct pnv_ioda_pe *npe, int num);
>>  extern void pnv_npu_take_ownership(struct pnv_ioda_pe *npe);
>>  extern void pnv_npu_release_ownership(struct pnv_ioda_pe *npe);
>> -extern int pnv_npu2_init(struct pnv_phb *phb);
>> +extern void pnv_npu2_map_lpar_phb(struct pnv_phb *nphb, unsigned long msr);
>>  
>>  /* pci-ioda-tce.c */
>>  #define POWERNV_IOMMU_DEFAULT_LEVELS1
>> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c 
>> b/arch/powerpc/platforms/powernv/npu-dma.c
>> index cb2b4f9..677f30a 100644
>> --- a/arch/powerpc/platforms/powernv/npu-dma.c
>> +++ b/arch/powerpc/platforms/powernv/npu-dma.c
>> @@ -762,7 +762,6 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev 
>> *gpdev,
>>  u32 nvlink_index;
>>  struct device_node *nvlink_dn;
>>  struct mm_struct *mm = current->mm;
>> -struct pnv_phb *nphb;
>>  struct npu *npu;
>>  struct npu_context *npu_context;
>>  
>> @@ -772,9 +771,6 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev 
>> *gpdev,
>>   */
>>  struct pci_dev *npdev = pnv_pci_get_npu_dev(gpdev, 0);
>>  
>> -if (!firmware_has_feature(FW_FEATURE_OPAL))
>> -return ERR_PTR(-ENODEV);
>> -
>>  if (!npdev)
>>  /* No nvlink associated with this GPU device */
>>  return ERR_PTR(-ENODEV);
>> @@ -792,22 +788,9 @@ struct npu_context *pnv_npu2_init_context(struct 
>> pci_dev *gpdev,
>>  return ERR_PTR(-EINVAL);
>>  }
>>  
>> -nphb = pci_bus_to_host(npdev->bus)->private_data;
>>  npu = npdev_to_npu(npdev);
>>  
>>  /*
>> - * Setup the NPU context table for a particular GPU. These need to be
>> - * per-GPU as we need the tables to filter ATSDs when there are no
>> - * active contexts on a particular GPU. It is safe for these to be
>> - * called concurrently with destroy as the OPAL call takes appropriate
>> - * locks and refcounts on init/destroy.
>> - */
>> -rc = opal_npu_init_context(nphb->opal_id, mm->context.id,
>> flags,
> 
> AFAICT this was the only use of 'flags' in this function, so it should
> be removed from the signature, no?


It should, however this is ABI and the vendor driver is calling this
function with the flags 

[PATCH AUTOSEL 4.9 03/17] mm/memory_hotplug: make add_memory() take the device_hotplug_lock

2018-11-12 Thread Sasha Levin
From: David Hildenbrand 

[ Upstream commit 8df1d0e4a265f25dc1e7e7624ccdbcb4a6630c89 ]

add_memory() currently does not take the device_hotplug_lock, however
is aleady called under the lock from
arch/powerpc/platforms/pseries/hotplug-memory.c
drivers/acpi/acpi_memhotplug.c
to synchronize against CPU hot-remove and similar.

In general, we should hold the device_hotplug_lock when adding memory to
synchronize against online/offline request (e.g.  from user space) - which
already resulted in lock inversions due to device_lock() and
mem_hotplug_lock - see 30467e0b3be ("mm, hotplug: fix concurrent memory
hot-add deadlock").  add_memory()/add_memory_resource() will create memory
block devices, so this really feels like the right thing to do.

Holding the device_hotplug_lock makes sure that a memory block device
can really only be accessed (e.g. via .online/.state) from user space,
once the memory has been fully added to the system.

The lock is not held yet in
drivers/xen/balloon.c
arch/powerpc/platforms/powernv/memtrace.c
drivers/s390/char/sclp_cmd.c
drivers/hv/hv_balloon.c
So, let's either use the locked variants or take the lock.

Don't export add_memory_resource(), as it once was exported to be used by
XEN, which is never built as a module.  If somebody requires it, we also
have to export a locked variant (as device_hotplug_lock is never
exported).

Link: http://lkml.kernel.org/r/20180925091457.28651-3-da...@redhat.com
Signed-off-by: David Hildenbrand 
Reviewed-by: Pavel Tatashin 
Reviewed-by: Rafael J. Wysocki 
Reviewed-by: Rashmica Gupta 
Reviewed-by: Oscar Salvador 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: "Rafael J. Wysocki" 
Cc: Len Brown 
Cc: Greg Kroah-Hartman 
Cc: Boris Ostrovsky 
Cc: Juergen Gross 
Cc: Nathan Fontenot 
Cc: John Allen 
Cc: Michal Hocko 
Cc: Dan Williams 
Cc: Joonsoo Kim 
Cc: Vlastimil Babka 
Cc: Mathieu Malaterre 
Cc: Pavel Tatashin 
Cc: YASUAKI ISHIMATSU 
Cc: Balbir Singh 
Cc: Haiyang Zhang 
Cc: Heiko Carstens 
Cc: Jonathan Corbet 
Cc: Kate Stewart 
Cc: "K. Y. Srinivasan" 
Cc: Martin Schwidefsky 
Cc: Michael Neuling 
Cc: Philippe Ombredanne 
Cc: Stephen Hemminger 
Cc: Thomas Gleixner 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Sasha Levin 
---
 .../platforms/pseries/hotplug-memory.c|  2 +-
 drivers/acpi/acpi_memhotplug.c|  2 +-
 drivers/base/memory.c |  9 ++--
 drivers/xen/balloon.c |  3 +++
 include/linux/memory_hotplug.h|  1 +
 mm/memory_hotplug.c   | 22 ---
 6 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index c0a0947f43bb..656bbbd731d0 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -616,7 +616,7 @@ static int dlpar_add_lmb(struct of_drconf_cell *lmb)
nid = memory_add_physaddr_to_nid(lmb->base_addr);
 
/* Add the memory */
-   rc = add_memory(nid, lmb->base_addr, block_sz);
+   rc = __add_memory(nid, lmb->base_addr, block_sz);
if (rc) {
dlpar_remove_device_tree_lmb(lmb);
dlpar_release_drc(lmb->drc_index);
diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 6b0d3ef7309c..2ccfbb61ca89 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -228,7 +228,7 @@ static int acpi_memory_enable_device(struct 
acpi_memory_device *mem_device)
if (node < 0)
node = memory_add_physaddr_to_nid(info->start_addr);
 
-   result = add_memory(node, info->start_addr, info->length);
+   result = __add_memory(node, info->start_addr, info->length);
 
/*
 * If the memory block has been used by the kernel, add_memory()
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index c5cdd190b781..9f96f1b43c15 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -500,15 +500,20 @@ memory_probe_store(struct device *dev, struct 
device_attribute *attr,
if (phys_addr & ((pages_per_block << PAGE_SHIFT) - 1))
return -EINVAL;
 
+   ret = lock_device_hotplug_sysfs();
+   if (ret)
+   goto out;
+
nid = memory_add_physaddr_to_nid(phys_addr);
-   ret = add_memory(nid, phys_addr,
-MIN_MEMORY_BLOCK_SIZE * sections_per_block);
+   ret = __add_memory(nid, phys_addr,
+  MIN_MEMORY_BLOCK_SIZE * sections_per_block);
 
if (ret)
goto out;
 
ret = count;
 out:
+   unlock_device_hotplug();
return ret;
 }
 
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index e4db19e88ab1..a10c64fee9ac 100644
--- 

[PATCH AUTOSEL 4.14 08/26] mm/memory_hotplug: make add_memory() take the device_hotplug_lock

2018-11-12 Thread Sasha Levin
From: David Hildenbrand 

[ Upstream commit 8df1d0e4a265f25dc1e7e7624ccdbcb4a6630c89 ]

add_memory() currently does not take the device_hotplug_lock, however
is aleady called under the lock from
arch/powerpc/platforms/pseries/hotplug-memory.c
drivers/acpi/acpi_memhotplug.c
to synchronize against CPU hot-remove and similar.

In general, we should hold the device_hotplug_lock when adding memory to
synchronize against online/offline request (e.g.  from user space) - which
already resulted in lock inversions due to device_lock() and
mem_hotplug_lock - see 30467e0b3be ("mm, hotplug: fix concurrent memory
hot-add deadlock").  add_memory()/add_memory_resource() will create memory
block devices, so this really feels like the right thing to do.

Holding the device_hotplug_lock makes sure that a memory block device
can really only be accessed (e.g. via .online/.state) from user space,
once the memory has been fully added to the system.

The lock is not held yet in
drivers/xen/balloon.c
arch/powerpc/platforms/powernv/memtrace.c
drivers/s390/char/sclp_cmd.c
drivers/hv/hv_balloon.c
So, let's either use the locked variants or take the lock.

Don't export add_memory_resource(), as it once was exported to be used by
XEN, which is never built as a module.  If somebody requires it, we also
have to export a locked variant (as device_hotplug_lock is never
exported).

Link: http://lkml.kernel.org/r/20180925091457.28651-3-da...@redhat.com
Signed-off-by: David Hildenbrand 
Reviewed-by: Pavel Tatashin 
Reviewed-by: Rafael J. Wysocki 
Reviewed-by: Rashmica Gupta 
Reviewed-by: Oscar Salvador 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: "Rafael J. Wysocki" 
Cc: Len Brown 
Cc: Greg Kroah-Hartman 
Cc: Boris Ostrovsky 
Cc: Juergen Gross 
Cc: Nathan Fontenot 
Cc: John Allen 
Cc: Michal Hocko 
Cc: Dan Williams 
Cc: Joonsoo Kim 
Cc: Vlastimil Babka 
Cc: Mathieu Malaterre 
Cc: Pavel Tatashin 
Cc: YASUAKI ISHIMATSU 
Cc: Balbir Singh 
Cc: Haiyang Zhang 
Cc: Heiko Carstens 
Cc: Jonathan Corbet 
Cc: Kate Stewart 
Cc: "K. Y. Srinivasan" 
Cc: Martin Schwidefsky 
Cc: Michael Neuling 
Cc: Philippe Ombredanne 
Cc: Stephen Hemminger 
Cc: Thomas Gleixner 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Sasha Levin 
---
 .../platforms/pseries/hotplug-memory.c|  2 +-
 drivers/acpi/acpi_memhotplug.c|  2 +-
 drivers/base/memory.c |  9 ++--
 drivers/xen/balloon.c |  3 +++
 include/linux/memory_hotplug.h|  1 +
 mm/memory_hotplug.c   | 22 ---
 6 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 1d48ab424bd9..c197ea8dde01 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -784,7 +784,7 @@ static int dlpar_add_lmb(struct of_drconf_cell *lmb)
nid = memory_add_physaddr_to_nid(lmb->base_addr);
 
/* Add the memory */
-   rc = add_memory(nid, lmb->base_addr, block_sz);
+   rc = __add_memory(nid, lmb->base_addr, block_sz);
if (rc) {
dlpar_remove_device_tree_lmb(lmb);
return rc;
diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 6b0d3ef7309c..2ccfbb61ca89 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -228,7 +228,7 @@ static int acpi_memory_enable_device(struct 
acpi_memory_device *mem_device)
if (node < 0)
node = memory_add_physaddr_to_nid(info->start_addr);
 
-   result = add_memory(node, info->start_addr, info->length);
+   result = __add_memory(node, info->start_addr, info->length);
 
/*
 * If the memory block has been used by the kernel, add_memory()
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 1d60b58a8c19..9a9aebb4bd19 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -517,15 +517,20 @@ memory_probe_store(struct device *dev, struct 
device_attribute *attr,
if (phys_addr & ((pages_per_block << PAGE_SHIFT) - 1))
return -EINVAL;
 
+   ret = lock_device_hotplug_sysfs();
+   if (ret)
+   goto out;
+
nid = memory_add_physaddr_to_nid(phys_addr);
-   ret = add_memory(nid, phys_addr,
-MIN_MEMORY_BLOCK_SIZE * sections_per_block);
+   ret = __add_memory(nid, phys_addr,
+  MIN_MEMORY_BLOCK_SIZE * sections_per_block);
 
if (ret)
goto out;
 
ret = count;
 out:
+   unlock_device_hotplug();
return ret;
 }
 
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 065f0b607373..6e39a0c10f93 100644
--- a/drivers/xen/balloon.c
+++ 

[PATCH AUTOSEL 4.18 12/39] mm/memory_hotplug: make add_memory() take the device_hotplug_lock

2018-11-12 Thread Sasha Levin
From: David Hildenbrand 

[ Upstream commit 8df1d0e4a265f25dc1e7e7624ccdbcb4a6630c89 ]

add_memory() currently does not take the device_hotplug_lock, however
is aleady called under the lock from
arch/powerpc/platforms/pseries/hotplug-memory.c
drivers/acpi/acpi_memhotplug.c
to synchronize against CPU hot-remove and similar.

In general, we should hold the device_hotplug_lock when adding memory to
synchronize against online/offline request (e.g.  from user space) - which
already resulted in lock inversions due to device_lock() and
mem_hotplug_lock - see 30467e0b3be ("mm, hotplug: fix concurrent memory
hot-add deadlock").  add_memory()/add_memory_resource() will create memory
block devices, so this really feels like the right thing to do.

Holding the device_hotplug_lock makes sure that a memory block device
can really only be accessed (e.g. via .online/.state) from user space,
once the memory has been fully added to the system.

The lock is not held yet in
drivers/xen/balloon.c
arch/powerpc/platforms/powernv/memtrace.c
drivers/s390/char/sclp_cmd.c
drivers/hv/hv_balloon.c
So, let's either use the locked variants or take the lock.

Don't export add_memory_resource(), as it once was exported to be used by
XEN, which is never built as a module.  If somebody requires it, we also
have to export a locked variant (as device_hotplug_lock is never
exported).

Link: http://lkml.kernel.org/r/20180925091457.28651-3-da...@redhat.com
Signed-off-by: David Hildenbrand 
Reviewed-by: Pavel Tatashin 
Reviewed-by: Rafael J. Wysocki 
Reviewed-by: Rashmica Gupta 
Reviewed-by: Oscar Salvador 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: "Rafael J. Wysocki" 
Cc: Len Brown 
Cc: Greg Kroah-Hartman 
Cc: Boris Ostrovsky 
Cc: Juergen Gross 
Cc: Nathan Fontenot 
Cc: John Allen 
Cc: Michal Hocko 
Cc: Dan Williams 
Cc: Joonsoo Kim 
Cc: Vlastimil Babka 
Cc: Mathieu Malaterre 
Cc: Pavel Tatashin 
Cc: YASUAKI ISHIMATSU 
Cc: Balbir Singh 
Cc: Haiyang Zhang 
Cc: Heiko Carstens 
Cc: Jonathan Corbet 
Cc: Kate Stewart 
Cc: "K. Y. Srinivasan" 
Cc: Martin Schwidefsky 
Cc: Michael Neuling 
Cc: Philippe Ombredanne 
Cc: Stephen Hemminger 
Cc: Thomas Gleixner 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Sasha Levin 
---
 .../platforms/pseries/hotplug-memory.c|  2 +-
 drivers/acpi/acpi_memhotplug.c|  2 +-
 drivers/base/memory.c |  9 ++--
 drivers/xen/balloon.c |  3 +++
 include/linux/memory_hotplug.h|  1 +
 mm/memory_hotplug.c   | 22 ---
 6 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index c1578f54c626..79e074eac486 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -702,7 +702,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
nid = memory_add_physaddr_to_nid(lmb->base_addr);
 
/* Add the memory */
-   rc = add_memory(nid, lmb->base_addr, block_sz);
+   rc = __add_memory(nid, lmb->base_addr, block_sz);
if (rc) {
dlpar_remove_device_tree_lmb(lmb);
return rc;
diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 6b0d3ef7309c..2ccfbb61ca89 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -228,7 +228,7 @@ static int acpi_memory_enable_device(struct 
acpi_memory_device *mem_device)
if (node < 0)
node = memory_add_physaddr_to_nid(info->start_addr);
 
-   result = add_memory(node, info->start_addr, info->length);
+   result = __add_memory(node, info->start_addr, info->length);
 
/*
 * If the memory block has been used by the kernel, add_memory()
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 622ab8edc035..7307a173dca8 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -519,15 +519,20 @@ memory_probe_store(struct device *dev, struct 
device_attribute *attr,
if (phys_addr & ((pages_per_block << PAGE_SHIFT) - 1))
return -EINVAL;
 
+   ret = lock_device_hotplug_sysfs();
+   if (ret)
+   goto out;
+
nid = memory_add_physaddr_to_nid(phys_addr);
-   ret = add_memory(nid, phys_addr,
-MIN_MEMORY_BLOCK_SIZE * sections_per_block);
+   ret = __add_memory(nid, phys_addr,
+  MIN_MEMORY_BLOCK_SIZE * sections_per_block);
 
if (ret)
goto out;
 
ret = count;
 out:
+   unlock_device_hotplug();
return ret;
 }
 
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 065f0b607373..6e39a0c10f93 100644
--- a/drivers/xen/balloon.c
+++ 

[PATCH AUTOSEL 4.19 14/44] mm/memory_hotplug: make add_memory() take the device_hotplug_lock

2018-11-12 Thread Sasha Levin
From: David Hildenbrand 

[ Upstream commit 8df1d0e4a265f25dc1e7e7624ccdbcb4a6630c89 ]

add_memory() currently does not take the device_hotplug_lock, however
is aleady called under the lock from
arch/powerpc/platforms/pseries/hotplug-memory.c
drivers/acpi/acpi_memhotplug.c
to synchronize against CPU hot-remove and similar.

In general, we should hold the device_hotplug_lock when adding memory to
synchronize against online/offline request (e.g.  from user space) - which
already resulted in lock inversions due to device_lock() and
mem_hotplug_lock - see 30467e0b3be ("mm, hotplug: fix concurrent memory
hot-add deadlock").  add_memory()/add_memory_resource() will create memory
block devices, so this really feels like the right thing to do.

Holding the device_hotplug_lock makes sure that a memory block device
can really only be accessed (e.g. via .online/.state) from user space,
once the memory has been fully added to the system.

The lock is not held yet in
drivers/xen/balloon.c
arch/powerpc/platforms/powernv/memtrace.c
drivers/s390/char/sclp_cmd.c
drivers/hv/hv_balloon.c
So, let's either use the locked variants or take the lock.

Don't export add_memory_resource(), as it once was exported to be used by
XEN, which is never built as a module.  If somebody requires it, we also
have to export a locked variant (as device_hotplug_lock is never
exported).

Link: http://lkml.kernel.org/r/20180925091457.28651-3-da...@redhat.com
Signed-off-by: David Hildenbrand 
Reviewed-by: Pavel Tatashin 
Reviewed-by: Rafael J. Wysocki 
Reviewed-by: Rashmica Gupta 
Reviewed-by: Oscar Salvador 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: "Rafael J. Wysocki" 
Cc: Len Brown 
Cc: Greg Kroah-Hartman 
Cc: Boris Ostrovsky 
Cc: Juergen Gross 
Cc: Nathan Fontenot 
Cc: John Allen 
Cc: Michal Hocko 
Cc: Dan Williams 
Cc: Joonsoo Kim 
Cc: Vlastimil Babka 
Cc: Mathieu Malaterre 
Cc: Pavel Tatashin 
Cc: YASUAKI ISHIMATSU 
Cc: Balbir Singh 
Cc: Haiyang Zhang 
Cc: Heiko Carstens 
Cc: Jonathan Corbet 
Cc: Kate Stewart 
Cc: "K. Y. Srinivasan" 
Cc: Martin Schwidefsky 
Cc: Michael Neuling 
Cc: Philippe Ombredanne 
Cc: Stephen Hemminger 
Cc: Thomas Gleixner 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Sasha Levin 
---
 .../platforms/pseries/hotplug-memory.c|  2 +-
 drivers/acpi/acpi_memhotplug.c|  2 +-
 drivers/base/memory.c |  9 ++--
 drivers/xen/balloon.c |  3 +++
 include/linux/memory_hotplug.h|  1 +
 mm/memory_hotplug.c   | 22 ---
 6 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index c1578f54c626..79e074eac486 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -702,7 +702,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
nid = memory_add_physaddr_to_nid(lmb->base_addr);
 
/* Add the memory */
-   rc = add_memory(nid, lmb->base_addr, block_sz);
+   rc = __add_memory(nid, lmb->base_addr, block_sz);
if (rc) {
dlpar_remove_device_tree_lmb(lmb);
return rc;
diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 6b0d3ef7309c..2ccfbb61ca89 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -228,7 +228,7 @@ static int acpi_memory_enable_device(struct 
acpi_memory_device *mem_device)
if (node < 0)
node = memory_add_physaddr_to_nid(info->start_addr);
 
-   result = add_memory(node, info->start_addr, info->length);
+   result = __add_memory(node, info->start_addr, info->length);
 
/*
 * If the memory block has been used by the kernel, add_memory()
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 817320c7c4c1..40cac122ec73 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -519,15 +519,20 @@ memory_probe_store(struct device *dev, struct 
device_attribute *attr,
if (phys_addr & ((pages_per_block << PAGE_SHIFT) - 1))
return -EINVAL;
 
+   ret = lock_device_hotplug_sysfs();
+   if (ret)
+   goto out;
+
nid = memory_add_physaddr_to_nid(phys_addr);
-   ret = add_memory(nid, phys_addr,
-MIN_MEMORY_BLOCK_SIZE * sections_per_block);
+   ret = __add_memory(nid, phys_addr,
+  MIN_MEMORY_BLOCK_SIZE * sections_per_block);
 
if (ret)
goto out;
 
ret = count;
 out:
+   unlock_device_hotplug();
return ret;
 }
 
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index e12bb256036f..6bab019a82b1 100644
--- a/drivers/xen/balloon.c
+++ 

Re: [PATCH v2 1/2] Makefile: Export clang toolchain variables

2018-11-12 Thread Masahiro Yamada
On Tue, Nov 13, 2018 at 3:56 AM Nick Desaulniers
 wrote:
>
> On Sun, Nov 11, 2018 at 8:21 PM Joel Stanley  wrote:
> >
> > The powerpc makefile will use these in it's boot wrapper.
> >
> > Signed-off-by: Joel Stanley 
> > ---
> >  Makefile | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/Makefile b/Makefile
> > index 09278330282d..840efe6eb54c 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -495,6 +495,7 @@ endif
> >  ifneq ($(GCC_TOOLCHAIN),)
> >  CLANG_FLAGS+= --gcc-toolchain=$(GCC_TOOLCHAIN)
> >  endif
> > +export CLANG_FLAGS
> >  CLANG_FLAGS+= -no-integrated-as
>
> Does this export CLANG_FLAGS before `-no-integrated-as` is set?

This does not matter.

For example, KBUILD_CFLAGS accumulates several compiler flags
after it is marked as 'export'.


> Either way, I think it would be clearer to export this after all the
> relevant flags are set.

OK. It is just a matter of preference,
but I will move the export line below
when I pick up this patch set.



> >  KBUILD_CFLAGS  += $(CLANG_FLAGS)
> >  KBUILD_AFLAGS  += $(CLANG_FLAGS)
> > --
> > 2.19.1
> >
>
>
> --
> Thanks,
> ~Nick Desaulniers



-- 
Best Regards
Masahiro Yamada


Re: [PATCH v2] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected

2018-11-12 Thread Bjorn Helgaas
[+cc Jon, for related VMD firmware-first error enable issue]

On Mon, Nov 12, 2018 at 08:05:41PM +, alex_gagn...@dellteam.com wrote:
> On 11/11/2018 11:50 PM, Oliver O'Halloran wrote:
> > On Thu, 2018-11-08 at 23:06 +, alex_gagn...@dellteam.com wrote:

> >> But it's not the firmware that crashes. It's linux as a result of a
> >> fatal error message from the firmware. And we can't fix that because FFS
> >> handling requires that the system reboots [1].
> > 
> > Do we know the exact circumsances that result in firmware requesting a
> > reboot? If it happen on any PCIe error I don't see what we can do to
> > prevent that beyond masking UEs entirely (are we even allowed to do
> > that on FFS systems?).
> 
> Pull a drive out at an angle, push two drives in at the same time, pull 
> out a drive really slow. If an error is even reported to the OS depends 
> on PD state, and proprietary mechanisms and logic in the HW and FW. OS 
> is not supposed to mask errors (touch AER bits) on FFS.

PD?

Do you think Linux observes the rule about not touching AER bits on
FFS?  I'm not sure it does.  I'm not even sure what section of the
spec is relevant.

The whole issue of firmware-first, the mechanism by which firmware
gets control, the System Error enables in Root Port Root Control
registers, etc., is very murky to me.  Jon has a sort of similar issue
with VMD where he needs to leave System Errors enabled instead of
disabling them as we currently do.

Bjorn

[1] 
https://lore.kernel.org/linux-pci/20181029210651.gb13...@bhelgaas-glaptop.roam.corp.google.com


Re: [PATCH] powerpc: Add KVM guest defconfig

2018-11-12 Thread Satheesh Rajendran
On Mon, Nov 12, 2018 at 11:24:08PM +1100, Michael Ellerman wrote:
> Satheesh Rajendran  writes:
> 
> > On Thu, Nov 08, 2018 at 04:23:07PM -0200, Breno Leitao wrote:
> >> hi Satheesh,
> >> 
> >> On 11/08/2018 03:08 AM, sathn...@linux.vnet.ibm.com wrote:
> >> > --- /dev/null
> >> > +++ b/arch/powerpc/configs/guest.config
> >> > @@ -0,0 +1,14 @@
> >> > +CONFIG_VIRTIO_BLK=y
> >> > +CONFIG_VIRTIO_BLK_SCSI=y
> >> > +CONFIG_SCSI_VIRTIO=y
> >> > +CONFIG_VIRTIO_NET=y
> >> > +CONFIG_NET_FAILOVER=y
> >> > +CONFIG_VIRTIO_CONSOLE=y
> >> > +CONFIG_VIRTIO=y
> >> > +CONFIG_VIRTIO_PCI=y
> >> > +CONFIG_KVM_GUEST=y
> >> > +CONFIG_EPAPR_PARAVIRT=y
> >> > +CONFIG_XFS_FS=y
> >> 
> >> Why a guest kernel needs to have XFS integrated in the core image? I am
> >> wondering if it is a requirement from another CONFIG_ option.
> >
> > Idea is to have a working config which would boot guest without initramfs,
> > other FS(like EXT4) is already integrated in the core image, 
> > thought this would be helpful for distributions, which default XFS as root 
> > disk.
> 
> Maybe we should switch XFS_FS to Y in ppc64_defconfig ?

Sure, makes sense, will send it for ppc64_defconfig instead. 
Inaddition, Have few more symbols to be enabled for cgroups,
memhotplug,numa balancing.
I guess these symbols can also go to ppc64_defconfig itself?.

i.e,

CONFIG_CGROUP_SCHED=y
CONFIG_MEMCG=y
CONFIG_CGROUP_SCHED=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_CGROUP_PERF=y
CONFIG_MEMORY_HOTPLUG=y
CONFIG_MEMORY_HOTREMOVE=y
CONFIG_NUMA_BALANCING=y

Thanks!
-Satheesh.
> 
> cheers
> 



Re: [PATCH] Documentation: fix spelling mistake, EACCESS -> EACCES

2018-11-12 Thread Geoff Levand
Hi Everyone,

On 11/12/2018 04:57 AM, Michael Ellerman wrote:
> But maybe Geoff has a better feel for how many people (other than him)
> are still running upstream on PS3s.

There are still PS3 users out there.  They are hobbyists who
generally don't post to Linux kernel mailing lists.  I usually
get one or two mail or IRC inquiries every month, many asking
how to get a recent kernel running.

I try keep my dev repo at k.org up to the latest -rc release
and tested on the retail PS3.  For more info see:

  https://git.kernel.org/pub/scm/linux/kernel/git/geoff/ps3-linux.git
  https://www.kernel.org/pub/linux/kernel/people/geoff/cell/README

-Geoff


Re: [PATCH v2] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected

2018-11-12 Thread Alex_Gagniuc
On 11/11/2018 11:50 PM, Oliver O'Halloran wrote:
> 
> [EXTERNAL EMAIL]
> Please report any suspicious attachments, links, or requests for sensitive 
> information.
> 
> 
> On Thu, 2018-11-08 at 23:06 +, alex_gagn...@dellteam.com wrote:
>> On 11/08/2018 04:51 PM, Greg KH wrote:
>>> On Thu, Nov 08, 2018 at 10:49:08PM +, alex_gagn...@dellteam.com wrote:
 In the case that we're trying to fix, this code executing is a result of
 the device being gone, so we can guarantee race-free operation. I agree
 that there is a race, in the general case. As far as checking the result
 for all F's, that's not an option when firmware crashes the system as a
 result of the mmio read/write. It's never pretty when firmware gets
 involved.
>>>
>>> If you have firmware that crashes the system when you try to read from a
>>> PCI device that was hot-removed, that is broken firmware and needs to be
>>> fixed.  The kernel can not work around that as again, you will never win
>>> that race.
>>
>> But it's not the firmware that crashes. It's linux as a result of a
>> fatal error message from the firmware. And we can't fix that because FFS
>> handling requires that the system reboots [1].
> 
> Do we know the exact circumsances that result in firmware requesting a
> reboot? If it happen on any PCIe error I don't see what we can do to
> prevent that beyond masking UEs entirely (are we even allowed to do
> that on FFS systems?).

Pull a drive out at an angle, push two drives in at the same time, pull 
out a drive really slow. If an error is even reported to the OS depends 
on PD state, and proprietary mechanisms and logic in the HW and FW. OS 
is not supposed to mask errors (touch AER bits) on FFS.

Sadly, with FFS, behavior can and does change from BIOS version to BIOS 
version. On one product, for example, we eliminated a lot of crashes by 
simply not reporting some classes of PCIe errors to the OS.

Alex

>> If we're going to say that we don't want to support FFS because it's a
>> separate code path, and different flow, that's fine. I am myself, not a
>> fan of FFS. But if we're going to continue supporting it, I think we'll
>> continue to have to resolve these sort of unintended consequences.
>>
>> Alex
>>
>> [1] ACPI 6.2, 18.1 - Hardware Errors and Error Sources
> 
> 



Re: [PATCH] powerpc/math-emu: Fix building with clang

2018-11-12 Thread Nick Desaulniers
On Mon, Nov 12, 2018 at 3:33 AM Joel Stanley  wrote:
> > On Thu, Nov 1, 2018 at 8:37 PM Joel Stanley  wrote:
> > >
> > >   arch/powerpc/math-emu/fnmsub.c:46:2: error: invalid use of a cast in a
> > >   inline asm context requiring an l-value: remove the cast or build with
> > >   -fheinous-gnu-extensions

> ./arch/powerpc/include/asm/sfp-machine.h:283:27: note: expanded from
> macro 'sub_ddmmss'
>: "=r" ((USItype)(sh)),  \
>~~^~~

Eek, I can of think that add_ss(), sub_ddmmss(), and umul_ppmm()
should be rewritten from the form:

asm("..." : "=r" (USItype)(arg) : ...);

to the form:

USItype temp = (USItype) arg;
asm("..." : "=r" (temp) : ...);
arg = (typeof(arg)) temp;

Rather than the flag being added.
-- 
Thanks,
~Nick Desaulniers


Re: [PATCH v2] powerpc/32: Avoid unsupported flags with clang

2018-11-12 Thread Nick Desaulniers
On Sun, Nov 11, 2018 at 9:28 PM Joel Stanley  wrote:
>
> When building for ppc32 with clang these flags are unsupported:
>
>   -ffixed-r2 and -mmultiple
>
> llvm's lib/Target/PowerPC/PPCRegisterInfo.cpp marks r2 as reserved on
> when building for SVR4ABI and !ppc64:
>
>   // The SVR4 ABI reserves r2 and r13
>   if (Subtarget.isSVR4ABI()) {
> // We only reserve r2 if we need to use the TOC pointer. If we have no
> // explicit uses of the TOC pointer (meaning we're a leaf function with
> // no constant-pool loads, etc.) and we have no potential uses inside an
> // inline asm block, then we can treat r2 has an ordinary callee-saved
> // register.
> const PPCFunctionInfo *FuncInfo = MF.getInfo();
> if (!TM.isPPC64() || FuncInfo->usesTOCBasePtr() || MF.hasInlineAsm())
>   markSuperRegs(Reserved, PPC::R2);  // System-reserved register
> markSuperRegs(Reserved, PPC::R13); // Small Data Area pointer register
>   }
>
> This means we can safely omit -ffixed-r2 when building for 32-bit
> targets.
>
> The -mmultiple/-mno-multiple flags are not supported by clang, so
> platforms that might support multiple miss out on using multiple word
> instructions.
>
> We wrap these flags in cc-option so that when Clang gains support the
> kernel will be able use these flags.
>
> Clang 8 can then build a ppc44x_defconfig which boots in Qemu:
>
>   make CC=clang-8 ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu-  
> ppc44x_defconfig
>   ./scripts/config -e CONFIG_DEVTMPFS -d DEVTMPFS_MOUNT
>   make CC=clang-8 ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu-
>
>   qemu-system-ppc -M bamboo \
>-kernel arch/powerpc/boot/zImage \
>-dtb arch/powerpc/boot/dts/bamboo.dtb \
>-initrd ~/ppc32-440-rootfs.cpio \
>-nographic -serial stdio -monitor pty -append "console=ttyS0"
>
> Link: https://github.com/ClangBuiltLinux/linux/issues/261
> Link: https://bugs.llvm.org/show_bug.cgi?id=39556
> Link: https://bugs.llvm.org/show_bug.cgi?id=39555
> Signed-off-by: Joel Stanley 

Hopefully we can get these fixed up soon for you, thanks Joel!
Reviewed-by: Nick Desaulniers 

> ---
>  arch/powerpc/Makefile | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
> index 8a2ce14d68d0..4685671dfb4f 100644
> --- a/arch/powerpc/Makefile
> +++ b/arch/powerpc/Makefile
> @@ -152,7 +152,14 @@ endif
>  CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mcmodel=medium,$(call 
> cc-option,-mminimal-toc))
>  CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mno-pointers-to-nested-functions)
>
> -CFLAGS-$(CONFIG_PPC32) := -ffixed-r2 $(MULTIPLEWORD)
> +# Clang unconditionally reserves r2 on ppc32 and does not support the flag
> +# https://bugs.llvm.org/show_bug.cgi?id=39555
> +CFLAGS-$(CONFIG_PPC32) := $(call cc-option, -ffixed-r2)
> +
> +# Clang doesn't support -mmultiple / -mno-multiple
> +# https://bugs.llvm.org/show_bug.cgi?id=39556
> +CFLAGS-$(CONFIG_PPC32) += $(call cc-option, $(MULTIPLEWORD))
> +
>  CFLAGS-$(CONFIG_PPC32) += $(call cc-option,-mno-readonly-in-sdata)
>
>  ifdef CONFIG_PPC_BOOK3S_64
> --
> 2.19.1
>


-- 
Thanks,
~Nick Desaulniers


Re: [PATCH v2 1/2] Makefile: Export clang toolchain variables

2018-11-12 Thread Nick Desaulniers
On Sun, Nov 11, 2018 at 8:21 PM Joel Stanley  wrote:
>
> The powerpc makefile will use these in it's boot wrapper.
>
> Signed-off-by: Joel Stanley 
> ---
>  Makefile | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/Makefile b/Makefile
> index 09278330282d..840efe6eb54c 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -495,6 +495,7 @@ endif
>  ifneq ($(GCC_TOOLCHAIN),)
>  CLANG_FLAGS+= --gcc-toolchain=$(GCC_TOOLCHAIN)
>  endif
> +export CLANG_FLAGS
>  CLANG_FLAGS+= -no-integrated-as

Does this export CLANG_FLAGS before `-no-integrated-as` is set?
Either way, I think it would be clearer to export this after all the
relevant flags are set.

>  KBUILD_CFLAGS  += $(CLANG_FLAGS)
>  KBUILD_AFLAGS  += $(CLANG_FLAGS)
> --
> 2.19.1
>


-- 
Thanks,
~Nick Desaulniers


[PATCH] powerpc/numa: Perform full re-add of CPU for PRRN/VPHN topology update

2018-11-12 Thread Michael Bringmann
On pseries systems, performing changes to a partition's affinity
can result in altering the nodes a CPU is assigned to the
current system.  For example, some systems are subject to resource
balancing operations by the operator or control software.  In such
environments, system CPUs may be in node 1 and 3 at boot, and be
moved to nodes 2, 3, and 5, for better performance.

The current implementation attempts to recognize such changes within
the powerpc-specific version of arch_update_cpu_topology to modify a
range of system data structures directly.  However, some scheduler
data structures may be inaccessible, or the timing of a node change
may still lead to corruption or error in other modules (e.g. user
space) which do not receive notification of these changes.

This patch modifies the PRRN/VPHN topology update worker function to
recognize an affinity change for a CPU, and to perform a full DLPAR
remove and add of the CPU instead of dynamically changing its node
to resolve this issue.

[Based upon patch submission:
Subject: [PATCH] powerpc/pseries: Perform full re-add of CPU for topology 
update post-migration
From: Nathan Fontenot 
Date: Tue Oct 30 05:43:36 AEDT 2018
]

[Replace patch submission:
Subject: [PATCH] powerpc/topology: Update numa mask when cpu node mapping 
changes
From: Srikar Dronamraju 
DAte: Wed Oct 10 15:24:46 AEDT 2018
]

Signed-off-by: Michael Bringmann 
---
 arch/powerpc/include/asm/topology.h |6 -
 arch/powerpc/kernel/rtasd.c |1 
 arch/powerpc/mm/numa.c  |  184 +--
 3 files changed, 27 insertions(+), 164 deletions(-)

diff --git a/arch/powerpc/include/asm/topology.h 
b/arch/powerpc/include/asm/topology.h
index f85e2b0..9f85246 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -42,7 +42,6 @@ static inline int pcibus_to_node(struct pci_bus *bus)
 
 extern int sysfs_add_device_to_node(struct device *dev, int nid);
 extern void sysfs_remove_device_from_node(struct device *dev, int nid);
-extern int numa_update_cpu_topology(bool cpus_locked);
 
 static inline void update_numa_cpu_lookup_table(unsigned int cpu, int node)
 {
@@ -77,11 +76,6 @@ static inline void sysfs_remove_device_from_node(struct 
device *dev,
 {
 }
 
-static inline int numa_update_cpu_topology(bool cpus_locked)
-{
-   return 0;
-}
-
 static inline void update_numa_cpu_lookup_table(unsigned int cpu, int node) {}
 
 #endif /* CONFIG_NUMA */
diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
index 38cadae..c161d74 100644
--- a/arch/powerpc/kernel/rtasd.c
+++ b/arch/powerpc/kernel/rtasd.c
@@ -285,7 +285,6 @@ static void handle_prrn_event(s32 scope)
 * the RTAS event.
 */
pseries_devicetree_update(-scope);
-   numa_update_cpu_topology(false);
 }
 
 static void handle_rtas_event(const struct rtas_error_log *log)
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index be6216e..f79b65f 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1236,96 +1236,25 @@ int find_and_online_cpu_nid(int cpu)
return new_nid;
 }
 
-/*
- * Update the CPU maps and sysfs entries for a single CPU when its NUMA
- * characteristics change. This function doesn't perform any locking and is
- * only safe to call from stop_machine().
- */
-static int update_cpu_topology(void *data)
-{
-   struct topology_update_data *update;
-   unsigned long cpu;
-
-   if (!data)
-   return -EINVAL;
-
-   cpu = smp_processor_id();
-
-   for (update = data; update; update = update->next) {
-   int new_nid = update->new_nid;
-   if (cpu != update->cpu)
-   continue;
-
-   unmap_cpu_from_node(cpu);
-   map_cpu_to_node(cpu, new_nid);
-   set_cpu_numa_node(cpu, new_nid);
-   set_cpu_numa_mem(cpu, local_memory_node(new_nid));
-   vdso_getcpu_init();
-   }
-
-   return 0;
-}
-
-static int update_lookup_table(void *data)
-{
-   struct topology_update_data *update;
-
-   if (!data)
-   return -EINVAL;
-
-   /*
-* Upon topology update, the numa-cpu lookup table needs to be updated
-* for all threads in the core, including offline CPUs, to ensure that
-* future hotplug operations respect the cpu-to-node associativity
-* properly.
-*/
-   for (update = data; update; update = update->next) {
-   int nid, base, j;
-
-   nid = update->new_nid;
-   base = cpu_first_thread_sibling(update->cpu);
-
-   for (j = 0; j < threads_per_core; j++) {
-   update_numa_cpu_lookup_table(base + j, nid);
-   }
-   }
-
-   return 0;
-}
-
-/*
- * Update the node maps and sysfs entries for each cpu whose home node
- * has changed. Returns 1 when the topology has changed, and 0 otherwise.
- *
- * cpus_locked says whether we 

[PATCH v4 2/4] kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()

2018-11-12 Thread Douglas Anderson
When I had lockdep turned on and dropped into kgdb I got a nice splat
on my system.  Specifically it hit:
  DEBUG_LOCKS_WARN_ON(current->hardirq_context)

Specifically it looked like this:
  sysrq: SysRq : DEBUG
  [ cut here ]
  DEBUG_LOCKS_WARN_ON(current->hardirq_context)
  WARNING: CPU: 0 PID: 0 at .../kernel/locking/lockdep.c:2875 
lockdep_hardirqs_on+0xf0/0x160
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0 #27
  pstate: 604003c9 (nZCv DAIF +PAN -UAO)
  pc : lockdep_hardirqs_on+0xf0/0x160
  ...
  Call trace:
   lockdep_hardirqs_on+0xf0/0x160
   trace_hardirqs_on+0x188/0x1ac
   kgdb_roundup_cpus+0x14/0x3c
   kgdb_cpu_enter+0x53c/0x5cc
   kgdb_handle_exception+0x180/0x1d4
   kgdb_compiled_brk_fn+0x30/0x3c
   brk_handler+0x134/0x178
   do_debug_exception+0xfc/0x178
   el1_dbg+0x18/0x78
   kgdb_breakpoint+0x34/0x58
   sysrq_handle_dbg+0x54/0x5c
   __handle_sysrq+0x114/0x21c
   handle_sysrq+0x30/0x3c
   qcom_geni_serial_isr+0x2dc/0x30c
  ...
  ...
  irq event stamp: ...45
  hardirqs last  enabled at (...44): [...] __do_softirq+0xd8/0x4e4
  hardirqs last disabled at (...45): [...] el1_irq+0x74/0x130
  softirqs last  enabled at (...42): [...] _local_bh_enable+0x2c/0x34
  softirqs last disabled at (...43): [...] irq_exit+0xa8/0x100
  ---[ end trace adf21f830c46e638 ]---

Looking closely at it, it seems like a really bad idea to be calling
local_irq_enable() in kgdb_roundup_cpus().  If nothing else that seems
like it could violate spinlock semantics and cause a deadlock.

Instead, let's use a private csd alongside
smp_call_function_single_async() to round up the other CPUs.  Using
smp_call_function_single_async() doesn't require interrupts to be
enabled so we can remove the offending bit of code.

In order to avoid duplicating this across all the architectures that
use the default kgdb_roundup_cpus(), we'll add a "weak" implementation
to debug_core.c.

Looking at all the people who previously had copies of this code,
there were a few variants.  I've attempted to keep the variants
working like they used to.  Specifically:
* For arch/arc we passed NULL to kgdb_nmicallback() instead of
  get_irq_regs().
* For arch/mips there was a bit of extra code around
  kgdb_nmicallback()

NOTE: In this patch we will still get into trouble if we try to round
up a CPU that failed to round up before.  We'll try to round it up
again and potentially hang when we try to grab the csd lock.  That's
not new behavior but we'll still try to do better in a future patch.

Suggested-by: Daniel Thompson 
Signed-off-by: Douglas Anderson 
---

Changes in v4: None
Changes in v3:
- No separate init call.
- Don't round up the CPU that is doing the rounding up.
- Add "#ifdef CONFIG_SMP" to match the rest of the file.
- Updated desc saying we don't solve the "failed to roundup" case.
- Document the ignored parameter.

Changes in v2:
- Removing irq flags separated from fixing lockdep splat.
- Don't use smp_call_function (Daniel).

 arch/arc/kernel/kgdb.c | 10 ++
 arch/arm/kernel/kgdb.c | 12 
 arch/arm64/kernel/kgdb.c   | 12 
 arch/hexagon/kernel/kgdb.c | 27 ---
 arch/mips/kernel/kgdb.c|  9 +
 arch/powerpc/kernel/kgdb.c |  4 ++--
 arch/sh/kernel/kgdb.c  | 12 
 include/linux/kgdb.h   | 15 +--
 kernel/debug/debug_core.c  | 35 +++
 9 files changed, 53 insertions(+), 83 deletions(-)

diff --git a/arch/arc/kernel/kgdb.c b/arch/arc/kernel/kgdb.c
index 0932851028e0..68d9fe4b5aa7 100644
--- a/arch/arc/kernel/kgdb.c
+++ b/arch/arc/kernel/kgdb.c
@@ -192,18 +192,12 @@ void kgdb_arch_set_pc(struct pt_regs *regs, unsigned long 
ip)
instruction_pointer(regs) = ip;
 }
 
-static void kgdb_call_nmi_hook(void *ignored)
+void kgdb_call_nmi_hook(void *ignored)
 {
+   /* Default implementation passes get_irq_regs() but we don't */
kgdb_nmicallback(raw_smp_processor_id(), NULL);
 }
 
-void kgdb_roundup_cpus(void)
-{
-   local_irq_enable();
-   smp_call_function(kgdb_call_nmi_hook, NULL, 0);
-   local_irq_disable();
-}
-
 struct kgdb_arch arch_kgdb_ops = {
/* breakpoint instruction: TRAP_S 0x3 */
 #ifdef CONFIG_CPU_BIG_ENDIAN
diff --git a/arch/arm/kernel/kgdb.c b/arch/arm/kernel/kgdb.c
index f21077b077be..d9a69e941463 100644
--- a/arch/arm/kernel/kgdb.c
+++ b/arch/arm/kernel/kgdb.c
@@ -170,18 +170,6 @@ static struct undef_hook kgdb_compiled_brkpt_hook = {
.fn = kgdb_compiled_brk_fn
 };
 
-static void kgdb_call_nmi_hook(void *ignored)
-{
-   kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
-}
-
-void kgdb_roundup_cpus(void)
-{
-   local_irq_enable();
-   smp_call_function(kgdb_call_nmi_hook, NULL, 0);
-   local_irq_disable();
-}
-
 static int __kgdb_notify(struct die_args *args, unsigned long cmd)
 {
struct pt_regs *regs = args->regs;
diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
index 

[PATCH v4 1/4] kgdb: Remove irq flags from roundup

2018-11-12 Thread Douglas Anderson
The function kgdb_roundup_cpus() was passed a parameter that was
documented as:

> the flags that will be used when restoring the interrupts. There is
> local_irq_save() call before kgdb_roundup_cpus().

Nobody used those flags.  Anyone who wanted to temporarily turn on
interrupts just did local_irq_enable() and local_irq_disable() without
looking at them.  So we can definitely remove the flags.

Signed-off-by: Douglas Anderson 
---

Changes in v4: None
Changes in v3: None
Changes in v2:
- Removing irq flags separated from fixing lockdep splat.

 arch/arc/kernel/kgdb.c | 2 +-
 arch/arm/kernel/kgdb.c | 2 +-
 arch/arm64/kernel/kgdb.c   | 2 +-
 arch/hexagon/kernel/kgdb.c | 9 ++---
 arch/mips/kernel/kgdb.c| 2 +-
 arch/powerpc/kernel/kgdb.c | 2 +-
 arch/sh/kernel/kgdb.c  | 2 +-
 arch/sparc/kernel/smp_64.c | 2 +-
 arch/x86/kernel/kgdb.c | 9 ++---
 include/linux/kgdb.h   | 9 ++---
 kernel/debug/debug_core.c  | 2 +-
 11 files changed, 14 insertions(+), 29 deletions(-)

diff --git a/arch/arc/kernel/kgdb.c b/arch/arc/kernel/kgdb.c
index 9a3c34af2ae8..0932851028e0 100644
--- a/arch/arc/kernel/kgdb.c
+++ b/arch/arc/kernel/kgdb.c
@@ -197,7 +197,7 @@ static void kgdb_call_nmi_hook(void *ignored)
kgdb_nmicallback(raw_smp_processor_id(), NULL);
 }
 
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
 {
local_irq_enable();
smp_call_function(kgdb_call_nmi_hook, NULL, 0);
diff --git a/arch/arm/kernel/kgdb.c b/arch/arm/kernel/kgdb.c
index caa0dbe3dc61..f21077b077be 100644
--- a/arch/arm/kernel/kgdb.c
+++ b/arch/arm/kernel/kgdb.c
@@ -175,7 +175,7 @@ static void kgdb_call_nmi_hook(void *ignored)
kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
 }
 
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
 {
local_irq_enable();
smp_call_function(kgdb_call_nmi_hook, NULL, 0);
diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
index a20de58061a8..12c339ff6e75 100644
--- a/arch/arm64/kernel/kgdb.c
+++ b/arch/arm64/kernel/kgdb.c
@@ -289,7 +289,7 @@ static void kgdb_call_nmi_hook(void *ignored)
kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
 }
 
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
 {
local_irq_enable();
smp_call_function(kgdb_call_nmi_hook, NULL, 0);
diff --git a/arch/hexagon/kernel/kgdb.c b/arch/hexagon/kernel/kgdb.c
index 16c24b22d0b2..012e0e230ac2 100644
--- a/arch/hexagon/kernel/kgdb.c
+++ b/arch/hexagon/kernel/kgdb.c
@@ -119,17 +119,12 @@ void kgdb_arch_set_pc(struct pt_regs *regs, unsigned long 
pc)
 
 /**
  * kgdb_roundup_cpus - Get other CPUs into a holding pattern
- * @flags: Current IRQ state
  *
  * On SMP systems, we need to get the attention of the other CPUs
  * and get them be in a known state.  This should do what is needed
  * to get the other CPUs to call kgdb_wait(). Note that on some arches,
  * the NMI approach is not used for rounding up all the CPUs. For example,
- * in case of MIPS, smp_call_function() is used to roundup CPUs. In
- * this case, we have to make sure that interrupts are enabled before
- * calling smp_call_function(). The argument to this function is
- * the flags that will be used when restoring the interrupts. There is
- * local_irq_save() call before kgdb_roundup_cpus().
+ * in case of MIPS, smp_call_function() is used to roundup CPUs.
  *
  * On non-SMP systems, this is not called.
  */
@@ -139,7 +134,7 @@ static void hexagon_kgdb_nmi_hook(void *ignored)
kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
 }
 
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
 {
local_irq_enable();
smp_call_function(hexagon_kgdb_nmi_hook, NULL, 0);
diff --git a/arch/mips/kernel/kgdb.c b/arch/mips/kernel/kgdb.c
index eb6c0d582626..2b05effc17b4 100644
--- a/arch/mips/kernel/kgdb.c
+++ b/arch/mips/kernel/kgdb.c
@@ -219,7 +219,7 @@ static void kgdb_call_nmi_hook(void *ignored)
set_fs(old_fs);
 }
 
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
 {
local_irq_enable();
smp_call_function(kgdb_call_nmi_hook, NULL, 0);
diff --git a/arch/powerpc/kernel/kgdb.c b/arch/powerpc/kernel/kgdb.c
index 59c578f865aa..b0e804844be0 100644
--- a/arch/powerpc/kernel/kgdb.c
+++ b/arch/powerpc/kernel/kgdb.c
@@ -124,7 +124,7 @@ static int kgdb_call_nmi_hook(struct pt_regs *regs)
 }
 
 #ifdef CONFIG_SMP
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
 {
smp_send_debugger_break();
 }
diff --git a/arch/sh/kernel/kgdb.c b/arch/sh/kernel/kgdb.c
index 4f04c6638a4d..cc57630f6bf2 100644
--- a/arch/sh/kernel/kgdb.c
+++ b/arch/sh/kernel/kgdb.c
@@ -319,7 +319,7 @@ static void kgdb_call_nmi_hook(void *ignored)
kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
 }
 
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
 {
local_irq_enable();
  

[PATCH v4 0/4] kgdb: Fix kgdb_roundup_cpus()

2018-11-12 Thread Douglas Anderson
This series was originally part of the series ("serial: Finish kgdb on
qcom_geni; fix many lockdep splats w/ kgdb") but it made sense to
split it up.

It's believed that dropping into kgdb should be more robust once these
patches are applied.

Changes in v4:
- Removed smp_mb() calls.
- Also clear out .debuggerinfo.
- Also clear out .debuggerinfo and .task for the master.
- Remove clearing out in kdb_stub for offline CPUs; it's now redundant.

Changes in v3:
- No separate init call.
- Don't round up the CPU that is doing the rounding up.
- Add "#ifdef CONFIG_SMP" to match the rest of the file.
- Updated desc saying we don't solve the "failed to roundup" case.
- Document the ignored parameter.
- Don't round up a CPU that failed rounding up before new for v3.
- Don't back trace on a cpu that didn't round up new for v3.

Changes in v2:
- Removing irq flags separated from fixing lockdep splat.
- Don't use smp_call_function (Daniel).

Douglas Anderson (4):
  kgdb: Remove irq flags from roundup
  kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()
  kgdb: Don't round up a CPU that failed rounding up before
  kdb: Don't back trace on a cpu that didn't round up

 arch/arc/kernel/kgdb.c  | 10 ++
 arch/arm/kernel/kgdb.c  | 12 ---
 arch/arm64/kernel/kgdb.c| 12 ---
 arch/hexagon/kernel/kgdb.c  | 32 ---
 arch/mips/kernel/kgdb.c |  9 +-
 arch/powerpc/kernel/kgdb.c  |  6 ++--
 arch/sh/kernel/kgdb.c   | 12 ---
 arch/sparc/kernel/smp_64.c  |  2 +-
 arch/x86/kernel/kgdb.c  |  9 ++
 include/linux/kgdb.h| 22 -
 kernel/debug/debug_core.c   | 56 -
 kernel/debug/debug_core.h   |  1 +
 kernel/debug/kdb/kdb_bt.c   | 11 ++-
 kernel/debug/kdb/kdb_debugger.c |  7 -
 14 files changed, 89 insertions(+), 112 deletions(-)

-- 
2.19.1.930.g4563a0d9d0-goog



Re: [PATCH] Documentation: fix spelling mistake, EACCESS -> EACCES

2018-11-12 Thread Michael Ellerman
Jeremy Kerr  writes:
> Hi Jon,
>
>>> Signed-off-by: Colin Ian King 
>>> ---
>>>   Documentation/filesystems/spufs.txt | 2 +-
>>>   Documentation/gpu/drm-uapi.rst  | 4 ++--
>>>   2 files changed, 3 insertions(+), 3 deletions(-)
>> 
>> Applied, thanks.
>> 
>> This is the first patch to spufs.txt since 2006...I wonder if that stuff
>> is being used by anybody...
>
> Anyone who is using the vector processors on the Cell BE processors will
> be using it. However, that hardware is becoming pretty rare now.
>
> We'll want to remove the spufs bits if/when we drop support for the cell
> platforms (IBM QSxx, PS3, celleb). mpe: any ides on that? Do you have a 
> policy for dropping platform support?

I don't have a policy. We discussed it a bit at maintainer summit, that
basically boiled down to stuff should get removed when it is not used
much and/or is causing undue maintenance burden - both of which are
fairly arbitrary criteria.

My feeling is spufs is not causing anyone much work, it does get hit by
some VFS updates but it's just one of many many filesystems, so the
incremental overhead is pretty small I think - though VFS people may
disagree :)

I still have a working IBM QS22, and Geoff is still maintaining PS3,
celleb is gone.

Basically we're keeping it around in case people are still using it on
their PS3s. I don't know how we gauge how many people that is without
removing support and seeing who is annoyed. But even if we did that a
lot of folks will probably not notice for months to years, and when they
do notice they'll just bin their PS3s rather than telling us.

But maybe Geoff has a better feel for how many people (other than him)
are still running upstream on PS3s.

cheers


Re: [PATCH] powerpc/math-emu: Fix building with clang

2018-11-12 Thread Segher Boessenkool
On Mon, Nov 12, 2018 at 10:03:34PM +1030, Joel Stanley wrote:
> arch/powerpc/math-emu/fnmsub.c:46:2: error: invalid use of a cast in a
> inline asm context requiring an l-value: remove
>   the cast or build with -fheinous-gnu-extensions

...

> ./arch/powerpc/include/asm/sfp-machine.h:283:27: note: expanded from
> macro 'sub_ddmmss'
>: "=r" ((USItype)(sh)),  \
>~~^~~

This was fixed in GCC over 16 years ago ( https://gcc.gnu.org/r56600 ),
and in GMP (where it comes from) presumably before that.

There are many projects around which still have old GMP snippets in
them.  Like, the Linux kernel for powerpc, or GCC for x86!


Segher


Re: [PATCH] powerpc: Add KVM guest defconfig

2018-11-12 Thread Michael Ellerman
Satheesh Rajendran  writes:

> On Thu, Nov 08, 2018 at 04:23:07PM -0200, Breno Leitao wrote:
>> hi Satheesh,
>> 
>> On 11/08/2018 03:08 AM, sathn...@linux.vnet.ibm.com wrote:
>> > --- /dev/null
>> > +++ b/arch/powerpc/configs/guest.config
>> > @@ -0,0 +1,14 @@
>> > +CONFIG_VIRTIO_BLK=y
>> > +CONFIG_VIRTIO_BLK_SCSI=y
>> > +CONFIG_SCSI_VIRTIO=y
>> > +CONFIG_VIRTIO_NET=y
>> > +CONFIG_NET_FAILOVER=y
>> > +CONFIG_VIRTIO_CONSOLE=y
>> > +CONFIG_VIRTIO=y
>> > +CONFIG_VIRTIO_PCI=y
>> > +CONFIG_KVM_GUEST=y
>> > +CONFIG_EPAPR_PARAVIRT=y
>> > +CONFIG_XFS_FS=y
>> 
>> Why a guest kernel needs to have XFS integrated in the core image? I am
>> wondering if it is a requirement from another CONFIG_ option.
>
> Idea is to have a working config which would boot guest without initramfs,
> other FS(like EXT4) is already integrated in the core image, 
> thought this would be helpful for distributions, which default XFS as root 
> disk.

Maybe we should switch XFS_FS to Y in ppc64_defconfig ?

cheers


Re: pkeys: Reserve PKEY_DISABLE_READ

2018-11-12 Thread Florian Weimer
* Ram Pai:

> On Thu, Nov 08, 2018 at 09:23:35PM +0100, Florian Weimer wrote:
>> * Ram Pai:
>> 
>> > Florian,
>> >
>> >I can. But I am struggling to understand the requirement. Why is
>> >this needed?  Are we proposing a enhancement to the sys_pkey_alloc(),
>> >to be able to allocate keys that are initialied to disable-read
>> >only?
>> 
>> Yes, I think that would be a natural consequence.
>> 
>> However, my immediate need comes from the fact that the AMR register can
>> contain a flag combination that is not possible to represent with the
>> existing PKEY_DISABLE_WRITE and PKEY_DISABLE_ACCESS flags.  User code
>> could write to AMR directly, so I cannot rule out that certain flag
>> combinations exist there.
>> 
>> So I came up with this:
>> 
>> int
>> pkey_get (int key)
>> {
>>   if (key < 0 || key > PKEY_MAX)
>> {
>>   __set_errno (EINVAL);
>>   return -1;
>> }
>>   unsigned int index = pkey_index (key);
>>   unsigned long int amr = pkey_read ();
>>   unsigned int bits = (amr >> index) & 3;
>> 
>>   /* Translate from AMR values.  PKEY_AMR_READ standing alone is not
>>  currently representable.  */
>>   if (bits & PKEY_AMR_READ)
>
> this should be
>if (bits & (PKEY_AMR_READ|PKEY_AMR_WRITE))

This would return zero for PKEY_AMR_READ alone.

>> return PKEY_DISABLE_ACCESS;
>
>
>>   else if (bits == PKEY_AMR_WRITE)
>> return PKEY_DISABLE_WRITE;
>>   return 0;
>> }

It's hard to tell whether PKEY_DISABLE_ACCESS is better in this case.
Which is why I want PKEY_DISABLE_READ.

>> And this is not ideal.  I would prefer something like this instead:
>> 
>>   switch (bits)
>> {
>>   case PKEY_AMR_READ | PKEY_AMR_WRITE:
>> return PKEY_DISABLE_ACCESS;
>>   case PKEY_AMR_READ:
>> return PKEY_DISABLE_READ;
>>   case PKEY_AMR_WRITE:
>> return PKEY_DISABLE_WRITE;
>>   case 0:
>> return 0;
>> }
>
> yes.
>  and on x86 it will be something like:
>switch (bits)
>  {
>case PKEY_PKRU_ACCESS :
>  return PKEY_DISABLE_ACCESS;
>case PKEY_AMR_WRITE:
>  return PKEY_DISABLE_WRITE;
>case 0:
>  return 0;
>  }

x86 returns the PKRU bits directly, including the nonsensical case
(PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE).

> But for this to work, why do you need to enhance the sys_pkey_alloc()
> interface?  Not that I am against it. Trying to understand if the
> enhancement is really needed.

sys_pkey_alloc performs an implicit pkey_set for the newly allocated key
(that is, it updates the PKRU/AMR register).  It makes sense to match
the behavior of the userspace implementation.

Thanks,
Florian


Re: [RFC] mm: Replace all open encodings for NUMA_NO_NODE

2018-11-12 Thread Anshuman Khandual



On 11/12/2018 02:13 PM, Hans Verkuil wrote:
> On 11/12/2018 03:41 AM, Anshuman Khandual wrote:
>> At present there are multiple places where invalid node number is encoded
>> as -1. Even though implicitly understood it is always better to have macros
>> in there. Replace these open encodings for an invalid node number with the
>> global macro NUMA_NO_NODE. This helps remove NUMA related assumptions like
>> 'invalid node' from various places redirecting them to a common definition.
>>
>> Signed-off-by: Anshuman Khandual 
>> ---
>> Build tested this with multiple cross compiler options like alpha, sparc,
>> arm64, x86, powerpc64le etc with their default config which might not have
>> compiled tested all driver related changes. I will appreciate folks giving
>> this a test in their respective build environment.
>>
>> All these places for replacement were found by running the following grep
>> patterns on the entire kernel code. Please let me know if this might have
>> missed some instances. This might also have replaced some false positives.
>> I will appreciate suggestions, inputs and review.
> The 'node' in the drivers/media and the drivers/video sources has nothing to
> do with numa. It's an index for a framebuffer instead (i.e. the X in 
> /dev/fbX).

Thanks for the input. Will drop the changes there.


Re: [PATCH] powerpc/math-emu: Fix building with clang

2018-11-12 Thread Joel Stanley
On Sat, 3 Nov 2018 at 03:24, Nick Desaulniers  wrote:
>
> On Thu, Nov 1, 2018 at 8:37 PM Joel Stanley  wrote:
> >
> >   make CC=clang-8 ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu-  
> > ppc44x_defconfig
> >   make CC=clang-8 ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu-
> >
> >   ...
> >
> >   arch/powerpc/math-emu/fnmsub.c:46:2: error: invalid use of a cast in a
> >   inline asm context requiring an l-value: remove the cast or build with
> >   -fheinous-gnu-extensions
> >   FP_ADD_D(R, T, B);
> >   ^
> >   ./include/math-emu/double.h:110:27: note: expanded from macro 'FP_ADD_D'
> >   #define FP_ADD_D(R,X,Y) _FP_ADD(D,2,R,X,Y)
> >   ^~
> >   ./include/math-emu/op-common.h:367:34: note: expanded from macro '_FP_ADD'
> >   #define _FP_ADD(fs, wc, R, X, Y) _FP_ADD_INTERNAL(fs, wc, R, X, Y, '+')
> >^~
> >   ./include/math-emu/op-common.h:264:4: note: expanded from macro 
> > '_FP_ADD_INTERNAL'
> > _FP_FRAC_ADD_##wc(R, X, Y); 
> >\
> > ^~
> >   note: (skipping 1 expansions in backtrace; use -fmacro-backtrace-limit=0 
> > to see all)
>
> ^^^ Hi Joel, would you mind recompiling with
> `-fmacro-backtrace-limit=0` hacked in and including the full
> backtrace?  I'm curious if there's a more appropriate fix, but can't
> tell where the inline asm is that clang is complaining about.

Sure. Here's the full backtrace:

arch/powerpc/math-emu/fnmsub.c:46:2: error: invalid use of a cast in a
inline asm context requiring an l-value: remove
  the cast or build with -fheinous-gnu-extensions
FP_ADD_D(R, T, B);
^
./include/math-emu/double.h:110:27: note: expanded from macro 'FP_ADD_D'
#define FP_ADD_D(R,X,Y) _FP_ADD(D,2,R,X,Y)
^~
./include/math-emu/op-common.h:367:34: note: expanded from macro '_FP_ADD'
#define _FP_ADD(fs, wc, R, X, Y) _FP_ADD_INTERNAL(fs, wc, R, X, Y, '+')
 ^~
./include/math-emu/op-common.h:274:4: note: expanded from macro
'_FP_ADD_INTERNAL'
  _FP_FRAC_SUB_##wc(R, X, Y);\
  ^~
:268:1: note: expanded from here
_FP_FRAC_SUB_2
^
./include/math-emu/op-2.h:97:19: note: expanded from macro '_FP_FRAC_SUB_2'
  __FP_FRAC_SUB_2(R##_f1, R##_f0, X##_f1, X##_f0, Y##_f1, Y##_f0)
  ^~~
:269:1: note: expanded from here
R_f1
^
./arch/powerpc/include/asm/sfp-machine.h:283:27: note: expanded from
macro 'sub_ddmmss'
   : "=r" ((USItype)(sh)),  \
   ~~^~~


Re: [PATCH 2/2] x86, powerpc: remove -funit-at-a-time compiler option entirely

2018-11-12 Thread Michael Ellerman
Masahiro Yamada  writes:

> GCC 4.6 manual says:
>
> -funit-at-a-time
>   This option is left for compatibility reasons. -funit-at-a-time has
>   no effect, while -fno-unit-at-a-time implies -fno-toplevel-reorder
>   and -fno-section-anchors.
>   Enabled by default.
>
> Signed-off-by: Masahiro Yamada 
> ---
>
>  arch/powerpc/Makefile | 4 
>  arch/x86/Makefile | 4 
>  arch/x86/Makefile.um  | 5 -
>  3 files changed, 13 deletions(-)
>
> diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
> index 8a2ce14..854199c 100644
> --- a/arch/powerpc/Makefile
> +++ b/arch/powerpc/Makefile
> @@ -228,10 +228,6 @@ KBUILD_CFLAGS += $(call cc-option,-mno-vsx)
>  KBUILD_CFLAGS += $(call cc-option,-mno-spe)
>  KBUILD_CFLAGS += $(call cc-option,-mspe=no)
>  
> -# Enable unit-at-a-time mode when possible. It shrinks the
> -# kernel considerably.
> -KBUILD_CFLAGS += $(call cc-option,-funit-at-a-time)
> -

Thanks for cleaning it up.

Acked-by: Michael Ellerman 

cheers


Re: [PATCH v2 2/2] powerpc/boot: Set target when cross-compiling for clang

2018-11-12 Thread Michael Ellerman
Joel Stanley  writes:

> Clang needs to be told which target it is building for when cross
> compiling.
>
> Link: https://github.com/ClangBuiltLinux/linux/issues/259
> Signed-off-by: Joel Stanley 
> ---
>  arch/powerpc/boot/Makefile | 5 +
>  1 file changed, 5 insertions(+)

Acked-by: Michael Ellerman 

cheers

> diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
> index 39354365f54a..111f97b1ccec 100644
> --- a/arch/powerpc/boot/Makefile
> +++ b/arch/powerpc/boot/Makefile
> @@ -55,6 +55,11 @@ BOOTAFLAGS := -D__ASSEMBLY__ $(BOOTCFLAGS) -traditional 
> -nostdinc
>  
>  BOOTARFLAGS  := -cr$(KBUILD_ARFLAGS)
>  
> +ifdef CONFIG_CC_IS_CLANG
> +BOOTCFLAGS += $(CLANG_FLAGS)
> +BOOTAFLAGS += $(CLANG_FLAGS)
> +endif
> +
>  ifdef CONFIG_DEBUG_INFO
>  BOOTCFLAGS   += -g
>  endif
> -- 
> 2.19.1


Re: pkeys: Reserve PKEY_DISABLE_READ

2018-11-12 Thread Florian Weimer
* Ram Pai:

> On Thu, Nov 08, 2018 at 01:05:09PM +0100, Florian Weimer wrote:
>> Would it be possible to reserve a bit for PKEY_DISABLE_READ?
>> 
>> I think the POWER implementation can disable read access at the hardware
>> level, but not write access, and that cannot be expressed with the
>> current PKEY_DISABLE_ACCESS and PKEY_DISABLE_WRITE bits.
>
> POWER hardware can disable-read and can **also disable-write**
> at the hardware level. It can disable-execute aswell at the
> hardware level.   For example if the key bits for a given key in the AMR
> register is  
>   0b01  it is read-disable
>   0b10  it is write-disable
>
> To support access-disable, we make the key value 0b11.
>
> So in case if you want to know if the key is read-disable 'bitwise-and' it
> against 0x1.  i.e  (x & 0x1)

Not sure if we covered that alreay, but my problem is that I cannot
translate a 0b01 mask to a PKEY_DISABLE_* flag combination with the
current flags.  0b10 and 0b11 are fine.

POWER also loses the distinction between PKEY_DISABLE_ACCESS and
PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE, but that's fine.  This breaks
the current glibc test case, but I have a patch for that.  Arguably, the
test is wrong or at least overly strict in what it accepts.

Thanks,
Florian


Re: [RFC PATCH v1 3/4] kvmppc: H_SVM_INIT_START and H_SVM_INIT_DONE hcalls

2018-11-12 Thread Bharata B Rao
On Thu, Nov 01, 2018 at 09:49:26PM +1100, Balbir Singh wrote:
> On Mon, Oct 22, 2018 at 10:48:36AM +0530, Bharata B Rao wrote:
> > H_SVM_INIT_START: Initiate securing a VM
> > H_SVM_INIT_DONE: Conclude securing a VM
> > 
> > During early guest init, these hcalls will be issued by UV.
> > As part of these hcalls, [un]register memslots with UV.
> > 
> > Signed-off-by: Bharata B Rao 
> > ---
> >  arch/powerpc/include/asm/hvcall.h|  4 ++-
> >  arch/powerpc/include/asm/kvm_host.h  |  1 +
> >  arch/powerpc/include/asm/ucall-api.h |  6 
> >  arch/powerpc/kvm/book3s_hv.c | 54 
> >  4 files changed, 64 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/powerpc/include/asm/hvcall.h 
> > b/arch/powerpc/include/asm/hvcall.h
> > index 89e6b70c1857..6091276fef07 100644
> > --- a/arch/powerpc/include/asm/hvcall.h
> > +++ b/arch/powerpc/include/asm/hvcall.h
> > @@ -300,7 +300,9 @@
> >  #define H_INT_RESET 0x3D0
> >  #define H_SVM_PAGE_IN  0x3D4
> >  #define H_SVM_PAGE_OUT 0x3D8
> > -#define MAX_HCALL_OPCODE   H_SVM_PAGE_OUT
> > +#define H_SVM_INIT_START   0x3DC
> > +#define H_SVM_INIT_DONE0x3E0
> > +#define MAX_HCALL_OPCODE   H_SVM_INIT_DONE
> >  
> >  /* H_VIOCTL functions */
> >  #define H_GET_VIOA_DUMP_SIZE   0x01
> > diff --git a/arch/powerpc/include/asm/kvm_host.h 
> > b/arch/powerpc/include/asm/kvm_host.h
> > index 194e6e0ff239..267f8c568bc3 100644
> > --- a/arch/powerpc/include/asm/kvm_host.h
> > +++ b/arch/powerpc/include/asm/kvm_host.h
> > @@ -292,6 +292,7 @@ struct kvm_arch {
> > struct dentry *debugfs_dir;
> > struct dentry *htab_dentry;
> > struct kvm_resize_hpt *resize_hpt; /* protected by kvm->lock */
> > +   bool svm_init_start; /* Indicates H_SVM_INIT_START has been called */
> >  #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
> >  #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
> > struct mutex hpt_mutex;
> > diff --git a/arch/powerpc/include/asm/ucall-api.h 
> > b/arch/powerpc/include/asm/ucall-api.h
> > index 2c12f514f8ab..9ddfcf541211 100644
> > --- a/arch/powerpc/include/asm/ucall-api.h
> > +++ b/arch/powerpc/include/asm/ucall-api.h
> > @@ -17,4 +17,10 @@ static inline int uv_page_out(u64 lpid, u64 dw0, u64 
> > dw1, u64 dw2, u64 dw3)
> > return U_SUCCESS;
> >  }
> >  
> > +static inline int uv_register_mem_slot(u64 lpid, u64 dw0, u64 dw1, u64 dw2,
> > +  u64 dw3)
> > +{
> > +   return 0;
> > +}
> > +
> >  #endif /* _ASM_POWERPC_UCALL_API_H */
> > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> > index 05084eb8aadd..47f366f634fd 100644
> > --- a/arch/powerpc/kvm/book3s_hv.c
> > +++ b/arch/powerpc/kvm/book3s_hv.c
> > @@ -819,6 +819,50 @@ static int kvmppc_get_yield_count(struct kvm_vcpu 
> > *vcpu)
> > return yield_count;
> >  }
> >  
> > +#ifdef CONFIG_PPC_SVM
> > +#include 
> > +/*
> > + * TODO: Check if memslots related calls here need to be called
> > + * under any lock.
> > + */
> > +static unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
> > +{
> > +   struct kvm_memslots *slots;
> > +   struct kvm_memory_slot *memslot;
> > +   int ret;
> > +
> > +   slots = kvm_memslots(kvm);
> > +   kvm_for_each_memslot(memslot, slots) {
> > +   ret = uv_register_mem_slot(kvm->arch.lpid,
> > +  memslot->base_gfn << PAGE_SHIFT,
> > +  memslot->npages * PAGE_SIZE,
> > +  0, memslot->id);
> 
> For every memslot their is a corresponding registration in the ultravisor?
> Is there a corresponding teardown?

Yes, uv_unregister_mem_slot(), called during memory unplug time.

Regards,
Bharata.



Re: [RFC PATCH v1 2/4] kvmppc: Add support for shared pages in HMM driver

2018-11-12 Thread Bharata B Rao
On Thu, Nov 01, 2018 at 09:45:52PM +1100, Balbir Singh wrote:
> On Mon, Oct 22, 2018 at 10:48:35AM +0530, Bharata B Rao wrote:
> > A secure guest will share some of its pages with hypervisor (Eg. virtio
> > bounce buffers etc). Support shared pages in HMM driver.
> > 
> > Signed-off-by: Bharata B Rao 
> > ---
> >  arch/powerpc/kvm/book3s_hv_hmm.c | 69 ++--
> >  1 file changed, 65 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/powerpc/kvm/book3s_hv_hmm.c 
> > b/arch/powerpc/kvm/book3s_hv_hmm.c
> > index a2ee3163a312..09b8e19b7605 100644
> > --- a/arch/powerpc/kvm/book3s_hv_hmm.c
> > +++ b/arch/powerpc/kvm/book3s_hv_hmm.c
> > @@ -50,6 +50,7 @@ struct kvmppc_hmm_page_pvt {
> > struct hlist_head *hmm_hash;
> > unsigned int lpid;
> > unsigned long gpa;
> > +   bool skip_page_out;
> >  };
> >  
> >  struct kvmppc_hmm_migrate_args {
> > @@ -278,6 +279,65 @@ static unsigned long kvmppc_gpa_to_hva(struct kvm 
> > *kvm, unsigned long gpa,
> > return hva;
> >  }
> >  
> > +/*
> > + * Shares the page with HV, thus making it a normal page.
> > + *
> > + * - If the page is already secure, then provision a new page and share
> > + * - If the page is a normal page, share the existing page
> > + *
> > + * In the former case, uses the HMM fault handler to release the HMM page.
> > + */
> > +static unsigned long
> > +kvmppc_share_page(struct kvm *kvm, unsigned long gpa,
> > + unsigned long addr, unsigned long page_shift)
> > +{
> > +
> 
> So this is a special flag passed via the hypercall to say
> this page can be skipped from page_out from secure memory?
> Who has the master copy of the page at this point?
> 
> In which case the question is
> 
> Why did we get a fault on the page which resulted in the
> fault migration ops being called?
> What category of pages are considered shared?

When UV/guest asks for sharing a page, there can be two cases:

- If the page is already secure, then provision a new page and share
- If the page is a normal page, share the existing page

In the former case, we touch the page via get_user_pages() and re-use the
HMM fault handler to release the HMM page. We use skip_page_out to mark
that this page is meant to be released w/o doing a page-out which otherwise
would be done if HV touches a secure page.

When a page is shared, both HV and UV have mappings to the same physical
page that resides in the non-secure memory.

Regards,
Bharata.



Re: [RFC PATCH v1 1/4] kvmppc: HMM backend driver to manage pages of secure guest

2018-11-12 Thread Bharata B Rao
On Thu, Nov 01, 2018 at 05:43:39PM +1100, Balbir Singh wrote:
> On Mon, Oct 22, 2018 at 10:48:34AM +0530, Bharata B Rao wrote:
> > HMM driver for KVM PPC to manage page transitions of
> > secure guest via H_SVM_PAGE_IN and H_SVM_PAGE_OUT hcalls.
> > 
> > H_SVM_PAGE_IN: Move the content of a normal page to secure page
> > H_SVM_PAGE_OUT: Move the content of a secure page to normal page
> > 
> > Signed-off-by: Bharata B Rao 
> > ---
> >  arch/powerpc/include/asm/hvcall.h|   7 +-
> >  arch/powerpc/include/asm/kvm_host.h  |  15 +
> >  arch/powerpc/include/asm/kvm_ppc.h   |  28 ++
> >  arch/powerpc/include/asm/ucall-api.h |  20 ++
> >  arch/powerpc/kvm/Makefile|   3 +
> >  arch/powerpc/kvm/book3s_hv.c |  38 ++
> >  arch/powerpc/kvm/book3s_hv_hmm.c | 514 +++
> >  7 files changed, 624 insertions(+), 1 deletion(-)
> >  create mode 100644 arch/powerpc/include/asm/ucall-api.h
> >  create mode 100644 arch/powerpc/kvm/book3s_hv_hmm.c
> > 
> > diff --git a/arch/powerpc/include/asm/hvcall.h 
> > b/arch/powerpc/include/asm/hvcall.h
> > index a0b17f9f1ea4..89e6b70c1857 100644
> > --- a/arch/powerpc/include/asm/hvcall.h
> > +++ b/arch/powerpc/include/asm/hvcall.h
> > @@ -158,6 +158,9 @@
> >  /* Each control block has to be on a 4K boundary */
> >  #define H_CB_ALIGNMENT  4096
> >  
> > +/* Flags for H_SVM_PAGE_IN */
> > +#define H_PAGE_IN_SHARED   0x1
> > +
> >  /* pSeries hypervisor opcodes */
> >  #define H_REMOVE   0x04
> >  #define H_ENTER0x08
> > @@ -295,7 +298,9 @@
> >  #define H_INT_ESB   0x3C8
> >  #define H_INT_SYNC  0x3CC
> >  #define H_INT_RESET 0x3D0
> > -#define MAX_HCALL_OPCODE   H_INT_RESET
> > +#define H_SVM_PAGE_IN  0x3D4
> > +#define H_SVM_PAGE_OUT 0x3D8
> > +#define MAX_HCALL_OPCODE   H_SVM_PAGE_OUT
> >  
> >  /* H_VIOCTL functions */
> >  #define H_GET_VIOA_DUMP_SIZE   0x01
> > diff --git a/arch/powerpc/include/asm/kvm_host.h 
> > b/arch/powerpc/include/asm/kvm_host.h
> > index 906bcbdfd2a1..194e6e0ff239 100644
> > --- a/arch/powerpc/include/asm/kvm_host.h
> > +++ b/arch/powerpc/include/asm/kvm_host.h
> > @@ -310,6 +310,9 @@ struct kvm_arch {
> > struct kvmppc_passthru_irqmap *pimap;
> >  #endif
> > struct kvmppc_ops *kvm_ops;
> > +#ifdef CONFIG_PPC_SVM
> > +   struct hlist_head *hmm_hash;
> > +#endif
> >  #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> > /* This array can grow quite large, keep it at the end */
> > struct kvmppc_vcore *vcores[KVM_MAX_VCORES];
> > @@ -830,4 +833,16 @@ static inline void kvm_arch_vcpu_blocking(struct 
> > kvm_vcpu *vcpu) {}
> >  static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
> >  static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
> >  
> > +#ifdef CONFIG_PPC_SVM
> > +struct kvmppc_hmm_device {
> > +   struct hmm_device *device;
> > +   struct hmm_devmem *devmem;
> > +   unsigned long *pfn_bitmap;
> > +};
> > +
> > +extern int kvmppc_hmm_init(void);
> > +extern void kvmppc_hmm_free(void);
> > +extern int kvmppc_hmm_hash_create(struct kvm *kvm);
> > +extern void kvmppc_hmm_hash_destroy(struct kvm *kvm);
> > +#endif
> >  #endif /* __POWERPC_KVM_HOST_H__ */
> > diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
> > b/arch/powerpc/include/asm/kvm_ppc.h
> > index e991821dd7fa..ba81a07e2bdf 100644
> > --- a/arch/powerpc/include/asm/kvm_ppc.h
> > +++ b/arch/powerpc/include/asm/kvm_ppc.h
> > @@ -906,4 +906,32 @@ static inline ulong kvmppc_get_ea_indexed(struct 
> > kvm_vcpu *vcpu, int ra, int rb)
> >  
> >  extern void xics_wake_cpu(int cpu);
> >  
> > +#ifdef CONFIG_PPC_SVM
> > +extern unsigned long kvmppc_h_svm_page_in(struct kvm *kvm,
> > + unsigned int lpid,
> > + unsigned long gra,
> > + unsigned long flags,
> > + unsigned long page_shift);
> > +extern unsigned long kvmppc_h_svm_page_out(struct kvm *kvm,
> > + unsigned int lpid,
> > + unsigned long gra,
> > + unsigned long flags,
> > + unsigned long page_shift);
> > +#else
> > +static inline unsigned long
> > +kvmppc_h_svm_page_in(struct kvm *kvm, unsigned int lpid,
> > +unsigned long gra, unsigned long flags,
> > +unsigned long page_shift)
> > +{
> > +   return H_UNSUPPORTED;
> > +}
> > +
> > +static inline unsigned long
> > +kvmppc_h_svm_page_out(struct kvm *kvm, unsigned int lpid,
> > + unsigned long gra, unsigned long flags,
> > + unsigned long page_shift)
> > +{
> > +   return H_UNSUPPORTED;
> > +}
> > +#endif
> >  #endif /* __POWERPC_KVM_PPC_H__ */
> > diff --git a/arch/powerpc/include/asm/ucall-api.h 
> > b/arch/powerpc/include/asm/ucall-api.h
> > new file mode 100644
> > 

Re: [RFC PATCH v1 3/4] kvmppc: H_SVM_INIT_START and H_SVM_INIT_DONE hcalls

2018-11-12 Thread Bharata B Rao
On Tue, Oct 30, 2018 at 04:29:57PM +1100, Paul Mackerras wrote:
> On Mon, Oct 22, 2018 at 10:48:36AM +0530, Bharata B Rao wrote:
> > H_SVM_INIT_START: Initiate securing a VM
> > H_SVM_INIT_DONE: Conclude securing a VM
> > 
> > During early guest init, these hcalls will be issued by UV.
> > As part of these hcalls, [un]register memslots with UV.
> > 
> > Signed-off-by: Bharata B Rao 
> 
> Comments below...

Will address all your comments in my next post.

Regards,
Bharata.



Re: [RFC PATCH v1 2/4] kvmppc: Add support for shared pages in HMM driver

2018-11-12 Thread Bharata B Rao
On Tue, Oct 30, 2018 at 04:26:46PM +1100, Paul Mackerras wrote:
> On Mon, Oct 22, 2018 at 10:48:35AM +0530, Bharata B Rao wrote:
> > A secure guest will share some of its pages with hypervisor (Eg. virtio
> > bounce buffers etc). Support shared pages in HMM driver.
> > 
> > Signed-off-by: Bharata B Rao 
> 
> Comments below...
> 
> > ---
> >  arch/powerpc/kvm/book3s_hv_hmm.c | 69 ++--
> >  1 file changed, 65 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/powerpc/kvm/book3s_hv_hmm.c 
> > b/arch/powerpc/kvm/book3s_hv_hmm.c
> > index a2ee3163a312..09b8e19b7605 100644
> > --- a/arch/powerpc/kvm/book3s_hv_hmm.c
> > +++ b/arch/powerpc/kvm/book3s_hv_hmm.c
> > @@ -50,6 +50,7 @@ struct kvmppc_hmm_page_pvt {
> > struct hlist_head *hmm_hash;
> > unsigned int lpid;
> > unsigned long gpa;
> > +   bool skip_page_out;
> >  };
> >  
> >  struct kvmppc_hmm_migrate_args {
> > @@ -278,6 +279,65 @@ static unsigned long kvmppc_gpa_to_hva(struct kvm 
> > *kvm, unsigned long gpa,
> > return hva;
> >  }
> >  
> > +/*
> > + * Shares the page with HV, thus making it a normal page.
> > + *
> > + * - If the page is already secure, then provision a new page and share
> > + * - If the page is a normal page, share the existing page
> > + *
> > + * In the former case, uses the HMM fault handler to release the HMM page.
> > + */
> > +static unsigned long
> > +kvmppc_share_page(struct kvm *kvm, unsigned long gpa,
> > + unsigned long addr, unsigned long page_shift)
> > +{
> > +
> > +   int ret;
> > +   struct hlist_head *list, *hmm_hash;
> > +   unsigned int lpid = kvm->arch.lpid;
> > +   unsigned long flags;
> > +   struct kvmppc_hmm_pfn_entry *p;
> > +   struct page *hmm_page, *page;
> > +   struct kvmppc_hmm_page_pvt *pvt;
> > +   unsigned long pfn;
> > +
> > +   /*
> > +* First check if the requested page has already been given to
> > +* UV as a secure page. If so, ensure that we don't issue a
> > +* UV_PAGE_OUT but instead directly send the page
> > +*/
> > +   spin_lock_irqsave(_hmm_lock, flags);
> > +   hmm_hash = kvm->arch.hmm_hash;
> > +   list = _hash[kvmppc_hmm_pfn_hash_fn(gpa)];
> > +   hlist_for_each_entry(p, list, hlist) {
> > +   if (p->addr == gpa) {
> > +   hmm_page = pfn_to_page(p->hmm_pfn);
> > +   get_page(hmm_page); /* TODO: Necessary ? */
> > +   pvt = (struct kvmppc_hmm_page_pvt *)
> > +   hmm_devmem_page_get_drvdata(hmm_page);
> > +   pvt->skip_page_out = true;
> > +   put_page(hmm_page);
> > +   break;
> > +   }
> > +   }
> > +   spin_unlock_irqrestore(_hmm_lock, flags);
> > +
> > +   ret = get_user_pages_fast(addr, 1, 0, );
> 
> Why are we calling this with write==0?  Surely in general the secure
> guest will expect to be able to write to the shared page?
> 
> Also, in general get_user_pages_fast isn't sufficient to translate a
> host virtual address (derived from a guest real address) into a pfn.
> See for example hva_to_pfn() in virt/kvm/kvm_main.c and the things it
> does to cope with the various cases that one can hit.  I can imagine
> in future that the secure guest might want to establish a shared
> mapping to a PCI device, for instance.

I switched to using gfn_to_pfn() which should cover all the cases.

> 
> > +   if (ret != 1)
> > +   return H_PARAMETER;
> > +
> > +   pfn = page_to_pfn(page);
> > +   if (is_zero_pfn(pfn)) {
> > +   put_page(page);
> > +   return H_SUCCESS;
> > +   }
> 
> The ultravisor still needs a page to map into the guest in this case,
> doesn't it?  What's the point of returning without giving the
> ultravisor a page to use?

Yes, missed it.

Regards,
Bharata.



Re: [RFC PATCH v1 1/4] kvmppc: HMM backend driver to manage pages of secure guest

2018-11-12 Thread Bharata B Rao
On Tue, Oct 30, 2018 at 04:03:00PM +1100, Paul Mackerras wrote:
> On Mon, Oct 22, 2018 at 10:48:34AM +0530, Bharata B Rao wrote:
> > HMM driver for KVM PPC to manage page transitions of
> > secure guest via H_SVM_PAGE_IN and H_SVM_PAGE_OUT hcalls.
> > 
> > H_SVM_PAGE_IN: Move the content of a normal page to secure page
> > H_SVM_PAGE_OUT: Move the content of a secure page to normal page
> 
> Comments below...

Thanks for your comments, I have addressed all of your comments in the
next version I am about to post out soon. I will reply only to the
questions here:

> > diff --git a/arch/powerpc/kvm/book3s_hv_hmm.c 
> > b/arch/powerpc/kvm/book3s_hv_hmm.c
> > new file mode 100644
> > index ..a2ee3163a312
> > --- /dev/null
> > +++ b/arch/powerpc/kvm/book3s_hv_hmm.c
> > @@ -0,0 +1,514 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * HMM driver to manage page migration between normal and secure
> > + * memory.
> > + *
> > + * Based on Jérôme Glisse's HMM dummy driver.
> > + *
> > + * Copyright 2018 Bharata B Rao, IBM Corp. 
> > + */
> > +
> > +/*
> > + * A pseries guest can be run as a secure guest on Ultravisor-enabled
> > + * POWER platforms. On such platforms, this driver will be used to manage
> > + * the movement of guest pages between the normal memory managed by
> > + * hypervisor (HV) and secure memory managed by Ultravisor (UV).
> > + *
> > + * Private ZONE_DEVICE memory equal to the amount of secure memory
> > + * available in the platform for running secure guests is created
> > + * via a HMM device. The movement of pages between normal and secure
> > + * memory is done by ->alloc_and_copy() callback routine of migrate_vma().
> > + *
> > + * The page-in or page-out requests from UV will come to HV as hcalls and
> > + * HV will call back into UV via uvcalls to satisfy these page requests.
> > + *
> > + * For each page that gets moved into secure memory, a HMM PFN is used
> > + * on the HV side and HMM migration PTE corresponding to that PFN would be
> > + * populated in the QEMU page tables. A per-guest hash table is created to
> > + * manage the pool of HMM PFNs. Guest real address is used as key to index
> > + * into the hash table and choose a free HMM PFN.
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +static struct kvmppc_hmm_device *kvmppc_hmm;
> > +spinlock_t kvmppc_hmm_lock;
> > +
> > +#define KVMPPC_HMM_HASH_BITS10
> > +#define KVMPPC_HMM_HASH_SIZE   (1 << KVMPPC_HMM_HASH_BITS)
> > +
> > +struct kvmppc_hmm_pfn_entry {
> > +   struct hlist_node hlist;
> > +   unsigned long addr;
> > +   unsigned long hmm_pfn;
> > +};
> > +
> > +struct kvmppc_hmm_page_pvt {
> > +   struct hlist_head *hmm_hash;
> > +   unsigned int lpid;
> > +   unsigned long gpa;
> > +};
> > +
> > +struct kvmppc_hmm_migrate_args {
> > +   struct hlist_head *hmm_hash;
> > +   unsigned int lpid;
> > +   unsigned long gpa;
> > +   unsigned long page_shift;
> > +};
> > +
> > +int kvmppc_hmm_hash_create(struct kvm *kvm)
> > +{
> > +   int i;
> > +
> > +   kvm->arch.hmm_hash = kzalloc(KVMPPC_HMM_HASH_SIZE *
> > +sizeof(struct hlist_head), GFP_KERNEL);
> > +   if (!kvm->arch.hmm_hash)
> > +   return -ENOMEM;
> > +
> > +   for (i = 0; i < KVMPPC_HMM_HASH_SIZE; i++)
> > +   INIT_HLIST_HEAD(>arch.hmm_hash[i]);
> > +   return 0;
> > +}
> 
> I gather that this hash table maps a guest real address (or pfn) to a
> hmm_pfn, is that right?  I think that mapping could be stored in the
> rmap arrays attached to the memslots, since there won't be any
> hypervisor-side mappings of the page while it is over in the secure
> space.  But maybe that could be a future optimization.

Hash table exists for two reasons:

- When guest/UV requests for a shared page-in, we need to quickly lookup
  if HV has ever given that page to UV as secure page and hence maintains
  only HMM PFN for it. We can walk the linux page table to figure this
  out and since only a small fraction of guest's pages are shared, the
  cost of lookup shouldn't cause much overhead.
- When the guest is destroyed, we need release all HMM pages that HV
  is using to keep track of secure pages. So it becomes easier if there
  is one centralized place (hash table in this case) where we can release
  all pages. I think using rmap arrays as you suggest should cover this
  case.

> 
> > +/*
> > + * Cleanup the HMM pages hash table when guest terminates
> > + *
> > + * Iterate over all the HMM pages hash list entries and release
> > + * reference on them. The actual freeing of the entry happens
> > + * via hmm_devmem_ops.free path.
> > + */
> > +void kvmppc_hmm_hash_destroy(struct kvm *kvm)
> > +{
> > +   int i;
> > +   struct kvmppc_hmm_pfn_entry *p;
> > +   struct page *hmm_page;
> > +
> > +   for (i = 0; i < KVMPPC_HMM_HASH_SIZE; i++) {
> > +   while (!hlist_empty(>arch.hmm_hash[i])) {
> > +   p = hlist_entry(kvm->arch.hmm_hash[i].first,
> > + 

Re: [RFC] mm: Replace all open encodings for NUMA_NO_NODE

2018-11-12 Thread Hans Verkuil
On 11/12/2018 03:41 AM, Anshuman Khandual wrote:
> At present there are multiple places where invalid node number is encoded
> as -1. Even though implicitly understood it is always better to have macros
> in there. Replace these open encodings for an invalid node number with the
> global macro NUMA_NO_NODE. This helps remove NUMA related assumptions like
> 'invalid node' from various places redirecting them to a common definition.
> 
> Signed-off-by: Anshuman Khandual 
> ---
> Build tested this with multiple cross compiler options like alpha, sparc,
> arm64, x86, powerpc64le etc with their default config which might not have
> compiled tested all driver related changes. I will appreciate folks giving
> this a test in their respective build environment.
> 
> All these places for replacement were found by running the following grep
> patterns on the entire kernel code. Please let me know if this might have
> missed some instances. This might also have replaced some false positives.
> I will appreciate suggestions, inputs and review.

The 'node' in the drivers/media and the drivers/video sources has nothing to
do with numa. It's an index for a framebuffer instead (i.e. the X in /dev/fbX).

Regards,

Hans

> 
> 1. git grep "nid == -1"
> 2. git grep "node == -1"
> 3. git grep "nid = -1"
> 4. git grep "node = -1"
> 
>  arch/alpha/include/asm/topology.h |  2 +-
>  arch/ia64/kernel/numa.c   |  2 +-
>  arch/ia64/mm/discontig.c  |  6 +++---
>  arch/ia64/sn/kernel/io_common.c   |  2 +-
>  arch/powerpc/include/asm/pci-bridge.h |  2 +-
>  arch/powerpc/kernel/paca.c|  2 +-
>  arch/powerpc/kernel/pci-common.c  |  2 +-
>  arch/powerpc/mm/numa.c| 14 +++---
>  arch/powerpc/platforms/powernv/memtrace.c |  4 ++--
>  arch/sparc/kernel/auxio_32.c  |  2 +-
>  arch/sparc/kernel/pci_fire.c  |  2 +-
>  arch/sparc/kernel/pci_schizo.c|  2 +-
>  arch/sparc/kernel/pcic.c  |  6 +++---
>  arch/sparc/kernel/psycho_common.c |  2 +-
>  arch/sparc/kernel/sbus.c  |  2 +-
>  arch/sparc/mm/init_64.c   |  6 +++---
>  arch/sparc/prom/init_32.c |  2 +-
>  arch/sparc/prom/init_64.c |  4 ++--
>  arch/sparc/prom/tree_32.c | 12 ++--
>  arch/sparc/prom/tree_64.c | 18 +-
>  arch/x86/include/asm/pci.h|  2 +-
>  arch/x86/kernel/apic/x2apic_uv_x.c|  6 +++---
>  arch/x86/kernel/smpboot.c |  2 +-
>  arch/x86/platform/olpc/olpc_dt.c  | 16 
>  drivers/block/mtip32xx/mtip32xx.c |  4 ++--
>  drivers/dma/dmaengine.c   |  3 ++-
>  drivers/infiniband/hw/hfi1/affinity.c |  2 +-
>  drivers/infiniband/hw/hfi1/init.c |  2 +-
>  drivers/iommu/dmar.c  |  4 ++--
>  drivers/iommu/intel-iommu.c   |  2 +-
>  drivers/media/pci/ivtv/ivtvfb.c   |  2 +-
>  drivers/media/platform/vivid/vivid-osd.c  |  2 +-
>  drivers/misc/sgi-xp/xpc_uv.c  |  2 +-
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  4 ++--
>  drivers/video/fbdev/mmp/fb/mmpfb.c|  2 +-
>  drivers/video/fbdev/pxa168fb.c|  2 +-
>  drivers/video/fbdev/w100fb.c  |  2 +-
>  fs/ocfs2/dlm/dlmcommon.h  |  2 +-
>  fs/ocfs2/dlm/dlmdomain.c  | 10 +-
>  fs/ocfs2/dlm/dlmmaster.c  |  2 +-
>  fs/ocfs2/dlm/dlmrecovery.c|  2 +-
>  fs/ocfs2/stack_user.c |  6 +++---
>  init/init_task.c  |  2 +-
>  kernel/kthread.c  |  2 +-
>  kernel/sched/fair.c   | 15 ---
>  lib/cpumask.c |  2 +-
>  mm/huge_memory.c  | 12 ++--
>  mm/hugetlb.c  |  2 +-
>  mm/ksm.c  |  2 +-
>  mm/memory.c   |  6 +++---
>  mm/memory_hotplug.c   | 12 ++--
>  mm/mempolicy.c|  2 +-
>  mm/page_alloc.c   |  4 ++--
>  mm/page_ext.c |  2 +-
>  net/core/pktgen.c |  2 +-
>  net/qrtr/qrtr.c   |  2 +-
>  tools/perf/bench/numa.c   |  6 +++---
>  57 files changed, 125 insertions(+), 123 deletions(-)
> 
> diff --git a/arch/alpha/include/asm/topology.h 
> b/arch/alpha/include/asm/topology.h
> index e6e13a8..f6dc89c 100644
> --- a/arch/alpha/include/asm/topology.h
> +++ b/arch/alpha/include/asm/topology.h
> @@ -29,7