date:20170614

Ouch, this is a wrong one, please ignore. I'll repost in a sec.


On 15/06/17 15:06, Alexey Kardashevskiy wrote:
> Here is a patchset which Yongji was working on before
> leaving IBM LTC. Since we still want to have this functionality
> in the kernel (DPDK is the first user), here is a rebase
> on the current upstream.
> 
> 
> Current vfio-pci implementation disallows to mmap the page
> containing MSI-X table in case that users can write directly
> to MSI-X table and generate an incorrect MSIs.
> 
> However, this will cause some performance issue when there
> are some critical device registers in the same page as the
> MSI-X table. We have to handle the mmio access to these
> registers in QEMU emulation rather than in guest.
> 
> To solve this issue, this series allows to expose MSI-X table
> to userspace when hardware enables the capability of interrupt
> remapping which can ensure that a given PCI device can only
> shoot the MSIs assigned for it. And we introduce a new bus_flags
> PCI_BUS_FLAGS_MSI_REMAP to test this capability on PCI side
> for different archs.
> 
> The patch 3 are based on the proposed patchset[1].
> 
> Changelog
> v3:
> - rebased on the current upstream
> 
> v2:
> - Make the commit log more clear
> - Replace pci_bus_check_msi_remapping() with pci_bus_msi_isolated()
>   so that we could clearly know what the function does
> - Set PCI_BUS_FLAGS_MSI_REMAP in pci_create_root_bus() instead
>   of iommu_bus_notifier()
> - Reserve VFIO_REGION_INFO_FLAG_CAPS when we allow to mmap MSI-X
>   table so that we can know whether we allow to mmap MSI-X table
>   in QEMU
> 
> [1] 
> https://www.mail-archive.com/linux-kernel%40vger.kernel.org/msg1138820.html
> 
> 
> This is based on sha1
> 63f700aab4c1 Linus Torvalds "Merge tag 'xtensa-20170612' of 
> git://github.com/jcmvbkbc/linux-xtensa".
> 
> Please comment. Thanks.
> 
> 
> 
> Yongji Xie (3):
>   PCI: Add a new PCI_BUS_FLAGS_MSI_REMAP flag
>   pci-ioda: Set PCI_BUS_FLAGS_MSI_REMAP for IODA host bridge
>   vfio-pci: Allow to expose MSI-X table to userspace if interrupt
> remapping is enabled
> 
>  include/linux/pci.h   |  1 +
>  arch/powerpc/platforms/powernv/pci-ioda.c |  8 
>  drivers/vfio/pci/vfio_pci.c   | 18 +++---
>  drivers/vfio/pci/vfio_pci_rdwr.c  |  3 ++-
>  4 files changed, 26 insertions(+), 4 deletions(-)
> 


-- 
Alexey

[PATCH kernel 3/3] vfio-pci: Allow to expose MSI-X table to userspace if interrupt remapping is enabled

From: Yongji Xie 

This patch tries to expose MSI-X tables to userspace if hardware
enables interrupt remapping which can ensure that a given PCI
device can only shoot the MSIs assigned for it. So we could
never worry that userspace driver can hurt other devices by
writing to the exposed MSI-X table directly.

Signed-off-by: Yongji Xie 
Signed-off-by: Alexey Kardashevskiy 
---
 drivers/vfio/pci/vfio_pci.c  | 18 +++---
 drivers/vfio/pci/vfio_pci_rdwr.c |  3 ++-
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 324c52e3a1a4..700e9d04dab5 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -564,8 +564,12 @@ static int msix_sparse_mmap_cap(struct vfio_pci_device 
*vdev,
 
end = pci_resource_len(vdev->pdev, vdev->msix_bar);
 
-   /* If MSI-X table is aligned to the start or end, only one area */
-   if (((vdev->msix_offset & PAGE_MASK) == 0) ||
+   /*
+* If MSI-X table is allowed to mmap because of the capability
+* of IRQ remapping or aligned to the start or end, only one area
+*/
+   if ((vdev->pdev->bus->bus_flags & PCI_BUS_FLAGS_MSI_REMAP) ||
+   ((vdev->msix_offset & PAGE_MASK) == 0) ||
(PAGE_ALIGN(vdev->msix_offset + vdev->msix_size) >= end))
nr_areas = 1;
 
@@ -577,6 +581,12 @@ static int msix_sparse_mmap_cap(struct vfio_pci_device 
*vdev,
 
sparse->nr_areas = nr_areas;
 
+   if (vdev->pdev->bus->bus_flags & PCI_BUS_FLAGS_MSI_REMAP) {
+   sparse->areas[i].offset = 0;
+   sparse->areas[i].size = end;
+   goto out;
+   }
+
if (vdev->msix_offset & PAGE_MASK) {
sparse->areas[i].offset = 0;
sparse->areas[i].size = vdev->msix_offset & PAGE_MASK;
@@ -590,6 +600,7 @@ static int msix_sparse_mmap_cap(struct vfio_pci_device 
*vdev,
i++;
}
 
+out:
ret = vfio_info_add_capability(caps, VFIO_REGION_INFO_CAP_SPARSE_MMAP,
   sparse);
kfree(sparse);
@@ -1115,7 +1126,8 @@ static int vfio_pci_mmap(void *device_data, struct 
vm_area_struct *vma)
if (req_start + req_len > phys_len)
return -EINVAL;
 
-   if (index == vdev->msix_bar) {
+   if (index == vdev->msix_bar &&
+   !(pdev->bus->bus_flags & PCI_BUS_FLAGS_MSI_REMAP)) {
/*
 * Disallow mmaps overlapping the MSI-X table; users don't
 * get to touch this directly.  We could find somewhere
diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index 357243d76f10..5378f2c3ac8e 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -164,7 +164,8 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_device *vdev, char 
__user *buf,
} else
io = vdev->barmap[bar];
 
-   if (bar == vdev->msix_bar) {
+   if (bar == vdev->msix_bar &&
+   !(pdev->bus->bus_flags & PCI_BUS_FLAGS_MSI_REMAP)) {
x_start = vdev->msix_offset;
x_end = vdev->msix_offset + vdev->msix_size;
}
-- 
2.11.0

[PATCH 1/2] powerpc: Fix emulation of mcrf in emulate_step()

2017-06-14 Thread Anton Blanchard

From: Anton Blanchard 

The mcrf emulation code was looking at the CR fields in the reverse
order. It also relied on reserved fields being zero which is somewhat
fragile, so fix that too.

Cc: sta...@vger.kernel.org
Signed-off-by: Anton Blanchard 
---
 arch/powerpc/lib/sstep.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 33117f8a0882..fb84f51b1f0b 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -683,8 +683,10 @@ int analyse_instr(struct instruction_op *op, struct 
pt_regs *regs,
case 19:
switch ((instr >> 1) & 0x3ff) {
case 0: /* mcrf */
-   rd = (instr >> 21) & 0x1c;
-   ra = (instr >> 16) & 0x1c;
+   rd = 7 - ((instr >> 23) & 0x7);
+   ra = 7 - ((instr >> 18) & 0x7);
+   rd *= 4;
+   ra *= 4;
val = (regs->ccr >> ra) & 0xf;
regs->ccr = (regs->ccr & ~(0xfUL << rd)) | (val << rd);
goto instr_done;
-- 
2.11.0

[PATCH] powerpc: Fix /proc/cpuinfo revision for POWER9 DD2

2017-06-14 Thread Michael Neuling

The P9 PVR bits 12-15 don't indicate a revision but instead different
chip configurations.  From BookIV we have:
   Bits  Configuration
0 :Scale out 12 cores
1 :Scale out 24 cores
2 :Scale up  12 cores
3 :Scale up  24 cores

DD1 doesn't use this but DD2 does. Linux will mostly use the "Scale
out 24 core" configuration (ie. SMT4 not SMT8) which results in a PVR
of 0x004e1200. The reported revision in /proc/cpuinfo is hence
reported incorrectly as "18.0".

This patch fixes this to mask off only the relevant bits for the major
revision (ie. bits 8-11) for POWER9.

Signed-off-by: Michael Neuling 
---
 arch/powerpc/kernel/setup-common.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index 857129acf9..94a948207c 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -335,6 +335,10 @@ static int show_cpuinfo(struct seq_file *m, void *v)
maj = ((pvr >> 8) & 0xFF) - 1;
min = pvr & 0xFF;
break;
+   case 0x004e: /* POWER9 bits 12-15 give chip type */
+   maj = (pvr >> 8) & 0x0F;
+   min = pvr & 0xFF;
+   break;
default:
maj = (pvr >> 8) & 0xFF;
min = pvr & 0xFF;
-- 
2.11.0

[PATCH kernel 0/3 REPOST] vfio-pci: Add support for mmapping MSI-X table


Here is a patchset which Yongji was working on before
leaving IBM LTC. Since we still want to have this functionality
in the kernel (DPDK is the first user), here is a rebase
on the current upstream.


Current vfio-pci implementation disallows to mmap the page
containing MSI-X table in case that users can write directly
to MSI-X table and generate an incorrect MSIs.

However, this will cause some performance issue when there
are some critical device registers in the same page as the
MSI-X table. We have to handle the mmio access to these
registers in QEMU emulation rather than in guest.

To solve this issue, this series allows to expose MSI-X table
to userspace when hardware enables the capability of interrupt
remapping which can ensure that a given PCI device can only
shoot the MSIs assigned for it. And we introduce a new bus_flags
PCI_BUS_FLAGS_MSI_REMAP to test this capability on PCI side
for different archs.

The patch 3 are based on the proposed patchset[1].

Changelog
v3:
- rebased on the current upstream

v2:
- Make the commit log more clear
- Replace pci_bus_check_msi_remapping() with pci_bus_msi_isolated()
  so that we could clearly know what the function does
- Set PCI_BUS_FLAGS_MSI_REMAP in pci_create_root_bus() instead
  of iommu_bus_notifier()
- Reserve VFIO_REGION_INFO_FLAG_CAPS when we allow to mmap MSI-X
  table so that we can know whether we allow to mmap MSI-X table
  in QEMU

[1] https://www.mail-archive.com/linux-kernel%40vger.kernel.org/msg1138820.html


This is based on sha1
63f700aab4c1 Linus Torvalds "Merge tag 'xtensa-20170612' of 
git://github.com/jcmvbkbc/linux-xtensa".

Please comment. Thanks.



Yongji Xie (3):
  PCI: Add a new PCI_BUS_FLAGS_MSI_REMAP flag
  pci-ioda: Set PCI_BUS_FLAGS_MSI_REMAP for IODA host bridge
  vfio-pci: Allow to expose MSI-X table to userspace if interrupt
remapping is enabled

 include/linux/pci.h   |  1 +
 arch/powerpc/platforms/powernv/pci-ioda.c |  8 
 drivers/vfio/pci/vfio_pci.c   | 18 +++---
 drivers/vfio/pci/vfio_pci_rdwr.c  |  3 ++-
 4 files changed, 26 insertions(+), 4 deletions(-)

-- 
2.11.0

[PATCH kernel 3/3] vfio-pci: Allow to expose MSI-X table to userspace if interrupt remapping is enabled

From: Yongji Xie 

This patch tries to expose MSI-X tables to userspace if hardware
enables interrupt remapping which can ensure that a given PCI
device can only shoot the MSIs assigned for it. So we could
never worry that userspace driver can hurt other devices by
writing to the exposed MSI-X table directly.

Signed-off-by: Yongji Xie 
Signed-off-by: Michael Roth 
Signed-off-by: Paul Mackerras 
---
 drivers/vfio/pci/vfio_pci.c  | 18 +++---
 drivers/vfio/pci/vfio_pci_rdwr.c |  3 ++-
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 324c52e3a1a4..700e9d04dab5 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -564,8 +564,12 @@ static int msix_sparse_mmap_cap(struct vfio_pci_device 
*vdev,
 
end = pci_resource_len(vdev->pdev, vdev->msix_bar);
 
-   /* If MSI-X table is aligned to the start or end, only one area */
-   if (((vdev->msix_offset & PAGE_MASK) == 0) ||
+   /*
+* If MSI-X table is allowed to mmap because of the capability
+* of IRQ remapping or aligned to the start or end, only one area
+*/
+   if ((vdev->pdev->bus->bus_flags & PCI_BUS_FLAGS_MSI_REMAP) ||
+   ((vdev->msix_offset & PAGE_MASK) == 0) ||
(PAGE_ALIGN(vdev->msix_offset + vdev->msix_size) >= end))
nr_areas = 1;
 
@@ -577,6 +581,12 @@ static int msix_sparse_mmap_cap(struct vfio_pci_device 
*vdev,
 
sparse->nr_areas = nr_areas;
 
+   if (vdev->pdev->bus->bus_flags & PCI_BUS_FLAGS_MSI_REMAP) {
+   sparse->areas[i].offset = 0;
+   sparse->areas[i].size = end;
+   goto out;
+   }
+
if (vdev->msix_offset & PAGE_MASK) {
sparse->areas[i].offset = 0;
sparse->areas[i].size = vdev->msix_offset & PAGE_MASK;
@@ -590,6 +600,7 @@ static int msix_sparse_mmap_cap(struct vfio_pci_device 
*vdev,
i++;
}
 
+out:
ret = vfio_info_add_capability(caps, VFIO_REGION_INFO_CAP_SPARSE_MMAP,
   sparse);
kfree(sparse);
@@ -1115,7 +1126,8 @@ static int vfio_pci_mmap(void *device_data, struct 
vm_area_struct *vma)
if (req_start + req_len > phys_len)
return -EINVAL;
 
-   if (index == vdev->msix_bar) {
+   if (index == vdev->msix_bar &&
+   !(pdev->bus->bus_flags & PCI_BUS_FLAGS_MSI_REMAP)) {
/*
 * Disallow mmaps overlapping the MSI-X table; users don't
 * get to touch this directly.  We could find somewhere
diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index 357243d76f10..5378f2c3ac8e 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -164,7 +164,8 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_device *vdev, char 
__user *buf,
} else
io = vdev->barmap[bar];
 
-   if (bar == vdev->msix_bar) {
+   if (bar == vdev->msix_bar &&
+   !(pdev->bus->bus_flags & PCI_BUS_FLAGS_MSI_REMAP)) {
x_start = vdev->msix_offset;
x_end = vdev->msix_offset + vdev->msix_size;
}
-- 
2.11.0

[PATCH kernel 2/3] pci-ioda: Set PCI_BUS_FLAGS_MSI_REMAP for IODA host bridge

From: Yongji Xie 

Any IODA host bridge have the capability of IRQ remapping.
So we set PCI_BUS_FLAGS_MSI_REMAP when this kind of host birdge
is detected.

Signed-off-by: Yongji Xie 
Reviewed-by: Alexey Kardashevskiy 
Signed-off-by: Paul Mackerras 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 283caf1070c9..b6bda1918273 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -3177,6 +3177,12 @@ static void pnv_pci_ioda_fixup(void)
 #endif
 }
 
+int pnv_pci_ioda_root_bridge_prepare(struct pci_host_bridge *bridge)
+{
+   bridge->bus->bus_flags |= PCI_BUS_FLAGS_MSI_REMAP;
+   return 0;
+}
+
 /*
  * Returns the alignment for I/O or memory windows for P2P
  * bridges. That actually depends on how PEs are segmented.
@@ -3861,6 +3867,8 @@ static void __init pnv_pci_init_ioda_phb(struct 
device_node *np,
 */
ppc_md.pcibios_fixup = pnv_pci_ioda_fixup;
 
+   ppc_md.pcibios_root_bridge_prepare = pnv_pci_ioda_root_bridge_prepare;
+
if (phb->type == PNV_PHB_NPU) {
hose->controller_ops = pnv_npu_ioda_controller_ops;
} else {
-- 
2.11.0

[PATCH 2/2] powerpc: Fix emulation of mfocrf in emulate_step()

2017-06-14 Thread Anton Blanchard

From: Anton Blanchard 

>From POWER4 onwards, mfocrf() only places the specified CR field into
the destination GPR, and the rest of it is set to 0. The PowerPC AS
from version 3.0 now requires this behaviour.

The emulation code currently puts the entire CR into the destination GPR.
Fix it.

Cc: sta...@vger.kernel.org
Signed-off-by: Anton Blanchard 
---
 arch/powerpc/lib/sstep.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index fb84f51b1f0b..ee33327686ae 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -966,6 +966,19 @@ int analyse_instr(struct instruction_op *op, struct 
pt_regs *regs,
 #endif
 
case 19:/* mfcr */
+   if ((instr >> 20) & 1) {
+   imm = 0xf000UL;
+   for (sh = 0; sh < 8; ++sh) {
+   if (instr & (0x8 >> sh)) {
+   regs->gpr[rd] = regs->ccr & imm;
+   break;
+   }
+   imm >>= 4;
+   }
+
+   goto instr_done;
+   }
+
regs->gpr[rd] = regs->ccr;
regs->gpr[rd] &= 0xUL;
goto instr_done;
-- 
2.11.0

[PATCH kernel 1/3] PCI: Add a new PCI_BUS_FLAGS_MSI_REMAP flag

From: Yongji Xie 

We introduce a new pci_bus_flags, PCI_BUS_FLAGS_MSI_REMAP
which indicates interrupts of all devices on the bus are
managed by the hardware enabling IRQ remapping(intel naming).
When the capability is enabled, a given PCI device can only
shoot the MSIs assigned for it. In other words, the hardware
can protect system from invalid MSIs of the device by checking
the target address and data when there is something wrong
with MSI part in device or device driver.

There is a existing flag for this capability in the IOMMU space:

enum iommu_cap {
IOMMU_CAP_CACHE_COHERENCY,
--->IOMMU_CAP_INTR_REMAP,
IOMMU_CAP_NOEXEC,
};

and Eric also posted a patchset [1] to abstract it on MSI
controller side for ARM. But it would make sense to have a
more common flag like PCI_BUS_FLAGS_MSI_REMAP so that we can
use a universal flag to test this capability on PCI side for
different archs.

With this flag enabled, we can easily know whether it's safe
to expose MSI-X tables of PCI BARs to userspace. Some usespace
drivers such as VFIO may benefit from this.

[1] https://www.mail-archive.com/linux-kernel%40vger.kernel.org/msg1138820.html

Signed-off-by: Yongji Xie 
Signed-off-by: Alexey Kardashevskiy 
---
 include/linux/pci.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/pci.h b/include/linux/pci.h
index 8039f9f0ca05..2c6dbb3dd0da 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -200,6 +200,7 @@ enum pci_bus_flags {
PCI_BUS_FLAGS_NO_MSI= (__force pci_bus_flags_t) 1,
PCI_BUS_FLAGS_NO_MMRBC  = (__force pci_bus_flags_t) 2,
PCI_BUS_FLAGS_NO_AERSID = (__force pci_bus_flags_t) 4,
+   PCI_BUS_FLAGS_MSI_REMAP = (__force pci_bus_flags_t) 8,
 };
 
 /* These values come from the PCI Express Spec */
-- 
2.11.0

Re: [PATCH] powernv/npu-dma.c: Remove spurious WARN_ON when a PCI device has no of_node

On 14/06/17 14:47, Alistair Popple wrote:
> "4c3b89e powerpc/powernv: Add sanity checks to pnv_pci_get_{gpu|npu}_dev"
> introduced explicit warnings in pnv_pci_get_npu_dev() when a PCIe device
> has no associated device-tree node. However not all PCIe devices have an
> of_node and pnv_pci_get_npu_dev() gets indirectly called at least once for
> every PCIe device in the system. This results in spurious WARN_ON()'s so
> remove it.
> 
> The same situation should not exist for pnv_pci_get_gpu_dev() as any NPU
> based PCIe device requires a device-tree node.
> 
> Signed-off-by: Alistair Popple 
> Reported-by: Alexey Kardashevskiy 

Reviewed-by: Alexey Kardashevskiy 

> ---
>  arch/powerpc/platforms/powernv/npu-dma.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c 
> b/arch/powerpc/platforms/powernv/npu-dma.c
> index 78fa939..e6f444b 100644
> --- a/arch/powerpc/platforms/powernv/npu-dma.c
> +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> @@ -75,7 +75,8 @@ struct pci_dev *pnv_pci_get_npu_dev(struct pci_dev *gpdev, 
> int index)
>   if (WARN_ON(!gpdev))
>   return NULL;
>  
> - if (WARN_ON(!gpdev->dev.of_node))
> + /* Not all PCI devices have device-tree nodes */
> + if (!gpdev->dev.of_node)
>   return NULL;
>  
>   /* Get assoicated PCI device */
> 


-- 
Alexey

[PATCH kernel 2/3] pci-ioda: Set PCI_BUS_FLAGS_MSI_REMAP for IODA host bridge

From: Yongji Xie 

Any IODA host bridge have the capability of IRQ remapping.
So we set PCI_BUS_FLAGS_MSI_REMAP when this kind of host birdge
is detected.

Signed-off-by: Yongji Xie 
Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 283caf1070c9..b6bda1918273 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -3177,6 +3177,12 @@ static void pnv_pci_ioda_fixup(void)
 #endif
 }
 
+int pnv_pci_ioda_root_bridge_prepare(struct pci_host_bridge *bridge)
+{
+   bridge->bus->bus_flags |= PCI_BUS_FLAGS_MSI_REMAP;
+   return 0;
+}
+
 /*
  * Returns the alignment for I/O or memory windows for P2P
  * bridges. That actually depends on how PEs are segmented.
@@ -3861,6 +3867,8 @@ static void __init pnv_pci_init_ioda_phb(struct 
device_node *np,
 */
ppc_md.pcibios_fixup = pnv_pci_ioda_fixup;
 
+   ppc_md.pcibios_root_bridge_prepare = pnv_pci_ioda_root_bridge_prepare;
+
if (phb->type == PNV_PHB_NPU) {
hose->controller_ops = pnv_npu_ioda_controller_ops;
} else {
-- 
2.11.0

Re: [PATCH kernel] powerpc/debug: Add missing warn flag to WARN_ON's non-builtin path

On 14/06/17 21:04, Michael Ellerman wrote:
> Alexey Kardashevskiy  writes:
> 
>> When trapped on WARN_ON(), report_bug() is expected to return
>> BUG_TRAP_TYPE_WARN so the caller could increment NIP by 4 and continue.
>> The __builtin_constant_p() path of the PPC's WARN_ON() calls (indirectly)
>> __WARN_FLAGS() which has BUGFLAG_WARNING set, however the other branch
>> does not which makes report_bug() report a bug rather than a warning.
>>
>> Fixes: 19d436268dde95389 ("debug: Add _ONCE() logic to report_bug()")
>> Signed-off-by: Alexey Kardashevskiy 
>> ---
>>
>> Actually 19d436268dde95389 replaced __WARN_TAINT() with __WARN_FLAGS()
>> and lost BUGFLAG_TAINT() and this is not in the commit log so it is
>> unclear:
>> 1) why
> 
> I think the rename is because previously the argument was a taint value,
> whereas now it is a flags value (which is a superset of taint).
> 
>> 2) whether this particular patch should be doing
>>BUGFLAG_WARNING|BUGFLAG_TAINT(TAINT_WARN)
>>  or
>>BUGFLAG_WARNING|(flags)
> 
> There is no flags here so the latter won't work AFAICS.
> 
>> Any ideas? Thanks.
> 
> Your patch looks correct to me. I assume it works?

Yes, it does.

> 
> 
> The bug isn't introduced by 19d436268dde ("debug: Add _ONCE() logic to
> report_bug()") as far as I can see.
> 
> If you check out that revision you see that BUGFLAG_TAINT still contains
> BUGFLAG_WARNING:
> 
> #define BUGFLAG_TAINT(taint)  (BUGFLAG_WARNING | ((taint) << 8))
> 
> But that was removed in f26dee15103f ("debug: Avoid setting
> BUGFLAG_WARNING twice"). So I think the Fixes: tag should point at that
> commit.

Ah, you're right. Should I repost the patch with the updated "fixes:" clause?


> 
> cheers
> 
>> diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
>> index f2c562a0a427..0151af6c2a50 100644
>> --- a/arch/powerpc/include/asm/bug.h
>> +++ b/arch/powerpc/include/asm/bug.h
>> @@ -104,7 +104,7 @@
>>  "1: "PPC_TLNEI" %4,0\n" \
>>  _EMIT_BUG_ENTRY \
>>  : : "i" (__FILE__), "i" (__LINE__), \
>> -  "i" (BUGFLAG_TAINT(TAINT_WARN)),  \
>> +  "i" (BUGFLAG_WARNING|BUGFLAG_TAINT(TAINT_WARN)),\
>>"i" (sizeof(struct bug_entry)),   \
>>"r" (__ret_warn_on)); \
>>  }   \
>> -- 
>> 2.11.0


-- 
Alexey

[PATCH kernel 0/3] vfio-pci: Add support for mmapping MSI-X table

Here is a patchset which Yongji was working on before
leaving IBM LTC. Since we still want to have this functionality
in the kernel (DPDK is the first user), here is a rebase
on the current upstream.


Current vfio-pci implementation disallows to mmap the page
containing MSI-X table in case that users can write directly
to MSI-X table and generate an incorrect MSIs.

However, this will cause some performance issue when there
are some critical device registers in the same page as the
MSI-X table. We have to handle the mmio access to these
registers in QEMU emulation rather than in guest.

To solve this issue, this series allows to expose MSI-X table
to userspace when hardware enables the capability of interrupt
remapping which can ensure that a given PCI device can only
shoot the MSIs assigned for it. And we introduce a new bus_flags
PCI_BUS_FLAGS_MSI_REMAP to test this capability on PCI side
for different archs.

The patch 3 are based on the proposed patchset[1].

Changelog
v3:
- rebased on the current upstream

v2:
- Make the commit log more clear
- Replace pci_bus_check_msi_remapping() with pci_bus_msi_isolated()
  so that we could clearly know what the function does
- Set PCI_BUS_FLAGS_MSI_REMAP in pci_create_root_bus() instead
  of iommu_bus_notifier()
- Reserve VFIO_REGION_INFO_FLAG_CAPS when we allow to mmap MSI-X
  table so that we can know whether we allow to mmap MSI-X table
  in QEMU

[1] https://www.mail-archive.com/linux-kernel%40vger.kernel.org/msg1138820.html


This is based on sha1
63f700aab4c1 Linus Torvalds "Merge tag 'xtensa-20170612' of 
git://github.com/jcmvbkbc/linux-xtensa".

Please comment. Thanks.



Yongji Xie (3):
  PCI: Add a new PCI_BUS_FLAGS_MSI_REMAP flag
  pci-ioda: Set PCI_BUS_FLAGS_MSI_REMAP for IODA host bridge
  vfio-pci: Allow to expose MSI-X table to userspace if interrupt
remapping is enabled

 include/linux/pci.h   |  1 +
 arch/powerpc/platforms/powernv/pci-ioda.c |  8 
 drivers/vfio/pci/vfio_pci.c   | 18 +++---
 drivers/vfio/pci/vfio_pci_rdwr.c  |  3 ++-
 4 files changed, 26 insertions(+), 4 deletions(-)

-- 
2.11.0

[PATCH kernel 1/3] PCI: Add a new PCI_BUS_FLAGS_MSI_REMAP flag

From: Yongji Xie 

We introduce a new pci_bus_flags, PCI_BUS_FLAGS_MSI_REMAP
which indicates interrupts of all devices on the bus are
managed by the hardware enabling IRQ remapping(intel naming).
When the capability is enabled, a given PCI device can only
shoot the MSIs assigned for it. In other words, the hardware
can protect system from invalid MSIs of the device by checking
the target address and data when there is something wrong
with MSI part in device or device driver.

There is a existing flag for this capability in the IOMMU space:

enum iommu_cap {
IOMMU_CAP_CACHE_COHERENCY,
--->IOMMU_CAP_INTR_REMAP,
IOMMU_CAP_NOEXEC,
};

and Eric also posted a patchset [1] to abstract it on MSI
controller side for ARM. But it would make sense to have a
more common flag like PCI_BUS_FLAGS_MSI_REMAP so that we can
use a universal flag to test this capability on PCI side for
different archs.

With this flag enabled, we can easily know whether it's safe
to expose MSI-X tables of PCI BARs to userspace. Some usespace
drivers such as VFIO may benefit from this.

[1] https://www.mail-archive.com/linux-kernel%40vger.kernel.org/msg1138820.html

Signed-off-by: Yongji Xie 
Signed-off-by: Paul Mackerras 
---
 include/linux/pci.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/pci.h b/include/linux/pci.h
index 8039f9f0ca05..2c6dbb3dd0da 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -200,6 +200,7 @@ enum pci_bus_flags {
PCI_BUS_FLAGS_NO_MSI= (__force pci_bus_flags_t) 1,
PCI_BUS_FLAGS_NO_MMRBC  = (__force pci_bus_flags_t) 2,
PCI_BUS_FLAGS_NO_AERSID = (__force pci_bus_flags_t) 4,
+   PCI_BUS_FLAGS_MSI_REMAP = (__force pci_bus_flags_t) 8,
 };
 
 /* These values come from the PCI Express Spec */
-- 
2.11.0

[PATCH 1/3] cpuidle: powerpc: cpuidle set polling before enabling irqs

local_irq_enable can cause interrupts to be taken which could
take significant amount of processing time. The idle process
should set its polling flag before this, so another process that
wakes it during this time will not have to send an IPI.

Expand the TIF_POLLING_NRFLAG coverage to as large as possible.

Reviewed-by: Gautham R. Shenoy 
Signed-off-by: Nicholas Piggin 
---
 drivers/cpuidle/cpuidle-powernv.c | 4 +++-
 drivers/cpuidle/cpuidle-pseries.c | 3 ++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/cpuidle/cpuidle-powernv.c 
b/drivers/cpuidle/cpuidle-powernv.c
index 45eaf06462ae..77bc50ad9f57 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -51,9 +51,10 @@ static int snooze_loop(struct cpuidle_device *dev,
 {
u64 snooze_exit_time;
 
-   local_irq_enable();
set_thread_flag(TIF_POLLING_NRFLAG);
 
+   local_irq_enable();
+
snooze_exit_time = get_tb() + snooze_timeout;
ppc64_runlatch_off();
HMT_very_low();
@@ -66,6 +67,7 @@ static int snooze_loop(struct cpuidle_device *dev,
ppc64_runlatch_on();
clear_thread_flag(TIF_POLLING_NRFLAG);
smp_mb();
+
return index;
 }
 
diff --git a/drivers/cpuidle/cpuidle-pseries.c 
b/drivers/cpuidle/cpuidle-pseries.c
index 166ccd711ec9..7b12bb2ea70f 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -62,9 +62,10 @@ static int snooze_loop(struct cpuidle_device *dev,
unsigned long in_purr;
u64 snooze_exit_time;
 
+   set_thread_flag(TIF_POLLING_NRFLAG);
+
idle_loop_prolog(_purr);
local_irq_enable();
-   set_thread_flag(TIF_POLLING_NRFLAG);
snooze_exit_time = get_tb() + snooze_timeout;
 
while (!need_resched()) {
-- 
2.11.0

[PATCH 0/3] powerpc (powernv and pseries) cpuidle driver improvmeents

Hi,

These are a few small improvements that came from doing an
optimisation pass over powerpc cpu idle paths.

Michael reminded me to cc the cpuidle maintainers. I think he
will take the patches through the powerpc tree, but any suggestion
or ack or nack would be welcome.

Thanks,
Nick

Nicholas Piggin (3):
  cpuidle: powerpc: cpuidle set polling before enabling irqs
  cpuidle: powerpc: read mostly for common globals
  cpuidle: powerpc: no memory barrier after break from idle

 drivers/cpuidle/cpuidle-powernv.c | 25 +
 drivers/cpuidle/cpuidle-pseries.c | 22 +++---
 2 files changed, 32 insertions(+), 15 deletions(-)

-- 
2.11.0

Re: [PATCH V3] cxl: Fixes for Coherent Accelerator Interface Architecture 2.0

2017-06-14 Thread christophe lombard


Le 14/06/2017 à 07:01, Michael Ellerman a écrit :

Christophe Lombard  writes:


A previous set of patches "cxl: Add support for Coherent Accelerator
Interface Architecture 2.0" has introduced a new support for the CAPI
cards.

Which commit is that?

cheers


Here are the commit ids of the patchset:
cxl: Add support for Coherent Accelerator Interface Architecture 2.0

patch1: aba81433b50350fde68bf80fe9f75d671e15b5ae
patch2: 66ef20c7834b7df18168b12a57ef01c6ae0d1a81
patch3: 6dd2d23403396d8e6d153a6c9db56e1a1012bad8
patch4: bdd2e7150644fee4de7167a3e08294ef32eeda11
patch5: 64663f372c72cedeba1b1dc86df9cc159ae5a93d
patch6: abd1d99bb3da42d6c7341c14986f5b8f4dcc6bd5
patch7: f24be42aab37c6d07c05126673138e06223a6399

Thanks


These patches have been tested on Simulation environment and
quite a bit of them have been tested on real hardware.

This patch brings new fixes after a series of tests carried out on
new equipment:
* Add POWER9 definition.
* Re-enable any masked interrupts when the AFU is not activated after
   resetting the AFU.
* Remove the api cxl_is_psl8/9 which is no longer useful.
* Do not dump CAPI1 registers.
* Rewrite cxl_is_page_fault() function.
* Do not register slb callack on P9.

Changelog[v3]
  - Rebase to latest upstream.
  - Update the patch's header.
  - Add new test in cxl_is_page_fault().

Changelog[v2]
  - Rebase to latest upstream.
  - Update cxl_is_page_fault() to handle the checkout response status.
  - Add comments.

Signed-off-by: Christophe Lombard 
---
  drivers/misc/cxl/context.c |  6 +++---
  drivers/misc/cxl/cxl.h | 18 +-
  drivers/misc/cxl/fault.c   | 23 +++
  drivers/misc/cxl/main.c| 17 +
  drivers/misc/cxl/native.c  | 29 +
  drivers/misc/cxl/pci.c | 11 ---
  6 files changed, 57 insertions(+), 47 deletions(-)

diff --git a/drivers/misc/cxl/context.c b/drivers/misc/cxl/context.c
index 4472ce1..8c32040 100644
--- a/drivers/misc/cxl/context.c
+++ b/drivers/misc/cxl/context.c
@@ -45,7 +45,7 @@ int cxl_context_init(struct cxl_context *ctx, struct cxl_afu 
*afu, bool master)
mutex_init(>mapping_lock);
ctx->mapping = NULL;
  
-	if (cxl_is_psl8(afu)) {

+   if (cxl_is_power8()) {
spin_lock_init(>sste_lock);
  
  		/*

@@ -189,7 +189,7 @@ int cxl_context_iomap(struct cxl_context *ctx, struct 
vm_area_struct *vma)
if (start + len > ctx->afu->adapter->ps_size)
return -EINVAL;
  
-		if (cxl_is_psl9(ctx->afu)) {

+   if (cxl_is_power9()) {
/*
 * Make sure there is a valid problem state
 * area space for this AFU.
@@ -324,7 +324,7 @@ static void reclaim_ctx(struct rcu_head *rcu)
  {
struct cxl_context *ctx = container_of(rcu, struct cxl_context, rcu);
  
-	if (cxl_is_psl8(ctx->afu))

+   if (cxl_is_power8())
free_page((u64)ctx->sstp);
if (ctx->ff_page)
__free_page(ctx->ff_page);
diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index c8568ea..a03f8e7 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -357,6 +357,7 @@ static const cxl_p2n_reg_t CXL_PSL_WED_An = {0x0A0};
  #define CXL_PSL9_DSISR_An_PF_RGP  0x0090ULL  /* PTE not found 
(Radix Guest (parent)) 0b1001 */
  #define CXL_PSL9_DSISR_An_PF_HRH  0x0094ULL  /* PTE not found 
(HPT/Radix Host)   0b10010100 */
  #define CXL_PSL9_DSISR_An_PF_STEG 0x009CULL  /* PTE not found 
(STEG VA)  0b10011100 */
+#define CXL_PSL9_DSISR_An_URTCH   0x00B4ULL  /* Unsupported Radix 
Tree Configuration 0b10110100 */
  
  /** CXL_PSL_TFC_An **/

  #define CXL_PSL_TFC_An_A  (1ull << (63-28)) /* Acknowledge non-translation 
fault */
@@ -844,24 +845,15 @@ static inline bool cxl_is_power8(void)
  
  static inline bool cxl_is_power9(void)

  {
-   /* intermediate solution */
-   if (!cxl_is_power8() &&
-  (cpu_has_feature(CPU_FTRS_POWER9) ||
-   cpu_has_feature(CPU_FTR_POWER9_DD1)))
+   if (pvr_version_is(PVR_POWER9))
return true;
return false;
  }
  
-static inline bool cxl_is_psl8(struct cxl_afu *afu)

+static inline bool cxl_is_power9_dd1(void)
  {
-   if (afu->adapter->caia_major == 1)
-   return true;
-   return false;
-}
-
-static inline bool cxl_is_psl9(struct cxl_afu *afu)
-{
-   if (afu->adapter->caia_major == 2)
+   if ((pvr_version_is(PVR_POWER9)) &&
+   cpu_has_feature(CPU_FTR_POWER9_DD1))
return true;
return false;
  }
diff --git a/drivers/misc/cxl/fault.c b/drivers/misc/cxl/fault.c
index 538..c79e39b 100644
--- a/drivers/misc/cxl/fault.c
+++ b/drivers/misc/cxl/fault.c
@@ -187,7 +187,7 @@ static struct mm_struct

[PATCH] powerpc/uprobes: Implement arch_uretprobe_is_alive()

2017-06-14 Thread Naveen N. Rao

This helper is used to detect if a uprobe'd function has returned
through a setjmp/longjmp, rather than branching to the LR that was
updated previously by us. This fixes a SIGSEGV that gets generated when
programs use setjmp/longjmp with uretprobes.

We use the arm64 model (arch/arm64/kernel/probes/uprobes.c:
arch_uretprobe_is_alive()) for detecting when stack frames have been
removed from under us.

Reference:
https://marc.info/?l=linux-kernel=143748610330073
commit 7b868e4802a86 ("uprobes/x86: Reimplement arch_uretprobe_is_alive()")
commit db087ef69a2b1 ("uprobes/x86: Make arch_uretprobe_is_alive(RP_CHECK_CALL) 
more
clever")

Tested with the test program from:
https://sourceware.org/git/gitweb.cgi?p=systemtap.git;a=blob;f=testsuite/systemtap.base/bz5274.c;hb=HEAD

And this script:
$ cat test.sh
#!/bin/bash

perf probe -x ./bz5274 -a bz5274_main_return=main%return
perf probe -x ./bz5274 -a bz5274_funca_return=funca%return
perf probe -x ./bz5274 -a bz5274_funcb_return=funcb%return
perf probe -x ./bz5274 -a bz5274_funcc_return=funcc%return
perf probe -x ./bz5274 -a bz5274_funcd_return=funcd%return

perf record -e 'probe_bz5274:*' -aR ./bz5274

Reported-by: Gustavo Luiz Duarte 
Reported-by: z...@redhat.com
Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/kernel/uprobes.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c
index 003b20964ea0..5d105b8eeece 100644
--- a/arch/powerpc/kernel/uprobes.c
+++ b/arch/powerpc/kernel/uprobes.c
@@ -205,3 +205,12 @@ arch_uretprobe_hijack_return_addr(unsigned long 
trampoline_vaddr, struct pt_regs
 
return orig_ret_vaddr;
 }
+
+bool arch_uretprobe_is_alive(struct return_instance *ret, enum rp_check ctx,
+   struct pt_regs *regs)
+{
+   if (ctx == RP_CHECK_CHAIN_CALL)
+   return regs->gpr[1] <= ret->stack;
+   else
+   return regs->gpr[1] < ret->stack;
+}
-- 
2.12.2

Re: [PATCH v2 6/6] ima: Support module-style appended signatures for appraisal

2017-06-14 Thread Mimi Zohar

Hi Thiago,

On Wed, 2017-06-07 at 22:49 -0300, Thiago Jung Bauermann wrote:
> This patch introduces the modsig keyword to the IMA policy syntax to
> specify that a given hook should expect the file to have the IMA signature
> appended to it. Here is how it can be used in a rule:
> 
> appraise func=KEXEC_KERNEL_CHECK appraise_type=modsig|imasig
> 
> With this rule, IMA will accept either an appended signature or a signature
> stored in the extended attribute. In that case, it will first check whether
> there is an appended signature, and if not it will read it from the
> extended attribute.
> 
> The format of the appended signature is the same used for signed kernel
> modules. This means that the file can be signed with the scripts/sign-file
> tool, with a command line such as this:
> 
> $ sign-file sha256 privkey_ima.pem x509_ima.der vmlinux
> 
> This code only works for files that are hashed from a memory buffer, not
> for files that are read from disk at the time of hash calculation. In other
> words, only hooks that use kernel_read_file can support appended
> signatures. This means that only FIRMWARE_CHECK, KEXEC_KERNEL_CHECK,
> KEXEC_INITRAMFS_CHECK and POLICY_CHECK can be supported.
> 
> This feature warrants a separate config option because it depends on many
> other config options:
> 
>  ASYMMETRIC_KEY_TYPE n -> y
>  CRYPTO_RSA n -> y
>  INTEGRITY_SIGNATURE n -> y
>  MODULE_SIG_FORMAT n -> y
>  SYSTEM_DATA_VERIFICATION n -> y
> +ASN1 y
> +ASYMMETRIC_PUBLIC_KEY_SUBTYPE y
> +CLZ_TAB y
> +CRYPTO_AKCIPHER y
> +IMA_APPRAISE_MODSIG y
> +IMA_TRUSTED_KEYRING n
> +INTEGRITY_ASYMMETRIC_KEYS y
> +INTEGRITY_TRUSTED_KEYRING n
> +MPILIB y
> +OID_REGISTRY y
> +PKCS7_MESSAGE_PARSER y
> +PKCS7_TEST_KEY n
> +SECONDARY_TRUSTED_KEYRING n
> +SIGNATURE y
> +SIGNED_PE_FILE_VERIFICATION n
> +SYSTEM_EXTRA_CERTIFICATE n
> +SYSTEM_TRUSTED_KEYRING y
> +SYSTEM_TRUSTED_KEYS ""
> +X509_CERTIFICATE_PARSER y
> 
> The change in CONFIG_INTEGRITY_SIGNATURE to select CONFIG_KEYS instead of
> depending on it is to avoid a dependency recursion in
> CONFIG_IMA_APPRAISE_MODSIG, because CONFIG_MODULE_SIG_FORMAT selects
> CONFIG_KEYS and Kconfig complains that CONFIG_INTEGRITY_SIGNATURE depends
> on it.
> 
> Signed-off-by: Thiago Jung Bauermann 

Thank you, Thiago.  Appended signatures seem to be working proper now
with multiple keys on the IMA keyring.

The length of this patch description is a good indication that this
patch needs to be broken up for easier review.  A few
comments/suggestions inline below.

> ---
>  crypto/asymmetric_keys/pkcs7_parser.c |  12 +++
>  include/crypto/pkcs7.h|   3 +
>  security/integrity/Kconfig|   2 +-
>  security/integrity/digsig.c   |  28 +++--
>  security/integrity/ima/Kconfig|  13 +++
>  security/integrity/ima/Makefile   |   1 +
>  security/integrity/ima/ima.h  |  53 ++
>  security/integrity/ima/ima_api.c  |   2 +-
>  security/integrity/ima/ima_appraise.c |  41 ++--
>  security/integrity/ima/ima_main.c |  91 
>  security/integrity/ima/ima_modsig.c   | 167 
> ++
>  security/integrity/ima/ima_policy.c   |  26 +++--
>  security/integrity/ima/ima_template_lib.c |  14 ++-
>  security/integrity/integrity.h|   5 +-
>  14 files changed, 404 insertions(+), 54 deletions(-)
> 
> diff --git a/crypto/asymmetric_keys/pkcs7_parser.c 
> b/crypto/asymmetric_keys/pkcs7_parser.c
> index af4cd8649117..e41beda297a8 100644
> --- a/crypto/asymmetric_keys/pkcs7_parser.c
> +++ b/crypto/asymmetric_keys/pkcs7_parser.c
> @@ -673,3 +673,15 @@ int pkcs7_note_signed_info(void *context, size_t hdrlen,
>   return -ENOMEM;
>   return 0;
>  }
> +
> +/**
> + * pkcs7_get_message_sig - get signature in @pkcs7
> + *
> + * This function doesn't meaningfully support messages with more than one
> + * signature. It will always return the first signature.
> + */
> +const struct public_key_signature *pkcs7_get_message_sig(
> + const struct pkcs7_message *pkcs7)
> +{
> + return pkcs7->signed_infos ? pkcs7->signed_infos->sig : NULL;
> +}
> diff --git a/include/crypto/pkcs7.h b/include/crypto/pkcs7.h
> index 583f199400a3..a05a0d7214e6 100644
> --- a/include/crypto/pkcs7.h
> +++ b/include/crypto/pkcs7.h
> @@ -29,6 +29,9 @@ extern int pkcs7_get_content_data(const struct 
> pkcs7_message *pkcs7,
> const void **_data, size_t *_datalen,
> size_t *_headerlen);
> 
> +extern const struct public_key_signature *pkcs7_get_message_sig(
> + const struct pkcs7_message *pkcs7);
> +
>  /*
>   * pkcs7_trust.c
>   */
> diff --git a/security/integrity/Kconfig b/security/integrity/Kconfig
> index da9565891738..0d642e0317c7 100644
> --- a/security/integrity/Kconfig
> +++ b/security/integrity/Kconfig
> @@ -17,8

[PATCH 3/3] cpuidle: powerpc: no memory barrier after break from idle

A memory barrier is not required after the task wakes up,
only if we clear the polling flag before waking. The case
where we have work to do is the important one, so optimise
for it.

Reviewed-by: Vaidyanathan Srinivasan 
Signed-off-by: Nicholas Piggin 
---
 drivers/cpuidle/cpuidle-powernv.c | 11 +--
 drivers/cpuidle/cpuidle-pseries.c | 11 +--
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/cpuidle/cpuidle-powernv.c 
b/drivers/cpuidle/cpuidle-powernv.c
index abf2ffcd4a0a..722b81b03593 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -59,14 +59,21 @@ static int snooze_loop(struct cpuidle_device *dev,
ppc64_runlatch_off();
HMT_very_low();
while (!need_resched()) {
-   if (likely(snooze_timeout_en) && get_tb() > snooze_exit_time)
+   if (likely(snooze_timeout_en) && get_tb() > snooze_exit_time) {
+   /*
+* Task has not woken up but we are exiting the polling
+* loop anyway. Require a barrier after polling is
+* cleared to order subsequent test of need_resched().
+*/
+   clear_thread_flag(TIF_POLLING_NRFLAG);
+   smp_mb();
break;
+   }
}
 
HMT_medium();
ppc64_runlatch_on();
clear_thread_flag(TIF_POLLING_NRFLAG);
-   smp_mb();
 
return index;
 }
diff --git a/drivers/cpuidle/cpuidle-pseries.c 
b/drivers/cpuidle/cpuidle-pseries.c
index a404f352d284..e9b3853d93ea 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -71,13 +71,20 @@ static int snooze_loop(struct cpuidle_device *dev,
while (!need_resched()) {
HMT_low();
HMT_very_low();
-   if (snooze_timeout_en && get_tb() > snooze_exit_time)
+   if (likely(snooze_timeout_en) && get_tb() > snooze_exit_time) {
+   /*
+* Task has not woken up but we are exiting the polling
+* loop anyway. Require a barrier after polling is
+* cleared to order subsequent test of need_resched().
+*/
+   clear_thread_flag(TIF_POLLING_NRFLAG);
+   smp_mb();
break;
+   }
}
 
HMT_medium();
clear_thread_flag(TIF_POLLING_NRFLAG);
-   smp_mb();
 
idle_loop_epilog(in_purr);
 
-- 
2.11.0

Re: RESEND Re: [Patch 2/2]: powerpc/hotplug/mm: Fix hot-add memory node assoc

2017-06-14 Thread Michael Bringmann

Hello:

On 06/14/2017 12:27 AM, Balbir Singh wrote:
> On Wed, Jun 14, 2017 at 3:25 PM, Balbir Singh  wrote:
>>
>>
>> On Wed, Jun 14, 2017 at 8:21 AM, Michael Bringmann 
>> wrote:
>>>
>>> On a related note, we are discussing the addition of 2 new device-tree
>>> properties
>>> with Pete Heyrman and his fellows that should simplify the determination
>>> of the
>>> set of required nodes.
>>>
>>> * One property would provide the total/max number of nodes needed by the
>>> kernel
>>>   on the current hardware.
>>
>>
> 
> Yes, that would be nice to have
> 
>>
>>>
>>> * A second property would provide the total/max number of nodes that the
>>> kernel
>>>   could use on any system to which it could be migrated.
>>>
>>
> 
> Not sure about this one, are you suggesting more memory can be added
> depending on the migration target?

We would use only one of these numbers to allocate nodes.  I have only been
on the periphery of the discussions, so I can not communicate the full
reasoning as to why both measures would be needed.  We would like to have
the first number for node allocation/initialization, but if only the second
value were provided, we would likely need to use it.

>>
>>
>>>
>>> These properties aren't available, yet, and it takes time to define new
>>> properties
>>> in the PAPR and have them implemented in pHyp and the kernel.  As an
>>> intermediary
>>> step, the systems which are doing a lot of dynamic hot-add/hot-remove
>>> configuration
>>> could provide equivalent information to the PowerPC kernel with a command
>>> line
>>> parameter.  The 'numa.c' code would then read this value and fill in the
>>> necessary
>>> entries in the 'node_possible_map'.
>>>
>>> Would you foresee any problems with using such a feature?
>>
>>
> 
> Sorry my mailer goofed up, resending
> 
> Balbir Singh
> 

Thanks.


-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:   (512) 466-0650
m...@linux.vnet.ibm.com

Re: [PATCH 09/13] powerpc/64s: cpuidle set polling before enabling irqs

Nicholas Piggin  writes:

> On Wed, 14 Jun 2017 21:40:52 +1000
> Michael Ellerman  wrote:
>
>> Nicholas Piggin  writes:
>> 
>> > local_irq_enable can cause interrupts to be taken which could
>> > take significant amount of processing time. The idle process
>> > should set its polling flag before this, so another process that
>> > wakes it during this time will not have to send an IPI.
>> >
>> > Expand the TIF_POLLING_NRFLAG coverage to as large as possible.
>> >
>> > Reviewed-by: Gautham R. Shenoy 
>> > Signed-off-by: Nicholas Piggin 
>> > ---
>> >  drivers/cpuidle/cpuidle-powernv.c | 4 +++-
>> >  drivers/cpuidle/cpuidle-pseries.c | 3 ++-
>> >  2 files changed, 5 insertions(+), 2 deletions(-)  
>> 
>> I don't think the cpuidle folks are really interested in these changes,
>> but we should Cc them to be polite.
>> 
>> Can you resend patches 9, 10, 11 with a subject like:
>> 
>>   "cpuidle: powernv: Set polling ..."
>> 
>> And Cc the cpuidle folks:
>> 
>> $ ./scripts/get_maintainer.pl -f drivers/cpuidle
>> r...@rjwysocki.net
>> daniel.lezc...@linaro.org
>> linux...@vger.kernel.org
>> linux-ker...@vger.kernel.org
>
> Yeah I can do that. I'll send them as thier own series. They don't
> depend on any of the patches in this series, so I should have done
> that in the first place.

Great thanks.

cheers

[PATCH V2] cxl: Export library to support IBM XSL

2017-06-14 Thread Christophe Lombard

This patch exports a in-kernel 'library' API which can be called by
other drivers to help interacting with an IBM XSL on a POWER9 system.

The XSL (Translation Service Layer) is a stripped down version of the
PSL (Power Service Layer) used in some cards such as the Mellanox CX5.
Like the PSL, it implements the CAIA architecture, but has a number
of differences, mostly in it's implementation dependent registers.

The XSL also uses a special DMA cxl mode, which uses a slightly
different init sequence for the CAPP and PHB.

Changelog[v2]
 - Rebase to latest upstream.
 - Return -EFAULT in case of NULL pointer in cxllib_handle_fault().
 - Reverse parameters when copro_handle_mm_fault() is called.

Signed-off-by: Christophe Lombard 
---

This applies on top of this patch:
http://patchwork.ozlabs.org/patch/775322/
---
 arch/powerpc/include/asm/opal-api.h |   1 +
 drivers/misc/cxl/Kconfig|   5 +
 drivers/misc/cxl/Makefile   |   2 +-
 drivers/misc/cxl/cxl.h  |   7 ++
 drivers/misc/cxl/cxllib.c   | 241 
 drivers/misc/cxl/fault.c|  25 ++--
 drivers/misc/cxl/native.c   |  16 ++-
 drivers/misc/cxl/pci.c  |  41 +++---
 include/misc/cxllib.h   | 132 
 9 files changed, 439 insertions(+), 31 deletions(-)
 create mode 100644 drivers/misc/cxl/cxllib.c
 create mode 100644 include/misc/cxllib.h

diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index cb3e624..3e0be78 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -877,6 +877,7 @@ enum {
OPAL_PHB_CAPI_MODE_SNOOP_OFF= 2,
OPAL_PHB_CAPI_MODE_SNOOP_ON = 3,
OPAL_PHB_CAPI_MODE_DMA  = 4,
+   OPAL_PHB_CAPI_MODE_DMA_TVT1 = 5,
 };
 
 /* OPAL I2C request */
diff --git a/drivers/misc/cxl/Kconfig b/drivers/misc/cxl/Kconfig
index b75cf83..93397cb 100644
--- a/drivers/misc/cxl/Kconfig
+++ b/drivers/misc/cxl/Kconfig
@@ -11,11 +11,16 @@ config CXL_AFU_DRIVER_OPS
bool
default n
 
+config CXL_LIB
+   bool
+   default n
+
 config CXL
tristate "Support for IBM Coherent Accelerators (CXL)"
depends on PPC_POWERNV && PCI_MSI && EEH
select CXL_BASE
select CXL_AFU_DRIVER_OPS
+   select CXL_LIB
default m
help
  Select this option to enable driver support for IBM Coherent
diff --git a/drivers/misc/cxl/Makefile b/drivers/misc/cxl/Makefile
index c14fd6b..0b5fd74 100644
--- a/drivers/misc/cxl/Makefile
+++ b/drivers/misc/cxl/Makefile
@@ -3,7 +3,7 @@ ccflags-$(CONFIG_PPC_WERROR)+= -Werror
 
 cxl-y  += main.o file.o irq.o fault.o native.o
 cxl-y  += context.o sysfs.o pci.o trace.o
-cxl-y  += vphb.o phb.o api.o
+cxl-y  += vphb.o phb.o api.o cxllib.o
 cxl-$(CONFIG_PPC_PSERIES)  += flash.o guest.o of.o hcalls.o
 cxl-$(CONFIG_DEBUG_FS) += debugfs.o
 obj-$(CONFIG_CXL)  += cxl.o
diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index a03f8e7..81e01f0 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -1010,6 +1010,8 @@ static inline void cxl_debugfs_add_afu_regs_psl8(struct 
cxl_afu *afu, struct den
 
 void cxl_handle_fault(struct work_struct *work);
 void cxl_prefault(struct cxl_context *ctx, u64 wed);
+int cxl_handle_page_fault(bool kernel_context, struct mm_struct *mm,
+ u64 dsisr, u64 dar);
 
 struct cxl *get_cxl_adapter(int num);
 int cxl_alloc_sst(struct cxl_context *ctx);
@@ -1061,6 +1063,11 @@ int cxl_afu_slbia(struct cxl_afu *afu);
 int cxl_data_cache_flush(struct cxl *adapter);
 int cxl_afu_disable(struct cxl_afu *afu);
 int cxl_psl_purge(struct cxl_afu *afu);
+int cxl_calc_capp_routing(struct pci_dev *dev, u64 *chipid,
+ u32 *phb_index, u64 *capp_unit_id);
+int cxl_slot_is_switched(struct pci_dev *dev);
+int cxl_get_xsl9_dsnctl(u64 capp_unit_id, u64 *reg);
+u64 cxl_calculate_sr(bool master, bool kernel, bool real_mode, bool p9);
 
 void cxl_native_irq_dump_regs_psl9(struct cxl_context *ctx);
 void cxl_native_irq_dump_regs_psl8(struct cxl_context *ctx);
diff --git a/drivers/misc/cxl/cxllib.c b/drivers/misc/cxl/cxllib.c
new file mode 100644
index 000..63b6280
--- /dev/null
+++ b/drivers/misc/cxl/cxllib.c
@@ -0,0 +1,241 @@
+/*
+ * Copyright 2016 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include 
+#include 
+#include "cxl.h"
+#include 
+#include 
+
+#define CXL_INVALID_DRA ~0ull
+#define CXL_DUMMY_READ_SIZE 128
+#define CXL_DUMMY_READ_ALIGN8
+#define

Re: [PATCH 2/4] powerpc/64: context switch avoid reservation-clearing instruction

Nicholas Piggin  writes:

> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 803c3bc274c4..1f0688ad09d7 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2875,6 +2875,12 @@ context_switch(struct rq *rq, struct task_struct *prev,
>   rq_unpin_lock(rq, rf);
>   spin_release(>lock.dep_map, 1, _THIS_IP_);
>  
> + /*
> +  * Some architectures require that a spin lock is taken before
> +  * _switch. The rq_lock satisfies this condition. See powerpc
> +  * _switch for details.
> +  */
> +
>   /* Here we just switch the register state and the stack. */
>   switch_to(prev, next, prev);
>   barrier();

I dropped this hunk, if you want to merge it you can resend it and get
an ack from Peterz.

cheers

[PATCH 2/3] cpuidle: powerpc: read mostly for common globals

Ensure these don't get put into bouncing cachelines.

Reviewed-by: Vaidyanathan Srinivasan 
Reviewed-by: Gautham R. Shenoy 
Signed-off-by: Nicholas Piggin 
---
 drivers/cpuidle/cpuidle-powernv.c | 10 +-
 drivers/cpuidle/cpuidle-pseries.c |  8 
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/cpuidle/cpuidle-powernv.c 
b/drivers/cpuidle/cpuidle-powernv.c
index 77bc50ad9f57..abf2ffcd4a0a 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -32,18 +32,18 @@ static struct cpuidle_driver powernv_idle_driver = {
.owner= THIS_MODULE,
 };
 
-static int max_idle_state;
-static struct cpuidle_state *cpuidle_state_table;
+static int max_idle_state __read_mostly;
+static struct cpuidle_state *cpuidle_state_table __read_mostly;
 
 struct stop_psscr_table {
u64 val;
u64 mask;
 };
 
-static struct stop_psscr_table stop_psscr_table[CPUIDLE_STATE_MAX];
+static struct stop_psscr_table stop_psscr_table[CPUIDLE_STATE_MAX] 
__read_mostly;
 
-static u64 snooze_timeout;
-static bool snooze_timeout_en;
+static u64 snooze_timeout __read_mostly;
+static bool snooze_timeout_en __read_mostly;
 
 static int snooze_loop(struct cpuidle_device *dev,
struct cpuidle_driver *drv,
diff --git a/drivers/cpuidle/cpuidle-pseries.c 
b/drivers/cpuidle/cpuidle-pseries.c
index 7b12bb2ea70f..a404f352d284 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -25,10 +25,10 @@ struct cpuidle_driver pseries_idle_driver = {
.owner= THIS_MODULE,
 };
 
-static int max_idle_state;
-static struct cpuidle_state *cpuidle_state_table;
-static u64 snooze_timeout;
-static bool snooze_timeout_en;
+static int max_idle_state __read_mostly;
+static struct cpuidle_state *cpuidle_state_table __read_mostly;
+static u64 snooze_timeout __read_mostly;
+static bool snooze_timeout_en __read_mostly;
 
 static inline void idle_loop_prolog(unsigned long *in_purr)
 {
-- 
2.11.0

Re: [PATCH 12/13] powerpc/64: runlatch CTRL[RUN] set optimisation

On Wed, 14 Jun 2017 21:38:36 +1000
Michael Ellerman  wrote:

> Nicholas Piggin  writes:
> 
> > The CTRL register is read-only except bit 63 which is the run latch
> > control. This means it can be updated with a mtspr rather than
> > mfspr/mtspr.  
> 
> Turns out this doesn't work on Cell.
> 
> There's an extra field in there:
> 
>   Thread enable bits (Read/Write)
>   The hypervisor state can suspend its own thread by setting the TE bit
>   for its thread to '0’. The hypervisor state can resume the opposite
>   thread by setting the TE bit for the opposite thread to '1'. The
>   hypervisor state cannot suspend the opposite thread by setting the TE
>   bit for the opposite thread to ‘0’. This setting is ignored and does not
>   cause an error.
>   
>   TE0 is the thread enable bit for thread 0.
>   TE1 is the thread enable bit for thread 1.
>   
>   If thread 0 executes the mtctrl instruction, these are the bit values:
>   
>   [TE0 TE1] Description
> 0   0   Disable or suspend thread 0; thread 1 unchanged.
> 0   1   Disable or suspend thread 0; enable or resume thread 1 if it was 
> disabled.
> 1   0   Unchanged.
> 1   1   Enable or resume thread 1 if it was disabled.
>   
>   If thread 1 executes the mtctrl instruction, these are the bit values:
>   
>   [TE0 TE1] Description
> 0   0Thread 0 unchanged; disable or suspend thread 1.
> 0   1Unchanged.
> 1   0Enable or resume thread 0 if it was disabled; disable or suspend 
> thread 1.
> 1   1Enable or resume thread 0 if it was disabled.
> 
> 
> So writing either 0 or CTRL_RUNLATCH (1) will disable the thread that
> does the write - :D
> 
> For now I'll just drop this.

Ouch, good catch, thanks. I'll go with something like this (below)
instead, but I'll do some testing and get numbers first. So dropping
it should be fine.

---
 arch/powerpc/kernel/process.c | 35 +++
 1 file changed, 27 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index baae104b16c7..f587c1fdf981 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1960,11 +1960,25 @@ void show_stack(struct task_struct *tsk, unsigned long 
*stack)
 void notrace __ppc64_runlatch_on(void)
 {
struct thread_info *ti = current_thread_info();
-   unsigned long ctrl;
 
-   ctrl = mfspr(SPRN_CTRLF);
-   ctrl |= CTRL_RUNLATCH;
-   mtspr(SPRN_CTRLT, ctrl);
+   if (cpu_has_feature(CPU_FTR_ARCH_206)) {
+   /*
+* Least significant bit (RUN) is the only writable bit of
+* the CTRL register, so we can avoid mfspr. 2.06 is not the
+* earliest ISA where this is the case, but it's convenient.
+*/
+   mtspr(SPRN_CTRLT, CTRL_RUNLATCH);
+   } else {
+   unsigned long ctrl;
+
+   /*
+* Some architectures (e.g., Cell) have writable fields other
+* than RUN, so do the read-modify-write.
+*/
+   ctrl = mfspr(SPRN_CTRLF);
+   ctrl |= CTRL_RUNLATCH;
+   mtspr(SPRN_CTRLT, ctrl);
+   }
 
ti->local_flags |= _TLF_RUNLATCH;
 }
@@ -1973,13 +1987,18 @@ void notrace __ppc64_runlatch_on(void)
 void notrace __ppc64_runlatch_off(void)
 {
struct thread_info *ti = current_thread_info();
-   unsigned long ctrl;
 
ti->local_flags &= ~_TLF_RUNLATCH;
 
-   ctrl = mfspr(SPRN_CTRLF);
-   ctrl &= ~CTRL_RUNLATCH;
-   mtspr(SPRN_CTRLT, ctrl);
+   if (cpu_has_feature(CPU_FTR_ARCH_206)) {
+   mtspr(SPRN_CTRLT, 0);
+   } else {
+   unsigned long ctrl;
+
+   ctrl = mfspr(SPRN_CTRLF);
+   ctrl &= ~CTRL_RUNLATCH;
+   mtspr(SPRN_CTRLT, ctrl);
+   }
 }
 #endif /* CONFIG_PPC64 */
 
-- 
2.11.0

Re: [PATCH] powerpc64/hw_breakpoints: Handle data breakpoints in radix mode

2017-06-14 Thread Naveen N. Rao

On 2017/06/14 04:41PM, Michael Ellerman wrote:
> "Aneesh Kumar K.V"  writes:
> > On Wednesday 14 June 2017 10:41 AM, Naveen N. Rao wrote:
> >> On 2017/06/14 08:38AM, Aneesh Kumar K.V wrote:
> >>> "Naveen N. Rao"  writes:
>  diff --git a/arch/powerpc/kernel/exceptions-64s.S 
>  b/arch/powerpc/kernel/exceptions-64s.S
>  index ae418b85c17c..17ee701b8336 100644
>  --- a/arch/powerpc/kernel/exceptions-64s.S
>  +++ b/arch/powerpc/kernel/exceptions-64s.S
>  @@ -1411,10 +1411,8 @@ USE_TEXT_SECTION()
>   .balign IFETCH_ALIGN_BYTES
>    do_hash_page:
>    #ifdef CONFIG_PPC_STD_MMU_64
>  -andis.  r0,r4,0xa410/* weird error? */
>  +andis.  r0,r4,0xa450/* weird error? */
> >>>
> >>> Can we convert that to a #define value. Ram did try to do that here.
> >>>
> >>> https://lists.ozlabs.org/pipermail/linuxppc-dev/2017-June/158607.html
> >> 
> >> Hmm... I feel it will be good to do that as part of Ram's series since
> >> he has already coded it up :)
> >> 
> >> Ram's patches will anyway require a rebase and the change I do here for
> >> detecting DAWR already has a #define, so it should be a simple matter of
> >> including DSISR_DABRMATCH in DSISR_PAGE_FAULT_MASK.
> >> 
> >> But, if you really feel that I should make that change here, please do
> >> let me know and I will re-spin with those changes.
> >
> > The thing is that change from 0xa410 to 0xa450 is not clear at all. And 
> > it needs proper documentation. IMHO the best way to do that is switch to 
> > #define name for that constant.
> 
> Not in this patch. It needs to be backported, so it should be as minimal
> as possible.

Ok.

> 
> The change from 0xa410 to 0xa450 does need a mention in the changelog,
> I'll add that.

Thanks, Michael!
(emails just started flowing again...)

- Naveen

Re: [PATCH] powerpc/xive: Fix offset for store EOI MMIOs

2017-06-14 Thread Benjamin Herrenschmidt

On Wed, 2017-06-14 at 14:44 +1000, Michael Ellerman wrote:
> Benjamin Herrenschmidt  writes:
> 
> > Architecturally we should apply a 0x400 offset for these. Not doing
> > it will break future HW implementations.
> 
> Can you elaborate a bit?
> 
> You're changing a write to 0x0 to be a write to 0x400, which at face
> value appears like it breaks something, or is already broken?

The code is broken today, I didn't read the spec properly. We never had
a proper DD2 model to test until recently. The offset of 0 is supposed
to remain for "triggers" though not all sources support both trigger
and store EOI, and in P9 specifically, some sources will treat 0 as a
store EOI :-) But future chips will not. So this makes us use the
properly architected offset which should work always.

Cheers,
Ben.

> 
> cheers
> 
> > diff --git a/arch/powerpc/include/asm/xive.h 
> > b/arch/powerpc/include/asm/xive.h
> > index c8a822a..c23ff43 100644
> > --- a/arch/powerpc/include/asm/xive.h
> > +++ b/arch/powerpc/include/asm/xive.h
> > @@ -94,11 +94,13 @@ struct xive_q {
> >   * store at 0 and some ESBs support doing a trigger via a
> >   * separate trigger page.
> >   */
> > -#define XIVE_ESB_GET   0x800
> > -#define XIVE_ESB_SET_PQ_00 0xc00
> > -#define XIVE_ESB_SET_PQ_01 0xd00
> > -#define XIVE_ESB_SET_PQ_10 0xe00
> > -#define XIVE_ESB_SET_PQ_11 0xf00
> > +#define XIVE_ESB_STORE_EOI 0x400 /* Store */
> > +#define XIVE_ESB_LOAD_EOI  0x000 /* Load */
> > +#define XIVE_ESB_GET   0x800 /* Load */
> > +#define XIVE_ESB_SET_PQ_00 0xc00 /* Load */
> > +#define XIVE_ESB_SET_PQ_01 0xd00 /* Load */
> > +#define XIVE_ESB_SET_PQ_10 0xe00 /* Load */
> > +#define XIVE_ESB_SET_PQ_11 0xf00 /* Load */
> >  
> >  #define XIVE_ESB_VAL_P 0x2
> >  #define XIVE_ESB_VAL_Q 0x1
> > diff --git a/arch/powerpc/kvm/book3s_xive_template.c 
> > b/arch/powerpc/kvm/book3s_xive_template.c
> > index 023a311..4636ca6 100644
> > --- a/arch/powerpc/kvm/book3s_xive_template.c
> > +++ b/arch/powerpc/kvm/book3s_xive_template.c
> > @@ -69,7 +69,7 @@ static void GLUE(X_PFX,source_eoi)(u32 hw_irq, struct 
> > xive_irq_data *xd)
> >  {
> > /* If the XIVE supports the new "store EOI facility, use it */
> > if (xd->flags & XIVE_IRQ_FLAG_STORE_EOI)
> > -   __x_writeq(0, __x_eoi_page(xd));
> > +   __x_writeq(0, __x_eoi_page(xd) + XIVE_ESB_STORE_EOI);
> > else if (hw_irq && xd->flags & XIVE_IRQ_FLAG_EOI_FW) {
> > opal_int_eoi(hw_irq);
> > } else {
> > @@ -89,7 +89,7 @@ static void GLUE(X_PFX,source_eoi)(u32 hw_irq, struct 
> > xive_irq_data *xd)
> >  * properly.
> >  */
> > if (xd->flags & XIVE_IRQ_FLAG_LSI)
> > -   __x_readq(__x_eoi_page(xd));
> > +   __x_readq(__x_eoi_page(xd) + XIVE_ESB_LOAD_EOI);
> > else {
> > eoi_val = GLUE(X_PFX,esb_load)(xd, XIVE_ESB_SET_PQ_00);
> >  
> > diff --git a/arch/powerpc/sysdev/xive/common.c 
> > b/arch/powerpc/sysdev/xive/common.c
> > index 9138250..8f5e303 100644
> > --- a/arch/powerpc/sysdev/xive/common.c
> > +++ b/arch/powerpc/sysdev/xive/common.c
> > @@ -297,7 +297,7 @@ void xive_do_source_eoi(u32 hw_irq, struct 
> > xive_irq_data *xd)
> >  {
> > /* If the XIVE supports the new "store EOI facility, use it */
> > if (xd->flags & XIVE_IRQ_FLAG_STORE_EOI)
> > -   out_be64(xd->eoi_mmio, 0);
> > +   out_be64(xd->eoi_mmio + XIVE_ESB_STORE_EOI, 0);
> > else if (hw_irq && xd->flags & XIVE_IRQ_FLAG_EOI_FW) {
> > /*
> >  * The FW told us to call it. This happens for some

Re: [PATCH V3] cxl: Fixes for Coherent Accelerator Interface Architecture 2.0

2017-06-14 Thread Frederic Barrat




Le 13/06/2017 à 17:41, Christophe Lombard a écrit :

A previous set of patches "cxl: Add support for Coherent Accelerator
Interface Architecture 2.0" has introduced a new support for the CAPI
cards. These patches have been tested on Simulation environment and
quite a bit of them have been tested on real hardware.

This patch brings new fixes after a series of tests carried out on
new equipment:
* Add POWER9 definition.
* Re-enable any masked interrupts when the AFU is not activated after
   resetting the AFU.
* Remove the api cxl_is_psl8/9 which is no longer useful.
* Do not dump CAPI1 registers.
* Rewrite cxl_is_page_fault() function.
* Do not register slb callack on P9.

Changelog[v3]
  - Rebase to latest upstream.
  - Update the patch's header.
  - Add new test in cxl_is_page_fault().

Changelog[v2]
  - Rebase to latest upstream.
  - Update cxl_is_page_fault() to handle the checkout response status.
  - Add comments.

Signed-off-by: Christophe Lombard 
---


Looks good to me, thanks!
Acked-by: Frederic Barrat 




  drivers/misc/cxl/context.c |  6 +++---
  drivers/misc/cxl/cxl.h | 18 +-
  drivers/misc/cxl/fault.c   | 23 +++
  drivers/misc/cxl/main.c| 17 +
  drivers/misc/cxl/native.c  | 29 +
  drivers/misc/cxl/pci.c | 11 ---
  6 files changed, 57 insertions(+), 47 deletions(-)

diff --git a/drivers/misc/cxl/context.c b/drivers/misc/cxl/context.c
index 4472ce1..8c32040 100644
--- a/drivers/misc/cxl/context.c
+++ b/drivers/misc/cxl/context.c
@@ -45,7 +45,7 @@ int cxl_context_init(struct cxl_context *ctx, struct cxl_afu 
*afu, bool master)
mutex_init(>mapping_lock);
ctx->mapping = NULL;

-   if (cxl_is_psl8(afu)) {
+   if (cxl_is_power8()) {
spin_lock_init(>sste_lock);

/*
@@ -189,7 +189,7 @@ int cxl_context_iomap(struct cxl_context *ctx, struct 
vm_area_struct *vma)
if (start + len > ctx->afu->adapter->ps_size)
return -EINVAL;

-   if (cxl_is_psl9(ctx->afu)) {
+   if (cxl_is_power9()) {
/*
 * Make sure there is a valid problem state
 * area space for this AFU.
@@ -324,7 +324,7 @@ static void reclaim_ctx(struct rcu_head *rcu)
  {
struct cxl_context *ctx = container_of(rcu, struct cxl_context, rcu);

-   if (cxl_is_psl8(ctx->afu))
+   if (cxl_is_power8())
free_page((u64)ctx->sstp);
if (ctx->ff_page)
__free_page(ctx->ff_page);
diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index c8568ea..a03f8e7 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -357,6 +357,7 @@ static const cxl_p2n_reg_t CXL_PSL_WED_An = {0x0A0};
  #define CXL_PSL9_DSISR_An_PF_RGP  0x0090ULL  /* PTE not found 
(Radix Guest (parent)) 0b1001 */
  #define CXL_PSL9_DSISR_An_PF_HRH  0x0094ULL  /* PTE not found 
(HPT/Radix Host)   0b10010100 */
  #define CXL_PSL9_DSISR_An_PF_STEG 0x009CULL  /* PTE not found 
(STEG VA)  0b10011100 */
+#define CXL_PSL9_DSISR_An_URTCH   0x00B4ULL  /* Unsupported Radix 
Tree Configuration 0b10110100 */

  /** CXL_PSL_TFC_An **/
  #define CXL_PSL_TFC_An_A  (1ull << (63-28)) /* Acknowledge non-translation 
fault */
@@ -844,24 +845,15 @@ static inline bool cxl_is_power8(void)

  static inline bool cxl_is_power9(void)
  {
-   /* intermediate solution */
-   if (!cxl_is_power8() &&
-  (cpu_has_feature(CPU_FTRS_POWER9) ||
-   cpu_has_feature(CPU_FTR_POWER9_DD1)))
+   if (pvr_version_is(PVR_POWER9))
return true;
return false;
  }

-static inline bool cxl_is_psl8(struct cxl_afu *afu)
+static inline bool cxl_is_power9_dd1(void)
  {
-   if (afu->adapter->caia_major == 1)
-   return true;
-   return false;
-}
-
-static inline bool cxl_is_psl9(struct cxl_afu *afu)
-{
-   if (afu->adapter->caia_major == 2)
+   if ((pvr_version_is(PVR_POWER9)) &&
+   cpu_has_feature(CPU_FTR_POWER9_DD1))
return true;
return false;
  }
diff --git a/drivers/misc/cxl/fault.c b/drivers/misc/cxl/fault.c
index 538..c79e39b 100644
--- a/drivers/misc/cxl/fault.c
+++ b/drivers/misc/cxl/fault.c
@@ -187,7 +187,7 @@ static struct mm_struct *get_mem_context(struct cxl_context 
*ctx)

  static bool cxl_is_segment_miss(struct cxl_context *ctx, u64 dsisr)
  {
-   if ((cxl_is_psl8(ctx->afu)) && (dsisr & CXL_PSL_DSISR_An_DS))
+   if ((cxl_is_power8() && (dsisr & CXL_PSL_DSISR_An_DS)))
return true;

return false;
@@ -195,16 +195,23 @@ static bool cxl_is_segment_miss(struct cxl_context *ctx, 
u64 dsisr)

  static bool cxl_is_page_fault(struct cxl_context *ctx, u64

Re: [PATCH v1 1/3] powerpc, 8xx: remove support for 8xx

2017-06-14 Thread Heiko Schocher

Hello Christophe,

Am 14.06.2017 um 09:40 schrieb Christophe LEROY:

Le 13/06/2017 à 09:37, Heiko Schocher a écrit :

Hello Christophe,

Am 13.06.2017 um 07:40 schrieb Christophe LEROY:

Le 13/06/2017 à 07:26, Christophe LEROY a écrit :

There was for long time no activity in the 8xx area.
We need to go further and convert to Kconfig, but it
turned out, nobody is interested anymore in 8xx,
so remove it (with a heavy heart, knowing that I remove
here the root of U-Boot).

Signed-off-by: Heiko Schocher

Please don't do that.

Tom already applied the patch to mainline ...

Can be reverted ?

As you can see in Linux kernel activity, there have been a lot activity related
to the 8xx,
including but not limited to:
1/ HW Crypto for the 885 (Talitos SEC1)
2/ TX NAPI in the 8xx Ethernet driver
3/ Scatter/Gather support in the 8xx Ethernet driver
4/ Hugepages
5/ Perf events
6/ hw breakpoints
7/ Linear memory mapping via Large TLBs

Thats linux ... not u-boot!

Sure, but it shows there is still interest to that processor. The 885 is a good
recent 8xx.
u-boot is not really something we focus on. We update it once a year, as far as
it can start our
Linux box we are happy.

Hmm...

The followings links give an overview of the activity:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/drivers/crypto/talitos.c
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/arch/powerpc/kernel/head_8xx.S

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/arch/powerpc/mm/8xx_mmu.c

We have thousands of boards with mpc885 running and requiring support for at
least the next 10
years.

Ok, nice to hear!

How can the 8xx survive without U-boot support ?

Tom asked (I think a lot of times) regarding converting mpc8xx to
Kconfig / DM and nobody did the necessary steps. We also asked
our customers if they can do the necessary changes, none was interested.

I did't know. Indeed I was not following uboot activity until someone who knows
the level of
interest we have in 8xx alerted me yesterday.
It would have been nice if you had notified linuxppc-dev list.

So, if you need mpc8xx support in U-Boot, simply add it again with
Kconfig and DM support included!

Ok, I'll try and come with a patch to convert 8xx in the coming weeks. In the
mean time please
revert the deletion in order to avoid nightmare conflicts when the converting
patch comes.

I vote for making a patch which adds new mpc8xx support, as we had a lot
of crap in the code, but thats Tom decision!

bye,
Heiko
--
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany

Re: [PATCH] powerpc64/hw_breakpoints: Handle data breakpoints in radix mode

2017-06-14 Thread Anshuman Khandual

On 06/14/2017 12:12 AM, Naveen N. Rao wrote:
> On P9, trying to use data breakpoints throws the splat shown below (*).
> This is because the check for a data breakpoint in DSISR is in
> do_hash_page(). Move this check to handle_page_fault() so as to catch
> data breakpoints in both hash and radix MMU modes.

Why cant we check for DSISR inside do_hash_page() on P9 ?

Re: [PATCH v1 1/3] powerpc, 8xx: remove support for 8xx

2017-06-14 Thread Christophe LEROY

Le 13/06/2017 à 09:37, Heiko Schocher a écrit :

Hello Christophe,

Am 13.06.2017 um 07:40 schrieb Christophe LEROY:

Le 13/06/2017 à 07:26, Christophe LEROY a écrit :

Signed-off-by: Heiko Schocher

Please don't do that.

Tom already applied the patch to mainline ...

Can be reverted ?

As you can see in Linux kernel activity, there have been a lot
activity related to the 8xx,

including but not limited to:
1/ HW Crypto for the 885 (Talitos SEC1)
2/ TX NAPI in the 8xx Ethernet driver
3/ Scatter/Gather support in the 8xx Ethernet driver
4/ Hugepages
5/ Perf events
6/ hw breakpoints
7/ Linear memory mapping via Large TLBs

Thats linux ... not u-boot!

Sure, but it shows there is still interest to that processor. The 885 is
a good recent 8xx.
u-boot is not really something we focus on. We update it once a year, as
far as it can start our Linux box we are happy.

The followings links give an overview of the activity:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/drivers/crypto/talitos.c

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/arch/powerpc/kernel/head_8xx.S

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/arch/powerpc/mm/8xx_mmu.c

We have thousands of boards with mpc885 running and requiring support
for at least the next 10 years.

Ok, nice to hear!

How can the 8xx survive without U-boot support ?

I did't know. Indeed I was not following uboot activity until someone
who knows the level of interest we have in 8xx alerted me yesterday.

It would have been nice if you had notified linuxppc-dev list.

So, if you need mpc8xx support in U-Boot, simply add it again with
Kconfig and DM support included!

Ok, I'll try and come with a patch to convert 8xx in the coming weeks.
In the mean time please revert the deletion in order to avoid nightmare
conflicts when the converting patch comes.

Thanks
Christophe

Re: [PATCH] powerpc64/hw_breakpoints: Handle data breakpoints in radix mode

2017-06-14 Thread Ram Pai

On Wed, Jun 14, 2017 at 10:43:30AM +0530, Aneesh Kumar K.V wrote:
> 
> 
> On Wednesday 14 June 2017 10:41 AM, Naveen N. Rao wrote:
> >Hi Aneesh,
> >
> >On 2017/06/14 08:38AM, Aneesh Kumar K.V wrote:
> >>"Naveen N. Rao"  writes:
> >>
> >>>On P9, trying to use data breakpoints throws the splat shown below (*).
> >>>This is because the check for a data breakpoint in DSISR is in
> >>>do_hash_page(). Move this check to handle_page_fault() so as to catch
> >>>data breakpoints in both hash and radix MMU modes.
> >>>
> >>>While at it, also remove the label '11' that was made redundant by
> >>>commit a546498f3bf9aa ("powerpc: Call do_page_fault() with interrupts
> >>>off")
> >>>
> >>>(*)
> >>> Unable to handle kernel paging request for data at address 
> >>> 0xc0e19218
> >>> Faulting instruction address: 0xc01155e8
> >>> cpu 0x0: Vector: 300 (Data Access) at [c000ef1e7b20]
> >>> pc: c01155e8: find_pid_ns+0x48/0xe0
> >>> lr: c0116ac4: find_task_by_vpid+0x44/0x90
> >>> sp: c000ef1e7da0
> >>> msr: 90009033
> >>> dar: c0e19218
> >>> dsisr: 40
> >>> current = 0xc000f1f59700
> >>> paca= 0xcfd4 softe: 0 irq_happened: 0x01
> >>> pid   = 1192, comm = sh
> >>> Linux version 4.12.0-rc3-nnr (root@ea605ec2993c) (gcc version 5.4.0 
> >>> 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.1) ) #74 SMP Tue Jun 13 
> >>> 16:52:49 UTC 2017
> >>> enter ? for help
> >>> [c000ef1e7dc0] c0116ac4 find_task_by_vpid+0x44/0x90
> >>> [c000ef1e7de0] c0108800 SyS_setpgid+0x80/0x220
> >>> [c000ef1e7e30] c000ba6c system_call+0x38/0xfc
> >>> --- Exception: c01 (System Call) at 7fff94480890
> >>> SP (7fffd91e7260) is in userspace
> >>>
> >>>Fixes: caca285e5ab4a ("powerpc/mm/radix: Use STD_MMU_64 to properly
> >>>isolate hash related code")
> >>>Reported-by: Shriya R. Kulkarni 
> >>>Signed-off-by: Naveen N. Rao 
> >>>---
> >>>  arch/powerpc/kernel/exceptions-64s.S | 8 
> >>>  1 file changed, 4 insertions(+), 4 deletions(-)
> >>>
> >>>diff --git a/arch/powerpc/kernel/exceptions-64s.S 
> >>>b/arch/powerpc/kernel/exceptions-64s.S
> >>>index ae418b85c17c..17ee701b8336 100644
> >>>--- a/arch/powerpc/kernel/exceptions-64s.S
> >>>+++ b/arch/powerpc/kernel/exceptions-64s.S
> >>>@@ -1411,10 +1411,8 @@ USE_TEXT_SECTION()
> >>>   .balign IFETCH_ALIGN_BYTES
> >>>  do_hash_page:
> >>>  #ifdef CONFIG_PPC_STD_MMU_64
> >>>-  andis.  r0,r4,0xa410/* weird error? */
> >>>+  andis.  r0,r4,0xa450/* weird error? */
> >>
> >>Can we convert that to a #define value. Ram did try to do that here.
> >>
> >>https://lists.ozlabs.org/pipermail/linuxppc-dev/2017-June/158607.html
> >
> >Hmm... I feel it will be good to do that as part of Ram's series since
> >he has already coded it up :)
> >
> >Ram's patches will anyway require a rebase and the change I do here for
> >detecting DAWR already has a #define, so it should be a simple matter of
> >including DSISR_DABRMATCH in DSISR_PAGE_FAULT_MASK.
> >
> >But, if you really feel that I should make that change here, please do
> >let me know and I will re-spin with those changes.
> >
> 
> The thing is that change from 0xa410 to 0xa450 is not clear at all.
> And it needs proper documentation. IMHO the best way to do that is
> switch to #define name for that constant.

Naveen,

Feel free to take the macro from my patch. I think the magic
number is a little ugly. The earlier it goes the better.

My patch set will probably go through a couple of iterations. So I will
rebase it on top of your changes anyway.

RP

Re: [PATCH] powerpc: dts: use #include "..." to include local DT

Masahiro Yamada  writes:
> 2017-06-13 19:21 GMT+09:00 Michael Ellerman :
>> Masahiro Yamada  writes:
>>>
>>> (+Anatolij Gustschin )
>>>
>>> Ping.
>>> I am not 100% sure who is responsible for this,
>>> but somebody, could take a look at this patch, please?
>>
>> Have you tested it actually works?
>>
>> It sounds reasonable, and if it behaves as you describe there is no
>> change in behaviour, right?
>
> I do not have access to hardware,
> but it is pretty easy to test this patch.
>
> $ make O=foo ARCH=powerpc CROSS_COMPILE=powerpc-linux-  dts/ac14xx.dtb
>
> gave me the DTB output.
>
> The binary comparison matched with/without this patch,
> so I am sure there is no change in behavior.
>
> Likewise for mpc5121ads and pdm360ng.

Thanks.

Acked-by: Michael Ellerman 


cheers

Re: [PATCH] powerpc64/hw_breakpoints: Handle data breakpoints in radix mode

"Aneesh Kumar K.V"  writes:
> On Wednesday 14 June 2017 10:41 AM, Naveen N. Rao wrote:
>> On 2017/06/14 08:38AM, Aneesh Kumar K.V wrote:
>>> "Naveen N. Rao"  writes:
 diff --git a/arch/powerpc/kernel/exceptions-64s.S 
 b/arch/powerpc/kernel/exceptions-64s.S
 index ae418b85c17c..17ee701b8336 100644
 --- a/arch/powerpc/kernel/exceptions-64s.S
 +++ b/arch/powerpc/kernel/exceptions-64s.S
 @@ -1411,10 +1411,8 @@ USE_TEXT_SECTION()
.balign IFETCH_ALIGN_BYTES
   do_hash_page:
   #ifdef CONFIG_PPC_STD_MMU_64
 -  andis.  r0,r4,0xa410/* weird error? */
 +  andis.  r0,r4,0xa450/* weird error? */
>>>
>>> Can we convert that to a #define value. Ram did try to do that here.
>>>
>>> https://lists.ozlabs.org/pipermail/linuxppc-dev/2017-June/158607.html
>> 
>> Hmm... I feel it will be good to do that as part of Ram's series since
>> he has already coded it up :)
>> 
>> Ram's patches will anyway require a rebase and the change I do here for
>> detecting DAWR already has a #define, so it should be a simple matter of
>> including DSISR_DABRMATCH in DSISR_PAGE_FAULT_MASK.
>> 
>> But, if you really feel that I should make that change here, please do
>> let me know and I will re-spin with those changes.
>
> The thing is that change from 0xa410 to 0xa450 is not clear at all. And 
> it needs proper documentation. IMHO the best way to do that is switch to 
> #define name for that constant.

Not in this patch. It needs to be backported, so it should be as minimal
as possible.

The change from 0xa410 to 0xa450 does need a mention in the changelog,
I'll add that.

cheers

Re: [PATCH] recordmcount.pl: Add ppc64le to list of supported architectures

2017-06-14 Thread Kamalesh Babulal


On Wednesday 14 June 2017 10:23 AM, Michael Ellerman wrote:

I don't get this, the arch should always be powerpc.

Right. Something else is fubar for that to happen, we should fix
whatever it is.


Agree, ARCH over-ruling by reading the underlying architecture will
not work, as the expectation is to have ARCH=powerpc for all of the 
powerpc platform. Sorry for the noise, kindly ignore this patch.


--
cheers,
Kamalesh.

Re: [PATCH 03/44] dmaengine: ioat: don't use DMA_ERROR_CODE

2017-06-14 Thread Vinod Koul

On Thu, Jun 08, 2017 at 03:25:28PM +0200, Christoph Hellwig wrote:
> DMA_ERROR_CODE is not a public API and will go away.  Instead properly
> unwind based on the loop counter.

Acked-By: Vinod Koul 

-- 
~Vinod

Re: [PATCH 7/8] powerpc/perf/hv-24x7: Support v2 of the hypervisor API

2017-06-14 Thread Thiago Jung Bauermann


Hello Suka,

Thanks for your review!

Sukadev Bhattiprolu  writes:
> Thiago Jung Bauermann [bauer...@linux.vnet.ibm.com] wrote:
>> @@ -166,9 +174,12 @@ DEFINE_PER_CPU(struct hv_24x7_hw, hv_24x7_hw);
>>  DEFINE_PER_CPU(char, hv_24x7_reqb[H24x7_DATA_BUFFER_SIZE]) __aligned(4096);
>>  DEFINE_PER_CPU(char, hv_24x7_resb[H24x7_DATA_BUFFER_SIZE]) __aligned(4096);
>> 
>> -#define MAX_NUM_REQUESTS((H24x7_DATA_BUFFER_SIZE - \
>> +#define MAX_NUM_REQUESTS_V1 ((H24x7_DATA_BUFFER_SIZE - \
>> +sizeof(struct hv_24x7_request_buffer)) \
>> +/ H24x7_REQUEST_SIZE_V1)
>> +#define MAX_NUM_REQUESTS_V2 ((H24x7_DATA_BUFFER_SIZE - \
>>  sizeof(struct hv_24x7_request_buffer)) \
>> -/ sizeof(struct hv_24x7_request))
>> +/ H24x7_REQUEST_SIZE_V2)
>
> Nit: Can we define MAX_NUM_REQUESTS(version) - with a version parameter ? It
> will...

That's a good idea. I created a function instead of a macro, I think it
makes the code clearer since the macro would be a bit harder to read.

>> @@ -1101,9 +1112,13 @@ static int add_event_to_24x7_request(struct 
>> perf_event *event,
>>  {
>>  u16 idx;
>>  int i;
>> +size_t req_size;
>>  struct hv_24x7_request *req;
>> 
>> -if (request_buffer->num_requests >= MAX_NUM_REQUESTS) {
>> +if ((request_buffer->interface_version == 1
>> + && request_buffer->num_requests >= MAX_NUM_REQUESTS_V1)
>> +|| (request_buffer->interface_version > 1
>> +&& request_buffer->num_requests >= MAX_NUM_REQUESTS_V2)) {
>>  pr_devel("Too many requests for 24x7 HCALL %d\n",
>
> ...simplify this check to
>
>   if (request->buffer->num_requests >= MAX_NUM_REQUESTS(version))

That's better indeed.

>>  request_buffer->num_requests);
>>  return -EINVAL;
>> @@ -1120,8 +1135,11 @@ static int add_event_to_24x7_request(struct 
>> perf_event *event,
>>  idx = event_get_vcpu(event);
>>  }
>> 
>> +req_size = request_buffer->interface_version == 1 ?
>> +   H24x7_REQUEST_SIZE_V1 : H24x7_REQUEST_SIZE_V2;
>> +
>
> Maybe similarly, with H24x7_REQUEST_SIZE(version) ?

I implement this suggestion too.

>> +/**
>> + * get_count_from_result - get event count from the given result
>> + *
>> + * @event:  Event associated with @res.
>> + * @resb:   Result buffer containing @res.
>> + * @res:Result to work on.
>> + * @countp: Output variable containing the event count.
>> + * @next:   Optional output variable pointing to the next result in @resb.
>> + */
>> +static int get_count_from_result(struct perf_event *event,
>> + struct hv_24x7_data_result_buffer *resb,
>> + struct hv_24x7_result *res, u64 *countp,
>> + struct hv_24x7_result **next)
>> +{
>> +u16 num_elements = be16_to_cpu(res->num_elements_returned);
>> +u16 data_size = be16_to_cpu(res->result_element_data_size);
>> +unsigned int data_offset;
>> +void *element_data;
>> +int ret = 0;
>> +
>> +/*
>> + * We can bail out early if the result is empty.
>> + */
>> +if (!num_elements) {
>> +pr_debug("Result of request %hhu is empty, nothing to do\n",
>> + res->result_ix);
>> +
>> +if (next)
>> +*next = (struct hv_24x7_result *) res->elements;
>> +
>> +return -ENODATA;
>> +}
>> +
>> +/*
>> + * This code assumes that a result has only one element.
>> + */
>> +if (num_elements != 1) {
>> +pr_debug("Error: result of request %hhu has %hu elements\n",
>> + res->result_ix, num_elements);
>
> Could this happen due to an user request or would this indicate a bug
> in the way we submitted the request (perf should submit separate request
> for each lpar/index - we set ->max_ix and ->max_num_lpars to cpu_be16(1).

That's a good question. I don't know to be honest.

> Minor inconsistency with proceeding, is that if the next element passes,
> this return code is lost/over written. IOW, h_24x7_event_commit_txn()'s
> return value depends on the last element we process, even if intermediate
> ones encounter an error.

Good point. Would it be better if h_24x7_event_commit_txn interrupted
processing and returned an error instead of continuing?

The version below addresses your comments, except the one above.

Subject: [PATCH 7/8] powerpc/perf/hv-24x7: Support v2 of the hypervisor API

POWER9 introduces a new version of the hypervisor API to access the 24x7
perf counters. The new version changed some of the structures used for
requests and results.

Signed-off-by: Thiago Jung Bauermann 
---
 arch/powerpc/perf/hv-24x7.c|

Re: [PATCH v1 1/3] powerpc, 8xx: remove support for 8xx

2017-06-14 Thread Tom Rini

On Wed, Jun 14, 2017 at 09:40:18AM +0200, Christophe LEROY wrote:
> 
> 
> Le 13/06/2017 à 09:37, Heiko Schocher a écrit :
> >Hello Christophe,
> >
> >Am 13.06.2017 um 07:40 schrieb Christophe LEROY:
> >>
> >>
> >>Le 13/06/2017 à 07:26, Christophe LEROY a écrit :
> >>>There was for long time no activity in the 8xx area.
> >>>We need to go further and convert to Kconfig, but it
> >>>turned out, nobody is interested anymore in 8xx,
> >>>so remove it (with a heavy heart, knowing that I remove
> >>>here the root of U-Boot).
> >>>
> >>>Signed-off-by: Heiko Schocher 
> >>>
> >>
> >>Please don't do that.
> >
> >Tom already applied the patch to mainline ...
> 
> Can be reverted ?

It's not a trivial revert, but it wouldn't take took long to resolve the
conflicts, for someone that's interested in the architecture.

> >>As you can see in Linux kernel activity, there have been a lot
> >>activity related to the 8xx,
> >>including but not limited to:
> >>1/ HW Crypto for the 885 (Talitos SEC1)
> >>2/ TX NAPI in the 8xx Ethernet driver
> >>3/ Scatter/Gather support in the 8xx Ethernet driver
> >>4/ Hugepages
> >>5/ Perf events
> >>6/ hw breakpoints
> >>7/ Linear memory mapping via Large TLBs
> >
> >Thats linux ... not u-boot!
> 
> Sure, but it shows there is still interest to that processor. The
> 885 is a good recent 8xx.
> u-boot is not really something we focus on. We update it once a
> year, as far as it can start our Linux box we are happy.

I suppose this highlights the risks of not upstreaming your code.  If
someone had been submitting a new board in the last few years, they
would (likely, yes, we've made a few mistakes in missing maintainers at
times) have been CC'd on one of the "Does anyone care about this still?"
messages that've been both in public and in private.

> >>The followings links give an overview of the activity:
> >>
> >>https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/drivers/crypto/talitos.c
> >>
> >>https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c
> >>
> >>
> >>https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/arch/powerpc/kernel/head_8xx.S
> >>
> >>
> >>https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/arch/powerpc/mm/8xx_mmu.c
> >>
> >>
> >>We have thousands of boards with mpc885 running and requiring
> >>support for at least the next 10 years.
> >
> >Ok, nice to hear!
> >
> >>How can the 8xx survive without U-boot support ?
> >
> >Tom asked (I think a lot of times) regarding converting mpc8xx to
> >Kconfig / DM and nobody did the necessary steps. We also asked
> >our customers if they can do the necessary changes, none was interested.
> 
> I did't know. Indeed I was not following uboot activity until
> someone who knows the level of interest we have in 8xx alerted me
> yesterday.
> It would have been nice if you had notified linuxppc-dev list.
> 
> >
> >So, if you need mpc8xx support in U-Boot, simply add it again with
> >Kconfig and DM support included!
> 
> Ok, I'll try and come with a patch to convert 8xx in the coming
> weeks. In the mean time please revert the deletion in order to avoid
> nightmare conflicts when the converting patch comes.

Well, how about this.  Now that you know there's a problem, and a need
for a maintainer, can you please submit a patch, in the next say 2
weeks, that brings back the core of mpc8xx and lists you (or someone
else from your company) as the maintainer for mpc8xx, and we'll aim for
v2017.09 as having (a) one of your platforms, or relevant reference
platform in mainline.  I don't want "just a revert" as there's many
boards that no, we don't want anymore as no one is maintaining them.
Thanks!

-- 
Tom


signature.asc
Description: Digital signature

Re: [PATCH 21/44] powerpc: implement ->mapping_error

Christoph Hellwig  writes:

> DMA_ERROR_CODE is going to go away, so don't rely on it.  Instead
> define a ->mapping_error method for all IOMMU based dma operation
> instances.  The direct ops don't ever return an error and don't
> need a ->mapping_error method.
>
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/powerpc/include/asm/dma-mapping.h |  4 
>  arch/powerpc/include/asm/iommu.h   |  4 
>  arch/powerpc/kernel/dma-iommu.c|  6 ++
>  arch/powerpc/kernel/iommu.c| 28 ++--
>  arch/powerpc/platforms/cell/iommu.c|  1 +
>  arch/powerpc/platforms/pseries/vio.c   |  3 ++-
>  6 files changed, 27 insertions(+), 19 deletions(-)

I also see:

  arch/powerpc/kernel/dma.c:const struct dma_map_ops dma_direct_ops = {

Which you mentioned can't fail.

  arch/powerpc/platforms/pseries/ibmebus.c:static const struct dma_map_ops 
ibmebus_dma_ops = {

Which can't fail.

And:

  arch/powerpc/platforms/powernv/npu-dma.c:static const struct dma_map_ops 
dma_npu_ops = {
  arch/powerpc/platforms/ps3/system-bus.c:static const struct dma_map_ops 
ps3_sb_dma_ops = {
  arch/powerpc/platforms/ps3/system-bus.c:static const struct dma_map_ops 
ps3_ioc0_dma_ops = {

All of which look like they definitely can fail, but return 0 on error
and don't implement ->mapping_error.

So I guess I'm acking this and adding a TODO to fix up the NPU code at
least, the ps3 code is probably better left alone these days.

Acked-by: Michael Ellerman 

cheers

Re: [PATCH] powerpc64/hw_breakpoints: Handle data breakpoints in radix mode

Anshuman Khandual  writes:

> On 06/14/2017 12:12 AM, Naveen N. Rao wrote:
>> On P9, trying to use data breakpoints throws the splat shown below (*).
>> This is because the check for a data breakpoint in DSISR is in
>> do_hash_page(). Move this check to handle_page_fault() so as to catch
>> data breakpoints in both hash and radix MMU modes.
>
> Why cant we check for DSISR inside do_hash_page() on P9 ?

We can.

But we also need to check inside handle_page_fault(), because when we're
in Radix mode we don't go via do_hash_page().

So rather than doing it in two places, the patch changes the logic so we
check in handle_page_fault(), and teaches the do_hash_page() code to go
there if DSISR_DABRMATCH is set.

cheers

Re: [PATCH kernel] powerpc/debug: Add missing warn flag to WARN_ON's non-builtin path

Alexey Kardashevskiy  writes:

> When trapped on WARN_ON(), report_bug() is expected to return
> BUG_TRAP_TYPE_WARN so the caller could increment NIP by 4 and continue.
> The __builtin_constant_p() path of the PPC's WARN_ON() calls (indirectly)
> __WARN_FLAGS() which has BUGFLAG_WARNING set, however the other branch
> does not which makes report_bug() report a bug rather than a warning.
>
> Fixes: 19d436268dde95389 ("debug: Add _ONCE() logic to report_bug()")
> Signed-off-by: Alexey Kardashevskiy 
> ---
>
> Actually 19d436268dde95389 replaced __WARN_TAINT() with __WARN_FLAGS()
> and lost BUGFLAG_TAINT() and this is not in the commit log so it is
> unclear:
> 1) why

I think the rename is because previously the argument was a taint value,
whereas now it is a flags value (which is a superset of taint).

> 2) whether this particular patch should be doing
>BUGFLAG_WARNING|BUGFLAG_TAINT(TAINT_WARN)
>  or
>BUGFLAG_WARNING|(flags)

There is no flags here so the latter won't work AFAICS.

> Any ideas? Thanks.

Your patch looks correct to me. I assume it works?

The bug isn't introduced by 19d436268dde ("debug: Add _ONCE() logic to
report_bug()") as far as I can see.

If you check out that revision you see that BUGFLAG_TAINT still contains
BUGFLAG_WARNING:

#define BUGFLAG_TAINT(taint)(BUGFLAG_WARNING | ((taint) << 8))

But that was removed in f26dee15103f ("debug: Avoid setting
BUGFLAG_WARNING twice"). So I think the Fixes: tag should point at that
commit.

cheers

> diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
> index f2c562a0a427..0151af6c2a50 100644
> --- a/arch/powerpc/include/asm/bug.h
> +++ b/arch/powerpc/include/asm/bug.h
> @@ -104,7 +104,7 @@
>   "1: "PPC_TLNEI" %4,0\n" \
>   _EMIT_BUG_ENTRY \
>   : : "i" (__FILE__), "i" (__LINE__), \
> -   "i" (BUGFLAG_TAINT(TAINT_WARN)),  \
> +   "i" (BUGFLAG_WARNING|BUGFLAG_TAINT(TAINT_WARN)),\
> "i" (sizeof(struct bug_entry)),   \
> "r" (__ret_warn_on)); \
>   }   \
> -- 
> 2.11.0

Re: [PATCH 03/13] powerpc/64s: idle process interrupts from system reset wakeup

Nicholas Piggin  writes:

> On Tue, 13 Jun 2017 23:05:47 +1000
> Nicholas Piggin  wrote:
>
>> diff --git a/arch/powerpc/include/asm/hw_irq.h 
>> b/arch/powerpc/include/asm/hw_irq.h
>> index f06112cf8734..8366bdc69988 100644
>> --- a/arch/powerpc/include/asm/hw_irq.h
>> +++ b/arch/powerpc/include/asm/hw_irq.h
>> @@ -32,6 +32,7 @@
>>  #ifndef __ASSEMBLY__
>>  
>>  extern void __replay_interrupt(unsigned int vector);
>> +extern void __replay_wakeup_interrupt(unsigned long srr1);
>>  
>>  extern void timer_interrupt(struct pt_regs *);
>>  extern void performance_monitor_exception(struct pt_regs *regs);
>
> Oops, just noticed this remnant from an earlier version. Please
> ignore this hunk.

Done.

cheers

Re: [PATCH 12/13] powerpc/64: runlatch CTRL[RUN] set optimisation

Nicholas Piggin  writes:

> The CTRL register is read-only except bit 63 which is the run latch
> control. This means it can be updated with a mtspr rather than
> mfspr/mtspr.

Turns out this doesn't work on Cell.

There's an extra field in there:

  Thread enable bits (Read/Write)
  The hypervisor state can suspend its own thread by setting the TE bit
  for its thread to '0’. The hypervisor state can resume the opposite
  thread by setting the TE bit for the opposite thread to '1'. The
  hypervisor state cannot suspend the opposite thread by setting the TE
  bit for the opposite thread to ‘0’. This setting is ignored and does not
  cause an error.

  TE0 is the thread enable bit for thread 0.
  TE1 is the thread enable bit for thread 1.

  If thread 0 executes the mtctrl instruction, these are the bit values:

  [TE0 TE1] Description
0   0   Disable or suspend thread 0; thread 1 unchanged.
0   1   Disable or suspend thread 0; enable or resume thread 1 if it was 
disabled.
1   0   Unchanged.
1   1   Enable or resume thread 1 if it was disabled.

  If thread 1 executes the mtctrl instruction, these are the bit values:

  [TE0 TE1] Description
0   0Thread 0 unchanged; disable or suspend thread 1.
0   1Unchanged.
1   0Enable or resume thread 0 if it was disabled; disable or suspend 
thread 1.
1   1Enable or resume thread 0 if it was disabled.

So writing either 0 or CTRL_RUNLATCH (1) will disable the thread that
does the write - :D

For now I'll just drop this.

cheers

Re: [PATCH 09/13] powerpc/64s: cpuidle set polling before enabling irqs

Nicholas Piggin  writes:

> local_irq_enable can cause interrupts to be taken which could
> take significant amount of processing time. The idle process
> should set its polling flag before this, so another process that
> wakes it during this time will not have to send an IPI.
>
> Expand the TIF_POLLING_NRFLAG coverage to as large as possible.
>
> Reviewed-by: Gautham R. Shenoy 
> Signed-off-by: Nicholas Piggin 
> ---
>  drivers/cpuidle/cpuidle-powernv.c | 4 +++-
>  drivers/cpuidle/cpuidle-pseries.c | 3 ++-
>  2 files changed, 5 insertions(+), 2 deletions(-)

I don't think the cpuidle folks are really interested in these changes,
but we should Cc them to be polite.

Can you resend patches 9, 10, 11 with a subject like:

  "cpuidle: powernv: Set polling ..."

And Cc the cpuidle folks:

$ ./scripts/get_maintainer.pl -f drivers/cpuidle
r...@rjwysocki.net
daniel.lezc...@linaro.org
linux...@vger.kernel.org
linux-ker...@vger.kernel.org

cheers

Re: [PATCH] powerpc/configs: fix default values for NF_CT_PROTO_*

Davide Caratti  writes:

> On Tue, 2017-06-13 at 20:49 +1000, Michael Ellerman wrote:
>> Davide Caratti  writes:
>> 
>> > NF_CT_PROTO_{SCTP,UDPLITE,DCCP} can't be set to 'm' anymore, since they
>> > have been redefined as 'bool': fix defconfig for linkstation, mvme5100 and
>> > ppc6xx platforms accordingly.
>> 
>> Since when? ie. which commit changed the symbols to bool from tristate?
>
> hello Michael,
>
> the commits are:
>
> a85406afeb3e ("netfilter: conntrack: built-in support for SCTP")
> c51d39010a1b ("netfilter: conntrack: built-in support for DCCP")
> 9b91c96c5d1f ("netfilter: conntrack: built-in support for UDPlite")
>
> they were causing a "warning symbol value 'm' invalid" in kconfig: sorry
> for not noticing this before.

No worries. Thanks for sending the fix.

My auto builder did hit the warning, but it doesn't fail for those
warnings, eg:

http://kisskb.ellerman.id.au/kisskb/buildresult/13060278/


I'm actually going through our configs at the moment so I'll add this to
that series and merge it for 4.13.

cheers

Re: [PATCH v2 2/3] EDAC: altera: simplify calculation of total memory

2017-06-14 Thread Borislav Petkov

On Mon, Jun 12, 2017 at 01:34:05PM -0500, Thor Thayer wrote:
> On 06/06/2017 06:54 PM, Chris Packham wrote:
> > Use of_address_to_resource() and resource_size() instead of manually
> > parsing the "reg" property from the "memory" node(s).
> > 
> > Signed-off-by: Chris Packham 
> > ---

...

> Nice change! Tested on Cyclone5 DevKit & Arria10 DevKit.
> 
> Tested-by: Thor Thayer 

Applied, thanks.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH 09/13] powerpc/64s: cpuidle set polling before enabling irqs