Re: [PATCH v2] Remove duplicate setting of the B field in tlbie

2016-09-26 Thread Paul Mackerras
On Fri, Sep 16, 2016 at 05:25:50PM +1000, Balbir Singh wrote:
> 
> Remove duplicate setting of the the "B" field when doing a tlbie(l).
> In compute_tlbie_rb(), the "B" field is set again just before
> returning the rb value to be used for tlbie(l).
> 
> Signed-off-by: Balbir Singh 

Thanks, applied to kvm-ppc-next.

Paul.


Re: [patch] KVM: PPC: fix a sanity check

2016-09-26 Thread Paul Mackerras
On Thu, Jul 14, 2016 at 01:15:46PM +0300, Dan Carpenter wrote:
> We use logical negate where bitwise negate was intended.  It means that
> we never return -EINVAL here.
> 
> Fixes: ce11e48b7fdd ('KVM: PPC: E500: Add userspace debug stub support')
> Signed-off-by: Dan Carpenter 

Thanks, applied to kvm-ppc-next.

Paul.


Re: [PATCH] KVM: PPC: Book3s PR: Allow access to unprivileged MMCR2 register

2016-09-26 Thread Paul Mackerras
On Wed, Sep 21, 2016 at 03:06:45PM +0200, Thomas Huth wrote:
> The MMCR2 register is available twice, one time with number 785
> (privileged access), and one time with number 769 (unprivileged,
> but it can be disabled completely). In former times, the Linux
> kernel was using the unprivileged register 769 only, but since
> commit 8dd75ccb571f3c92c ("powerpc: Use privileged SPR number
> for MMCR2"), it uses the privileged register 785 instead.
> The KVM-PR code then of course also switched to use the SPR 785,
> but this is causing older guest kernels to crash, since these
> kernels still access 769 instead. So to support older kernels
> with KVM-PR again, we have to support register 769 in KVM-PR, too.
> 
> Fixes: 8dd75ccb571f3c92c48014b3dabd3d51a115ab41
> Cc: sta...@vger.kernel.org # v3.10+
> Signed-off-by: Thomas Huth 

Thanks, applied to kvm-ppc-next.

Paul.


Re: [PATCH] powernv: Search for new flash DT node location

2016-09-26 Thread Stewart Smith
Michael Ellerman  writes:
> Jack Miller  writes:
>
>> On Wed, Aug 03, 2016 at 05:16:34PM +1000, Michael Ellerman wrote:
>>> We could instead just search for all nodes that are compatible with
>>> "ibm,opal-flash". We do that for i2c, see opal_i2c_create_devs().
>>> 
>>> Is there a particular reason not to do that?
>>
>> I'm actually surprised that this is preferred. Jeremy mentioned something
>> similar, but I guess I just don't like the idea of finding devices in weird
>> places in the tree.
>
> But where is "weird". Arguably "/opal/flash" is weird. What does it
> mean? There's a bus called "opal" and a device on it called "flash"? No.
>
> Point being the structure is fairly arbitrary, or at least debatable, so
> tying the code 100% to the structure is inflexible. As we have discovered.
>
> Our other option is to tell skiboot to get stuffed, and leave the flash
> node where it was on P8.
>
>> Then again, if we can't trust the DT we're in bigger
>> trouble than erroneous flash nodes =).
>
> Quite :)
>
>> If we really just want to find compatible nodes anywhere, let's simplify i2c
>> and pdev_init into one function and make that behavior consistent with this
>> new patch.
>
> That seems OK to me.
>
> We should get an ack from Stewart though for the other node types.

For finding nodes based on compatible no matter where they are in the tree,

Acked-by: Stewart Smith 

(and yes, includes other nodes too)

The exact location then isn't too important, and having a /flash that's
ibm,opal-flash and allows for some other driver to bind to it I think is
also something we shouldn't rule out.

-- 
Stewart Smith
OPAL Architect, IBM.



RE: [v6,2/2] QE: remove PPCisms for QE

2016-09-26 Thread Qiang Zhao
On Tue, Sep 27, 2016 at 7:12AM -0500, Scott Wood wrote:

> -Original Message-
> From: Scott Wood [mailto:o...@buserror.net]
> Sent: Tuesday, September 27, 2016 7:12 AM
> To: Qiang Zhao 
> Cc: linuxppc-dev@lists.ozlabs.org; pku@gmail.com; X.B. Xie
> ; linux-ker...@vger.kernel.org
> Subject: Re: [v6,2/2] QE: remove PPCisms for QE
> 
> On Mon, 2016-09-26 at 01:46 +, Qiang Zhao wrote:
> > On Sun, Sep 25, 2016 at 12:19PM -0500, Scott Wood wrote:
> >
> > >
> > > -Original Message-
> > > From: Scott Wood [mailto:o...@buserror.net]
> > > Sent: Sunday, September 25, 2016 12:19 PM
> > > To: Qiang Zhao 
> > > Cc: linuxppc-dev@lists.ozlabs.org; pku@gmail.com; X.B. Xie
> > > ; linux-ker...@vger.kernel.org
> > > Subject: Re: [v6,2/2] QE: remove PPCisms for QE
> > >
> > > On Sat, Sep 24, 2016 at 11:14:11PM -0500, Scott Wood wrote:
> > > >
> > > > On Fri, Sep 23, 2016 at 10:20:32AM +0800, Zhao Qiang wrote:
> > > > >
> > > > > QE was supported on PowerPC, and dependent on PPC, Now it is
> > > > > supported on other platforms. so remove PPCisms.
> > > > >
> > > > > Signed-off-by: Zhao Qiang 
> > > > > ---
> > > > > Changes for v2:
> > > > >   - na
> > > > > Changes for v3:
> > > > >   - add NO_IRQ
> > > > > Changes for v4:
> > > > >   - modify spin_event_timeout to opencoded timeout loop
> > > > >   - remove NO_IRQ
> > > > >   - modify virq_to_hw to opencoed code Changes for v5:
> > > > >   - modify commit msg
> > > > >   - modify depends of QUICC_ENGINE
> > > > >   - add kerneldoc header for qe_issue_cmd Changes for v6:
> > > > >   - add dependency on FSL_SOC and PPC32 for drivers
> > > > >     depending on QUICC_ENGING but not available on ARM
> > > > >
> > > > >  drivers/irqchip/qe_ic.c| 28 +++-
> > > > >  drivers/net/ethernet/freescale/Kconfig | 10 ++---
> > > > >  drivers/soc/fsl/qe/Kconfig |  2 +-
> > > > >  drivers/soc/fsl/qe/qe.c| 80
> > > > > -
> > > > > -
> > > > >  drivers/soc/fsl/qe/qe_io.c | 42 --
> > > > >  drivers/soc/fsl/qe/qe_tdm.c|  8 ++--
> > > > >  drivers/soc/fsl/qe/ucc.c   | 10 ++---
> > > > >  drivers/soc/fsl/qe/ucc_fast.c  | 68
> > > > > ++---
> > > > > 
> > > > >  drivers/tty/serial/Kconfig |  2 +-
> > > > >  drivers/usb/gadget/udc/Kconfig |  2 +-
> > > > >  drivers/usb/host/Kconfig   |  2 +-
> > > > >  include/soc/fsl/qe/qe.h|  1 -
> > > > >  include/soc/fsl/qe/qe_ic.h | 12 ++---
> > > > >  13 files changed, 141 insertions(+), 126 deletions(-)
> > > > I assume this means you'll be updating
> > > > http://patchwork.ozlabs.org/patch/654473/
> > > > to be based on top of this...
> > > Apparently that assumption was wrong, since I now see that you're
> > > patching drivers/irqchip/qe_ic.c rather than drivers/soc/fsl/qe/qe_ic.c.
> > > Please keep the drivers/irqchip stuff separate and send to the
> > > appropriate maintainers.
> > >
> > You means separate drivers/irqchip/qe_ic.c part from this patch and
> > send it with the other qe_ic patches?
> > Is it acceptable if I modify qe_ic.c with drivers/soc/fsl/qe/qe_ic.c,
> > then send qe_ic patches to move qe_ic to drivers/irqchip?
> 
> I'd recommend against it.  It would complicate getting the drivers/irqchip
> patchset merged.  In any case, it's too late for 4.9.

Ok, thank you for your recommend.

BR
-Zhao Qiang


Re: [bug] crypto/vmx/p8_ghash memory corruption in 4.8-rc7

2016-09-26 Thread Herbert Xu
On Mon, Sep 26, 2016 at 02:43:17PM -0300, Marcelo Cerri wrote:
> 
> Wouldn't be enough to provide a pair of import/export functions as the
> padlock-sha driver does?

I don't think that will help as ultimately you need to call the
export function on the fallback and that's what requires the extra
memory.  In fact very operation involving the fallback will need
that extra memory too.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH v5 0/4] PCI: Introduce a way to enforce all MMIO BARs not to share PAGE_SIZE

2016-09-26 Thread Yongji Xie

Hi Bjorn,

Kindly Ping... Any comment on V5?

Thanks,
Yongji

On 2016/9/13 17:00, Yongji Xie wrote:

This series introduces a way for PCI resource allocator to force
MMIO BARs not to share PAGE_SIZE. This would make sense to VFIO
driver. Because current VFIO implementation disallows to mmap
sub-page(size < PAGE_SIZE) MMIO BARs which may share the same page
with other BARs for security reasons. Thus, we have to handle mmio
access to these BARs in QEMU emulation rather than in guest which
will cause some performance loss.

In our solution, we try to make use of the existing code path of
resource_alignment kernel parameter and add a macro to set default
alignment for it. Thus we can define this macro by default on some
archs which may easily hit the performance issue because of their
64K page.

In this series, patch 1,2 fixed bugs of using resource_alignment;
patch 3 tried to add a new option for resource_alignment to use
IORESOURCE_STARTALIGN to specify the alignment of PCI BARs; patch 4
adds a macro to set the default alignment of all MMIO BARs.

Changelog v5:
- Rebased against v4.8-rc6
- Drop the patch that forbidding disable memory decoding in
   pci_reassigndev_resource_alignment()

Changelog v4:
- Rebased against v4.8-rc1
- Drop one irrelevant patch
- Drop the patch that adding wildcard to resource_alignment to enforce
   the alignment of all MMIO BARs to be at least PAGE_SIZE
- Change the format of option "noresize" of resource_alignment
- Code style improvements

Changelog v3:
- Ignore enforced alignment to fixed BARs
- Fix issue that disabling memory decoding when reassigning the alignment
- Only enable default alignment on PowerNV platform

Changelog v2:
- Ignore enforced alignment to VF BARs on pci_reassigndev_resource_alignment()

Yongji Xie (4):
   PCI: Ignore enforced alignment when kernel uses existing firmware setup
   PCI: Ignore enforced alignment to VF BARs
   PCI: Add a new option for resource_alignment to reassign alignment
   PCI: Add a macro to set default alignment for all PCI devices

  Documentation/kernel-parameters.txt |9 +++--
  arch/powerpc/include/asm/pci.h  |4 +++
  drivers/pci/pci.c   |   63 +--
  3 files changed, 63 insertions(+), 13 deletions(-)





Re: [RFC PATCH] powerpc/mm: THP page cache support

2016-09-26 Thread Balbir Singh


On 27/09/16 01:53, Aneesh Kumar K.V wrote:
>>>  
>>> +void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd)
>>
>> static?
> 
> Ok I will fix that.

inline as well?

Balbir Singh.



Re: [PATCH] crypto: sha1-powerpc: little-endian support

2016-09-26 Thread Paulo Flabiano Smorigo
Fri, Sep 23, 2016 at 04:31:56PM -0300, Marcelo Cerri wrote:
> The driver does not handle endianness properly when loading the input
> data.

Indeed. I tested in both endianesses and it's working fine. Thanks!

Herbert, can we go ahead with this fix?

> 
> Signed-off-by: Marcelo Cerri 
> ---
>  arch/powerpc/crypto/sha1-powerpc-asm.S | 13 +++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/crypto/sha1-powerpc-asm.S 
> b/arch/powerpc/crypto/sha1-powerpc-asm.S
> index 125e165..82ddc9b 100644
> --- a/arch/powerpc/crypto/sha1-powerpc-asm.S
> +++ b/arch/powerpc/crypto/sha1-powerpc-asm.S
> @@ -7,6 +7,15 @@
>  #include 
>  #include 
> 
> +#ifdef __BIG_ENDIAN__
> +#define LWZ(rt, d, ra)   \
> + lwz rt,d(ra)
> +#else
> +#define LWZ(rt, d, ra)   \
> + li  rt,d;   \
> + lwbrx   rt,rt,ra
> +#endif
> +
>  /*
>   * We roll the registers for T, A, B, C, D, E around on each
>   * iteration; T on iteration t is A on iteration t+1, and so on.
> @@ -23,7 +32,7 @@
>  #define W(t) (((t)%16)+16)
> 
>  #define LOADW(t) \
> - lwz W(t),(t)*4(r4)
> + LWZ(W(t),(t)*4,r4)
> 
>  #define STEPD0_LOAD(t)   \
>   andcr0,RD(t),RB(t); \
> @@ -33,7 +42,7 @@
>   add r0,RE(t),r15;   \
>   add RT(t),RT(t),r6; \
>   add r14,r0,W(t);\
> - lwz W((t)+4),((t)+4)*4(r4); \
> + LWZ(W((t)+4),((t)+4)*4,r4); \
>   rotlwi  RB(t),RB(t),30; \
>   add RT(t),RT(t),r14
> 
> -- 
> 2.7.4
> 

-- 
Paulo Flabiano Smorigo
IBM Linux Technology Center



Re: [PATCH v3 5/5] mm: enable CONFIG_MOVABLE_NODE on powerpc

2016-09-26 Thread Reza Arbab

On Tue, Sep 27, 2016 at 07:15:41AM +1000, Benjamin Herrenschmidt wrote:

What is that business with a command line argument ? Do that mean that
we'll need some magic command line argument to properly handle LPC memory
on CAPI devices or GPUs ? If yes that's bad ... kernel arguments should
be a last resort.


Well, movable_node is just a boolean, meaning "allow nodes which contain 
only movable memory". It's _not_ like "movable_node=10,13-15,17", if 
that's what you were thinking.



We should have all the information we need from the device-tree.

Note also that we shouldn't need to create those nodes at boot time,
we need to add the ability to create the whole thing at runtime, we may know
that there's an NPU with an LPC window in the system but we won't know if it's
used until it is and for CAPI we just simply don't know until some PCI device
gets turned into CAPI mode and starts claiming LPC memory...


Yes, this is what is planned for, if I'm understanding you correctly.

In the dt, the PCI device node has a phandle pointing to the memory 
node. The memory node describes the window into which we can hotplug at 
runtime.


--
Reza Arbab



Re: [PATCH v3 4/5] powerpc/mm: restore top-down allocation when using movable_node

2016-09-26 Thread Reza Arbab

On Tue, Sep 27, 2016 at 07:12:31AM +1000, Benjamin Herrenschmidt wrote:
In any case, if the memory hasn't been hotplug, this shouldn't be 
necessary as we shouldn't be considering it for allocation.


Right. To be clear, the background info I put in the commit log refers 
to x86, where the SRAT can describe movable nodes which exist at boot.  
They're trying to avoid allocations from those nodes before they've been 
identified.


On power, movable nodes can only exist via hotplug, so that scenario 
can't happen. We can immediately go back to top-down allocation. That is 
the missing call being added in the patch.


--
Reza Arbab



Re: [PATCH] PCI: Add parameter @mmio_force_on to pci_update_resource()

2016-09-26 Thread Gavin Shan
On Mon, Sep 19, 2016 at 09:53:30AM +1000, Gavin Shan wrote:
>In pci_update_resource(), the PCI device's memory decoding (0x2 in
>PCI_COMMAND) is disabled when 64-bits memory BAR is updated if the
>PCI device's memory space wasn't asked to be always on by @pdev->
>mmio_always_on. The PF's memory decoding might be disabled when
>updating its IOV BARs in the following path. Actually, the PF's
>memory decoding shouldn't be disabled in this scenario as the PF
>has been started to provide services:
>
>   sriov_numvfs_store
>   pdev->driver->sriov_configure
>   mlx5_core_sriov_configure
>   pci_enable_sriov
>   sriov_enable
>   pcibios_sriov_enable
>   pnv_pci_sriov_enable
>   pnv_pci_vf_resource_shift
>   pci_update_resource
>
>This doesn't change the PF's memory decoding in the path by introducing
>additional parameter (@mmio_force_on) to pci_update_resource().
>
>Reported-by: Carol Soto 
>Signed-off-by: Gavin Shan 
>Tested-by: Carol Soto 
>---

Bjorn, could you please have a quick review on this when you have available
time? We're running into SRIOV issue that is fixed by this patch.

Thanks,
Gavin 

> arch/powerpc/platforms/powernv/pci-ioda.c | 2 +-
> drivers/pci/iov.c | 2 +-
> drivers/pci/pci.c | 2 +-
> drivers/pci/setup-res.c   | 9 +
> include/linux/pci.h   | 2 +-
> 5 files changed, 9 insertions(+), 8 deletions(-)
>
>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
>b/arch/powerpc/platforms/powernv/pci-ioda.c
>index bc0c91e..2d6a2b7 100644
>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>@@ -999,7 +999,7 @@ static int pnv_pci_vf_resource_shift(struct pci_dev *dev, 
>int offset)
>   dev_info(>dev, "VF BAR%d: %pR shifted to %pR (%sabling %d 
> VFs shifted by %d)\n",
>i, , res, (offset > 0) ? "En" : "Dis",
>num_vfs, offset);
>-  pci_update_resource(dev, i + PCI_IOV_RESOURCES);
>+  pci_update_resource(dev, i + PCI_IOV_RESOURCES, true);
>   }
>   return 0;
> }
>diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
>index 2194b44..117aae6 100644
>--- a/drivers/pci/iov.c
>+++ b/drivers/pci/iov.c
>@@ -511,7 +511,7 @@ static void sriov_restore_state(struct pci_dev *dev)
>   return;
>
>   for (i = PCI_IOV_RESOURCES; i <= PCI_IOV_RESOURCE_END; i++)
>-  pci_update_resource(dev, i);
>+  pci_update_resource(dev, i, false);
>
>   pci_write_config_dword(dev, iov->pos + PCI_SRIOV_SYS_PGSIZE, iov->pgsz);
>   pci_iov_set_numvfs(dev, iov->num_VFs);
>diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>index aab9d51..87a33c0 100644
>--- a/drivers/pci/pci.c
>+++ b/drivers/pci/pci.c
>@@ -545,7 +545,7 @@ static void pci_restore_bars(struct pci_dev *dev)
>   return;
>
>   for (i = 0; i < PCI_BRIDGE_RESOURCES; i++)
>-  pci_update_resource(dev, i);
>+  pci_update_resource(dev, i, false);
> }
>
> static const struct pci_platform_pm_ops *pci_platform_pm;
>diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
>index 66c4d8f..e8a50ff 100644
>--- a/drivers/pci/setup-res.c
>+++ b/drivers/pci/setup-res.c
>@@ -26,7 +26,7 @@
> #include "pci.h"
>
>
>-void pci_update_resource(struct pci_dev *dev, int resno)
>+void pci_update_resource(struct pci_dev *dev, int resno, bool mmio_force_on)
> {
>   struct pci_bus_region region;
>   bool disable;
>@@ -81,7 +81,8 @@ void pci_update_resource(struct pci_dev *dev, int resno)
>* disable decoding so that a half-updated BAR won't conflict
>* with another device.
>*/
>-  disable = (res->flags & IORESOURCE_MEM_64) && !dev->mmio_always_on;
>+  disable = (res->flags & IORESOURCE_MEM_64) &&
>+!mmio_force_on && !dev->mmio_always_on;
>   if (disable) {
>   pci_read_config_word(dev, PCI_COMMAND, );
>   pci_write_config_word(dev, PCI_COMMAND,
>@@ -310,7 +311,7 @@ int pci_assign_resource(struct pci_dev *dev, int resno)
>   res->flags &= ~IORESOURCE_STARTALIGN;
>   dev_info(>dev, "BAR %d: assigned %pR\n", resno, res);
>   if (resno < PCI_BRIDGE_RESOURCES)
>-  pci_update_resource(dev, resno);
>+  pci_update_resource(dev, resno, false);
>
>   return 0;
> }
>@@ -350,7 +351,7 @@ int pci_reassign_resource(struct pci_dev *dev, int resno, 
>resource_size_t addsiz
>   dev_info(>dev, "BAR %d: reassigned %pR (expanded by %#llx)\n",
>resno, res, (unsigned long long) addsize);
>   if (resno < PCI_BRIDGE_RESOURCES)
>-  pci_update_resource(dev, resno);
>+  pci_update_resource(dev, resno, false);
>
>   return 0;
> }
>diff --git a/include/linux/pci.h b/include/linux/pci.h
>index 0ab8359..99231d1 100644
>--- a/include/linux/pci.h
>+++ b/include/linux/pci.h
>@@ -1039,7 

Re: [v6,2/2] QE: remove PPCisms for QE

2016-09-26 Thread Scott Wood
On Mon, 2016-09-26 at 01:46 +, Qiang Zhao wrote:
> On Sun, Sep 25, 2016 at 12:19PM -0500, Scott Wood wrote:
> 
> > 
> > -Original Message-
> > From: Scott Wood [mailto:o...@buserror.net]
> > Sent: Sunday, September 25, 2016 12:19 PM
> > To: Qiang Zhao 
> > Cc: linuxppc-dev@lists.ozlabs.org; pku@gmail.com; X.B. Xie
> > ; linux-ker...@vger.kernel.org
> > Subject: Re: [v6,2/2] QE: remove PPCisms for QE
> > 
> > On Sat, Sep 24, 2016 at 11:14:11PM -0500, Scott Wood wrote:
> > > 
> > > On Fri, Sep 23, 2016 at 10:20:32AM +0800, Zhao Qiang wrote:
> > > > 
> > > > QE was supported on PowerPC, and dependent on PPC, Now it is
> > > > supported on other platforms. so remove PPCisms.
> > > > 
> > > > Signed-off-by: Zhao Qiang 
> > > > ---
> > > > Changes for v2:
> > > > - na
> > > > Changes for v3:
> > > > - add NO_IRQ
> > > > Changes for v4:
> > > > - modify spin_event_timeout to opencoded timeout loop
> > > > - remove NO_IRQ
> > > > - modify virq_to_hw to opencoed code Changes for v5:
> > > > - modify commit msg
> > > > - modify depends of QUICC_ENGINE
> > > > - add kerneldoc header for qe_issue_cmd Changes for v6:
> > > > - add dependency on FSL_SOC and PPC32 for drivers
> > > >   depending on QUICC_ENGING but not available on ARM
> > > > 
> > > >  drivers/irqchip/qe_ic.c| 28 +++-
> > > >  drivers/net/ethernet/freescale/Kconfig | 10 ++---
> > > >  drivers/soc/fsl/qe/Kconfig |  2 +-
> > > >  drivers/soc/fsl/qe/qe.c| 80 -
> > > > -
> > > >  drivers/soc/fsl/qe/qe_io.c | 42 --
> > > >  drivers/soc/fsl/qe/qe_tdm.c|  8 ++--
> > > >  drivers/soc/fsl/qe/ucc.c   | 10 ++---
> > > >  drivers/soc/fsl/qe/ucc_fast.c  | 68 ++---
> > > > 
> > > >  drivers/tty/serial/Kconfig |  2 +-
> > > >  drivers/usb/gadget/udc/Kconfig |  2 +-
> > > >  drivers/usb/host/Kconfig   |  2 +-
> > > >  include/soc/fsl/qe/qe.h|  1 -
> > > >  include/soc/fsl/qe/qe_ic.h | 12 ++---
> > > >  13 files changed, 141 insertions(+), 126 deletions(-)
> > > I assume this means you'll be updating
> > > http://patchwork.ozlabs.org/patch/654473/
> > > to be based on top of this...
> > Apparently that assumption was wrong, since I now see that you're patching
> > drivers/irqchip/qe_ic.c rather than drivers/soc/fsl/qe/qe_ic.c.
> > Please keep the drivers/irqchip stuff separate and send to the appropriate
> > maintainers.
> > 
> You means separate drivers/irqchip/qe_ic.c part from this patch and send it
> with the other qe_ic patches?
> Is it acceptable if I modify qe_ic.c with drivers/soc/fsl/qe/qe_ic.c, then
> send qe_ic patches to move qe_ic to drivers/irqchip?

I'd recommend against it.  It would complicate getting the drivers/irqchip
patchset merged.  In any case, it's too late for 4.9.

-Scott



Re: powerpc64: Enable CONFIG_E500 and CONFIG_PPC_E500MC for e5500/e6500

2016-09-26 Thread Scott Wood
On Mon, 2016-09-26 at 10:48 +0200, David Engraf wrote:
> Am 25.09.2016 um 08:20 schrieb Scott Wood:
> > 
> > On Mon, Aug 22, 2016 at 04:46:43PM +0200, David Engraf wrote:
> > > 
> > > The PowerPC e5500/e6500 architecture is based on the e500mc core. Enable
> > > CONFIG_E500 and CONFIG_PPC_E500MC when e5500/e6500 is used.
> > > 
> > > This will also fix using CONFIG_PPC_QEMU_E500 on PPC64.
> > > 
> > > Signed-off-by: David Engraf 
> > > ---
> > >  arch/powerpc/platforms/Kconfig.cputype | 6 --
> > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/powerpc/platforms/Kconfig.cputype
> > > b/arch/powerpc/platforms/Kconfig.cputype
> > > index f32edec..0382da7 100644
> > > --- a/arch/powerpc/platforms/Kconfig.cputype
> > > +++ b/arch/powerpc/platforms/Kconfig.cputype
> > > @@ -125,11 +125,13 @@ config POWER8_CPU
> > > 
> > >  config E5500_CPU
> > >   bool "Freescale e5500"
> > > - depends on E500
> > > + select E500
> > > + select PPC_E500MC
> > > 
> > >  config E6500_CPU
> > >   bool "Freescale e6500"
> > > - depends on E500
> > > + select E500
> > > + select PPC_E500MC
> > These config symbols are for setting -mcpu.  Kernels built with
> > CONFIG_GENERIC_CPU should also work on e5500/e6500.
> I don't think so.

I do think so.  It's what you get when you run "make corenet64_smp_defconfig"
and that kernel works on e5500/e6500.

>  At least on QEMU it is not working because e5500/e6500 
> is based on the e500mc core and the option CONFIG_PPC_E500MC also 
> controls the cpu features (check cputable.h).

Again, this is only a problem when you have CONFIG_PPC_QEMU_E500 without
CONFIG_CORENET_GENERIC, and the fix for that is to have CONFIG_PPC_QEMU_E500
select CONFIG_E500 (and you need to manually turn on CONFIG_PPC_E500MC if
applicable, since CONFIG_PPC_QEMU_E500 can also be used with e500v2).

I wouldn't be opposed to also adding "select PPC_E500MC if PPC64" to
CONFIG_PPC_QEMU_E500.

> > 
> > The problem is that CONFIG_PPC_QEMU_E500 doesn't select E500 (I didn't
> > notice it before because usually CORENET_GENERIC is enabled as well).
> I noticed that as well, but I think it makes more sense to select 
> E500/PPC_E500MC within the cputype menu instead of having a dependency 
> which might be not clear for the user.

Again, that breaks CONFIG_GENERIC_CPU.  Unlike 32-bit, all 64-bit book3e
targets are supposed to be supportable with a single kernel image.

-Scott



Re: [PATCH v3 5/5] mm: enable CONFIG_MOVABLE_NODE on powerpc

2016-09-26 Thread Benjamin Herrenschmidt
On Sun, 2016-09-25 at 13:36 -0500, Reza Arbab wrote:
> To create a movable node, we need to hotplug all of its memory into
> ZONE_MOVABLE.
> 
> Note that to do this, auto_online_blocks should be off. Since the memory
> will first be added to the default zone, we must explicitly use
> online_movable to online.
> 
> Because such a node contains no normal memory, can_online_high_movable()
> will only allow us to do the onlining if CONFIG_MOVABLE_NODE is set.
> Enable the use of this config option on PPC64 platforms.

What is that business with a command line argument ? Do that mean that
we'll need some magic command line argument to properly handle LPC memory
on CAPI devices or GPUs ? If yes that's bad ... kernel arguments should
be a last resort.

We should have all the information we need from the device-tree.

Note also that we shouldn't need to create those nodes at boot time,
we need to add the ability to create the whole thing at runtime, we may know
that there's an NPU with an LPC window in the system but we won't know if it's
used until it is and for CAPI we just simply don't know until some PCI device
gets turned into CAPI mode and starts claiming LPC memory...

Ben.

> Signed-off-by: Reza Arbab 
> ---
>  Documentation/kernel-parameters.txt | 2 +-
>  mm/Kconfig  | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/kernel-parameters.txt 
> b/Documentation/kernel-parameters.txt
> index a4f4d69..3d8460d 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -2344,7 +2344,7 @@ bytes respectively. Such letter suffixes can also be 
> entirely omitted.
> >     that the amount of memory usable for all allocations
> >     is not too small.
>  
> > > - movable_node[KNL,X86] Boot-time switch to enable the effects
> > > + movable_node[KNL,X86,PPC] Boot-time switch to enable the effects
> >     of CONFIG_MOVABLE_NODE=y. See mm/Kconfig for details.
>  
> > >   MTD_Partition=  [MTD]
> diff --git a/mm/Kconfig b/mm/Kconfig
> index be0ee11..4b19cd3 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -153,7 +153,7 @@ config MOVABLE_NODE
> >     bool "Enable to assign a node which has only movable memory"
> >     depends on HAVE_MEMBLOCK
> >     depends on NO_BOOTMEM
> > -   depends on X86_64
> > +   depends on X86_64 || PPC64
> >     depends on NUMA
> >     default n
> >     help


Re: [PATCH v3 4/5] powerpc/mm: restore top-down allocation when using movable_node

2016-09-26 Thread Benjamin Herrenschmidt
On Sun, 2016-09-25 at 13:36 -0500, Reza Arbab wrote:
> At boot, the movable_node option sets bottom-up memblock allocation.
> 
> This reduces the chance that, in the window before movable memory has
> been identified, an allocation for the kernel might come from a movable
> node. By going bottom-up, early allocations will most likely come from
> the same node as the kernel image, which is necessarily in a nonmovable
> node.
> 
> Then, once any known hotplug memory has been marked, allocation can be
> reset back to top-down. On x86, this is done in numa_init(). This patch
> does the same on power, in numa initmem_init().

That's fragile and a bit gross.

But then I'm not *that* fan of making accelerator memory be "memory" nodes
in the first place. Oh well...

In any case, if the memory hasn't been hotplug, this shouldn't be necessary
as we shouldn't be considering it for allocation.

If we want to prevent it for other reason, we should add logic for that
in memblock, or reserve it early or something like that.

Just relying magically on the direction of the allocator is bad, really bad.

Ben.

> Signed-off-by: Reza Arbab 
> ---
>  arch/powerpc/mm/numa.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index d7ac419..fdf1e69 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -945,6 +945,9 @@ void __init initmem_init(void)
> >     max_low_pfn = memblock_end_of_DRAM() >> PAGE_SHIFT;
> >     max_pfn = max_low_pfn;
>  
> > +   /* bottom-up allocation may have been set by movable_node */
> > +   memblock_set_bottom_up(false);
> +
> >     if (parse_numa_properties())
> >     setup_nonnuma();
> >     else


Re: [PATCH] i2c_powermac: shut up lockdep warning

2016-09-26 Thread Benjamin Herrenschmidt
On Mon, 2016-09-26 at 14:00 +0300, Denis Kirjanov wrote:
> 
> 
> On Wednesday, September 21, 2016, Denis Kirjanov  rg> wrote:
> > That's unclear why lockdep shows the following warning but adding a
> > lockdep class to struct pmac_i2c_bus solves it
> 
> HI Ben, 
> 
> could you give any comments on this? 

I haven't quite figured out real reason for lockdep complaint, the
output is as usual terribly hard to parse, but adding a lock class
doesn't sound like something wrong to do so ...

Ben.

> Thanks!
> > 
> > [   20.507795]
> > ==
> > [   20.507796] [ INFO: possible circular locking dependency
> > detected ]
> > [   20.507800] 4.8.0-rc7-00037-gd2ffb01 #21 Not tainted
> > [   20.507801] 
> > ---
> > [   20.507803] swapper/0/1 is trying to acquire lock:
> > [   20.507818]  (>mutex){+.+.+.}, at: []
> > .pmac_i2c_open+0x30/0x100
> > [   20.507819]
> > [   20.507819] but task is already holding lock:
> > [   20.507829]  (>rwsem){+.+.+.}, at: []
> > .cpufreq_online+0x1ac/0x9d0
> > [   20.507830]
> > [   20.507830] which lock already depends on the new lock.
> > [   20.507830]
> > [   20.507832]
> > [   20.507832] the existing dependency chain (in reverse order) is:
> > [   20.507837]
> > [   20.507837] -> #4 (>rwsem){+.+.+.}:
> > [   20.507844]        [] .down_write+0x6c/0x110
> > [   20.507849]        []
> > .cpufreq_online+0x1ac/0x9d0
> > [   20.507855]        []
> > .subsys_interface_register+0xb8/0x110
> > [   20.507860]        []
> > .cpufreq_register_driver+0x1d0/0x250
> > [   20.507866]        []
> > .g5_cpufreq_init+0x9cc/0xa28
> > [   20.507872]        []
> > .do_one_initcall+0x5c/0x1d0
> > [   20.507878]        []
> > .kernel_init_freeable+0x1ac/0x28c
> > [   20.507883]        [] .kernel_init+0x1c/0x140
> > [   20.507887]        []
> > .ret_from_kernel_thread+0x58/0x64
> > [   20.507894]
> > [   20.507894] -> #3 (subsys mutex#2){+.+.+.}:
> > [   20.507899]        []
> > .mutex_lock_nested+0xa8/0x590
> > [   20.507903]        []
> > .bus_probe_device+0x44/0xe0
> > [   20.507907]        [] .device_add+0x508/0x730
> > [   20.507911]        []
> > .register_cpu+0x118/0x190
> > [   20.507916]        []
> > .topology_init+0x148/0x248
> > [   20.507921]        []
> > .do_one_initcall+0x5c/0x1d0
> > [   20.507925]        []
> > .kernel_init_freeable+0x1ac/0x28c
> > [   20.507929]        [] .kernel_init+0x1c/0x140
> > [   20.507934]        []
> > .ret_from_kernel_thread+0x58/0x64
> > [   20.507939]
> > [   20.507939] -> #2 (cpu_add_remove_lock){+.+.+.}:
> > [   20.507944]        []
> > .mutex_lock_nested+0xa8/0x590
> > [   20.507950]        []
> > .register_cpu_notifier+0x2c/0x70
> > [   20.507955]        []
> > .spawn_ksoftirqd+0x18/0x4c
> > [   20.507959]        []
> > .do_one_initcall+0x5c/0x1d0
> > [   20.507964]        []
> > .kernel_init_freeable+0xb0/0x28c
> > [   20.507968]        [] .kernel_init+0x1c/0x140
> > [   20.507972]        []
> > .ret_from_kernel_thread+0x58/0x64
> > [   20.507978]
> > [   20.507978] -> #1 (>mutex){+.+.+.}:
> > [   20.507982]        []
> > .mutex_lock_nested+0xa8/0x590
> > [   20.507987]        [] .kw_i2c_open+0x18/0x30
> > [   20.507991]        []
> > .pmac_i2c_open+0x94/0x100
> > [   20.507995]        []
> > .smp_core99_probe+0x260/0x410
> > [   20.507999]        []
> > .smp_prepare_cpus+0x280/0x2ac
> > [   20.508003]        []
> > .kernel_init_freeable+0x88/0x28c
> > [   20.508008]        [] .kernel_init+0x1c/0x140
> > [   20.508012]        []
> > .ret_from_kernel_thread+0x58/0x64
> > [   20.508018]
> > [   20.508018] -> #0 (>mutex){+.+.+.}:
> > [   20.508023]        [] .lock_acquire+0x84/0x100
> > [   20.508027]        []
> > .mutex_lock_nested+0xa8/0x590
> > [   20.508032]        []
> > .pmac_i2c_open+0x30/0x100
> > [   20.508037]        []
> > .pmac_i2c_do_begin+0x34/0x120
> > [   20.508040]        [] .pmf_call_one+0x50/0xd0
> > [   20.508045]        []
> > .g5_pfunc_switch_volt+0x2c/0xc0
> > [   20.508050]        []
> > .g5_pfunc_switch_freq+0x1cc/0x1f0
> > [   20.508054]        []
> > .g5_cpufreq_target+0x2c/0x40
> > [   20.508058]        []
> > .__cpufreq_driver_target+0x23c/0x840
> > [   20.508062]        []
> > .cpufreq_gov_performance_limits+0x18/0x30
> > [   20.508067]        []
> > .cpufreq_start_governor+0xac/0x100
> > [   20.508071]        []
> > .cpufreq_set_policy+0x208/0x260
> > [   20.508076]        []
> > .cpufreq_init_policy+0x6c/0xb0
> > [   20.508081]        []
> > .cpufreq_online+0x250/0x9d0
> > [   20.508085]        []
> > .subsys_interface_register+0xb8/0x110
> > [   20.508090]        []
> > .cpufreq_register_driver+0x1d0/0x250
> > [   20.508094]        []
> > .g5_cpufreq_init+0x9cc/0xa28
> > [   20.508099]        []
> > .do_one_initcall+0x5c/0x1d0
> > [   20.508103]        []
> > .kernel_init_freeable+0x1ac/0x28c
> > [   20.508107]        [] .kernel_init+0x1c/0x140
> > [   20.508112]        []
> > 

Re: [PATCH v3 4/5] powerpc/mm: restore top-down allocation when using movable_node

2016-09-26 Thread Reza Arbab

On Mon, Sep 26, 2016 at 09:17:43PM +0530, Aneesh Kumar K.V wrote:

+   /* bottom-up allocation may have been set by movable_node */
+   memblock_set_bottom_up(false);
+


By then we have done few memblock allocation right ?


Yes, some allocations do occur while bottom-up is set.

IMHO, we should do this early enough in prom.c after we do 
parse_early_param, with a comment there explaining that, we don't 
really support hotplug memblock and when we do that, this should be 
moved to a place where we can handle memblock allocation such that we 
avoid spreading memblock allocation to movable node.


Sure, we can do it earlier. The only consideration is that any potential 
calls to memblock_mark_hotplug() happen before we reset to top-down.  
Since we don't do that at all on power, the call can go anywhere.


--
Reza Arbab



Re: [PATCH v2] powerpc/mm: export current mmu mode info

2016-09-26 Thread Hari Bathini

Hi Michael/Aneesh,

Thanks for reviewing the patch..


On Friday 23 September 2016 04:40 PM, Michael Ellerman wrote:

Hari Bathini  writes:


diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index e2fb408..558987c 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -199,6 +199,21 @@ static inline void mmu_clear_feature(unsigned long feature)
  
  extern unsigned int __start___mmu_ftr_fixup, __stop___mmu_ftr_fixup;
  
+/*

+ * Possible MMU modes
+ */
+#define MMU_MODE_NONE   0
+#define MMU_MODE_RADIX  1
+#define MMU_MODE_HASH   2
+#define MMU_MODE_HASH32 3
+#define MMU_MODE_NOHASH 4
+#define MMU_MODE_NOHASH32   5

These are already defined in the same file:

/*
  * MMU families
  */
#define MMU_FTR_HPTE_TABLE  ASM_CONST(0x0001)
#define MMU_FTR_TYPE_8xxASM_CONST(0x0002)
#define MMU_FTR_TYPE_40xASM_CONST(0x0004)
#define MMU_FTR_TYPE_44xASM_CONST(0x0008)
#define MMU_FTR_TYPE_FSL_E  ASM_CONST(0x0010)
#define MMU_FTR_TYPE_47xASM_CONST(0x0020)
#define MMU_FTR_TYPE_RADIX  ASM_CONST(0x0040)

And the values for the current CPU are in cur_cpu_spec->mmu_features.


I primarily tried to introduce this patch as crash tool doesn't have 
access to

offset info (which is needed to access structure member mmu_features) early
in it's initialization process.


So if you must export anything, make it that value, and hopefully the
rest of the patch goes away.


On second thought, as long as we can get the vmemmap start address, for 
which
we have a variable already, we can push finding of MMU type for later. I 
may need
no kernel patch in that case. Working on patches for crash & 
makedumpfile tools

accordingly. Will post a v3 only if that doesn't work out..

Thanks
Hari



Re: [PATHC v2 0/9] ima: carry the measurement list across kexec

2016-09-26 Thread Thiago Jung Bauermann
Hello Eric,

Am Dienstag, 20 September 2016, 11:07:29 schrieb Eric W. Biederman:
> Thiago Jung Bauermann  writes:
> > Am Samstag, 17 September 2016, 00:17:37 schrieb Eric W. Biederman:
> >> Thiago Jung Bauermann  writes:
> > Is this what you had in mind?
> 
> Sort of.
> 
> I was just thinking that instead of having the boot path verify your ima
> list matches what is in the tpm and halting the boot there, we could
> test that on reboot.  Which would give a clean failure without the nasty
> poking into a prepared image.  The downside is that we have already run
> the shutdown scripts so it wouldn't be much cleaner, than triggering a
> machine reboot from elsewhere.
> 
> But I don't think we should spend too much time on that.  It was a
> passing thought.  We should focus on getting a non-poked ima buffer
> cleanly into kexec and we can worry about the rest later.

I was thinking of this as something orthogonal to the ima buffer feature.
But you're right, it's better not to discuss this now. I'll post a separate 
patch for this later.

> >> So from 10,000 feet I think that is correct.
> >> 
> >> I am not quite certain why a new mechanism is being invented.  We have
> >> other information that is already passed (much of it architecture
> >> specific) like the flattened device tree.  If you remove the need to
> >> update the information can you just append this information to the
> >> flattened device tree without a new special mechanism to pass the data?
> >> 
> >> I am just reluctant to invent a new mechanism when there is an existing
> >> mechanism that looks like it should work without problems.
> > 
> > Michael Ellerman suggested putting the buffer contents inside the device
> > tree itself, but the s390 people are also planning to implement this
> > feature. That architecture doesn't use device trees, so a solution that
> > depends on DTs won't help them.
> > 
> > With this mechanism each architecture will still need its own way of
> > communicating to the next kernel where the buffer is, but I think it's
> > easier to pass a base address and length than to pass a whole buffer.
> 
> A base address and length pair is fine.  There are several other pieces
> of data that we pass that way.
> 
> > I suppose we could piggyback the ima measurements buffer at the end of
> > one of the other segments such as the kernel or, in the case of
> > powerpc, the dtb but it looks hackish to me. I think it's cleaner to
> > put it in its own segment.
> 
> The boot protocol unfortunately is different on different architectures,
> and for each one we will have to implement and document the change.
> Because when you get into boot protocol issues you can't assume the
> kernel you are booting is the same version as the kernel that is booting
> it.
> 
> Where I run into a problem is you added a semi-generic concept a
> hand-over buffer.  Not a ima data buffer but a hand-over buffer.
> 
> The data falling in it's own dedicated area of memory and being added
> with kexec_add_buffer is completely fine.  I can see a dedicated pointer
> in struct kimage if necessary.
> 
> A semi-generic concept called a hand-over buffer seems to be a
> construction of infrustructure for no actual reason that will just
> result in confusion.  There are lots of things that are handed over, the
> flattend device tree, ramdisks, bootparams on x86, etc, etc.  ima is not
> special in this execpt for being perhaps the first addition that we are
> going to want the option of including on most architectures.

Ok, I understand. I decided to implement a generic concept because I thought 
that proposing a feature that is more useful than what I need it for would  
increase its chance of being accepted. It's interesting to see that it had 
the opposite effect.

I reworked and simplified the code and folded the hand-over buffer patches 
into Mimi's patch series to carry the measurement list across kexec. The 
kexec buffer code is in the following patches now:

[PATCH v5 01/10] powerpc: ima: Get the kexec buffer passed by the previous 
kernel
[PATCH v5 05/10] powerpc: ima: Send the kexec buffer to the next kernel

Each patch has a changelog listing what I changed to make it specific to 
IMA.

-- 
[]'s
Thiago Jung Bauermann
IBM Linux Technology Center



[PATCH v5 03/10] ima: permit duplicate measurement list entries

2016-09-26 Thread Mimi Zohar
Measurements carried across kexec need to be added to the IMA
measurement list, but should not prevent measurements of the newly
booted kernel from being added to the measurement list. This patch
adds support for allowing duplicate measurements.

The "boot_aggregate" measurement entry is the delimiter between soft
boots.

Signed-off-by: Mimi Zohar 
---
 security/integrity/ima/ima_queue.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/security/integrity/ima/ima_queue.c 
b/security/integrity/ima/ima_queue.c
index 4b1bb77..12d1b04 100644
--- a/security/integrity/ima/ima_queue.c
+++ b/security/integrity/ima/ima_queue.c
@@ -65,11 +65,12 @@ static struct ima_queue_entry *ima_lookup_digest_entry(u8 
*digest_value,
 }
 
 /* ima_add_template_entry helper function:
- * - Add template entry to measurement list and hash table.
+ * - Add template entry to the measurement list and hash table, for
+ *   all entries except those carried across kexec.
  *
  * (Called with ima_extend_list_mutex held.)
  */
-static int ima_add_digest_entry(struct ima_template_entry *entry)
+static int ima_add_digest_entry(struct ima_template_entry *entry, int flags)
 {
struct ima_queue_entry *qe;
unsigned int key;
@@ -85,8 +86,10 @@ static int ima_add_digest_entry(struct ima_template_entry 
*entry)
list_add_tail_rcu(>later, _measurements);
 
atomic_long_inc(_htable.len);
-   key = ima_hash_key(entry->digest);
-   hlist_add_head_rcu(>hnext, _htable.queue[key]);
+   if (flags) {
+   key = ima_hash_key(entry->digest);
+   hlist_add_head_rcu(>hnext, _htable.queue[key]);
+   }
return 0;
 }
 
@@ -126,7 +129,7 @@ int ima_add_template_entry(struct ima_template_entry 
*entry, int violation,
}
}
 
-   result = ima_add_digest_entry(entry);
+   result = ima_add_digest_entry(entry, 1);
if (result < 0) {
audit_cause = "ENOMEM";
audit_info = 0;
@@ -155,7 +158,7 @@ int ima_restore_measurement_entry(struct ima_template_entry 
*entry)
int result = 0;
 
mutex_lock(_extend_list_mutex);
-   result = ima_add_digest_entry(entry);
+   result = ima_add_digest_entry(entry, 0);
mutex_unlock(_extend_list_mutex);
return result;
 }
-- 
2.1.0



[PATCH v5 02/10] ima: on soft reboot, restore the measurement list

2016-09-26 Thread Mimi Zohar
The TPM PCRs are only reset on a hard reboot.  In order to validate a
TPM's quote after a soft reboot (eg. kexec -e), the IMA measurement list
of the running kernel must be saved and restored on boot.  This patch
restores the measurement list.

Changelog v5:
- replace CONFIG_KEXEC_FILE with architecture CONFIG_HAVE_IMA_KEXEC (Thiago)
- replace kexec_get_handover_buffer() with ima_get_kexec_buffer() (Thiago)
- replace kexec_free_handover_buffer() with ima_free_kexec_buffer() (Thiago)
- remove unnecessary includes from ima_kexec.c (Thiago)
- fix off-by-one error when checking hdr_v1->template_name_len (Colin King)

Changelog v2:
- redefined ima_kexec_hdr to use types with well defined sizes (M. Ellerman)
- defined missing ima_load_kexec_buffer() stub function

Changelog v1:
- call ima_load_kexec_buffer() (Thiago)

Signed-off-by: Mimi Zohar 
---
 security/integrity/ima/Makefile   |   1 +
 security/integrity/ima/ima.h  |  21 +
 security/integrity/ima/ima_init.c |   2 +
 security/integrity/ima/ima_kexec.c|  44 +
 security/integrity/ima/ima_queue.c|  10 ++
 security/integrity/ima/ima_template.c | 170 ++
 6 files changed, 248 insertions(+)
 create mode 100644 security/integrity/ima/ima_kexec.c

diff --git a/security/integrity/ima/Makefile b/security/integrity/ima/Makefile
index 9aeaeda..29f198b 100644
--- a/security/integrity/ima/Makefile
+++ b/security/integrity/ima/Makefile
@@ -8,4 +8,5 @@ obj-$(CONFIG_IMA) += ima.o
 ima-y := ima_fs.o ima_queue.o ima_init.o ima_main.o ima_crypto.o ima_api.o \
 ima_policy.o ima_template.o ima_template_lib.o
 ima-$(CONFIG_IMA_APPRAISE) += ima_appraise.o
+ima-$(CONFIG_HAVE_IMA_KEXEC) += ima_kexec.o
 obj-$(CONFIG_IMA_BLACKLIST_KEYRING) += ima_mok.o
diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index db25f54..51dc8d5 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -28,6 +28,10 @@
 
 #include "../integrity.h"
 
+#ifdef CONFIG_HAVE_IMA_KEXEC
+#include 
+#endif
+
 enum ima_show_type { IMA_SHOW_BINARY, IMA_SHOW_BINARY_NO_FIELD_LEN,
 IMA_SHOW_BINARY_OLD_STRING_FMT, IMA_SHOW_ASCII };
 enum tpm_pcrs { TPM_PCR0 = 0, TPM_PCR8 = 8 };
@@ -102,6 +106,21 @@ struct ima_queue_entry {
 };
 extern struct list_head ima_measurements;  /* list of all measurements */
 
+/* Some details preceding the binary serialized measurement list */
+struct ima_kexec_hdr {
+   u16 version;
+   u16 _reserved0;
+   u32 _reserved1;
+   u64 buffer_size;
+   u64 count;
+};
+
+#ifdef CONFIG_HAVE_IMA_KEXEC
+void ima_load_kexec_buffer(void);
+#else
+static inline void ima_load_kexec_buffer(void) {}
+#endif /* CONFIG_HAVE_IMA_KEXEC */
+
 /* Internal IMA function definitions */
 int ima_init(void);
 int ima_fs_init(void);
@@ -122,6 +141,8 @@ int ima_init_crypto(void);
 void ima_putc(struct seq_file *m, void *data, int datalen);
 void ima_print_digest(struct seq_file *m, u8 *digest, u32 size);
 struct ima_template_desc *ima_template_desc_current(void);
+int ima_restore_measurement_entry(struct ima_template_entry *entry);
+int ima_restore_measurement_list(loff_t bufsize, void *buf);
 int ima_init_template(void);
 
 /*
diff --git a/security/integrity/ima/ima_init.c 
b/security/integrity/ima/ima_init.c
index 32912bd..3ba0ca4 100644
--- a/security/integrity/ima/ima_init.c
+++ b/security/integrity/ima/ima_init.c
@@ -128,6 +128,8 @@ int __init ima_init(void)
if (rc != 0)
return rc;
 
+   ima_load_kexec_buffer();
+
rc = ima_add_boot_aggregate();  /* boot aggregate must be first entry */
if (rc != 0)
return rc;
diff --git a/security/integrity/ima/ima_kexec.c 
b/security/integrity/ima/ima_kexec.c
new file mode 100644
index 000..36afd0f
--- /dev/null
+++ b/security/integrity/ima/ima_kexec.c
@@ -0,0 +1,44 @@
+/*
+ * Copyright (C) 2016 IBM Corporation
+ *
+ * Authors:
+ * Thiago Jung Bauermann 
+ * Mimi Zohar 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+#include "ima.h"
+
+/*
+ * Restore the measurement list from the previous kernel.
+ */
+void ima_load_kexec_buffer(void)
+{
+   void *kexec_buffer = NULL;
+   size_t kexec_buffer_size = 0;
+   int rc;
+
+   rc = ima_get_kexec_buffer(_buffer, _buffer_size);
+   switch (rc) {
+   case 0:
+   rc = ima_restore_measurement_list(kexec_buffer_size,
+ kexec_buffer);
+   if (rc != 0)
+   pr_err("Failed to restore the measurement list: %d\n",
+   rc);
+
+   ima_free_kexec_buffer();
+   break;
+ 

Re: [bug] crypto/vmx/p8_ghash memory corruption in 4.8-rc7

2016-09-26 Thread Jan Stancek



- Original Message -
> From: "Marcelo Cerri" 
> To: "Jan Stancek" 
> Cc: "rui y wang" , herb...@gondor.apana.org.au, 
> mhce...@linux.vnet.ibm.com,
> leosi...@linux.vnet.ibm.com, pfsmor...@linux.vnet.ibm.com, 
> linux-cry...@vger.kernel.org,
> linuxppc-dev@lists.ozlabs.org, linux-ker...@vger.kernel.org
> Sent: Monday, 26 September, 2016 4:15:10 PM
> Subject: Re: [bug] crypto/vmx/p8_ghash memory corruption in 4.8-rc7
> 
> Hi Jan,
> 
> Just out of curiosity, have you tried to use "76" on both values to
> check if the problem still happens?

I did, I haven't seen any panics with such patch:

@@ -211,7 +212,7 @@ struct shash_alg p8_ghash_alg = {
.update = p8_ghash_update,
.final = p8_ghash_final,
.setkey = p8_ghash_setkey,
-   .descsize = sizeof(struct p8_ghash_desc_ctx),
+   .descsize = sizeof(struct p8_ghash_desc_ctx) + 20,
.base = {
 .cra_name = "ghash",
 .cra_driver_name = "p8_ghash",


[PATCH v5 04/10] ima: maintain memory size needed for serializing the measurement list

2016-09-26 Thread Mimi Zohar
In preparation for serializing the binary_runtime_measurements, this patch
maintains the amount of memory required.

Changelog v5:
- replace CONFIG_KEXEC_FILE with architecture CONFIG_HAVE_IMA_KEXEC (Thiago)

Changelog v3:
- include the ima_kexec_hdr size in the binary_runtime_measurement size.

Signed-off-by: Mimi Zohar 
---
 security/integrity/ima/Kconfig | 12 +
 security/integrity/ima/ima.h   |  1 +
 security/integrity/ima/ima_queue.c | 53 --
 3 files changed, 64 insertions(+), 2 deletions(-)

diff --git a/security/integrity/ima/Kconfig b/security/integrity/ima/Kconfig
index 5487827..370eb2f 100644
--- a/security/integrity/ima/Kconfig
+++ b/security/integrity/ima/Kconfig
@@ -27,6 +27,18 @@ config IMA
  to learn more about IMA.
  If unsure, say N.
 
+config IMA_KEXEC
+   bool "Enable carrying the IMA measurement list across a soft boot"
+   depends on IMA && TCG_TPM && HAVE_IMA_KEXEC
+   default n
+   help
+  TPM PCRs are only reset on a hard reboot.  In order to validate
+  a TPM's quote after a soft boot, the IMA measurement list of the
+  running kernel must be saved and restored on boot.
+
+  Depending on the IMA policy, the measurement list can grow to
+  be very large.
+
 config IMA_MEASURE_PCR_IDX
int
depends on IMA
diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index 51dc8d5..ea1dcc4 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -143,6 +143,7 @@ void ima_print_digest(struct seq_file *m, u8 *digest, u32 
size);
 struct ima_template_desc *ima_template_desc_current(void);
 int ima_restore_measurement_entry(struct ima_template_entry *entry);
 int ima_restore_measurement_list(loff_t bufsize, void *buf);
+unsigned long ima_get_binary_runtime_size(void);
 int ima_init_template(void);
 
 /*
diff --git a/security/integrity/ima/ima_queue.c 
b/security/integrity/ima/ima_queue.c
index 12d1b04..3a3cc2a 100644
--- a/security/integrity/ima/ima_queue.c
+++ b/security/integrity/ima/ima_queue.c
@@ -29,6 +29,11 @@
 #define AUDIT_CAUSE_LEN_MAX 32
 
 LIST_HEAD(ima_measurements);   /* list of all measurements */
+#ifdef CONFIG_IMA_KEXEC
+static unsigned long binary_runtime_size;
+#else
+static unsigned long binary_runtime_size = ULONG_MAX;
+#endif
 
 /* key: inode (before secure-hashing a file) */
 struct ima_h_table ima_htable = {
@@ -64,6 +69,24 @@ static struct ima_queue_entry *ima_lookup_digest_entry(u8 
*digest_value,
return ret;
 }
 
+/*
+ * Calculate the memory required for serializing a single
+ * binary_runtime_measurement list entry, which contains a
+ * couple of variable length fields (e.g template name and data).
+ */
+static int get_binary_runtime_size(struct ima_template_entry *entry)
+{
+   int size = 0;
+
+   size += sizeof(u32);/* pcr */
+   size += sizeof(entry->digest);
+   size += sizeof(int);/* template name size field */
+   size += strlen(entry->template_desc->name);
+   size += sizeof(entry->template_data_len);
+   size += entry->template_data_len;
+   return size;
+}
+
 /* ima_add_template_entry helper function:
  * - Add template entry to the measurement list and hash table, for
  *   all entries except those carried across kexec.
@@ -90,9 +113,30 @@ static int ima_add_digest_entry(struct ima_template_entry 
*entry, int flags)
key = ima_hash_key(entry->digest);
hlist_add_head_rcu(>hnext, _htable.queue[key]);
}
+
+   if (binary_runtime_size != ULONG_MAX) {
+   int size;
+
+   size = get_binary_runtime_size(entry);
+   binary_runtime_size = (binary_runtime_size < ULONG_MAX - size) ?
+binary_runtime_size + size : ULONG_MAX;
+   }
return 0;
 }
 
+/*
+ * Return the amount of memory required for serializing the
+ * entire binary_runtime_measurement list, including the ima_kexec_hdr
+ * structure.
+ */
+unsigned long ima_get_binary_runtime_size(void)
+{
+   if (binary_runtime_size >= (ULONG_MAX - sizeof(struct ima_kexec_hdr)))
+   return ULONG_MAX;
+   else
+   return binary_runtime_size + sizeof(struct ima_kexec_hdr);
+};
+
 static int ima_pcr_extend(const u8 *hash, int pcr)
 {
int result = 0;
@@ -106,8 +150,13 @@ static int ima_pcr_extend(const u8 *hash, int pcr)
return result;
 }
 
-/* Add template entry to the measurement list and hash table,
- * and extend the pcr.
+/*
+ * Add template entry to the measurement list and hash table, and
+ * extend the pcr.
+ *
+ * On systems which support carrying the IMA measurement list across
+ * kexec, maintain the total memory size required for serializing the
+ * binary_runtime_measurements.
  */
 int ima_add_template_entry(struct ima_template_entry *entry, int violation,
   const char *op, struct inode 

Re: [bug] crypto/vmx/p8_ghash memory corruption in 4.8-rc7

2016-09-26 Thread Marcelo Cerri
Herbert,

Wouldn't be enough to provide a pair of import/export functions as the
padlock-sha driver does?

-- 
Regards,
Marcelo

On Mon, Sep 26, 2016 at 10:59:34PM +0800, Herbert Xu wrote:
> On Fri, Sep 23, 2016 at 08:22:27PM -0400, Jan Stancek wrote:
> >
> > This seems to directly correspond with:
> >   p8_ghash_alg.descsize = sizeof(struct p8_ghash_desc_ctx) == 56
> >   shash_tfm->descsize = sizeof(struct p8_ghash_desc_ctx) + 
> > crypto_shash_descsize(fallback) == 56 + 20
> > where 20 is presumably coming from "ghash_alg.descsize".
> > 
> > My gut feeling was that these 2 should match, but I'd love to hear
> > what crypto people think.
> 
> Indeed.  The vmx driver is broken.  It is allocating a fallback
> but is not providing any space for the state of the fallback.
> 
> Unfortunately our interface doesn't really provide a way to provide
> the state size dynamically.  So what I'd suggest is to fix the
> fallback to the generic ghash implementation and export its state
> size like we do for md5/sha.
> 
> Cheers,
> -- 
> Email: Herbert Xu 
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
> --
> To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


signature.asc
Description: PGP signature


[PATCH v5 10/10] ima: platform-independent hash value

2016-09-26 Thread Mimi Zohar
From: Andreas Steffen 

For remote attestion it is important for the ima measurement values
to be platform-independent. Therefore integer fields to be hashed
must be converted to canonical format.

Changelog:
- Define canonical format as little endian (Mimi)

Signed-off-by: Andreas Steffen 
Signed-off-by: Mimi Zohar 
---
 security/integrity/ima/ima_crypto.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/security/integrity/ima/ima_crypto.c 
b/security/integrity/ima/ima_crypto.c
index 38f2ed8..802d5d2 100644
--- a/security/integrity/ima/ima_crypto.c
+++ b/security/integrity/ima/ima_crypto.c
@@ -477,11 +477,13 @@ static int ima_calc_field_array_hash_tfm(struct 
ima_field_data *field_data,
u8 buffer[IMA_EVENT_NAME_LEN_MAX + 1] = { 0 };
u8 *data_to_hash = field_data[i].data;
u32 datalen = field_data[i].len;
+   u32 datalen_to_hash =
+   !ima_canonical_fmt ? datalen : cpu_to_le32(datalen);
 
if (strcmp(td->name, IMA_TEMPLATE_IMA_NAME) != 0) {
rc = crypto_shash_update(shash,
-   (const u8 *) _data[i].len,
-   sizeof(field_data[i].len));
+   (const u8 *) _to_hash,
+   sizeof(datalen_to_hash));
if (rc)
break;
} else if (strcmp(td->fields[i]->field_id, "n") == 0) {
-- 
2.1.0



[PATCH v5 06/10] ima: on soft reboot, save the measurement list

2016-09-26 Thread Mimi Zohar
The TPM PCRs are only reset on a hard reboot.  In order to validate a
TPM's quote after a soft reboot (eg. kexec -e), the IMA measurement list
of the running kernel must be saved and restored on boot.

This patch uses the kexec buffer passing mechanism to pass the
serialized IMA binary_runtime_measurements to the next kernel.

Changelog v5:
- move writing the IMA measurement list to kexec load and remove
  from kexec execute.
- remove registering notifier to call update on kexec execute
- add includes needed by code in this patch to ima_kexec.c (Thiago)
- fold patch "ima: serialize the binary_runtime_measurements"
into this patch.

Changelog v4:
- Revert the skip_checksum change.  Instead calculate the checksum
with the measurement list segment, on update validate the existing
checksum before re-calulating a new checksum with the updated
measurement list.

Changelog v3:
- Request a kexec segment for storing the measurement list a half page,
not a full page, more than needed for additional measurements.
- Added binary_runtime_size overflow test
- Limit maximum number of pages needed for kexec_segment_size to half
of totalram_pages. (Dave Young)

Changelog v2:
- Fix build issue by defining a stub ima_add_kexec_buffer and stub
  struct kimage when CONFIG_IMA=n and CONFIG_IMA_KEXEC=n. (Fenguang Wu)
- removed kexec_add_handover_buffer() checksum argument.
- added skip_checksum member to kexec_buf
- only register reboot notifier once

Changelog v1:
- updated to call IMA functions  (Mimi)
- move code from ima_template.c to ima_kexec.c (Mimi)

Signed-off-by: Thiago Jung Bauermann 
Signed-off-by: Mimi Zohar 
---
 include/linux/ima.h|  12 
 kernel/kexec_file.c|   4 ++
 security/integrity/ima/ima.h   |   1 +
 security/integrity/ima/ima_fs.c|   2 +-
 security/integrity/ima/ima_kexec.c | 117 +
 5 files changed, 135 insertions(+), 1 deletion(-)

diff --git a/include/linux/ima.h b/include/linux/ima.h
index 0eb7c2e..7f6952f 100644
--- a/include/linux/ima.h
+++ b/include/linux/ima.h
@@ -11,6 +11,7 @@
 #define _LINUX_IMA_H
 
 #include 
+#include 
 struct linux_binprm;
 
 #ifdef CONFIG_IMA
@@ -23,6 +24,10 @@ extern int ima_post_read_file(struct file *file, void *buf, 
loff_t size,
  enum kernel_read_file_id id);
 extern void ima_post_path_mknod(struct dentry *dentry);
 
+#ifdef CONFIG_IMA_KEXEC
+extern void ima_add_kexec_buffer(struct kimage *image);
+#endif
+
 #else
 static inline int ima_bprm_check(struct linux_binprm *bprm)
 {
@@ -62,6 +67,13 @@ static inline void ima_post_path_mknod(struct dentry *dentry)
 
 #endif /* CONFIG_IMA */
 
+#ifndef CONFIG_IMA_KEXEC
+struct kimage;
+
+static inline void ima_add_kexec_buffer(struct kimage *image)
+{}
+#endif
+
 #ifdef CONFIG_IMA_APPRAISE
 extern void ima_inode_post_setattr(struct dentry *dentry);
 extern int ima_inode_setxattr(struct dentry *dentry, const char *xattr_name,
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 0c2df7f..b56a558 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -132,6 +133,9 @@ kimage_file_prepare_segments(struct kimage *image, int 
kernel_fd, int initrd_fd,
return ret;
image->kernel_buf_len = size;
 
+   /* IMA needs to pass the measurement list to the next kernel. */
+   ima_add_kexec_buffer(image);
+
/* Call arch image probe handlers */
ret = arch_kexec_kernel_image_probe(image, image->kernel_buf,
image->kernel_buf_len);
diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index ea1dcc4..139dec6 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -143,6 +143,7 @@ void ima_print_digest(struct seq_file *m, u8 *digest, u32 
size);
 struct ima_template_desc *ima_template_desc_current(void);
 int ima_restore_measurement_entry(struct ima_template_entry *entry);
 int ima_restore_measurement_list(loff_t bufsize, void *buf);
+int ima_measurements_show(struct seq_file *m, void *v);
 unsigned long ima_get_binary_runtime_size(void);
 int ima_init_template(void);
 
diff --git a/security/integrity/ima/ima_fs.c b/security/integrity/ima/ima_fs.c
index c07a384..66e5dd5 100644
--- a/security/integrity/ima/ima_fs.c
+++ b/security/integrity/ima/ima_fs.c
@@ -116,7 +116,7 @@ void ima_putc(struct seq_file *m, void *data, int datalen)
  *   [eventdata length]
  *   eventdata[n]=template specific data
  */
-static int ima_measurements_show(struct seq_file *m, void *v)
+int ima_measurements_show(struct seq_file *m, void *v)
 {
/* the list never shrinks, so we don't need a lock here */
struct ima_queue_entry *qe = v;
diff --git a/security/integrity/ima/ima_kexec.c 
b/security/integrity/ima/ima_kexec.c
index 36afd0f..2c4824a 100644
--- 

[PATCH v5 09/10] ima: define a canonical binary_runtime_measurements list format

2016-09-26 Thread Mimi Zohar
The IMA binary_runtime_measurements list is currently in platform native
format.

To allow restoring a measurement list carried across kexec with a
different endianness than the targeted kernel, this patch defines
little-endian as the canonical format.  For big endian systems wanting
to save/restore the measurement list from a system with a different
endianness, a new boot command line parameter named "ima_canonical_fmt"
is defined.

Considerations: use of the "ima_canonical_fmt" boot command line
option will break existing userspace applications on big endian systems
expecting the binary_runtime_measurements list to be in platform native
format.

Changelog v3:
- restore PCR value properly

Signed-off-by: Mimi Zohar 
---
 Documentation/kernel-parameters.txt   |  4 
 security/integrity/ima/ima.h  |  6 ++
 security/integrity/ima/ima_fs.c   | 28 +---
 security/integrity/ima/ima_kexec.c| 11 +--
 security/integrity/ima/ima_template.c | 24 ++--
 security/integrity/ima/ima_template_lib.c |  7 +--
 6 files changed, 67 insertions(+), 13 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index a4f4d69..8be3ac8 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1580,6 +1580,10 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
The builtin appraise policy appraises all files
owned by uid=0.
 
+   ima_canonical_fmt [IMA]
+   Use the canonical format for the binary runtime
+   measurements, instead of host native format.
+
ima_hash=   [IMA]
Format: { md5 | sha1 | rmd160 | sha256 | sha384
   | sha512 | ... }
diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index 6b0540a..5e6180a 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -122,6 +122,12 @@ void ima_load_kexec_buffer(void);
 static inline void ima_load_kexec_buffer(void) {}
 #endif /* CONFIG_HAVE_IMA_KEXEC */
 
+/*
+ * The default binary_runtime_measurements list format is defined as the
+ * platform native format.  The canonical format is defined as little-endian.
+ */
+extern bool ima_canonical_fmt;
+
 /* Internal IMA function definitions */
 int ima_init(void);
 int ima_fs_init(void);
diff --git a/security/integrity/ima/ima_fs.c b/security/integrity/ima/ima_fs.c
index 66e5dd5..2bcad99 100644
--- a/security/integrity/ima/ima_fs.c
+++ b/security/integrity/ima/ima_fs.c
@@ -28,6 +28,16 @@
 
 static DEFINE_MUTEX(ima_write_mutex);
 
+bool ima_canonical_fmt;
+static int __init default_canonical_fmt_setup(char *str)
+{
+#ifdef __BIG_ENDIAN
+   ima_canonical_fmt = 1;
+#endif
+   return 1;
+}
+__setup("ima_canonical_fmt", default_canonical_fmt_setup);
+
 static int valid_policy = 1;
 #define TMPBUFLEN 12
 static ssize_t ima_show_htable_value(char __user *buf, size_t count,
@@ -122,7 +132,7 @@ int ima_measurements_show(struct seq_file *m, void *v)
struct ima_queue_entry *qe = v;
struct ima_template_entry *e;
char *template_name;
-   int namelen;
+   u32 pcr, namelen, template_data_len; /* temporary fields */
bool is_ima_template = false;
int i;
 
@@ -139,25 +149,29 @@ int ima_measurements_show(struct seq_file *m, void *v)
 * PCR used defaults to the same (config option) in
 * little-endian format, unless set in policy
 */
-   ima_putc(m, >pcr, sizeof(e->pcr));
+   pcr = !ima_canonical_fmt ? e->pcr : cpu_to_le32(e->pcr);
+   ima_putc(m, , sizeof(e->pcr));
 
/* 2nd: template digest */
ima_putc(m, e->digest, TPM_DIGEST_SIZE);
 
/* 3rd: template name size */
-   namelen = strlen(template_name);
+   namelen = !ima_canonical_fmt ? strlen(template_name) :
+   cpu_to_le32(strlen(template_name));
ima_putc(m, , sizeof(namelen));
 
/* 4th:  template name */
-   ima_putc(m, template_name, namelen);
+   ima_putc(m, template_name, strlen(template_name));
 
/* 5th:  template length (except for 'ima' template) */
if (strcmp(template_name, IMA_TEMPLATE_IMA_NAME) == 0)
is_ima_template = true;
 
-   if (!is_ima_template)
-   ima_putc(m, >template_data_len,
-sizeof(e->template_data_len));
+   if (!is_ima_template) {
+   template_data_len = !ima_canonical_fmt ? e->template_data_len :
+   cpu_to_le32(e->template_data_len);
+   ima_putc(m, _data_len, sizeof(e->template_data_len));
+   }
 
/* 6th:  template specific data */
for (i = 0; i < e->template_desc->num_fields; i++) {
diff --git a/security/integrity/ima/ima_kexec.c 

[PATCH v5 08/10] ima: support restoring multiple template formats

2016-09-26 Thread Mimi Zohar
The configured IMA measurement list template format can be replaced at
runtime on the boot command line, including a custom template format.
This patch adds support for restoring a measuremement list containing
multiple builtin/custom template formats.

Signed-off-by: Mimi Zohar 
---
 security/integrity/ima/ima_template.c | 53 +--
 1 file changed, 50 insertions(+), 3 deletions(-)

diff --git a/security/integrity/ima/ima_template.c 
b/security/integrity/ima/ima_template.c
index c0d808c..e57b468 100644
--- a/security/integrity/ima/ima_template.c
+++ b/security/integrity/ima/ima_template.c
@@ -155,9 +155,14 @@ static int template_desc_init_fields(const char 
*template_fmt,
 {
const char *template_fmt_ptr;
struct ima_template_field *found_fields[IMA_TEMPLATE_NUM_FIELDS_MAX];
-   int template_num_fields = template_fmt_size(template_fmt);
+   int template_num_fields;
int i, len;
 
+   if (num_fields && *num_fields > 0) /* already initialized? */
+   return 0;
+
+   template_num_fields = template_fmt_size(template_fmt);
+
if (template_num_fields > IMA_TEMPLATE_NUM_FIELDS_MAX) {
pr_err("format string '%s' contains too many fields\n",
   template_fmt);
@@ -237,6 +242,35 @@ int __init ima_init_template(void)
return result;
 }
 
+static struct ima_template_desc *restore_template_fmt(char *template_name)
+{
+   struct ima_template_desc *template_desc = NULL;
+   int ret;
+
+   ret = template_desc_init_fields(template_name, NULL, NULL);
+   if (ret < 0) {
+   pr_err("attempting to initialize the template \"%s\" failed\n",
+   template_name);
+   goto out;
+   }
+
+   template_desc = kzalloc(sizeof(*template_desc), GFP_KERNEL);
+   if (!template_desc)
+   goto out;
+
+   template_desc->name = "";
+   template_desc->fmt = kstrdup(template_name, GFP_KERNEL);
+   if (!template_desc->fmt)
+   goto out;
+
+   spin_lock(_list);
+   list_add_tail_rcu(_desc->list, _templates);
+   spin_unlock(_list);
+   synchronize_rcu();
+out:
+   return template_desc;
+}
+
 static int ima_restore_template_data(struct ima_template_desc *template_desc,
 void *template_data,
 int template_data_size,
@@ -367,10 +401,23 @@ int ima_restore_measurement_list(loff_t size, void *buf)
}
data_v1 = bufp += (u_int8_t)hdr_v1->template_name_len;
 
-   /* get template format */
template_desc = lookup_template_desc(template_name);
if (!template_desc) {
-   pr_err("template \"%s\" not found\n", template_name);
+   template_desc = restore_template_fmt(template_name);
+   if (!template_desc)
+   break;
+   }
+
+   /*
+* Only the running system's template format is initialized
+* on boot.  As needed, initialize the other template formats.
+*/
+   ret = template_desc_init_fields(template_desc->fmt,
+   &(template_desc->fields),
+   &(template_desc->num_fields));
+   if (ret < 0) {
+   pr_err("attempting to restore the template fmt \"%s\" \
+   failed\n", template_desc->fmt);
ret = -EINVAL;
break;
}
-- 
2.1.0



[PATCH v5 07/10] ima: store the builtin/custom template definitions in a list

2016-09-26 Thread Mimi Zohar
The builtin and single custom templates are currently stored in an
array.  In preparation for being able to restore a measurement list
containing multiple builtin/custom templates, this patch stores the
builtin and custom templates as a linked list.  This will permit
defining more than one custom template per boot.

Changelog v4:
- fix "spinlock bad magic" BUG - reported by Dmitry Vyukov

Changelog v3:
- initialize template format list in ima_template_desc_current(), as it
might be called during __setup before normal initialization. (kernel
test robot)
- remove __init annotation of ima_init_template_list()

Changelog v2:
- fix lookup_template_desc() preemption imbalance (kernel test robot)

Signed-off-by: Mimi Zohar 
---
 security/integrity/ima/ima.h  |  2 ++
 security/integrity/ima/ima_main.c |  1 +
 security/integrity/ima/ima_template.c | 52 +++
 3 files changed, 44 insertions(+), 11 deletions(-)

diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index 139dec6..6b0540a 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -85,6 +85,7 @@ struct ima_template_field {
 
 /* IMA template descriptor definition */
 struct ima_template_desc {
+   struct list_head list;
char *name;
char *fmt;
int num_fields;
@@ -146,6 +147,7 @@ int ima_restore_measurement_list(loff_t bufsize, void *buf);
 int ima_measurements_show(struct seq_file *m, void *v);
 unsigned long ima_get_binary_runtime_size(void);
 int ima_init_template(void);
+void ima_init_template_list(void);
 
 /*
  * used to protect h_table and sha_table
diff --git a/security/integrity/ima/ima_main.c 
b/security/integrity/ima/ima_main.c
index 596ef61..592f318 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -418,6 +418,7 @@ static int __init init_ima(void)
 {
int error;
 
+   ima_init_template_list();
hash_setup(CONFIG_IMA_DEFAULT_HASH);
error = ima_init();
if (!error) {
diff --git a/security/integrity/ima/ima_template.c 
b/security/integrity/ima/ima_template.c
index 37f972c..c0d808c 100644
--- a/security/integrity/ima/ima_template.c
+++ b/security/integrity/ima/ima_template.c
@@ -15,16 +15,20 @@
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#include 
 #include "ima.h"
 #include "ima_template_lib.h"
 
-static struct ima_template_desc defined_templates[] = {
+static struct ima_template_desc builtin_templates[] = {
{.name = IMA_TEMPLATE_IMA_NAME, .fmt = IMA_TEMPLATE_IMA_FMT},
{.name = "ima-ng", .fmt = "d-ng|n-ng"},
{.name = "ima-sig", .fmt = "d-ng|n-ng|sig"},
{.name = "", .fmt = ""},/* placeholder for a custom format */
 };
 
+static LIST_HEAD(defined_templates);
+static DEFINE_SPINLOCK(template_list);
+
 static struct ima_template_field supported_fields[] = {
{.field_id = "d", .field_init = ima_eventdigest_init,
 .field_show = ima_show_template_digest},
@@ -53,6 +57,8 @@ static int __init ima_template_setup(char *str)
if (ima_template)
return 1;
 
+   ima_init_template_list();
+
/*
 * Verify that a template with the supplied name exists.
 * If not, use CONFIG_IMA_DEFAULT_TEMPLATE.
@@ -81,7 +87,7 @@ __setup("ima_template=", ima_template_setup);
 
 static int __init ima_template_fmt_setup(char *str)
 {
-   int num_templates = ARRAY_SIZE(defined_templates);
+   int num_templates = ARRAY_SIZE(builtin_templates);
 
if (ima_template)
return 1;
@@ -92,22 +98,28 @@ static int __init ima_template_fmt_setup(char *str)
return 1;
}
 
-   defined_templates[num_templates - 1].fmt = str;
-   ima_template = defined_templates + num_templates - 1;
+   builtin_templates[num_templates - 1].fmt = str;
+   ima_template = builtin_templates + num_templates - 1;
+
return 1;
 }
 __setup("ima_template_fmt=", ima_template_fmt_setup);
 
 static struct ima_template_desc *lookup_template_desc(const char *name)
 {
-   int i;
+   struct ima_template_desc *template_desc;
+   int found = 0;
 
-   for (i = 0; i < ARRAY_SIZE(defined_templates); i++) {
-   if (strcmp(defined_templates[i].name, name) == 0)
-   return defined_templates + i;
+   rcu_read_lock();
+   list_for_each_entry_rcu(template_desc, _templates, list) {
+   if ((strcmp(template_desc->name, name) == 0) ||
+   (strcmp(template_desc->fmt, name) == 0)) {
+   found = 1;
+   break;
+   }
}
-
-   return NULL;
+   rcu_read_unlock();
+   return found ? template_desc : NULL;
 }
 
 static struct ima_template_field *lookup_template_field(const char *field_id)
@@ -183,11 +195,29 @@ static int template_desc_init_fields(const char 
*template_fmt,
return 0;
 }
 
+void 

[PATCH v5 05/10] powerpc: ima: Send the kexec buffer to the next kernel

2016-09-26 Thread Mimi Zohar
From: Thiago Jung Bauermann 

The IMA kexec buffer allows the currently running kernel to pass
the measurement list via a kexec segment to the kernel that will be
kexec'd.

This is the architecture-specific part of setting up the IMA kexec
buffer for the next kernel. It will be used in the next patch.

Changelog v5:
- New patch in this version. This code was previously in the kexec buffer
  handover patch series.

Changelog relative to kexec handover patches v5:
- Moved code to arch/powerpc/kernel/ima_kexec.c.
- Renamed functions and struct members to variations of ima_kexec_buffer
  instead of variations of kexec_handover_buffer.
- Use a single property /chosen/linux,ima-kexec-buffer containing
  the buffer address and length, instead of
  /chosen/linux,kexec-handover-buffer-{start,end}.
- Use #address-cells and #size-cells to write the DT property.
- Use size_t instead of unsigned long for size arguments.
- Use CONFIG_IMA_KEXEC to build this code only when necessary.

Signed-off-by: Thiago Jung Bauermann 
---
 arch/powerpc/include/asm/ima.h | 16 ++
 arch/powerpc/include/asm/kexec.h   | 15 +-
 arch/powerpc/kernel/ima_kexec.c| 91 ++
 arch/powerpc/kernel/kexec_elf_64.c |  2 +-
 arch/powerpc/kernel/machine_kexec_64.c | 12 +++--
 5 files changed, 130 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/ima.h b/arch/powerpc/include/asm/ima.h
index d5a72dd..2313bdf 100644
--- a/arch/powerpc/include/asm/ima.h
+++ b/arch/powerpc/include/asm/ima.h
@@ -1,6 +1,8 @@
 #ifndef _ASM_POWERPC_IMA_H
 #define _ASM_POWERPC_IMA_H
 
+struct kimage;
+
 int ima_get_kexec_buffer(void **addr, size_t *size);
 int ima_free_kexec_buffer(void);
 
@@ -10,4 +12,18 @@ void remove_ima_buffer(void *fdt, int chosen_node);
 static inline void remove_ima_buffer(void *fdt, int chosen_node) {}
 #endif
 
+#ifdef CONFIG_IMA_KEXEC
+int arch_ima_add_kexec_buffer(struct kimage *image, unsigned long load_addr,
+ size_t size);
+
+int setup_ima_buffer(const struct kimage *image, void *fdt, int chosen_node);
+#else
+static inline int setup_ima_buffer(const struct kimage *image, void *fdt,
+  int chosen_node)
+{
+   remove_ima_buffer(fdt, chosen_node);
+   return 0;
+}
+#endif /* CONFIG_IMA_KEXEC */
+
 #endif /* _ASM_POWERPC_IMA_H */
diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index 5dda514..8848eab 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -92,12 +92,23 @@ static inline bool kdump_in_progress(void)
 }
 
 #ifdef CONFIG_KEXEC_FILE
+
+#ifdef CONFIG_IMA_KEXEC
+#define ARCH_HAS_KIMAGE_ARCH
+
+struct kimage_arch {
+   phys_addr_t ima_buffer_addr;
+   size_t ima_buffer_size;
+};
+#endif
+
 int setup_purgatory(struct kimage *image, const void *slave_code,
const void *fdt, unsigned long kernel_load_addr,
unsigned long fdt_load_addr, unsigned long stack_top,
int debug);
-int setup_new_fdt(void *fdt, unsigned long initrd_load_addr,
- unsigned long initrd_len, const char *cmdline);
+int setup_new_fdt(const struct kimage *image, void *fdt,
+ unsigned long initrd_load_addr, unsigned long initrd_len,
+ const char *cmdline);
 bool find_debug_console(const void *fdt);
 int delete_fdt_mem_rsv(void *fdt, unsigned long start, unsigned long size);
 #endif /* CONFIG_KEXEC_FILE */
diff --git a/arch/powerpc/kernel/ima_kexec.c b/arch/powerpc/kernel/ima_kexec.c
index 36e5a5d..5ea42c9 100644
--- a/arch/powerpc/kernel/ima_kexec.c
+++ b/arch/powerpc/kernel/ima_kexec.c
@@ -130,3 +130,94 @@ void remove_ima_buffer(void *fdt, int chosen_node)
if (!ret)
pr_debug("Removed old IMA buffer reservation.\n");
 }
+
+#ifdef CONFIG_IMA_KEXEC
+/**
+ * arch_ima_add_kexec_buffer - do arch-specific steps to add the IMA buffer
+ *
+ * Architectures should use this function to pass on the IMA buffer
+ * information to the next kernel.
+ *
+ * Return: 0 on success, negative errno on error.
+ */
+int arch_ima_add_kexec_buffer(struct kimage *image, unsigned long load_addr,
+ size_t size)
+{
+   image->arch.ima_buffer_addr = load_addr;
+   image->arch.ima_buffer_size = size;
+
+   return 0;
+}
+
+static int write_number(void *p, u64 value, int cells)
+{
+   if (cells == 1) {
+   u32 tmp;
+
+   if (value > U32_MAX)
+   return -EINVAL;
+
+   tmp = cpu_to_be32(value);
+   memcpy(p, , sizeof(tmp));
+   } else if (cells == 2) {
+   u64 tmp;
+
+   tmp = cpu_to_be64(value);
+   memcpy(p, , sizeof(tmp));
+   } else
+   return -EINVAL;
+
+   return 0;
+}
+
+/**
+ * setup_ima_buffer - add IMA buffer information to the fdt
+ 

[PATCH v5 01/10] powerpc: ima: Get the kexec buffer passed by the previous kernel

2016-09-26 Thread Mimi Zohar
From: Thiago Jung Bauermann 

The IMA kexec buffer allows the currently running kernel to pass
the measurement list via a kexec segment to the kernel that will be
kexec'd. The second kernel can check whether the previous kernel sent
the buffer and retrieve it.

This is the architecture-specific part which enables IMA to receive the
measurement list passed by the previous kernel. It will be used in the
next patch.

The change in machine_kexec_64.c is to factor out the logic of removing
an FDT memory reservation so that it can be used by remove_ima_buffer.

Changelog v5:
- New patch in this version. This code was previously in the kexec buffer
  handover patch series.

Changelog relative to kexec handover patches v5:
- Added CONFIG_HAVE_IMA_KEXEC.
- Added arch/powerpc/include/asm/ima.h.
- Moved code to arch/powerpc/kernel/ima_kexec.c.
- Renamed functions to variations of ima_kexec_buffer instead of
  variations of kexec_handover_buffer.
- Use a single property /chosen/linux,ima-kexec-buffer containing
  the buffer address and length, instead of
  /chosen/linux,kexec-handover-buffer-{start,end}.
- Use #address-cells and #size-cells to read the DT property.
- Use size_t instead of unsigned long for size arguments.
- Always remove linux,ima-kexec-buffer and its memory reservation
  when preparing a device tree for kexec_file_load.

Signed-off-by: Thiago Jung Bauermann 
---
 arch/Kconfig   |   3 +
 arch/powerpc/Kconfig   |   1 +
 arch/powerpc/include/asm/ima.h |  13 
 arch/powerpc/include/asm/kexec.h   |   1 +
 arch/powerpc/kernel/Makefile   |   4 +
 arch/powerpc/kernel/ima_kexec.c| 132 +
 arch/powerpc/kernel/machine_kexec_64.c | 106 +-
 7 files changed, 208 insertions(+), 52 deletions(-)
 create mode 100644 arch/powerpc/include/asm/ima.h
 create mode 100644 arch/powerpc/kernel/ima_kexec.c

diff --git a/arch/Kconfig b/arch/Kconfig
index e9c9334..60283562 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -5,6 +5,9 @@
 config KEXEC_CORE
bool
 
+config HAVE_IMA_KEXEC
+   bool
+
 config OPROFILE
tristate "OProfile system profiling"
depends on PROFILING
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index d1ba864..17fff29 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -458,6 +458,7 @@ config KEXEC
 config KEXEC_FILE
bool "kexec file based system call"
select KEXEC_CORE
+   select HAVE_IMA_KEXEC
select BUILD_BIN2C
depends on PPC64
depends on CRYPTO=y
diff --git a/arch/powerpc/include/asm/ima.h b/arch/powerpc/include/asm/ima.h
new file mode 100644
index 000..d5a72dd
--- /dev/null
+++ b/arch/powerpc/include/asm/ima.h
@@ -0,0 +1,13 @@
+#ifndef _ASM_POWERPC_IMA_H
+#define _ASM_POWERPC_IMA_H
+
+int ima_get_kexec_buffer(void **addr, size_t *size);
+int ima_free_kexec_buffer(void);
+
+#ifdef CONFIG_IMA
+void remove_ima_buffer(void *fdt, int chosen_node);
+#else
+static inline void remove_ima_buffer(void *fdt, int chosen_node) {}
+#endif
+
+#endif /* _ASM_POWERPC_IMA_H */
diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index 73f88b5..5dda514 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -99,6 +99,7 @@ int setup_purgatory(struct kimage *image, const void 
*slave_code,
 int setup_new_fdt(void *fdt, unsigned long initrd_load_addr,
  unsigned long initrd_len, const char *cmdline);
 bool find_debug_console(const void *fdt);
+int delete_fdt_mem_rsv(void *fdt, unsigned long start, unsigned long size);
 #endif /* CONFIG_KEXEC_FILE */
 
 #else /* !CONFIG_KEXEC_CORE */
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 0ee1b3f..5e0bacc 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -109,6 +109,10 @@ obj-$(CONFIG_PCI_MSI)  += msi.o
 obj-$(CONFIG_KEXEC_CORE)   += machine_kexec.o crash.o \
   machine_kexec_$(BITS).o
 obj-$(CONFIG_KEXEC_FILE)   += kexec_elf_$(BITS).o
+ifeq ($(CONFIG_HAVE_IMA_KEXEC)$(CONFIG_IMA),yy)
+obj-y  += ima_kexec.o
+endif
+
 obj-$(CONFIG_AUDIT)+= audit.o
 obj64-$(CONFIG_AUDIT)  += compat_audit.o
 
diff --git a/arch/powerpc/kernel/ima_kexec.c b/arch/powerpc/kernel/ima_kexec.c
new file mode 100644
index 000..36e5a5d
--- /dev/null
+++ b/arch/powerpc/kernel/ima_kexec.c
@@ -0,0 +1,132 @@
+/*
+ * Copyright (C) 2016 IBM Corporation
+ *
+ * Authors:
+ * Thiago Jung Bauermann 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include 
+#include 

[PATCH v5 00/10] ima: carry the measurement list across kexec

2016-09-26 Thread Mimi Zohar
The TPM PCRs are only reset on a hard reboot.  In order to validate a
TPM's quote after a soft reboot (eg. kexec -e), the IMA measurement list
of the running kernel must be saved and then restored on the subsequent
boot, possibly of a different architecture.

The existing securityfs binary_runtime_measurements file conveniently
provides a serialized format of the IMA measurement list. This patch
set serializes the measurement list in this format and restores it.

Up to now, the binary_runtime_measurements was defined as architecture
native format.  The assumption being that userspace could and would
handle any architecture conversions.  With the ability of carrying the
measurement list across kexec, possibly from one architecture to a
different one, the per boot architecture information is lost and with it
the ability of recalculating the template digest hash.  To resolve this
problem, without breaking the existing ABI, this patch set introduces
the boot command line option "ima_canonical_fmt", which is arbitrarily
defined as little endian.

The need for this boot command line option will be limited to the
existing version 1 format of the binary_runtime_measurements.
Subsequent formats will be defined as canonical format (eg. TPM 2.0
support for larger digests).

A simplified method of Thiago Bauermann's "kexec buffer handover" patch
series for carrying the IMA measurement list across kexec is included
in this patch set.  The simplified method requires all file measurements
be taken prior to executing the kexec load, as subsequent measurements
will not be carried across the kexec and restored.

These patches can also be found in the next-kexec-restore branch of:
git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity.git

Changelog v5:
- Included patches from Thiago Bauermann's "kexec buffer handover"
patch series for carrying the IMA measurement list across kexec.
- Added CONFIG_HAVE_IMA_KEXEC
- Renamed functions to variations of ima_kexec_buffer instead of
variations of kexec_handover_buffer

Changelog v4:
- Fixed "spinlock bad magic" BUG - reported by Dmitry Vyukov
- Rebased on Thiago Bauermann's v5 patch set
- Removed the skip_checksum initialization  

Changelog v3:
- Cleaned up the code for calculating the requested kexec segment size
needed for the IMA measurement list, limiting the segment size to half
of the totalram_pages.
- Fixed kernel test robot reports as enumerated in the respective
patch changelog.

Changelog v2:
- Canonical measurement list support added
- Redefined the ima_kexec_hdr struct to use well defined sizes

Andreas Steffen (1):
  ima: platform-independent hash value

Mimi Zohar (7):
  ima: on soft reboot, restore the measurement list
  ima: permit duplicate measurement list entries
  ima: maintain memory size needed for serializing the measurement list
  ima: on soft reboot, save the measurement list
  ima: store the builtin/custom template definitions in a list
  ima: support restoring multiple template formats
  ima: define a canonical binary_runtime_measurements list format

Thiago Jung Bauermann (2):
  powerpc: ima: Get the kexec buffer passed by the previous kernel
  powerpc: ima: Send the kexec buffer to the next kernel

 Documentation/kernel-parameters.txt   |   4 +
 arch/Kconfig  |   3 +
 arch/powerpc/Kconfig  |   1 +
 arch/powerpc/include/asm/ima.h|  29 +++
 arch/powerpc/include/asm/kexec.h  |  16 +-
 arch/powerpc/kernel/Makefile  |   4 +
 arch/powerpc/kernel/ima_kexec.c   | 223 +++
 arch/powerpc/kernel/kexec_elf_64.c|   2 +-
 arch/powerpc/kernel/machine_kexec_64.c| 116 ++--
 include/linux/ima.h   |  12 ++
 kernel/kexec_file.c   |   4 +
 security/integrity/ima/Kconfig|  12 ++
 security/integrity/ima/Makefile   |   1 +
 security/integrity/ima/ima.h  |  31 
 security/integrity/ima/ima_crypto.c   |   6 +-
 security/integrity/ima/ima_fs.c   |  30 ++-
 security/integrity/ima/ima_init.c |   2 +
 security/integrity/ima/ima_kexec.c| 168 +
 security/integrity/ima/ima_main.c |   1 +
 security/integrity/ima/ima_queue.c|  76 +++-
 security/integrity/ima/ima_template.c | 293 --
 security/integrity/ima/ima_template_lib.c |   7 +-
 22 files changed, 952 insertions(+), 89 deletions(-)
 create mode 100644 arch/powerpc/include/asm/ima.h
 create mode 100644 arch/powerpc/kernel/ima_kexec.c
 create mode 100644 security/integrity/ima/ima_kexec.c

-- 
2.1.0



Re: [PATCH v21 00/20] perf, tools: Add support for PMU events in JSON format

2016-09-26 Thread Andi Kleen
On Mon, Sep 26, 2016 at 12:03:43PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Mon, Sep 26, 2016 at 10:35:33AM +0200, Jiri Olsa escreveu:
> > ping.. is that working for you? IMO we can include this
> > as additional patch to the set..
> 
> No, it doesn't fails to build on the first cross env I tried, fixing it
> now, resulting patch:

Yes it shouldn't be difficult to fix cross building. I don't think
there are any fundamental problems.

-Andi


Re: [RFC PATCH] powerpc/mm: THP page cache support

2016-09-26 Thread Aneesh Kumar K.V
"Kirill A. Shutemov"  writes:

> On Thu, Sep 22, 2016 at 09:32:40PM +0530, Aneesh Kumar K.V wrote:
>> Update arch hook in the generic THP page cache code, that will
>> deposit and withdarw preallocated page table. Archs like ppc64 use
>> this preallocated table to store the hash pte slot information.
>> 
>> This is an RFC patch and I am sharing this early to get feedback on the
>> approach taken. I have used stress-ng mmap-file operation and that
>> resulted in some thp_file_mmap as show below.
>> 
>> [/mnt/stress]$ grep thp_file /proc/vmstat
>> thp_file_alloc 25403
>> thp_file_mapped 16967
>> [/mnt/stress]$
>> 
>> I did observe wrong nr_ptes count once. I need to recreate the problem
>> again.
>
> I don't see anything that could cause that.
>

I still need to debug this.

> The patch looks good to me (apart from nr_ptes issue). Few minor nitpicks
> below.
>
>> Signed-off-by: Aneesh Kumar K.V 
>> ---
>>  arch/powerpc/include/asm/book3s/64/pgtable.h |  3 ++
>>  include/asm-generic/pgtable.h|  8 +++-
>>  mm/Kconfig   |  6 +--
>>  mm/huge_memory.c | 19 +-
>>  mm/khugepaged.c  | 21 ++-
>>  mm/memory.c  | 56 
>> +++-
>>  6 files changed, 93 insertions(+), 20 deletions(-)
>> 
>> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
>> b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> index 263bf39ced40..1f45b06ce78e 100644
>> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
>> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> @@ -1017,6 +1017,9 @@ static inline int pmd_move_must_withdraw(struct 
>> spinlock *new_pmd_ptl,
>>   */
>>  return true;
>>  }
>> +
>> +#define arch_needs_pgtable_deposit() (true)
>> +
>>  #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>>  #endif /* __ASSEMBLY__ */
>>  #endif /* _ASM_POWERPC_BOOK3S_64_PGTABLE_H_ */
>> diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
>> index d4458b6dbfb4..0d1e400e82a2 100644
>> --- a/include/asm-generic/pgtable.h
>> +++ b/include/asm-generic/pgtable.h
>> @@ -660,11 +660,17 @@ static inline int pmd_move_must_withdraw(spinlock_t 
>> *new_pmd_ptl,
>>  /*
>>   * With split pmd lock we also need to move preallocated
>>   * PTE page table if new_pmd is on different PMD page table.
>> + *
>> + * We also don't deposit and withdraw tables for file pages.
>>   */
>> -return new_pmd_ptl != old_pmd_ptl;
>> +return (new_pmd_ptl != old_pmd_ptl) && vma_is_anonymous(vma);
>>  }
>>  #endif
>>  
>> +#ifndef arch_needs_pgtable_deposit
>> +#define arch_needs_pgtable_deposit() (false)
>> +#endif
>> +
>>  /*
>>   * This function is meant to be used by sites walking pagetables with
>>   * the mmap_sem hold in read mode to protect against MADV_DONTNEED and
>> diff --git a/mm/Kconfig b/mm/Kconfig
>> index be0ee11fa0d9..0a279d399722 100644
>> --- a/mm/Kconfig
>> +++ b/mm/Kconfig
>> @@ -447,13 +447,9 @@ choice
>>benefit.
>>  endchoice
>>  
>> -#
>> -# We don't deposit page tables on file THP mapping,
>> -# but Power makes use of them to address MMU quirk.
>> -#
>>  config  TRANSPARENT_HUGE_PAGECACHE
>>  def_bool y
>> -depends on TRANSPARENT_HUGEPAGE && !PPC
>> +depends on TRANSPARENT_HUGEPAGE
>>  
>>  #
>>  # UP and nommu archs use km based percpu allocator
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index a6abd76baa72..37176f455d16 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -1320,6 +1320,14 @@ out_unlocked:
>>  return ret;
>>  }
>>  
>> +void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd)
>
> static?

Ok I will fix that.
>
>> +{
>> +pgtable_t pgtable;
>> +pgtable = pgtable_trans_huge_withdraw(mm, pmd);
>> +pte_free(mm, pgtable);
>> +atomic_long_dec(>nr_ptes);
>> +}
>> +
>>  int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>>   pmd_t *pmd, unsigned long addr)
>>  {
>> @@ -1359,6 +1367,8 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct 
>> vm_area_struct *vma,
>>  atomic_long_dec(>mm->nr_ptes);
>>  add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
>>  } else {
>> +if (arch_needs_pgtable_deposit())
>
> Just hide the arch_needs_pgtable_deposit() check in zap_deposited_table().


ok.

>
>> +zap_deposited_table(tlb->mm, pmd);
>>  add_mm_counter(tlb->mm, MM_FILEPAGES, -HPAGE_PMD_NR);
>>  }
>>  spin_unlock(ptl);

-aneesh



Re: [PATCH v3 5/5] mm: enable CONFIG_MOVABLE_NODE on powerpc

2016-09-26 Thread Aneesh Kumar K.V
Reza Arbab  writes:

> To create a movable node, we need to hotplug all of its memory into
> ZONE_MOVABLE.
>
> Note that to do this, auto_online_blocks should be off. Since the memory
> will first be added to the default zone, we must explicitly use
> online_movable to online.
>
> Because such a node contains no normal memory, can_online_high_movable()
> will only allow us to do the onlining if CONFIG_MOVABLE_NODE is set.
> Enable the use of this config option on PPC64 platforms.
>

Reviewed-by: Aneesh Kumar K.V 

> Signed-off-by: Reza Arbab 
> ---
>  Documentation/kernel-parameters.txt | 2 +-
>  mm/Kconfig  | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/kernel-parameters.txt 
> b/Documentation/kernel-parameters.txt
> index a4f4d69..3d8460d 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -2344,7 +2344,7 @@ bytes respectively. Such letter suffixes can also be 
> entirely omitted.
>   that the amount of memory usable for all allocations
>   is not too small.
>
> - movable_node[KNL,X86] Boot-time switch to enable the effects
> + movable_node[KNL,X86,PPC] Boot-time switch to enable the effects
>   of CONFIG_MOVABLE_NODE=y. See mm/Kconfig for details.
>
>   MTD_Partition=  [MTD]
> diff --git a/mm/Kconfig b/mm/Kconfig
> index be0ee11..4b19cd3 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -153,7 +153,7 @@ config MOVABLE_NODE
>   bool "Enable to assign a node which has only movable memory"
>   depends on HAVE_MEMBLOCK
>   depends on NO_BOOTMEM
> - depends on X86_64
> + depends on X86_64 || PPC64
>   depends on NUMA
>   default n
>   help
> -- 
> 1.8.3.1



Re: [PATCH v3 4/5] powerpc/mm: restore top-down allocation when using movable_node

2016-09-26 Thread Aneesh Kumar K.V
Reza Arbab  writes:

> At boot, the movable_node option sets bottom-up memblock allocation.
>
> This reduces the chance that, in the window before movable memory has
> been identified, an allocation for the kernel might come from a movable
> node. By going bottom-up, early allocations will most likely come from
> the same node as the kernel image, which is necessarily in a nonmovable
> node.
>
> Then, once any known hotplug memory has been marked, allocation can be
> reset back to top-down. On x86, this is done in numa_init(). This patch
> does the same on power, in numa initmem_init().
>
> Signed-off-by: Reza Arbab 
> ---
>  arch/powerpc/mm/numa.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index d7ac419..fdf1e69 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -945,6 +945,9 @@ void __init initmem_init(void)
>   max_low_pfn = memblock_end_of_DRAM() >> PAGE_SHIFT;
>   max_pfn = max_low_pfn;
>
> + /* bottom-up allocation may have been set by movable_node */
> + memblock_set_bottom_up(false);
> +

By then we have done few memblock allocation right ? IMHO, we should do
this early enough in prom.c after we do parse_early_param, with a
comment there explaining that, we don't really support hotplug memblock
and when we do that, this should be moved to a place where we can handle
memblock allocation such that we avoid spreading memblock allocation to
movable node.


>   if (parse_numa_properties())
>   setup_nonnuma();
>   else
> -- 
> 1.8.3.1

-aneesh



Re: [PATCH v21 00/20] perf, tools: Add support for PMU events in JSON format

2016-09-26 Thread Arnaldo Carvalho de Melo
Em Mon, Sep 26, 2016 at 10:35:33AM +0200, Jiri Olsa escreveu:
> ping.. is that working for you? IMO we can include this
> as additional patch to the set..

No, it doesn't fails to build on the first cross env I tried, fixing it
now, resulting patch:

diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config
index 72edf83d76b7..9365c155c6f3 100644
--- a/tools/perf/Makefile.config
+++ b/tools/perf/Makefile.config
@@ -758,6 +758,10 @@ ifndef NO_AUXTRACE
   endif
 endif
 
+ifndef CROSS_COMPILE
+  CFLAGS += -DHAVE_PMU_EVENTS_SUPPORT
+endif
+
 # Among the variables below, these:
 #   perfexecdir
 #   template_dir
diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index 26dbee50b36c..ee86dbf2814e 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -349,7 +349,14 @@ include $(srctree)/tools/build/Makefile.include
 
 JEVENTS   := $(OUTPUT)pmu-events/jevents
 JEVENTS_IN:= $(OUTPUT)pmu-events/jevents-in.o
+
+#
+# Disabling pmu-events for cross compile, as
+# we dont support host CC tools building yet.
+#
+ifndef CROSS_COMPILE
 PMU_EVENTS_IN := $(OUTPUT)pmu-events/pmu-events-in.o
+endif
 
 export JEVENTS
 
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 2babcdf62839..37f74fcc9ca2 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -377,6 +377,8 @@ static int pmu_alias_terms(struct perf_pmu_alias *alias,
return 0;
 }
 
+#ifdef HAVE_PMU_EVENTS_SUPPORT
+
 /*
  * Reading/parsing the default pmu type value, which should be
  * located at:
@@ -473,6 +475,23 @@ static struct cpu_map *pmu_cpumask(const char *name)
return cpus;
 }
 
+#else
+static int pmu_type(const char *name __maybe_unused, __u32 *type)
+{
+   *type = 0;
+   return 0;
+}
+
+static void pmu_read_sysfs(void)
+{
+}
+
+static struct cpu_map *pmu_cpumask(const char *name __maybe_unused)
+{
+   return NULL;
+}
+#endif /* HAVE_PMU_EVENTS_SUPPORT */
+
 struct perf_event_attr * __weak
 perf_pmu__get_default_config(struct perf_pmu *pmu __maybe_unused)
 {


Re: [bug] crypto/vmx/p8_ghash memory corruption in 4.8-rc7

2016-09-26 Thread Herbert Xu
On Fri, Sep 23, 2016 at 08:22:27PM -0400, Jan Stancek wrote:
>
> This seems to directly correspond with:
>   p8_ghash_alg.descsize = sizeof(struct p8_ghash_desc_ctx) == 56
>   shash_tfm->descsize = sizeof(struct p8_ghash_desc_ctx) + 
> crypto_shash_descsize(fallback) == 56 + 20
> where 20 is presumably coming from "ghash_alg.descsize".
> 
> My gut feeling was that these 2 should match, but I'd love to hear
> what crypto people think.

Indeed.  The vmx driver is broken.  It is allocating a fallback
but is not providing any space for the state of the fallback.

Unfortunately our interface doesn't really provide a way to provide
the state size dynamically.  So what I'd suggest is to fix the
fallback to the generic ghash implementation and export its state
size like we do for md5/sha.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [bug] crypto/vmx/p8_ghash memory corruption in 4.8-rc7

2016-09-26 Thread Marcelo Cerri
Hi Jan,

Just out of curiosity, have you tried to use "76" on both values to
check if the problem still happens?

-- 
Regards,
Marcelo

On Fri, Sep 23, 2016 at 08:22:27PM -0400, Jan Stancek wrote:
> Hi,
> 
> I'm chasing a memory corruption with 4.8-rc7 as I'm observing random Oopses
> on ppc BE/LE systems (lpars, KVM guests). About 30% of issues is that
> module list gets corrupted, and "cat /proc/modules" or "lsmod" triggers
> an Oops, for example:
> 
> [   88.486041] Unable to handle kernel paging request for data at address 
> 0x0020
> ...
> [   88.487658] NIP [c020f820] m_show+0xa0/0x240
> [   88.487689] LR [c020f834] m_show+0xb4/0x240
> [   88.487719] Call Trace:
> [   88.487736] [c004b605bbb0] [c020f834] m_show+0xb4/0x240 
> (unreliable)
> [   88.487796] [c004b605bc50] [c045e73c] seq_read+0x36c/0x520
> [   88.487843] [c004b605bcf0] [c04e1014] proc_reg_read+0x84/0x120
> [   88.487889] [c004b605bd30] [c040df88] vfs_read+0xf8/0x380
> [   88.487934] [c004b605bde0] [c040fd40] SyS_read+0x60/0x110
> [   88.487981] [c004b605be30] [c0009590] system_call+0x38/0xec
> 
> 0x20 offset is module_use->source, module_use is NULL because 
> module.source_list
> gets corrupted.
> 
> The source of corruption appears to originate from a 'ahash' test for 
> p8_ghash:
> 
> cryptomgr_test
>  alg_test
>   alg_test_hash
>test_hash
> __test_hash
>  ahash_partial_update
>   shash_async_export
>memcpy
> 
> With some extra traces [1], I'm seeing that ahash_partial_update() allocates 
> 56 bytes
> for 'state', and then crypto_ahash_export() writes 76 bytes into it:
> 
> [5.970887] __test_hash alg name p8_ghash, result: c4333ac0, key: 
> c004b860a500, req: c004b860a380
> [5.970963] state: c4333f00, statesize: 56
> [5.970995] shash_default_export memcpy c4333f00 c004b860a3e0, 
> len: 76
> 
> This seems to directly correspond with:
>   p8_ghash_alg.descsize = sizeof(struct p8_ghash_desc_ctx) == 56
>   shash_tfm->descsize = sizeof(struct p8_ghash_desc_ctx) + 
> crypto_shash_descsize(fallback) == 56 + 20
> where 20 is presumably coming from "ghash_alg.descsize".
> 
> My gut feeling was that these 2 should match, but I'd love to hear
> what crypto people think.
> 
> Thank you,
> Jan
> 
> [1]
> diff --git a/crypto/shash.c b/crypto/shash.c
> index a051541..49fe182 100644
> --- a/crypto/shash.c
> +++ b/crypto/shash.c
> @@ -188,6 +188,8 @@ EXPORT_SYMBOL_GPL(crypto_shash_digest);
> 
>  static int shash_default_export(struct shash_desc *desc, void *out)
>  {
> +   int len = crypto_shash_descsize(desc->tfm);
> +   printk("shash_default_export memcpy %p %p, len: %d\n", out, 
> shash_desc_ctx(desc), len);
> memcpy(out, shash_desc_ctx(desc), crypto_shash_descsize(desc->tfm));
> return 0;
>  }
> diff --git a/crypto/testmgr.c b/crypto/testmgr.c
> index 5c9d5a5..2e54579 100644
> --- a/crypto/testmgr.c
> +++ b/crypto/testmgr.c
> @@ -218,6 +218,8 @@ static int ahash_partial_update(struct ahash_request 
> **preq,
> pr_err("alt: hash: Failed to alloc state for %s\n", algo);
> goto out_nostate;
> }
> +   printk("state: %p, statesize: %d\n", state, statesize);
> +
> ret = crypto_ahash_export(req, state);
> if (ret) {
> pr_err("alt: hash: Failed to export() for %s\n", algo);
> @@ -288,6 +290,7 @@ static int __test_hash(struct crypto_ahash *tfm, struct 
> hash_testvec *template,
>"%s\n", algo);
> goto out_noreq;
> }
> +   printk("__test_hash alg name %s, result: %p, key: %p, req: %p\n", 
> algo, result, key, req);
> ahash_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
>tcrypt_complete, );
> --
> To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


signature.asc
Description: PGP signature


Re: [PATCH 4/4] drivers/pci/hotplug: Support surprise hotplug

2016-09-26 Thread Bjorn Helgaas
On Mon, Sep 26, 2016 at 11:08:02PM +1000, Gavin Shan wrote:
> On Wed, Sep 21, 2016 at 11:57:03AM -0500, Bjorn Helgaas wrote:
> >Hi Gavin,
> >
> >You don't need my ack for any of these, and I assume you'll merge them
> >through the powerpc tree.
> >
> >Minor comments below, feel free to ignore them.
> >
> >On Wed, Sep 21, 2016 at 10:15:30PM +1000, Gavin Shan wrote:
> >> ...
> >> @@ -536,9 +565,16 @@ static struct pnv_php_slot *pnv_php_alloc_slot(struct 
> >> device_node *dn)
> >>if (unlikely(!php_slot))
> >>return NULL;
> >>  
> >> +  php_slot->event = kzalloc(sizeof(struct pnv_php_event), GFP_KERNEL);
> >> +  if (unlikely(!php_slot->event)) {
> >> +  kfree(php_slot);
> >> +  return NULL;
> >> +  }
> >
> >Since you *always* allocate the event when allocating the php_slot,
> >making the event a member of php_slot (instead of keeping a pointer to
> >it) would simplify your memory management a bit.
> >
> >It seems to be the style in this file to use "unlikely" liberally, but
> >I really doubt there's any performance consideration in this code.  To
> >me it adds more clutter than usefulness.
> >
> >> +static irqreturn_t pnv_php_interrupt(int irq, void *data)
> >> +{
> >> +  struct pnv_php_slot *php_slot = data;
> >> +  struct pci_dev *pchild, *pdev = php_slot->pdev;
> >> +  struct eeh_dev *edev;
> >> +  struct eeh_pe *pe;
> >> +  struct pnv_php_event *event;
> >> +  u16 sts, lsts;
> >> +  u8 presence;
> >> +  bool added;
> >> +  unsigned long flags;
> >> +  int ret;
> >> +
> >> +  pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, );
> >> +  sts &= (PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_DLLSC);
> >> +  pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, sts);
> >
> >I didn't realize that this is some sort of hybrid of native PCIe
> >hotplug and PowerNV-specific stuff.  Wonder if there's any opportunity
> >to combine with or leverage pciehp.  That seems pretty blue-sky
> >though, since there's so much PowerNV special sauce here.
> >
> 
> Bjorn, thanks a lot for your comments. All comments except last one
> (leverage pciehp) are covered in v2 which wasn't copied to linux-pci@
> list to avoid unnecessary traffic. Yeah, the driver is too much PowerNV
> platform specific things, which makes it hard to be built on top of
> pciehp.

Sounds good, thanks!


Re: [PATCH 4/4] drivers/pci/hotplug: Support surprise hotplug

2016-09-26 Thread Gavin Shan
On Wed, Sep 21, 2016 at 11:57:03AM -0500, Bjorn Helgaas wrote:
>Hi Gavin,
>
>You don't need my ack for any of these, and I assume you'll merge them
>through the powerpc tree.
>
>Minor comments below, feel free to ignore them.
>
>On Wed, Sep 21, 2016 at 10:15:30PM +1000, Gavin Shan wrote:
>> ...
>> @@ -536,9 +565,16 @@ static struct pnv_php_slot *pnv_php_alloc_slot(struct 
>> device_node *dn)
>>  if (unlikely(!php_slot))
>>  return NULL;
>>  
>> +php_slot->event = kzalloc(sizeof(struct pnv_php_event), GFP_KERNEL);
>> +if (unlikely(!php_slot->event)) {
>> +kfree(php_slot);
>> +return NULL;
>> +}
>
>Since you *always* allocate the event when allocating the php_slot,
>making the event a member of php_slot (instead of keeping a pointer to
>it) would simplify your memory management a bit.
>
>It seems to be the style in this file to use "unlikely" liberally, but
>I really doubt there's any performance consideration in this code.  To
>me it adds more clutter than usefulness.
>
>> +static irqreturn_t pnv_php_interrupt(int irq, void *data)
>> +{
>> +struct pnv_php_slot *php_slot = data;
>> +struct pci_dev *pchild, *pdev = php_slot->pdev;
>> +struct eeh_dev *edev;
>> +struct eeh_pe *pe;
>> +struct pnv_php_event *event;
>> +u16 sts, lsts;
>> +u8 presence;
>> +bool added;
>> +unsigned long flags;
>> +int ret;
>> +
>> +pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, );
>> +sts &= (PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_DLLSC);
>> +pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, sts);
>
>I didn't realize that this is some sort of hybrid of native PCIe
>hotplug and PowerNV-specific stuff.  Wonder if there's any opportunity
>to combine with or leverage pciehp.  That seems pretty blue-sky
>though, since there's so much PowerNV special sauce here.
>

Bjorn, thanks a lot for your comments. All comments except last one
(leverage pciehp) are covered in v2 which wasn't copied to linux-pci@
list to avoid unnecessary traffic. Yeah, the driver is too much PowerNV
platform specific things, which makes it hard to be built on top of
pciehp.

Thanks,
Gavin



[PATCH v2 2/5] powerpc/eeh: Export eeh_pe_state_mark()

2016-09-26 Thread Gavin Shan
This exports eeh_pe_state_mark(). It will be used to mark the surprise
hot removed PE as isolated to avoid unexpected EEH error reporting in
surprise remove path.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/kernel/eeh_pe.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index f0520da..de7d091 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -581,6 +581,7 @@ void eeh_pe_state_mark(struct eeh_pe *pe, int state)
 {
eeh_pe_traverse(pe, __eeh_pe_state_mark, );
 }
+EXPORT_SYMBOL_GPL(eeh_pe_state_mark);
 
 static void *__eeh_pe_dev_mode_mark(void *data, void *flag)
 {
-- 
2.1.0



[PATCH v2 3/5] powerpc/powernv: Unfreeze PE on allocation

2016-09-26 Thread Gavin Shan
This unfreezes PE when it's initialized because the PE might be put
into frozen state in the last hot remove path. It's not harmful to
do so if the PE is already in unfrozen state.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 38a5c65..841395e 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -133,9 +133,21 @@ static inline bool pnv_pci_is_m64_flags(unsigned long 
resource_flags)
 
 static struct pnv_ioda_pe *pnv_ioda_init_pe(struct pnv_phb *phb, int pe_no)
 {
+   s64 rc;
+
phb->ioda.pe_array[pe_no].phb = phb;
phb->ioda.pe_array[pe_no].pe_number = pe_no;
 
+   /* Clear the PE frozen state as it might be put into frozen state
+* in the last PCI remove path. It's not harmful to do so when the
+* PE is already in unfrozen state.
+*/
+   rc = opal_pci_eeh_freeze_clear(phb->opal_id, pe_no,
+  OPAL_EEH_ACTION_CLEAR_FREEZE_ALL);
+   if (rc != OPAL_SUCCESS)
+   pr_warn("%s: Error %lld unfreezing PHB#%d-PE#%d\n",
+   __func__, rc, phb->hose->global_number, pe_no);
+
return >ioda.pe_array[pe_no];
 }
 
-- 
2.1.0



[PATCH v2 4/5] drivers/pci/hotplug: Remove likely() and unlikely() in powernv driver

2016-09-26 Thread Gavin Shan
This removes likely() and unlikely() in pnv_php.c as the code isn't
running in hot path. Those macros to affect CPU's branch stream don't
help a lot for performance. I used them to identify the cases are
likely or unlikely to happen. No logical changes introduced.

Signed-off-by: Gavin Shan 
---
 drivers/pci/hotplug/pnv_php.c | 56 +--
 1 file changed, 28 insertions(+), 28 deletions(-)

diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
index e6245b0..182f218 100644
--- a/drivers/pci/hotplug/pnv_php.c
+++ b/drivers/pci/hotplug/pnv_php.c
@@ -122,7 +122,7 @@ static void pnv_php_detach_device_nodes(struct device_node 
*parent)
 
of_node_put(dn);
refcount = atomic_read(>kobj.kref.refcount);
-   if (unlikely(refcount != 1))
+   if (refcount != 1)
pr_warn("Invalid refcount %d on <%s>\n",
refcount, of_node_full_name(dn));
 
@@ -184,11 +184,11 @@ static int pnv_php_populate_changeset(struct of_changeset 
*ocs,
 
for_each_child_of_node(dn, child) {
ret = of_changeset_attach_node(ocs, child);
-   if (unlikely(ret))
+   if (ret)
break;
 
ret = pnv_php_populate_changeset(ocs, child);
-   if (unlikely(ret))
+   if (ret)
break;
}
 
@@ -201,7 +201,7 @@ static void *pnv_php_add_one_pdn(struct device_node *dn, 
void *data)
struct pci_dn *pdn;
 
pdn = pci_add_device_node_info(hose, dn);
-   if (unlikely(!pdn))
+   if (!pdn)
return ERR_PTR(-ENOMEM);
 
return NULL;
@@ -224,21 +224,21 @@ static int pnv_php_add_devtree(struct pnv_php_slot 
*php_slot)
 * fits the real size.
 */
fdt1 = kzalloc(0x1, GFP_KERNEL);
-   if (unlikely(!fdt1)) {
+   if (!fdt1) {
ret = -ENOMEM;
dev_warn(_slot->pdev->dev, "Cannot alloc FDT blob\n");
goto out;
}
 
ret = pnv_pci_get_device_tree(php_slot->dn->phandle, fdt1, 0x1);
-   if (unlikely(ret)) {
+   if (ret) {
dev_warn(_slot->pdev->dev, "Error %d getting FDT blob\n",
 ret);
goto free_fdt1;
}
 
fdt = kzalloc(fdt_totalsize(fdt1), GFP_KERNEL);
-   if (unlikely(!fdt)) {
+   if (!fdt) {
ret = -ENOMEM;
dev_warn(_slot->pdev->dev, "Cannot %d bytes memory\n",
 fdt_totalsize(fdt1));
@@ -248,7 +248,7 @@ static int pnv_php_add_devtree(struct pnv_php_slot 
*php_slot)
/* Unflatten device tree blob */
memcpy(fdt, fdt1, fdt_totalsize(fdt1));
dt = of_fdt_unflatten_tree(fdt, php_slot->dn, NULL);
-   if (unlikely(!dt)) {
+   if (!dt) {
ret = -EINVAL;
dev_warn(_slot->pdev->dev, "Cannot unflatten FDT\n");
goto free_fdt;
@@ -258,7 +258,7 @@ static int pnv_php_add_devtree(struct pnv_php_slot 
*php_slot)
of_changeset_init(_slot->ocs);
pnv_php_reverse_nodes(php_slot->dn);
ret = pnv_php_populate_changeset(_slot->ocs, php_slot->dn);
-   if (unlikely(ret)) {
+   if (ret) {
pnv_php_reverse_nodes(php_slot->dn);
dev_warn(_slot->pdev->dev, "Error %d populating 
changeset\n",
 ret);
@@ -267,7 +267,7 @@ static int pnv_php_add_devtree(struct pnv_php_slot 
*php_slot)
 
php_slot->dn->child = NULL;
ret = of_changeset_apply(_slot->ocs);
-   if (unlikely(ret)) {
+   if (ret) {
dev_warn(_slot->pdev->dev, "Error %d applying changeset\n",
 ret);
goto destroy_changeset;
@@ -301,7 +301,7 @@ int pnv_php_set_slot_power_state(struct hotplug_slot *slot,
int ret;
 
ret = pnv_pci_set_power_state(php_slot->id, state, );
-   if (likely(ret > 0)) {
+   if (ret > 0) {
if (be64_to_cpu(msg.params[1]) != php_slot->dn->phandle ||
be64_to_cpu(msg.params[2]) != state ||
be64_to_cpu(msg.params[3]) != OPAL_SUCCESS) {
@@ -311,7 +311,7 @@ int pnv_php_set_slot_power_state(struct hotplug_slot *slot,
 be64_to_cpu(msg.params[3]));
return -ENOMSG;
}
-   } else if (unlikely(ret < 0)) {
+   } else if (ret < 0) {
dev_warn(_slot->pdev->dev, "Error %d powering %s\n",
 ret, (state == OPAL_PCI_SLOT_POWER_ON) ? "on" : "off");
return ret;
@@ -338,7 +338,7 @@ static int pnv_php_get_power_state(struct hotplug_slot 
*slot, u8 *state)
 * be on.
 */
ret = pnv_pci_get_power_state(php_slot->id, _state);
-   if (unlikely(ret)) {
+   if (ret) {
  

[PATCH v2 1/5] powerpc/eeh: Allow to freeze PE in eeh_pe_set_option()

2016-09-26 Thread Gavin Shan
Function eeh_pe_set_option() is used to apply the requested options
(enable, disable, unfreeze) in EEH virtualization path. The semantics
of this function isn't complete until freezing is supported.

This allows to freeze the indicated PE. The new semantics is going to
be used in PCI surprise hot remove path, to freeze removed PCI devices
(PE) to avoid unexpected EEH error reporting.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/kernel/eeh.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 7429556..0699f15 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1502,6 +1502,7 @@ int eeh_pe_set_option(struct eeh_pe *pe, int option)
break;
case EEH_OPT_THAW_MMIO:
case EEH_OPT_THAW_DMA:
+   case EEH_OPT_FREEZE_PE:
if (!eeh_ops || !eeh_ops->set_option) {
ret = -ENOENT;
break;
-- 
2.1.0



[PATCH v2 5/5] drivers/pci/hotplug: Support surprise hotplug in powernv driver

2016-09-26 Thread Gavin Shan
This supports PCI surprise hotplug. The design is highlighted as
below:

   * The PCI slot's surprise hotplug capability is exposed through
 device node property "ibm,slot-surprise-pluggable", meaning
 PCI surprise hotplug will be disabled if skiboot doesn't support
 it yet.
   * The interrupt because of presence or link state change is raised
 on surprise hotplug event. One event is allocated and queued to
 the PCI slot for workqueue to pick it up and process in serialized
 fashion. The code flow for surprise hotplug is same to that for
 managed hotplug except: the affected PEs are put into frozen state
 to avoid unexpected EEH error reporting in surprise hot remove path.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/include/asm/pnv-pci.h |   2 +
 drivers/pci/hotplug/pnv_php.c  | 212 +
 2 files changed, 214 insertions(+)

diff --git a/arch/powerpc/include/asm/pnv-pci.h 
b/arch/powerpc/include/asm/pnv-pci.h
index 0cbd813..17e89dd 100644
--- a/arch/powerpc/include/asm/pnv-pci.h
+++ b/arch/powerpc/include/asm/pnv-pci.h
@@ -60,6 +60,8 @@ struct pnv_php_slot {
 #define PNV_PHP_STATE_POPULATED2
 #define PNV_PHP_STATE_OFFLINE  3
int state;
+   int irq;
+   struct workqueue_struct *wq;
struct device_node  *dn;
struct pci_dev  *pdev;
struct pci_bus  *bus;
diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
index 182f218..ea4ec72d 100644
--- a/drivers/pci/hotplug/pnv_php.c
+++ b/drivers/pci/hotplug/pnv_php.c
@@ -22,6 +22,12 @@
 #define DRIVER_AUTHOR  "Gavin Shan, IBM Corporation"
 #define DRIVER_DESC"PowerPC PowerNV PCI Hotplug Driver"
 
+struct pnv_php_event {
+   booladded;
+   struct pnv_php_slot *php_slot;
+   struct work_struct  work;
+};
+
 static LIST_HEAD(pnv_php_slot_list);
 static DEFINE_SPINLOCK(pnv_php_lock);
 
@@ -29,12 +35,40 @@ static void pnv_php_register(struct device_node *dn);
 static void pnv_php_unregister_one(struct device_node *dn);
 static void pnv_php_unregister(struct device_node *dn);
 
+static void pnv_php_disable_irq(struct pnv_php_slot *php_slot)
+{
+   struct pci_dev *pdev = php_slot->pdev;
+   u16 ctrl;
+
+   if (php_slot->irq > 0) {
+   pcie_capability_read_word(pdev, PCI_EXP_SLTCTL, );
+   ctrl &= ~(PCI_EXP_SLTCTL_HPIE |
+ PCI_EXP_SLTCTL_PDCE |
+ PCI_EXP_SLTCTL_DLLSCE);
+   pcie_capability_write_word(pdev, PCI_EXP_SLTCTL, ctrl);
+
+   free_irq(php_slot->irq, php_slot);
+   php_slot->irq = 0;
+   }
+
+   if (php_slot->wq) {
+   destroy_workqueue(php_slot->wq);
+   php_slot->wq = NULL;
+   }
+
+   if (pdev->msix_enabled)
+   pci_disable_msix(pdev);
+   else if (pdev->msi_enabled)
+   pci_disable_msi(pdev);
+}
+
 static void pnv_php_free_slot(struct kref *kref)
 {
struct pnv_php_slot *php_slot = container_of(kref,
struct pnv_php_slot, kref);
 
WARN_ON(!list_empty(_slot->children));
+   pnv_php_disable_irq(php_slot);
kfree(php_slot->name);
kfree(php_slot);
 }
@@ -609,6 +643,179 @@ static int pnv_php_register_slot(struct pnv_php_slot 
*php_slot)
return 0;
 }
 
+static int pnv_php_enable_msix(struct pnv_php_slot *php_slot)
+{
+   struct pci_dev *pdev = php_slot->pdev;
+   struct msix_entry entry;
+   int nr_entries, ret;
+   u16 pcie_flag;
+
+   /* Get total number of MSIx entries */
+   nr_entries = pci_msix_vec_count(pdev);
+   if (nr_entries < 0)
+   return nr_entries;
+
+   /* Check hotplug MSIx entry is in range */
+   pcie_capability_read_word(pdev, PCI_EXP_FLAGS, _flag);
+   entry.entry = (pcie_flag & PCI_EXP_FLAGS_IRQ) >> 9;
+   if (entry.entry >= nr_entries)
+   return -ERANGE;
+
+   /* Enable MSIx */
+   ret = pci_enable_msix_exact(pdev, , 1);
+   if (ret) {
+   dev_warn(>dev, "Error %d enabling MSIx\n", ret);
+   return ret;
+   }
+
+   return entry.vector;
+}
+
+static void pnv_php_event_handler(struct work_struct *work)
+{
+   struct pnv_php_event *event =
+   container_of(work, struct pnv_php_event, work);
+   struct pnv_php_slot *php_slot = event->php_slot;
+
+   if (event->added)
+   pnv_php_enable_slot(_slot->slot);
+   else
+   pnv_php_disable_slot(_slot->slot);
+
+   kfree(event);
+}
+
+static irqreturn_t pnv_php_interrupt(int irq, void *data)
+{
+   struct pnv_php_slot *php_slot = data;
+   struct pci_dev *pchild, *pdev = php_slot->pdev;
+   struct eeh_dev *edev;
+   struct eeh_pe *pe;
+  

[PATCH v2 0/5] powerpc/powernv: PCI Surprise Hotplug Support

2016-09-26 Thread Gavin Shan
This series of patches supports PCI surprise hotplug on PowerNV platform.
Without the corresponding skiboot patches, this feature won't be enabled
and workable.

   * The skiboot patches can be found in below link (PATCH[01/16):
 https://patchwork.ozlabs.org/project/skiboot/list/?submitter=63923
   * This newly added functionality depends on skiboot's changes. However,
 the functionality is disabled simply when skiboot doesn't support it.
 For one specific slot, property "ibm,slot-surprise-pluggable" of the
 slot's device node is set to 1 when surprise hotplug is claimed by
 skiboot.
   * The interrupts because of presence and link state change are enabled
 in order to support PCI surprise hotplug. The surprise hotplug events
 are queued to the PCI slot and they're picked up for further processing
 in serialized fashion. The surprise and managed hotplug share same code
 flow except: the affected PEs are put into frozen state to avoid unexpected
 EEH error reporting in surprise hot remove path.

PATCH[1/5] and PATCH[2/5] allows to freeze PEs to avoid unexpected EEH error
reporting in PCI surprise hot remove path. PATCH[3/5] clears PE's frozen state
on initializing it because the PE might have been put into frozen state in last
PCI surprise hot remove. PATCH[4/5] removes likely() and unlikely() in pnv_php.c
as they are not too useful. PATCH[5/5] supports PCI surprise hotplug for PowerNV
PCI hotplug driver.

Changelog
=
v2:
   * Add one patch to remove likely() and unlikely() in pnv_php.c.
   * Remove likely() and unlikely() in PATCH[v1 4/4].
   * The event isn't pre-allocated. It's always allocated from slab
 in the interrupt handler. The removed PE is put into frozen state
 before the event is allocated.

Gavin Shan (5):
  powerpc/eeh: Allow to freeze PE in eeh_pe_set_option()
  powerpc/eeh: Export eeh_pe_state_mark()
  powerpc/powernv: Unfreeze PE on allocation
  drivers/pci/hotplug: Remove likely() and unlikely() in powernv driver
  drivers/pci/hotplug: Support surprise hotplug in powernv driver

 arch/powerpc/include/asm/pnv-pci.h|   2 +
 arch/powerpc/kernel/eeh.c |   1 +
 arch/powerpc/kernel/eeh_pe.c  |   1 +
 arch/powerpc/platforms/powernv/pci-ioda.c |  12 ++
 drivers/pci/hotplug/pnv_php.c | 268 ++
 5 files changed, 256 insertions(+), 28 deletions(-)

-- 
2.1.0



Re: [PATCH] i2c_powermac: shut up lockdep warning

2016-09-26 Thread Denis Kirjanov
On Wednesday, September 21, 2016, Denis Kirjanov 
wrote:

> That's unclear why lockdep shows the following warning but adding a
> lockdep class to struct pmac_i2c_bus solves it


HI Ben,

could you give any comments on this?

Thanks!

>
> [   20.507795] ==
> [   20.507796] [ INFO: possible circular locking dependency detected ]
> [   20.507800] 4.8.0-rc7-00037-gd2ffb01 #21 Not tainted
> [   20.507801] ---
> [   20.507803] swapper/0/1 is trying to acquire lock:
> [   20.507818]  (>mutex){+.+.+.}, at: []
> .pmac_i2c_open+0x30/0x100
> [   20.507819]
> [   20.507819] but task is already holding lock:
> [   20.507829]  (>rwsem){+.+.+.}, at: []
> .cpufreq_online+0x1ac/0x9d0
> [   20.507830]
> [   20.507830] which lock already depends on the new lock.
> [   20.507830]
> [   20.507832]
> [   20.507832] the existing dependency chain (in reverse order) is:
> [   20.507837]
> [   20.507837] -> #4 (>rwsem){+.+.+.}:
> [   20.507844][] .down_write+0x6c/0x110
> [   20.507849][] .cpufreq_online+0x1ac/0x9d0
> [   20.507855][] .subsys_interface_register+
> 0xb8/0x110
> [   20.507860][] .cpufreq_register_driver+
> 0x1d0/0x250
> [   20.507866][] .g5_cpufreq_init+0x9cc/0xa28
> [   20.507872][] .do_one_initcall+0x5c/0x1d0
> [   20.507878][] .kernel_init_freeable+0x1ac/
> 0x28c
> [   20.507883][] .kernel_init+0x1c/0x140
> [   20.507887][] .ret_from_kernel_thread+0x58/
> 0x64
> [   20.507894]
> [   20.507894] -> #3 (subsys mutex#2){+.+.+.}:
> [   20.507899][] .mutex_lock_nested+0xa8/0x590
> [   20.507903][] .bus_probe_device+0x44/0xe0
> [   20.507907][] .device_add+0x508/0x730
> [   20.507911][] .register_cpu+0x118/0x190
> [   20.507916][] .topology_init+0x148/0x248
> [   20.507921][] .do_one_initcall+0x5c/0x1d0
> [   20.507925][] .kernel_init_freeable+0x1ac/
> 0x28c
> [   20.507929][] .kernel_init+0x1c/0x140
> [   20.507934][] .ret_from_kernel_thread+0x58/
> 0x64
> [   20.507939]
> [   20.507939] -> #2 (cpu_add_remove_lock){+.+.+.}:
> [   20.507944][] .mutex_lock_nested+0xa8/0x590
> [   20.507950][] .register_cpu_notifier+0x2c/
> 0x70
> [   20.507955][] .spawn_ksoftirqd+0x18/0x4c
> [   20.507959][] .do_one_initcall+0x5c/0x1d0
> [   20.507964][] .kernel_init_freeable+0xb0/
> 0x28c
> [   20.507968][] .kernel_init+0x1c/0x140
> [   20.507972][] .ret_from_kernel_thread+0x58/
> 0x64
> [   20.507978]
> [   20.507978] -> #1 (>mutex){+.+.+.}:
> [   20.507982][] .mutex_lock_nested+0xa8/0x590
> [   20.507987][] .kw_i2c_open+0x18/0x30
> [   20.507991][] .pmac_i2c_open+0x94/0x100
> [   20.507995][] .smp_core99_probe+0x260/0x410
> [   20.507999][] .smp_prepare_cpus+0x280/0x2ac
> [   20.508003][] .kernel_init_freeable+0x88/
> 0x28c
> [   20.508008][] .kernel_init+0x1c/0x140
> [   20.508012][] .ret_from_kernel_thread+0x58/
> 0x64
> [   20.508018]
> [   20.508018] -> #0 (>mutex){+.+.+.}:
> [   20.508023][] .lock_acquire+0x84/0x100
> [   20.508027][] .mutex_lock_nested+0xa8/0x590
> [   20.508032][] .pmac_i2c_open+0x30/0x100
> [   20.508037][] .pmac_i2c_do_begin+0x34/0x120
> [   20.508040][] .pmf_call_one+0x50/0xd0
> [   20.508045][] .g5_pfunc_switch_volt+0x2c/0xc0
> [   20.508050][] .g5_pfunc_switch_freq+0x1cc/
> 0x1f0
> [   20.508054][] .g5_cpufreq_target+0x2c/0x40
> [   20.508058][] .__cpufreq_driver_target+
> 0x23c/0x840
> [   20.508062][] .cpufreq_gov_performance_
> limits+0x18/0x30
> [   20.508067][] .cpufreq_start_governor+0xac/
> 0x100
> [   20.508071][] .cpufreq_set_policy+0x208/0x260
> [   20.508076][] .cpufreq_init_policy+0x6c/0xb0
> [   20.508081][] .cpufreq_online+0x250/0x9d0
> [   20.508085][] .subsys_interface_register+
> 0xb8/0x110
> [   20.508090][] .cpufreq_register_driver+
> 0x1d0/0x250
> [   20.508094][] .g5_cpufreq_init+0x9cc/0xa28
> [   20.508099][] .do_one_initcall+0x5c/0x1d0
> [   20.508103][] .kernel_init_freeable+0x1ac/
> 0x28c
> [   20.508107][] .kernel_init+0x1c/0x140
> [   20.508112][] .ret_from_kernel_thread+0x58/
> 0x64
> [   20.508113]
> [   20.508113] other info that might help us debug this:
> [   20.508113]
> [   20.508121] Chain exists of:
> [   20.508121]   >mutex --> subsys mutex#2 --> >rwsem
> [   20.508121]
> [   20.508123]  Possible unsafe locking scenario:
> [   20.508123]
> [   20.508124]CPU0CPU1
> [   20.508125]
> [   20.508128]   lock(>rwsem);
> [   20.508132]lock(subsys mutex#2);
> [   20.508135]lock(>rwsem);
> [   20.508138]  

Re: ehea crash on boot

2016-09-26 Thread Denis Kirjanov
On Monday, September 26, 2016, Mathieu Malaterre <
mathieu.malate...@gmail.com> wrote:

> On Fri, Sep 23, 2016 at 2:50 PM, Denis Kirjanov  > wrote:
> > Heh, another thing to debug :)
> >
> > mm: Hashing failure ! EA=0xd80080124040 access=0x800e
> > current=NetworkManager
> > trap=0x300 vsid=0x13d349c ssize=1 base psize=2 psize 2
> pte=0xc0003bc0300301ae
> > mm: Hashing failure ! EA=0xd80080124040 access=0x800e
> > current=NetworkManager
> > trap=0x300 vsid=0x13d349c ssize=1 base psize=2 psize 2
> pte=0xc0003bc0300301ae
> > Unable to handle kernel paging request for data at address
> 0xd80080124040
> > Faulting instruction address: 0xc06f21a0
> > cpu 0x8: Vector: 300 (Data Access) at [c005a8b92b50]
> > pc: c06f21a0: .ehea_create_cq+0x160/0x230
> > lr: c06f2164: .ehea_create_cq+0x124/0x230
> > sp: c005a8b92dd0
> > msr: 80009032
> > dar: d80080124040
> > dsisr: 4200
> > current = 0xc005a8b68200
> > paca = 0xcea94000 softe: 0 irq_happened: 0x01
> > pid = 6787, comm = NetworkManager
> > Linux version 4.8.0-rc6-00214-g4cea877 (kda@ps700) (gcc version 4.8.5
> > 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Fri Sep 23 15:01:08 MSK 2016
> > enter ? for help
> > [c005a8b92dd0] c06f2140 .ehea_create_cq+0x100/0x230
> (unreliable)
> > [c005a8b92e70] c06ed448 .ehea_up+0x288/0xed0
> > [c005a8b92fe0] c06ee314 .ehea_open+0x44/0x130
> > [c005a8b93070] c0812324 .__dev_open+0x154/0x220
> > [c005a8b93110] c0812734 .__dev_change_flags+0xd4/0x1e0
> > [c005a8b931b0] c081286c .dev_change_flags+0x2c/0x80
> > [c005a8b93240] c0829f0c .do_setlink+0x37c/0xe50
> > [c005a8b933c0] c082c884 .rtnl_newlink+0x5e4/0x9b0
> > [c005a8b936d0] c082cd08 .rtnetlink_rcv_msg+0xb8/0x2f0
> > [c005a8b937a0] c084e25c .netlink_rcv_skb+0x12c/0x150
> > [c005a8b93830] c0829458 .rtnetlink_rcv+0x38/0x60
> > [c005a8b938b0] c084d814 .netlink_unicast+0x1e4/0x350
> > [c005a8b93960] c084def8 .netlink_sendmsg+0x418/0x480
> > [c005a8b93a40] c07defac .sock_sendmsg+0x2c/0x60
> > [c005a8b93ab0] c07e0cbc .___sys_sendmsg+0x30c/0x320
> > [c005a8b93c90] c07e21bc .__sys_sendmsg+0x4c/0xb0
> > [c005a8b93d80] c07e2dec .SyS_socketcall+0x34c/0x3d0
> > [c005a8b93e30] c000946c system_call+0x38/0x108
>
> Can you turn UBSAN on for this ?


I'll get back to the problem and send a fix when I'll finish my trip


> --
> Mathieu
>


Re: [RFC PATCH] powerpc/mm: THP page cache support

2016-09-26 Thread Kirill A. Shutemov
On Thu, Sep 22, 2016 at 09:32:40PM +0530, Aneesh Kumar K.V wrote:
> Update arch hook in the generic THP page cache code, that will
> deposit and withdarw preallocated page table. Archs like ppc64 use
> this preallocated table to store the hash pte slot information.
> 
> This is an RFC patch and I am sharing this early to get feedback on the
> approach taken. I have used stress-ng mmap-file operation and that
> resulted in some thp_file_mmap as show below.
> 
> [/mnt/stress]$ grep thp_file /proc/vmstat
> thp_file_alloc 25403
> thp_file_mapped 16967
> [/mnt/stress]$
> 
> I did observe wrong nr_ptes count once. I need to recreate the problem
> again.

I don't see anything that could cause that.

The patch looks good to me (apart from nr_ptes issue). Few minor nitpicks
below.

> Signed-off-by: Aneesh Kumar K.V 
> ---
>  arch/powerpc/include/asm/book3s/64/pgtable.h |  3 ++
>  include/asm-generic/pgtable.h|  8 +++-
>  mm/Kconfig   |  6 +--
>  mm/huge_memory.c | 19 +-
>  mm/khugepaged.c  | 21 ++-
>  mm/memory.c  | 56 
> +++-
>  6 files changed, 93 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
> b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index 263bf39ced40..1f45b06ce78e 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -1017,6 +1017,9 @@ static inline int pmd_move_must_withdraw(struct 
> spinlock *new_pmd_ptl,
>*/
>   return true;
>  }
> +
> +#define arch_needs_pgtable_deposit() (true)
> +
>  #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>  #endif /* __ASSEMBLY__ */
>  #endif /* _ASM_POWERPC_BOOK3S_64_PGTABLE_H_ */
> diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
> index d4458b6dbfb4..0d1e400e82a2 100644
> --- a/include/asm-generic/pgtable.h
> +++ b/include/asm-generic/pgtable.h
> @@ -660,11 +660,17 @@ static inline int pmd_move_must_withdraw(spinlock_t 
> *new_pmd_ptl,
>   /*
>* With split pmd lock we also need to move preallocated
>* PTE page table if new_pmd is on different PMD page table.
> +  *
> +  * We also don't deposit and withdraw tables for file pages.
>*/
> - return new_pmd_ptl != old_pmd_ptl;
> + return (new_pmd_ptl != old_pmd_ptl) && vma_is_anonymous(vma);
>  }
>  #endif
>  
> +#ifndef arch_needs_pgtable_deposit
> +#define arch_needs_pgtable_deposit() (false)
> +#endif
> +
>  /*
>   * This function is meant to be used by sites walking pagetables with
>   * the mmap_sem hold in read mode to protect against MADV_DONTNEED and
> diff --git a/mm/Kconfig b/mm/Kconfig
> index be0ee11fa0d9..0a279d399722 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -447,13 +447,9 @@ choice
> benefit.
>  endchoice
>  
> -#
> -# We don't deposit page tables on file THP mapping,
> -# but Power makes use of them to address MMU quirk.
> -#
>  config   TRANSPARENT_HUGE_PAGECACHE
>   def_bool y
> - depends on TRANSPARENT_HUGEPAGE && !PPC
> + depends on TRANSPARENT_HUGEPAGE
>  
>  #
>  # UP and nommu archs use km based percpu allocator
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index a6abd76baa72..37176f455d16 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1320,6 +1320,14 @@ out_unlocked:
>   return ret;
>  }
>  
> +void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd)

static?

> +{
> + pgtable_t pgtable;
> + pgtable = pgtable_trans_huge_withdraw(mm, pmd);
> + pte_free(mm, pgtable);
> + atomic_long_dec(>nr_ptes);
> +}
> +
>  int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>pmd_t *pmd, unsigned long addr)
>  {
> @@ -1359,6 +1367,8 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct 
> vm_area_struct *vma,
>   atomic_long_dec(>mm->nr_ptes);
>   add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
>   } else {
> + if (arch_needs_pgtable_deposit())

Just hide the arch_needs_pgtable_deposit() check in zap_deposited_table().

> + zap_deposited_table(tlb->mm, pmd);
>   add_mm_counter(tlb->mm, MM_FILEPAGES, -HPAGE_PMD_NR);
>   }
>   spin_unlock(ptl);
-- 
 Kirill A. Shutemov


Re: [PATCH 2/3] bpf powerpc: implement support for tail calls

2016-09-26 Thread Naveen N. Rao
On 2016/09/26 11:00AM, Daniel Borkmann wrote:
> On 09/26/2016 10:56 AM, Naveen N. Rao wrote:
> > On 2016/09/24 03:30AM, Alexei Starovoitov wrote:
> > > On Sat, Sep 24, 2016 at 12:33:54AM +0200, Daniel Borkmann wrote:
> > > > On 09/23/2016 10:35 PM, Naveen N. Rao wrote:
> > > > > Tail calls allow JIT'ed eBPF programs to call into other JIT'ed eBPF
> > > > > programs. This can be achieved either by:
> > > > > (1) retaining the stack setup by the first eBPF program and having all
> > > > > subsequent eBPF programs re-using it, or,
> > > > > (2) by unwinding/tearing down the stack and having each eBPF program
> > > > > deal with its own stack as it sees fit.
> > > > > 
> > > > > To ensure that this does not create loops, there is a limit to how 
> > > > > many
> > > > > tail calls can be done (currently 32). This requires the JIT'ed code 
> > > > > to
> > > > > maintain a count of the number of tail calls done so far.
> > > > > 
> > > > > Approach (1) is simple, but requires every eBPF program to have 
> > > > > (almost)
> > > > > the same prologue/epilogue, regardless of whether they need it. This 
> > > > > is
> > > > > inefficient for small eBPF programs which may not sometimes need a
> > > > > prologue at all. As such, to minimize impact of tail call
> > > > > implementation, we use approach (2) here which needs each eBPF program
> > > > > in the chain to use its own prologue/epilogue. This is not ideal when
> > > > > many tail calls are involved and when all the eBPF programs in the 
> > > > > chain
> > > > > have similar prologue/epilogue. However, the impact is restricted to
> > > > > programs that do tail calls. Individual eBPF programs are not 
> > > > > affected.
> > > > > 
> > > > > We maintain the tail call count in a fixed location on the stack and
> > > > > updated tail call count values are passed in through this. The very
> > > > > first eBPF program in a chain sets this up to 0 (the first 2
> > > > > instructions). Subsequent tail calls skip the first two eBPF JIT
> > > > > instructions to maintain the count. For programs that don't do tail
> > > > > calls themselves, the first two instructions are NOPs.
> > > > > 
> > > > > Signed-off-by: Naveen N. Rao 
> > > > 
> > > > Thanks for adding support, Naveen, that's really great! I think 2) seems
> > > > fine as well in this context as prologue size can vary quite a bit here,
> > > > and depending on program types likelihood of tail call usage as well 
> > > > (but
> > > > I wouldn't expect deep nesting). Thanks a lot!
> > > 
> > > Great stuff. In this circumstances approach 2 makes sense to me as well.
> > 
> > Alexie, Daniel,
> > Thanks for the quick review!
> 
> The patches would go via Michael's tree (same way as with the JIT itself
> in the past), right?

Yes, this set is contained within arch/powerpc, so Michael can take this 
through his tree.

The other set with updates to samples/bpf can probably go through 
David's tree.

- Naveen



Re: [PATCH 2/3] bpf powerpc: implement support for tail calls

2016-09-26 Thread Daniel Borkmann

On 09/26/2016 10:56 AM, Naveen N. Rao wrote:

On 2016/09/24 03:30AM, Alexei Starovoitov wrote:

On Sat, Sep 24, 2016 at 12:33:54AM +0200, Daniel Borkmann wrote:

On 09/23/2016 10:35 PM, Naveen N. Rao wrote:

Tail calls allow JIT'ed eBPF programs to call into other JIT'ed eBPF
programs. This can be achieved either by:
(1) retaining the stack setup by the first eBPF program and having all
subsequent eBPF programs re-using it, or,
(2) by unwinding/tearing down the stack and having each eBPF program
deal with its own stack as it sees fit.

To ensure that this does not create loops, there is a limit to how many
tail calls can be done (currently 32). This requires the JIT'ed code to
maintain a count of the number of tail calls done so far.

Approach (1) is simple, but requires every eBPF program to have (almost)
the same prologue/epilogue, regardless of whether they need it. This is
inefficient for small eBPF programs which may not sometimes need a
prologue at all. As such, to minimize impact of tail call
implementation, we use approach (2) here which needs each eBPF program
in the chain to use its own prologue/epilogue. This is not ideal when
many tail calls are involved and when all the eBPF programs in the chain
have similar prologue/epilogue. However, the impact is restricted to
programs that do tail calls. Individual eBPF programs are not affected.

We maintain the tail call count in a fixed location on the stack and
updated tail call count values are passed in through this. The very
first eBPF program in a chain sets this up to 0 (the first 2
instructions). Subsequent tail calls skip the first two eBPF JIT
instructions to maintain the count. For programs that don't do tail
calls themselves, the first two instructions are NOPs.

Signed-off-by: Naveen N. Rao 


Thanks for adding support, Naveen, that's really great! I think 2) seems
fine as well in this context as prologue size can vary quite a bit here,
and depending on program types likelihood of tail call usage as well (but
I wouldn't expect deep nesting). Thanks a lot!


Great stuff. In this circumstances approach 2 makes sense to me as well.


Alexie, Daniel,
Thanks for the quick review!


The patches would go via Michael's tree (same way as with the JIT itself
in the past), right?


Re: [PATCH 2/3] bpf powerpc: implement support for tail calls

2016-09-26 Thread Naveen N. Rao
On 2016/09/24 03:30AM, Alexei Starovoitov wrote:
> On Sat, Sep 24, 2016 at 12:33:54AM +0200, Daniel Borkmann wrote:
> > On 09/23/2016 10:35 PM, Naveen N. Rao wrote:
> > >Tail calls allow JIT'ed eBPF programs to call into other JIT'ed eBPF
> > >programs. This can be achieved either by:
> > >(1) retaining the stack setup by the first eBPF program and having all
> > >subsequent eBPF programs re-using it, or,
> > >(2) by unwinding/tearing down the stack and having each eBPF program
> > >deal with its own stack as it sees fit.
> > >
> > >To ensure that this does not create loops, there is a limit to how many
> > >tail calls can be done (currently 32). This requires the JIT'ed code to
> > >maintain a count of the number of tail calls done so far.
> > >
> > >Approach (1) is simple, but requires every eBPF program to have (almost)
> > >the same prologue/epilogue, regardless of whether they need it. This is
> > >inefficient for small eBPF programs which may not sometimes need a
> > >prologue at all. As such, to minimize impact of tail call
> > >implementation, we use approach (2) here which needs each eBPF program
> > >in the chain to use its own prologue/epilogue. This is not ideal when
> > >many tail calls are involved and when all the eBPF programs in the chain
> > >have similar prologue/epilogue. However, the impact is restricted to
> > >programs that do tail calls. Individual eBPF programs are not affected.
> > >
> > >We maintain the tail call count in a fixed location on the stack and
> > >updated tail call count values are passed in through this. The very
> > >first eBPF program in a chain sets this up to 0 (the first 2
> > >instructions). Subsequent tail calls skip the first two eBPF JIT
> > >instructions to maintain the count. For programs that don't do tail
> > >calls themselves, the first two instructions are NOPs.
> > >
> > >Signed-off-by: Naveen N. Rao 
> > 
> > Thanks for adding support, Naveen, that's really great! I think 2) seems
> > fine as well in this context as prologue size can vary quite a bit here,
> > and depending on program types likelihood of tail call usage as well (but
> > I wouldn't expect deep nesting). Thanks a lot!
> 
> Great stuff. In this circumstances approach 2 makes sense to me as well.

Alexie, Daniel,
Thanks for the quick review!

- Naveen



Re: powerpc64: Enable CONFIG_E500 and CONFIG_PPC_E500MC for e5500/e6500

2016-09-26 Thread David Engraf

Am 25.09.2016 um 08:20 schrieb Scott Wood:

On Mon, Aug 22, 2016 at 04:46:43PM +0200, David Engraf wrote:

The PowerPC e5500/e6500 architecture is based on the e500mc core. Enable
CONFIG_E500 and CONFIG_PPC_E500MC when e5500/e6500 is used.

This will also fix using CONFIG_PPC_QEMU_E500 on PPC64.

Signed-off-by: David Engraf 
---
 arch/powerpc/platforms/Kconfig.cputype | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index f32edec..0382da7 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -125,11 +125,13 @@ config POWER8_CPU

 config E5500_CPU
bool "Freescale e5500"
-   depends on E500
+   select E500
+   select PPC_E500MC

 config E6500_CPU
bool "Freescale e6500"
-   depends on E500
+   select E500
+   select PPC_E500MC


These config symbols are for setting -mcpu.  Kernels built with
CONFIG_GENERIC_CPU should also work on e5500/e6500.


I don't think so. At least on QEMU it is not working because e5500/e6500 
is based on the e500mc core and the option CONFIG_PPC_E500MC also 
controls the cpu features (check cputable.h).



The problem is that CONFIG_PPC_QEMU_E500 doesn't select E500 (I didn't
notice it before because usually CORENET_GENERIC is enabled as well).


I noticed that as well, but I think it makes more sense to select 
E500/PPC_E500MC within the cputype menu instead of having a dependency 
which might be not clear for the user. Right now the way how to 
configure such a BSP is not clear, you need to open "Processor support" 
and select the "Processor Type", then switch to "Platform support" to 
select the BSP and afterward got back to "Processor support" to switch 
from the generic CPU type to e5500/e6500.



Note that your patch, by eliminating the dependency on E500, would make
it possible to build a book3s kernel with E5500_CPU/E6500_CPU, which
doesn't make any sense.


You're right. The attached version fixes this.

- David
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index f32edec..abd345e 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -125,11 +125,15 @@ config POWER8_CPU
 
 config E5500_CPU
 	bool "Freescale e5500"
-	depends on E500
+	depends on PPC_BOOK3E_64
+	select E500
+	select PPC_E500MC
 
 config E6500_CPU
 	bool "Freescale e6500"
-	depends on E500
+	depends on PPC_BOOK3E_64
+	select E500
+	select PPC_E500MC
 
 endchoice
 


Re: [PATCH v21 00/20] perf, tools: Add support for PMU events in JSON format

2016-09-26 Thread Jiri Olsa
On Thu, Sep 22, 2016 at 06:27:13PM +0200, Jiri Olsa wrote:
> On Thu, Sep 22, 2016 at 05:00:22PM +0200, Jiri Olsa wrote:
> > On Mon, Sep 19, 2016 at 09:28:20PM -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Mon, Sep 19, 2016 at 09:02:58PM -0300, Arnaldo Carvalho de Melo 
> > > escreveu:
> > > > Em Mon, Sep 19, 2016 at 08:37:53PM -0300, Arnaldo Carvalho de Melo 
> > > > escreveu:
> > > > > yeah, changing that typedef + true def to plain include 
> > > > > makes it progress to the next failure, which is in cross compilation
> > > > > environments, such as using fedora 24 + the Android NDK to try to 
> > > > > build
> > > > > a ARM android binary.
> > > 
> > > > 14 fedora:24-x-ARC-uClibc: FAIL
> > > >   GEN  /tmp/build/perf/pmu-events/pmu-events.c
> > > > /bin/sh: /tmp/build/perf/pmu-events/jevents: cannot execute binary 
> > > > file: Exec format error
> > > > pmu-events/Build:11: recipe for target 
> > > > '/tmp/build/perf/pmu-events/pmu-events.c' failed
> > > > make[2]: *** [/tmp/build/perf/pmu-events/pmu-events.c] Error 126
> > > > Makefile.perf:461: recipe for target 
> > > > '/tmp/build/perf/pmu-events/pmu-events-in.o' failed
> > > > make[1]: *** [/tmp/build/perf/pmu-events/pmu-events-in.o] Error 2
> > > > make[1]: *** Waiting for unfinished jobs
> > > 
> > > Jiri, we need something similar to scripts/Makefile.host :-\
> > > 
> > > Calling it a day, perhaps, for now, we should just detect that it is a
> > > corss compile env (CROSS_COMPILE is set) and exclude all this code from
> > > the build, emitting a warning.
> > > 
> > > I left what I did at the tmp.perf/core branch of my repo at
> > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git.
> > 
> > as discussed on irc we will disable it for cross builds now,
> > because we dont have good solution at the moment.. it's
> > similar case as for fixdep tool:
> > 
> > 3a70fcd3a4db tools build: Fix cross compile build
> > ...
> > We need to add support for host side tools build, meanwhile
> > disabling fixdep usage for cross arch builds.
> > 
> > I'll make a change to disable this for crossbuild and
> > work on common solution later
> 
> could you please give it a try with patch below?
> I tested but not with properly cross building...
> 
> also,  did you want some message during the cross build that pmu-events are 
> not included?

ping.. is that working for you? IMO we can include this
as additional patch to the set..

thanks,
jirka


Re: ehea crash on boot

2016-09-26 Thread Mathieu Malaterre
On Fri, Sep 23, 2016 at 2:50 PM, Denis Kirjanov  wrote:
> Heh, another thing to debug :)
>
> mm: Hashing failure ! EA=0xd80080124040 access=0x800e
> current=NetworkManager
> trap=0x300 vsid=0x13d349c ssize=1 base psize=2 psize 2 pte=0xc0003bc0300301ae
> mm: Hashing failure ! EA=0xd80080124040 access=0x800e
> current=NetworkManager
> trap=0x300 vsid=0x13d349c ssize=1 base psize=2 psize 2 pte=0xc0003bc0300301ae
> Unable to handle kernel paging request for data at address 0xd80080124040
> Faulting instruction address: 0xc06f21a0
> cpu 0x8: Vector: 300 (Data Access) at [c005a8b92b50]
> pc: c06f21a0: .ehea_create_cq+0x160/0x230
> lr: c06f2164: .ehea_create_cq+0x124/0x230
> sp: c005a8b92dd0
> msr: 80009032
> dar: d80080124040
> dsisr: 4200
> current = 0xc005a8b68200
> paca = 0xcea94000 softe: 0 irq_happened: 0x01
> pid = 6787, comm = NetworkManager
> Linux version 4.8.0-rc6-00214-g4cea877 (kda@ps700) (gcc version 4.8.5
> 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Fri Sep 23 15:01:08 MSK 2016
> enter ? for help
> [c005a8b92dd0] c06f2140 .ehea_create_cq+0x100/0x230 (unreliable)
> [c005a8b92e70] c06ed448 .ehea_up+0x288/0xed0
> [c005a8b92fe0] c06ee314 .ehea_open+0x44/0x130
> [c005a8b93070] c0812324 .__dev_open+0x154/0x220
> [c005a8b93110] c0812734 .__dev_change_flags+0xd4/0x1e0
> [c005a8b931b0] c081286c .dev_change_flags+0x2c/0x80
> [c005a8b93240] c0829f0c .do_setlink+0x37c/0xe50
> [c005a8b933c0] c082c884 .rtnl_newlink+0x5e4/0x9b0
> [c005a8b936d0] c082cd08 .rtnetlink_rcv_msg+0xb8/0x2f0
> [c005a8b937a0] c084e25c .netlink_rcv_skb+0x12c/0x150
> [c005a8b93830] c0829458 .rtnetlink_rcv+0x38/0x60
> [c005a8b938b0] c084d814 .netlink_unicast+0x1e4/0x350
> [c005a8b93960] c084def8 .netlink_sendmsg+0x418/0x480
> [c005a8b93a40] c07defac .sock_sendmsg+0x2c/0x60
> [c005a8b93ab0] c07e0cbc .___sys_sendmsg+0x30c/0x320
> [c005a8b93c90] c07e21bc .__sys_sendmsg+0x4c/0xb0
> [c005a8b93d80] c07e2dec .SyS_socketcall+0x34c/0x3d0
> [c005a8b93e30] c000946c system_call+0x38/0x108

Can you turn UBSAN on for this ?

-- 
Mathieu