Re: [PATCH] soc: fsl/qe: fix Oops on CPM1 (and likely CPM2)

2016-08-11 Thread Christophe Leroy



Le 12/08/2016 à 01:29, Scott Wood a écrit :

On Mon, 2016-08-08 at 18:08 +0200, Christophe Leroy wrote:

Commit 0e6e01ff694ee ("CPM/QE: use genalloc to manage CPM/QE muram")
has changed the way muram is managed.
genalloc uses kmalloc(), hence requires the SLAB to be up and running.

On powerpc 8xx, cpm_reset() is called early during startup.
cpm_reset() then calls cpm_muram_init() before SLAB is available,
hence the following Oops.

cpm_reset() cannot be called during initcalls because the CPM is
needed for console

This patch splits cpm_muram_init() in two parts. The first part,
related to mappings, is kept as cpm_muram_init()
The second part is named cpm_muram_pool_init() and is called
the first time cpm_muram_alloc() is used


Why do you need to split it, versus calling the full cpm_muram_init() on
demand?



There are drivers like for instance de i2c-cpm driver, that for instance 
call cpm_muram_addr() before calling cpm_muram_alloc()

Therefore, we need muram_vbase and muram_pbase set.

So if we want to keep a single function, it means we also have to call 
it on demand from cpm_muram_addr(), cpm_muram_offset() and cpm_muram_dma().


Is that what you recommend ?

Christophe


Re: [PATCH kernel 14/15] vfio/spapr_tce: Export container API for external users

2016-08-11 Thread Alexey Kardashevskiy
On 12/08/16 15:46, David Gibson wrote:
> On Wed, Aug 10, 2016 at 10:46:30AM -0600, Alex Williamson wrote:
>> On Wed, 10 Aug 2016 15:37:17 +1000
>> Alexey Kardashevskiy  wrote:
>>
>>> On 09/08/16 22:16, Alex Williamson wrote:
 On Tue, 9 Aug 2016 15:19:39 +1000
 Alexey Kardashevskiy  wrote:
   
> On 09/08/16 02:43, Alex Williamson wrote:  
>> On Wed,  3 Aug 2016 18:40:55 +1000
>> Alexey Kardashevskiy  wrote:
>> 
>>> This exports helpers which are needed to keep a VFIO container in
>>> memory while there are external users such as KVM.
>>>
>>> Signed-off-by: Alexey Kardashevskiy 
>>> ---
>>>  drivers/vfio/vfio.c | 30 ++
>>>  drivers/vfio/vfio_iommu_spapr_tce.c | 16 +++-
>>>  include/linux/vfio.h|  6 ++
>>>  3 files changed, 51 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
>>> index d1d70e0..baf6a9c 100644
>>> --- a/drivers/vfio/vfio.c
>>> +++ b/drivers/vfio/vfio.c
>>> @@ -1729,6 +1729,36 @@ long vfio_external_check_extension(struct 
>>> vfio_group *group, unsigned long arg)
>>>  EXPORT_SYMBOL_GPL(vfio_external_check_extension);
>>>  
>>>  /**
>>> + * External user API for containers, exported by symbols to be linked
>>> + * dynamically.
>>> + *
>>> + */
>>> +struct vfio_container *vfio_container_get_ext(struct file *filep)
>>> +{
>>> +   struct vfio_container *container = filep->private_data;
>>> +
>>> +   if (filep->f_op != &vfio_fops)
>>> +   return ERR_PTR(-EINVAL);
>>> +
>>> +   vfio_container_get(container);
>>> +
>>> +   return container;
>>> +}
>>> +EXPORT_SYMBOL_GPL(vfio_container_get_ext);
>>> +
>>> +void vfio_container_put_ext(struct vfio_container *container)
>>> +{
>>> +   vfio_container_put(container);
>>> +}
>>> +EXPORT_SYMBOL_GPL(vfio_container_put_ext);
>>> +
>>> +void *vfio_container_get_iommu_data_ext(struct vfio_container 
>>> *container)
>>> +{
>>> +   return container->iommu_data;
>>> +}
>>> +EXPORT_SYMBOL_GPL(vfio_container_get_iommu_data_ext);
>>> +
>>> +/**
>>>   * Sub-module support
>>>   */
>>>  /*
>>> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
>>> b/drivers/vfio/vfio_iommu_spapr_tce.c
>>> index 3594ad3..fceea3d 100644
>>> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
>>> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
>>> @@ -1331,6 +1331,21 @@ const struct vfio_iommu_driver_ops 
>>> tce_iommu_driver_ops = {
>>> .detach_group   = tce_iommu_detach_group,
>>>  };
>>>  
>>> +struct iommu_table *vfio_container_spapr_tce_table_get_ext(void 
>>> *iommu_data,
>>> +   u64 offset)
>>> +{
>>> +   struct tce_container *container = iommu_data;
>>> +   struct iommu_table *tbl = NULL;
>>> +
>>> +   if (tce_iommu_find_table(container, offset, &tbl) < 0)
>>> +   return NULL;
>>> +
>>> +   iommu_table_get(tbl);
>>> +
>>> +   return tbl;
>>> +}
>>> +EXPORT_SYMBOL_GPL(vfio_container_spapr_tce_table_get_ext);
>>> +
>>>  static int __init tce_iommu_init(void)
>>>  {
>>> return vfio_register_iommu_driver(&tce_iommu_driver_ops);
>>> @@ -1348,4 +1363,3 @@ MODULE_VERSION(DRIVER_VERSION);
>>>  MODULE_LICENSE("GPL v2");
>>>  MODULE_AUTHOR(DRIVER_AUTHOR);
>>>  MODULE_DESCRIPTION(DRIVER_DESC);
>>> -
>>> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
>>> index 0ecae0b..1c2138a 100644
>>> --- a/include/linux/vfio.h
>>> +++ b/include/linux/vfio.h
>>> @@ -91,6 +91,12 @@ extern void vfio_group_put_external_user(struct 
>>> vfio_group *group);
>>>  extern int vfio_external_user_iommu_id(struct vfio_group *group);
>>>  extern long vfio_external_check_extension(struct vfio_group *group,
>>>   unsigned long arg);
>>> +extern struct vfio_container *vfio_container_get_ext(struct file 
>>> *filep);
>>> +extern void vfio_container_put_ext(struct vfio_container *container);
>>> +extern void *vfio_container_get_iommu_data_ext(
>>> +   struct vfio_container *container);
>>> +extern struct iommu_table *vfio_container_spapr_tce_table_get_ext(
>>> +   void *iommu_data, u64 offset);
>>>  
>>>  /*
>>>   * Sub-module helpers
>>
>>
>> I think you need to take a closer look of the lifecycle of a container,
>> having a reference means the container itself won't go away, but only
>> having a group set within that container holds the actual IOMMU
>> references.  container->iommu_data is going to be NULL once the
>> groups are lost.  Thanks,
>
>
> Container own

Re: [PATCH v4] powerpc/pseries: use pci_host_bridge.release_fn() to kfree(phb)

2016-08-11 Thread Andrew Donnellan

On 12/08/16 06:25, Mauricio Faria de Oliveira wrote:

This patch leverages 'struct pci_host_bridge' from the PCI subsystem
in order to free the pci_controller only after the last reference to
its devices is dropped (avoiding an oops in pcibios_release_device()
if the last reference is dropped after pcibios_free_controller()).

The patch relies on pci_host_bridge.release_fn() (and .release_data),
which is called automatically by the PCI subsystem when the root bus
is released (i.e., the last reference is dropped).  Those fields are
set via pci_set_host_bridge_release() (e.g. in the platform-specific
implementation of pcibios_root_bridge_prepare()).

It introduces the 'pcibios_free_controller_deferred()' .release_fn()
and it expects .release_data to hold a pointer to the pci_controller.

The function implictly calls 'pcibios_free_controller()', so an user
must *NOT* explicitly call it if using the new _deferred() callback.

The functionality is enabled for pseries (although it isn't platform
specific, and may be used by cxl).

Details on not-so-elegant design choices:

 - Use 'pci_host_bridge.release_data' field as pointer to associated
   'struct pci_controller' so *not* to 'pci_bus_to_host(bridge->bus)'
   in pcibios_free_controller_deferred().

   That's because pci_remove_root_bus() sets 'host_bridge->bus = NULL'
   (so, if the last reference is released after pci_remove_root_bus()
   runs, which eventually reaches pcibios_free_controller_deferred(),
   that would hit a null pointer dereference).

   The cxl/vphb.c code calls pci_remove_root_bus(), and the cxl folks
   are interested in this fix.

Test-case #1 (hold references)

  # ls -ld /sys/block/sd* | grep -m1 0021:01:00.0
  <...> /sys/block/sdaa -> ../devices/pci0021:01/0021:01:00.0/<...>

  # ls -ld /sys/block/sd* | grep -m1 0021:01:00.1
  <...> /sys/block/sdab -> ../devices/pci0021:01/0021:01:00.1/<...>

  # cat >/dev/sdaa & pid1=$!
  # cat >/dev/sdab & pid2=$!

  # drmgr -w 5 -d 1 -c phb -s 'PHB 33' -r
  Validating PHB DLPAR capability...yes.
  [  594.306719] pci_hp_remove_devices: PCI: Removing devices on bus 0021:01
  [  594.306738] pci_hp_remove_devices:Removing 0021:01:00.0...
  ...
  [  598.236381] pci_hp_remove_devices:Removing 0021:01:00.1...
  ...
  [  611.972077] pci_bus 0021:01: busn_res: [bus 01-ff] is released
  [  611.972140] rpadlpar_io: slot PHB 33 removed

  # kill -9 $pid1
  # kill -9 $pid2
  [  632.918088] pcibios_free_controller_deferred: domain 33, dynamic 1

Test-case #2 (don't hold references)

  # drmgr -w 5 -d 1 -c phb -s 'PHB 33' -r
  Validating PHB DLPAR capability...yes.
  [  916.357363] pci_hp_remove_devices: PCI: Removing devices on bus 0021:01
  [  916.357386] pci_hp_remove_devices:Removing 0021:01:00.0...
  ...
  [  920.566527] pci_hp_remove_devices:Removing 0021:01:00.1...
  ...
  [  933.955873] pci_bus 0021:01: busn_res: [bus 01-ff] is released
  [  933.955977] pcibios_free_controller_deferred: domain 33, dynamic 1
  [  933.955999] rpadlpar_io: slot PHB 33 removed

Suggested-By: Gavin Shan 
Signed-off-by: Mauricio Faria de Oliveira 


Reviewed-by: Andrew Donnellan 
Tested-by: Andrew Donnellan  # cxl

Does this justify a Cc: stable?

--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited



Re: [PATCH v4] powerpc/pseries: use pci_host_bridge.release_fn() to kfree(phb)

2016-08-11 Thread Andrew Donnellan

On 12/08/16 15:54, Gavin Shan wrote:

It might be nicer for users to implement their own 
pcibios_free_controller_deferred(),
meaning pSeries needs its own implementation for now. The reason is more user 
(pSeries)
specific objects can be released together with the PHB. However, I'm still fine 
without
the comment to be covered.


That's probably not a bad idea, though from a cxl perspective I'm fine 
with using the current version.


--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited



Re: [PATCH v4] powerpc/pseries: use pci_host_bridge.release_fn() to kfree(phb)

2016-08-11 Thread Gavin Shan
On Thu, Aug 11, 2016 at 05:25:40PM -0300, Mauricio Faria de Oliveira wrote:
>This patch leverages 'struct pci_host_bridge' from the PCI subsystem
>in order to free the pci_controller only after the last reference to
>its devices is dropped (avoiding an oops in pcibios_release_device()
>if the last reference is dropped after pcibios_free_controller()).
>
>The patch relies on pci_host_bridge.release_fn() (and .release_data),
>which is called automatically by the PCI subsystem when the root bus
>is released (i.e., the last reference is dropped).  Those fields are
>set via pci_set_host_bridge_release() (e.g. in the platform-specific
>implementation of pcibios_root_bridge_prepare()).
>
>It introduces the 'pcibios_free_controller_deferred()' .release_fn()
>and it expects .release_data to hold a pointer to the pci_controller.
>
>The function implictly calls 'pcibios_free_controller()', so an user
>must *NOT* explicitly call it if using the new _deferred() callback.
>
>The functionality is enabled for pseries (although it isn't platform
>specific, and may be used by cxl).
>
>Details on not-so-elegant design choices:
>
> - Use 'pci_host_bridge.release_data' field as pointer to associated
>   'struct pci_controller' so *not* to 'pci_bus_to_host(bridge->bus)'
>   in pcibios_free_controller_deferred().
>
>   That's because pci_remove_root_bus() sets 'host_bridge->bus = NULL'
>   (so, if the last reference is released after pci_remove_root_bus()
>   runs, which eventually reaches pcibios_free_controller_deferred(),
>   that would hit a null pointer dereference).
>
>   The cxl/vphb.c code calls pci_remove_root_bus(), and the cxl folks
>   are interested in this fix.
>
>Test-case #1 (hold references)
>
>  # ls -ld /sys/block/sd* | grep -m1 0021:01:00.0
>  <...> /sys/block/sdaa -> ../devices/pci0021:01/0021:01:00.0/<...>
>
>  # ls -ld /sys/block/sd* | grep -m1 0021:01:00.1
>  <...> /sys/block/sdab -> ../devices/pci0021:01/0021:01:00.1/<...>
>
>  # cat >/dev/sdaa & pid1=$!
>  # cat >/dev/sdab & pid2=$!
>
>  # drmgr -w 5 -d 1 -c phb -s 'PHB 33' -r
>  Validating PHB DLPAR capability...yes.
>  [  594.306719] pci_hp_remove_devices: PCI: Removing devices on bus 0021:01
>  [  594.306738] pci_hp_remove_devices:Removing 0021:01:00.0...
>  ...
>  [  598.236381] pci_hp_remove_devices:Removing 0021:01:00.1...
>  ...
>  [  611.972077] pci_bus 0021:01: busn_res: [bus 01-ff] is released
>  [  611.972140] rpadlpar_io: slot PHB 33 removed
>
>  # kill -9 $pid1
>  # kill -9 $pid2
>  [  632.918088] pcibios_free_controller_deferred: domain 33, dynamic 1
>
>Test-case #2 (don't hold references)
>
>  # drmgr -w 5 -d 1 -c phb -s 'PHB 33' -r
>  Validating PHB DLPAR capability...yes.
>  [  916.357363] pci_hp_remove_devices: PCI: Removing devices on bus 0021:01
>  [  916.357386] pci_hp_remove_devices:Removing 0021:01:00.0...
>  ...
>  [  920.566527] pci_hp_remove_devices:Removing 0021:01:00.1...
>  ...
>  [  933.955873] pci_bus 0021:01: busn_res: [bus 01-ff] is released
>  [  933.955977] pcibios_free_controller_deferred: domain 33, dynamic 1
>  [  933.955999] rpadlpar_io: slot PHB 33 removed
>
>Suggested-By: Gavin Shan 
>Signed-off-by: Mauricio Faria de Oliveira 

I don't have more obvious comments except below one nitpicky:

Reviewed-by: Gavin Shan 

>---
>Changelog:
> - v4: improve usability/design/documentation:
>   - rename function to pcibios_free_controller_deferred()
>   - from function call pcibios_free_controller()
>   - no more struct pci_controller.bridge field
>   thanks: Gavin Shan, Andrew Donnellan
> - v3: different approach: struct pci_host_bridge.release_fn()
> - v2: different approach: struct pci_controller.refcount 
>
> arch/powerpc/include/asm/pci-bridge.h  |  1 +
> arch/powerpc/kernel/pci-common.c   | 36 ++
> arch/powerpc/platforms/pseries/pci.c   |  4 
> arch/powerpc/platforms/pseries/pci_dlpar.c |  7 --
> 4 files changed, 46 insertions(+), 2 deletions(-)
>
>diff --git a/arch/powerpc/include/asm/pci-bridge.h 
>b/arch/powerpc/include/asm/pci-bridge.h
>index b5e88e4..c0309c5 100644
>--- a/arch/powerpc/include/asm/pci-bridge.h
>+++ b/arch/powerpc/include/asm/pci-bridge.h
>@@ -301,6 +301,7 @@ extern void pci_process_bridge_OF_ranges(struct 
>pci_controller *hose,
> /* Allocate & free a PCI host bridge structure */
> extern struct pci_controller *pcibios_alloc_controller(struct device_node 
> *dev);
> extern void pcibios_free_controller(struct pci_controller *phb);
>+extern void pcibios_free_controller_deferred(struct pci_host_bridge *bridge);
>
> #ifdef CONFIG_PCI
> extern int pcibios_vaddr_is_ioport(void __iomem *address);
>diff --git a/arch/powerpc/kernel/pci-common.c 
>b/arch/powerpc/kernel/pci-common.c
>index a5c0153..8c48a78 100644
>--- a/arch/powerpc/kernel/pci-common.c
>+++ b/arch/powerpc/kernel/pci-common.c
>@@ -151,6 +151,42 @@ void pcibios_free_controller(struct pci_controller *phb)
> EXPORT_SYMBOL_GPL(pcibios_free_controller);
>
> /*
>+ * 

Re: [PATCH] powerpc: populate the default bus with machine_arch_initcall

2016-08-11 Thread Kevin Hao
On Fri, Aug 12, 2016 at 02:39:32PM +1000, Michael Ellerman wrote:
> Kevin Hao  writes:
> 
> > With the commit 44a7185c2ae6 ("of/platform: Add common method to
> > populate default bus"), a default function is introduced to populate
> > the default bus and this function is invoked at the arch_initcall_sync
> > level. This will override the arch specific population of default bus
> > which run at a lower level than arch_initcall_sync. Since not all
> > powerpc specific buses are added to the of_default_bus_match_table[],
> > this causes some powerpc specific bus are not probed. Fix this by
> > using a more preceding initcall.
> >
> > Signed-off-by: Kevin Hao 
> > ---
> > Of course we can adjust the powerpc arch codes to use the
> > of_platform_default_populate_init(), but it has high risk to break
> > other boards given the complicated powerpc specific buses. So I would
> > like just to fix the broken boards in the current release, and cook 
> > a patch to change to of_platform_default_populate_init() for linux-next.
> >
> > Only boot test on a mpc8315erdb board.
> >
> >  arch/powerpc/platforms/40x/ep405.c   | 2 +-
> >  arch/powerpc/platforms/40x/ppc40x_simple.c   | 2 +-
> >  arch/powerpc/platforms/40x/virtex.c  | 2 +-
> >  arch/powerpc/platforms/40x/walnut.c  | 2 +-
> >  arch/powerpc/platforms/44x/canyonlands.c | 2 +-
> >  arch/powerpc/platforms/44x/ebony.c   | 2 +-
> >  arch/powerpc/platforms/44x/iss4xx.c  | 2 +-
> >  arch/powerpc/platforms/44x/ppc44x_simple.c   | 2 +-
> >  arch/powerpc/platforms/44x/ppc476.c  | 2 +-
> >  arch/powerpc/platforms/44x/sam440ep.c| 2 +-
> >  arch/powerpc/platforms/44x/virtex.c  | 2 +-
> >  arch/powerpc/platforms/44x/warp.c| 2 +-
> >  arch/powerpc/platforms/82xx/ep8248e.c| 2 +-
> >  arch/powerpc/platforms/82xx/km82xx.c | 2 +-
> >  arch/powerpc/platforms/82xx/mpc8272_ads.c| 2 +-
> >  arch/powerpc/platforms/82xx/pq2fads.c| 2 +-
> >  arch/powerpc/platforms/83xx/mpc831x_rdb.c| 2 +-
> >  arch/powerpc/platforms/83xx/mpc834x_itx.c| 2 +-
> >  arch/powerpc/platforms/85xx/ppa8548.c| 2 +-
> >  arch/powerpc/platforms/8xx/adder875.c| 2 +-
> >  arch/powerpc/platforms/8xx/ep88xc.c  | 2 +-
> >  arch/powerpc/platforms/8xx/mpc86xads_setup.c | 2 +-
> >  arch/powerpc/platforms/8xx/mpc885ads_setup.c | 2 +-
> >  arch/powerpc/platforms/8xx/tqm8xx_setup.c| 2 +-
> >  arch/powerpc/platforms/cell/setup.c  | 2 +-
> >  arch/powerpc/platforms/embedded6xx/gamecube.c| 2 +-
> >  arch/powerpc/platforms/embedded6xx/linkstation.c | 2 +-
> >  arch/powerpc/platforms/embedded6xx/mvme5100.c| 2 +-
> >  arch/powerpc/platforms/embedded6xx/storcenter.c  | 2 +-
> >  arch/powerpc/platforms/embedded6xx/wii.c | 2 +-
> >  arch/powerpc/platforms/pasemi/setup.c| 2 +-
> 
> That's not a very minimal fix.
> 
> Every one of those initcall changes could be introducing a bug, by
> changing the order vs other init calls.
> 
> Can we just go back to the old behaviour on ppc?

Sure. How about this one?

From 4362b4cdd8a6198df4cc46c628473f0d44e03fa8 Mon Sep 17 00:00:00 2001
From: Kevin Hao 
Date: Fri, 12 Aug 2016 13:30:03 +0800
Subject: [PATCH v2] of/platform: disable the
 of_platform_default_populate_init() for all the ppc boards

With the commit 44a7185c2ae6 ("of/platform: Add common method to
populate default bus"), a default function is introduced to populate
the default bus and this function is invoked at the arch_initcall_sync
level. But a lot of ppc boards use machine_device_initcall() to
populate the default bus. This means that the default populate function
has higher priority and would override the arch specific population of
the bus. The side effect is that some arch specific bus are not probed,
then cause various malfunction due to the miss of some devices. Since
it is very possible to introduce bugs if we simply change the initcall
level for all these boards(about 30+). This just disable this default
function for all the ppc boards.

Signed-off-by: Kevin Hao 
---
 drivers/of/platform.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index 8aa197691074..f39ccd5aa701 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -497,6 +497,7 @@ int of_platform_default_populate(struct device_node *root,
 }
 EXPORT_SYMBOL_GPL(of_platform_default_populate);
 
+#ifndef CONFIG_PPC
 static int __init of_platform_default_populate_init(void)
 {
struct device_node *node;
@@ -521,6 +522,7 @@ static int __init of_platform_default_populate_init(void)
return 0;
 }
 arch_initcall_sync(of_platform_default_populate_init);
+#endif
 
 static int of_platform_device_destroy(struct device *dev, void *data)
 {
-- 
2.8.1

Thanks,
Kevin


signature.asc
Description: PGP signature


Re: [PATCH kernel 14/15] vfio/spapr_tce: Export container API for external users

2016-08-11 Thread David Gibson
On Wed, Aug 03, 2016 at 06:40:55PM +1000, Alexey Kardashevskiy wrote:
> This exports helpers which are needed to keep a VFIO container in
> memory while there are external users such as KVM.
> 
> Signed-off-by: Alexey Kardashevskiy 

I'll address Alex W's broader concerns in a  different mail.  But
there are some more superficial problems with this as well.

> ---
>  drivers/vfio/vfio.c | 30 ++
>  drivers/vfio/vfio_iommu_spapr_tce.c | 16 +++-
>  include/linux/vfio.h|  6 ++
>  3 files changed, 51 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index d1d70e0..baf6a9c 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -1729,6 +1729,36 @@ long vfio_external_check_extension(struct vfio_group 
> *group, unsigned long arg)
>  EXPORT_SYMBOL_GPL(vfio_external_check_extension);
>  
>  /**
> + * External user API for containers, exported by symbols to be linked
> + * dynamically.
> + *
> + */
> +struct vfio_container *vfio_container_get_ext(struct file *filep)
> +{
> + struct vfio_container *container = filep->private_data;
> +
> + if (filep->f_op != &vfio_fops)
> + return ERR_PTR(-EINVAL);
> +
> + vfio_container_get(container);
> +
> + return container;
> +}
> +EXPORT_SYMBOL_GPL(vfio_container_get_ext);
> +
> +void vfio_container_put_ext(struct vfio_container *container)
> +{
> + vfio_container_put(container);
> +}
> +EXPORT_SYMBOL_GPL(vfio_container_put_ext);
> +
> +void *vfio_container_get_iommu_data_ext(struct vfio_container *container)
> +{
> + return container->iommu_data;
> +}
> +EXPORT_SYMBOL_GPL(vfio_container_get_iommu_data_ext);
> +
> +/**
>   * Sub-module support
>   */
>  /*
> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
> b/drivers/vfio/vfio_iommu_spapr_tce.c
> index 3594ad3..fceea3d 100644
> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> @@ -1331,6 +1331,21 @@ const struct vfio_iommu_driver_ops 
> tce_iommu_driver_ops = {
>   .detach_group   = tce_iommu_detach_group,
>  };
>  
> +struct iommu_table *vfio_container_spapr_tce_table_get_ext(void *iommu_data,
> + u64 offset)

I really dislike this name.  I was confused for a while why this
existed on top of vfio_container_get_ext(), the names are so similar.


Making it take a void * is also really nasty since that void * has to
be something specific.  It would be better to have this take a
vfio_container *, verify that the container really does have an
spapr_tce backend, then lookup the tce_container and the actual IOMMU
tables within.

That might also let you drop vfio_container_get_iommu_data_ext()
entirely.

> +{
> + struct tce_container *container = iommu_data;
> + struct iommu_table *tbl = NULL;
> +
> + if (tce_iommu_find_table(container, offset, &tbl) < 0)
> + return NULL;
> +
> + iommu_table_get(tbl);
> +
> + return tbl;
> +}
> +EXPORT_SYMBOL_GPL(vfio_container_spapr_tce_table_get_ext);
> +
>  static int __init tce_iommu_init(void)
>  {
>   return vfio_register_iommu_driver(&tce_iommu_driver_ops);
> @@ -1348,4 +1363,3 @@ MODULE_VERSION(DRIVER_VERSION);
>  MODULE_LICENSE("GPL v2");
>  MODULE_AUTHOR(DRIVER_AUTHOR);
>  MODULE_DESCRIPTION(DRIVER_DESC);
> -
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index 0ecae0b..1c2138a 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -91,6 +91,12 @@ extern void vfio_group_put_external_user(struct vfio_group 
> *group);
>  extern int vfio_external_user_iommu_id(struct vfio_group *group);
>  extern long vfio_external_check_extension(struct vfio_group *group,
> unsigned long arg);
> +extern struct vfio_container *vfio_container_get_ext(struct file *filep);
> +extern void vfio_container_put_ext(struct vfio_container *container);
> +extern void *vfio_container_get_iommu_data_ext(
> + struct vfio_container *container);
> +extern struct iommu_table *vfio_container_spapr_tce_table_get_ext(
> + void *iommu_data, u64 offset);
>  
>  /*
>   * Sub-module helpers

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH kernel 14/15] vfio/spapr_tce: Export container API for external users

2016-08-11 Thread David Gibson
On Wed, Aug 10, 2016 at 10:46:30AM -0600, Alex Williamson wrote:
> On Wed, 10 Aug 2016 15:37:17 +1000
> Alexey Kardashevskiy  wrote:
> 
> > On 09/08/16 22:16, Alex Williamson wrote:
> > > On Tue, 9 Aug 2016 15:19:39 +1000
> > > Alexey Kardashevskiy  wrote:
> > >   
> > >> On 09/08/16 02:43, Alex Williamson wrote:  
> > >>> On Wed,  3 Aug 2016 18:40:55 +1000
> > >>> Alexey Kardashevskiy  wrote:
> > >>> 
> >  This exports helpers which are needed to keep a VFIO container in
> >  memory while there are external users such as KVM.
> > 
> >  Signed-off-by: Alexey Kardashevskiy 
> >  ---
> >   drivers/vfio/vfio.c | 30 
> >  ++
> >   drivers/vfio/vfio_iommu_spapr_tce.c | 16 +++-
> >   include/linux/vfio.h|  6 ++
> >   3 files changed, 51 insertions(+), 1 deletion(-)
> > 
> >  diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> >  index d1d70e0..baf6a9c 100644
> >  --- a/drivers/vfio/vfio.c
> >  +++ b/drivers/vfio/vfio.c
> >  @@ -1729,6 +1729,36 @@ long vfio_external_check_extension(struct 
> >  vfio_group *group, unsigned long arg)
> >   EXPORT_SYMBOL_GPL(vfio_external_check_extension);
> >   
> >   /**
> >  + * External user API for containers, exported by symbols to be linked
> >  + * dynamically.
> >  + *
> >  + */
> >  +struct vfio_container *vfio_container_get_ext(struct file *filep)
> >  +{
> >  +  struct vfio_container *container = filep->private_data;
> >  +
> >  +  if (filep->f_op != &vfio_fops)
> >  +  return ERR_PTR(-EINVAL);
> >  +
> >  +  vfio_container_get(container);
> >  +
> >  +  return container;
> >  +}
> >  +EXPORT_SYMBOL_GPL(vfio_container_get_ext);
> >  +
> >  +void vfio_container_put_ext(struct vfio_container *container)
> >  +{
> >  +  vfio_container_put(container);
> >  +}
> >  +EXPORT_SYMBOL_GPL(vfio_container_put_ext);
> >  +
> >  +void *vfio_container_get_iommu_data_ext(struct vfio_container 
> >  *container)
> >  +{
> >  +  return container->iommu_data;
> >  +}
> >  +EXPORT_SYMBOL_GPL(vfio_container_get_iommu_data_ext);
> >  +
> >  +/**
> >    * Sub-module support
> >    */
> >   /*
> >  diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
> >  b/drivers/vfio/vfio_iommu_spapr_tce.c
> >  index 3594ad3..fceea3d 100644
> >  --- a/drivers/vfio/vfio_iommu_spapr_tce.c
> >  +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> >  @@ -1331,6 +1331,21 @@ const struct vfio_iommu_driver_ops 
> >  tce_iommu_driver_ops = {
> > .detach_group   = tce_iommu_detach_group,
> >   };
> >   
> >  +struct iommu_table *vfio_container_spapr_tce_table_get_ext(void 
> >  *iommu_data,
> >  +  u64 offset)
> >  +{
> >  +  struct tce_container *container = iommu_data;
> >  +  struct iommu_table *tbl = NULL;
> >  +
> >  +  if (tce_iommu_find_table(container, offset, &tbl) < 0)
> >  +  return NULL;
> >  +
> >  +  iommu_table_get(tbl);
> >  +
> >  +  return tbl;
> >  +}
> >  +EXPORT_SYMBOL_GPL(vfio_container_spapr_tce_table_get_ext);
> >  +
> >   static int __init tce_iommu_init(void)
> >   {
> > return vfio_register_iommu_driver(&tce_iommu_driver_ops);
> >  @@ -1348,4 +1363,3 @@ MODULE_VERSION(DRIVER_VERSION);
> >   MODULE_LICENSE("GPL v2");
> >   MODULE_AUTHOR(DRIVER_AUTHOR);
> >   MODULE_DESCRIPTION(DRIVER_DESC);
> >  -
> >  diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> >  index 0ecae0b..1c2138a 100644
> >  --- a/include/linux/vfio.h
> >  +++ b/include/linux/vfio.h
> >  @@ -91,6 +91,12 @@ extern void vfio_group_put_external_user(struct 
> >  vfio_group *group);
> >   extern int vfio_external_user_iommu_id(struct vfio_group *group);
> >   extern long vfio_external_check_extension(struct vfio_group *group,
> >   unsigned long arg);
> >  +extern struct vfio_container *vfio_container_get_ext(struct file 
> >  *filep);
> >  +extern void vfio_container_put_ext(struct vfio_container *container);
> >  +extern void *vfio_container_get_iommu_data_ext(
> >  +  struct vfio_container *container);
> >  +extern struct iommu_table *vfio_container_spapr_tce_table_get_ext(
> >  +  void *iommu_data, u64 offset);
> >   
> >   /*
> >    * Sub-module helpers
> > >>>
> > >>>
> > >>> I think you need to take a closer look of the lifecycle of a container,
> > >>> having a reference means the container itself won't go away, but only
> > >>> having a group set within that container holds the actual IOMMU
> > >>> references.  container->iommu_data is going to be NULL once the
>

[PATCH v4 4/5] PCI: Add a new option for resource_alignment to reassign alignment

2016-08-11 Thread Yongji Xie
When using resource_alignment kernel parameter, the current
implement reassigns the alignment by changing resources' size
which can potentially break some drivers. For example, the driver
uses the size to locate some register whose length is related
to the size.

This patch adds a new option "noresize" for the parameter to
solve this problem.

Signed-off-by: Yongji Xie 
---
 Documentation/kernel-parameters.txt |9 ++---
 drivers/pci/pci.c   |   37 +--
 2 files changed, 33 insertions(+), 13 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 46c030a..c64e439 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -3023,15 +3023,18 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
window. The default value is 64 megabytes.
resource_alignment=
Format:
-   [@][:]:.[; ...]
-   [@]pci::\
-   [::][; 
...]
+   [@][noresize@][:]
+   :.[; ...]
+   [@][noresize@]pci::
+   [::][; ...]
Specifies alignment and device to reassign
aligned memory resources.
If  is not specified,
PAGE_SIZE is used as alignment.
PCI-PCI bridge can be specified, if resource
windows need to be expanded.
+   noresize: Don't change the resources' sizes when
+   reassigning alignment.
ecrc=   Enable/disable PCIe ECRC (transaction layer
end-to-end CRC checking).
bios: Use BIOS/firmware settings. This is the
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index caa0894..d895be7 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4946,11 +4946,13 @@ static DEFINE_SPINLOCK(resource_alignment_lock);
 /**
  * pci_specified_resource_alignment - get resource alignment specified by user.
  * @dev: the PCI device to get
+ * @resize: whether or not to change resources' size when reassigning alignment
  *
  * RETURNS: Resource alignment if it is specified.
  *  Zero if it is not specified.
  */
-static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev)
+static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev,
+   bool *resize)
 {
int seg, bus, slot, func, align_order, count;
unsigned short vendor, device, subsystem_vendor, subsystem_device;
@@ -4974,6 +4976,13 @@ static resource_size_t 
pci_specified_resource_alignment(struct pci_dev *dev)
} else {
align_order = -1;
}
+
+   if (!strncmp(p, "noresize@", 9)) {
+   *resize = false;
+   p += 9;
+   } else
+   *resize = true;
+
if (strncmp(p, "pci:", 4) == 0) {
/* PCI vendor/device (subvendor/subdevice) ids are 
specified */
p += 4;
@@ -5045,6 +5054,7 @@ void pci_reassigndev_resource_alignment(struct pci_dev 
*dev)
 {
int i;
struct resource *r;
+   bool resize = true;
resource_size_t align, size;
 
/*
@@ -5057,7 +5067,7 @@ void pci_reassigndev_resource_alignment(struct pci_dev 
*dev)
return;
 
/* check if specified PCI is target device to reassign */
-   align = pci_specified_resource_alignment(dev);
+   align = pci_specified_resource_alignment(dev, &resize);
if (!align)
return;
 
@@ -5080,15 +5090,22 @@ void pci_reassigndev_resource_alignment(struct pci_dev 
*dev)
}
 
size = resource_size(r);
-   if (size < align) {
-   size = align;
-   dev_info(&dev->dev,
-   "Rounding up size of resource #%d to %#llx.\n",
-   i, (unsigned long long)size);
+   if (resize) {
+   if (size < align) {
+   size = align;
+   dev_info(&dev->dev,
+   "Rounding up size of resource #%d to 
%#llx.\n",
+   i, (unsigned long long)size);
+   }
+   r->flags |= IORESOURCE_UNSET;
+   r->end = size - 1;
+   r->start = 0;
+   } else {
+   r->flags &= ~IORESOURCE_SIZEALIGN;
+   

[PATCH v4 5/5] PCI: Add a macro to set default alignment for all PCI devices

2016-08-11 Thread Yongji Xie
When vfio passthroughs a PCI device of which MMIO BARs are
smaller than PAGE_SIZE, guest will not handle the mmio
accesses to the BARs which leads to mmio emulations in host.

This is because vfio will not allow to passthrough one BAR's
mmio page which may be shared with other BARs. Otherwise,
there will be a backdoor that guest can use to access BARs
of other guest.

This patch adds a macro to set default alignment for all
PCI devices. Then we could solve this issue on some platforms
which would easily hit this issue because of their 64K page
such as PowerNV platform by defining this macro as PAGE_SIZE.

Signed-off-by: Yongji Xie 
---
 arch/powerpc/include/asm/pci.h |4 
 drivers/pci/pci.c  |4 
 2 files changed, 8 insertions(+)

diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h
index e9bd6cf..5e31bc2 100644
--- a/arch/powerpc/include/asm/pci.h
+++ b/arch/powerpc/include/asm/pci.h
@@ -28,6 +28,10 @@
 #define PCIBIOS_MIN_IO 0x1000
 #define PCIBIOS_MIN_MEM0x1000
 
+#ifdef CONFIG_PPC_POWERNV
+#define PCIBIOS_DEFAULT_ALIGNMENT  PAGE_SIZE
+#endif
+
 struct pci_dev;
 
 /* Values for the `which' argument to sys_pciconfig_iobase syscall.  */
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index d895be7..feae59e 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4959,6 +4959,10 @@ static resource_size_t 
pci_specified_resource_alignment(struct pci_dev *dev,
resource_size_t align = 0;
char *p;
 
+#ifdef PCIBIOS_DEFAULT_ALIGNMENT
+   align = PCIBIOS_DEFAULT_ALIGNMENT;
+   *resize = false;
+#endif
spin_lock(&resource_alignment_lock);
p = resource_alignment_param;
if (pci_has_flag(PCI_PROBE_ONLY)) {
-- 
1.7.9.5



[PATCH v4 3/5] PCI: Do not disable memory decoding in pci_reassigndev_resource_alignment()

2016-08-11 Thread Yongji Xie
We should not disable memory decoding when we reassign alignment
in pci_reassigndev_resource_alignment(). It's meaningless and
have some side effects. For example, we found it would break
this kind of P2P bridge:

0001:02:02.0 PCI bridge: PLX Technology, Inc. PEX 8718 16-Lane,
5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev aa)

And it may also potentially break the PCI devices with mmio_always_on
bit set.

Besides, disabling memory decoding is not expected in some fixup
function such as fixup_vga(). The fixup_vga() read PCI_COMMAND_MEMORY
to know whether the devices has been initialized by the firmware or
not. Disabling memory decoding would cause the one initialized by
firmware may not be set as the default VGA device when more than one
graphics adapter is present.

Signed-off-by: Yongji Xie 
---
 drivers/pci/pci.c |8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index b8357d7..caa0894 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5046,7 +5046,6 @@ void pci_reassigndev_resource_alignment(struct pci_dev 
*dev)
int i;
struct resource *r;
resource_size_t align, size;
-   u16 command;
 
/*
 * VF BARs are RO zero according to SR-IOV spec 3.4.1.11. Their
@@ -5069,12 +5068,7 @@ void pci_reassigndev_resource_alignment(struct pci_dev 
*dev)
return;
}
 
-   dev_info(&dev->dev,
-   "Disabling memory decoding and releasing memory resources.\n");
-   pci_read_config_word(dev, PCI_COMMAND, &command);
-   command &= ~PCI_COMMAND_MEMORY;
-   pci_write_config_word(dev, PCI_COMMAND, command);
-
+   dev_info(&dev->dev, "Releasing memory resources.\n");
for (i = 0; i < PCI_BRIDGE_RESOURCES; i++) {
r = &dev->resource[i];
if (!(r->flags & IORESOURCE_MEM))
-- 
1.7.9.5



[PATCH v4 2/5] PCI: Ignore enforced alignment to VF BARs

2016-08-11 Thread Yongji Xie
VF BARs are read-only zeroes according to SRIOV spec,
the normal way(writing BARs) of allocating resources wouldn't
be applied to VFs. The VFs' resources would be allocated
when we enable SR-IOV capability. So we should not try to
reassign alignment after we enable VFs. It's meaningless
and will release the allocated resources which leads to a bug.

Signed-off-by: Yongji Xie 
---
 drivers/pci/pci.c |9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 2d85a96..b8357d7 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5048,6 +5048,15 @@ void pci_reassigndev_resource_alignment(struct pci_dev 
*dev)
resource_size_t align, size;
u16 command;
 
+   /*
+* VF BARs are RO zero according to SR-IOV spec 3.4.1.11. Their
+* resources would be allocated when we enable them and not be
+* re-allocated any more. So we should never try to reassign
+* VF's alignment here.
+*/
+   if (dev->is_virtfn)
+   return;
+
/* check if specified PCI is target device to reassign */
align = pci_specified_resource_alignment(dev);
if (!align)
-- 
1.7.9.5



[PATCH v4 0/5] PCI: Introduce a way to enforce all MMIO BARs not to share PAGE_SIZE

2016-08-11 Thread Yongji Xie
This series introduces a way for PCI resource allocator to force
MMIO BARs not to share PAGE_SIZE. This would make sense to VFIO 
driver. Because current VFIO implementation disallows to mmap 
sub-page(size < PAGE_SIZE) MMIO BARs which may share the same page 
with other BARs for security reasons. Thus, we have to handle mmio 
access to these BARs in QEMU emulation rather than in guest which 
will cause some performance loss.

In our solution, we try to make use of the existing code path of
resource_alignment kernel parameter and add a macro to set default
alignment for it. Thus we can define this macro by default on some
archs which may easily hit the performance issue because of their
64K page.

In this series, patch 1,2,3 fixed bugs of using resource_alignment;
patch 4 tried to add a new option for resource_alignment to use 
IORESOURCE_STARTALIGN to specify the alignment of PCI BARs; patch 5
adds a macro to set the default alignment of all MMIO BARs.

Changelog v4:
- Rebased against v4.8-rc1
- Drop one irrelevant patch
- Drop the patch that adding wildcard to resource_alignment to enforce
  the alignment of all MMIO BARs to be at least PAGE_SIZE
- Change the format of option "noresize" of resource_alignment
- Code style improvements

Changelog v3:
- Ignore enforced alignment to fixed BARs
- Fix issue that disabling memory decoding when reassigning the alignment
- Only enable default alignment on PowerNV platform

Changelog v2:
- Ignore enforced alignment to VF BARs on pci_reassigndev_resource_alignment()

Yongji Xie (5):
  PCI: Ignore enforced alignment when kernel uses existing firmware setup
  PCI: Ignore enforced alignment to VF BARs
  PCI: Do not disable memory decoding in pci_reassigndev_resource_alignment()
  PCI: Add a new option for resource_alignment to reassign alignment
  PCI: Add a macro to set default alignment for all PCI devices

 Documentation/kernel-parameters.txt |9 +++--
 arch/powerpc/include/asm/pci.h  |4 ++
 drivers/pci/pci.c   |   71 ++-
 3 files changed, 64 insertions(+), 20 deletions(-)

-- 
1.7.9.5



[PATCH v4 1/5] PCI: Ignore enforced alignment when kernel uses existing firmware setup

2016-08-11 Thread Yongji Xie
PCI resources allocator will use firmware setup and not try to
reassign resource when PCI_PROBE_ONLY or IORESOURCE_PCI_FIXED
is set.

The enforced alignment in pci_reassigndev_resource_alignment()
should be ignored in this case. Otherwise, some PCI devices'
resources would be released here and not re-allocated.

Signed-off-by: Yongji Xie 
---
 drivers/pci/pci.c |   13 +
 1 file changed, 13 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index aab9d51..2d85a96 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4959,6 +4959,13 @@ static resource_size_t 
pci_specified_resource_alignment(struct pci_dev *dev)
 
spin_lock(&resource_alignment_lock);
p = resource_alignment_param;
+   if (pci_has_flag(PCI_PROBE_ONLY)) {
+   if (*p)
+   pr_info_once("PCI: resource_alignment ignored with 
PCI_PROBE_ONLY\n");
+   spin_unlock(&resource_alignment_lock);
+   return 0;
+   }
+
while (*p) {
count = 0;
if (sscanf(p, "%d%n", &align_order, &count) == 1 &&
@@ -5063,6 +5070,12 @@ void pci_reassigndev_resource_alignment(struct pci_dev 
*dev)
r = &dev->resource[i];
if (!(r->flags & IORESOURCE_MEM))
continue;
+   if (r->flags & IORESOURCE_PCI_FIXED) {
+   dev_info(&dev->dev, "No alignment for fixed BAR%d: 
%pR\n",
+   i, r);
+   continue;
+   }
+
size = resource_size(r);
if (size < align) {
size = align;
-- 
1.7.9.5



[PATCH 1/1] pci: host: pci-layerscape: add missing of_node_put after calling of_parse_phandle

2016-08-11 Thread Peter Chen
of_node_put needs to be called when the device node which is got
from of_parse_phandle has finished using.

Cc: Minghuan Lian 
Cc: Mingkai Hu 
Cc: Roy Zang 
Signed-off-by: Peter Chen 
---
 drivers/pci/host/pci-layerscape.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/pci/host/pci-layerscape.c 
b/drivers/pci/host/pci-layerscape.c
index 114ba81..573b996 100644
--- a/drivers/pci/host/pci-layerscape.c
+++ b/drivers/pci/host/pci-layerscape.c
@@ -173,6 +173,8 @@ static int ls_pcie_msi_host_init(struct pcie_port *pp,
return -EINVAL;
}
 
+   of_node_put(msi_node);
+
return 0;
 }
 
-- 
1.9.1



Re: [PATCH kernel 05/15] powerpc/iommu: Stop using @current in mm_iommu_xxx

2016-08-11 Thread Alexey Kardashevskiy
On 12/08/16 12:57, David Gibson wrote:
> On Wed, Aug 03, 2016 at 06:40:46PM +1000, Alexey Kardashevskiy wrote:
>> In some situations the userspace memory context may live longer than
>> the userspace process itself so if we need to do proper memory context
>> cleanup, we better cache @mm and use it later when the process is gone
>> (@current or @current->mm are NULL).
>>
>> This changes mm_iommu_xxx API to receive mm_struct instead of using one
>> from @current.
>>
>> This is needed by the following patch to do proper cleanup in time.
>> This depends on "powerpc/powernv/ioda: Fix endianness when reading TCEs"
>> to do proper cleanup via tce_iommu_clear() patch.
>>
>> To keep API consistent, this replaces mm_context_t with mm_struct;
>> we stick to mm_struct as mm_iommu_adjust_locked_vm() helper needs
>> access to &mm->mmap_sem.
>>
>> This should cause no behavioral change.
>>
>> Signed-off-by: Alexey Kardashevskiy 
>> ---
>>  arch/powerpc/include/asm/mmu_context.h | 20 +++--
>>  arch/powerpc/kernel/setup-common.c |  2 +-
>>  arch/powerpc/mm/mmu_context_book3s64.c |  4 +--
>>  arch/powerpc/mm/mmu_context_iommu.c| 54 
>> ++
>>  drivers/vfio/vfio_iommu_spapr_tce.c| 41 --
>>  5 files changed, 62 insertions(+), 59 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/mmu_context.h 
>> b/arch/powerpc/include/asm/mmu_context.h
>> index 9d2cd0c..b85cc7b 100644
>> --- a/arch/powerpc/include/asm/mmu_context.h
>> +++ b/arch/powerpc/include/asm/mmu_context.h
>> @@ -18,16 +18,18 @@ extern void destroy_context(struct mm_struct *mm);
>>  #ifdef CONFIG_SPAPR_TCE_IOMMU
>>  struct mm_iommu_table_group_mem_t;
>>  
>> -extern bool mm_iommu_preregistered(void);
>> -extern long mm_iommu_get(unsigned long ua, unsigned long entries,
>> +extern bool mm_iommu_preregistered(struct mm_struct *mm);
>> +extern long mm_iommu_get(struct mm_struct *mm,
>> +unsigned long ua, unsigned long entries,
>>  struct mm_iommu_table_group_mem_t **pmem);
>> -extern long mm_iommu_put(struct mm_iommu_table_group_mem_t *mem);
>> -extern void mm_iommu_init(mm_context_t *ctx);
>> -extern void mm_iommu_cleanup(mm_context_t *ctx);
>> -extern struct mm_iommu_table_group_mem_t *mm_iommu_lookup(unsigned long ua,
>> -unsigned long size);
>> -extern struct mm_iommu_table_group_mem_t *mm_iommu_find(unsigned long ua,
>> -unsigned long entries);
>> +extern long mm_iommu_put(struct mm_struct *mm,
>> +struct mm_iommu_table_group_mem_t *mem);
>> +extern void mm_iommu_init(struct mm_struct *mm);
>> +extern void mm_iommu_cleanup(struct mm_struct *mm);
>> +extern struct mm_iommu_table_group_mem_t *mm_iommu_lookup(struct mm_struct 
>> *mm,
>> +unsigned long ua, unsigned long size);
>> +extern struct mm_iommu_table_group_mem_t *mm_iommu_find(struct mm_struct 
>> *mm,
>> +unsigned long ua, unsigned long entries);
>>  extern long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem,
>>  unsigned long ua, unsigned long *hpa);
>>  extern long mm_iommu_mapped_inc(struct mm_iommu_table_group_mem_t *mem);
>> diff --git a/arch/powerpc/kernel/setup-common.c 
>> b/arch/powerpc/kernel/setup-common.c
>> index 714b4ba..e90b68a 100644
>> --- a/arch/powerpc/kernel/setup-common.c
>> +++ b/arch/powerpc/kernel/setup-common.c
>> @@ -905,7 +905,7 @@ void __init setup_arch(char **cmdline_p)
>>  init_mm.context.pte_frag = NULL;
>>  #endif
>>  #ifdef CONFIG_SPAPR_TCE_IOMMU
>> -mm_iommu_init(&init_mm.context);
>> +mm_iommu_init(&init_mm);
>>  #endif
>>  irqstack_early_init();
>>  exc_lvl_early_init();
>> diff --git a/arch/powerpc/mm/mmu_context_book3s64.c 
>> b/arch/powerpc/mm/mmu_context_book3s64.c
>> index b114f8b..ad82735 100644
>> --- a/arch/powerpc/mm/mmu_context_book3s64.c
>> +++ b/arch/powerpc/mm/mmu_context_book3s64.c
>> @@ -115,7 +115,7 @@ int init_new_context(struct task_struct *tsk, struct 
>> mm_struct *mm)
>>  mm->context.pte_frag = NULL;
>>  #endif
>>  #ifdef CONFIG_SPAPR_TCE_IOMMU
>> -mm_iommu_init(&mm->context);
>> +mm_iommu_init(mm);
>>  #endif
>>  return 0;
>>  }
>> @@ -160,7 +160,7 @@ static inline void destroy_pagetable_page(struct 
>> mm_struct *mm)
>>  void destroy_context(struct mm_struct *mm)
>>  {
>>  #ifdef CONFIG_SPAPR_TCE_IOMMU
>> -mm_iommu_cleanup(&mm->context);
>> +mm_iommu_cleanup(mm);
>>  #endif
>>  
>>  #ifdef CONFIG_PPC_ICSWX
>> diff --git a/arch/powerpc/mm/mmu_context_iommu.c 
>> b/arch/powerpc/mm/mmu_context_iommu.c
>> index da6a216..ee6685b 100644
>> --- a/arch/powerpc/mm/mmu_context_iommu.c
>> +++ b/arch/powerpc/mm/mmu_context_iommu.c
>> @@ -53,7 +53,7 @@ static long mm_iommu_adjust_locked_vm(struct mm_struct *mm,
>>  }
>>  
>>  pr_debug("[%d] RLIMIT_MEMLOCK HASH64 %c%ld %ld/%ld\n",
>> -current->pid,
>> +current ? current->pid : 0,
>>  incr ? '+' : '-',
>> 

Re: [PATCH kernel 13/15] KVM: PPC: Pass kvm* to kvmppc_find_table()

2016-08-11 Thread David Gibson
On Wed, Aug 03, 2016 at 06:40:54PM +1000, Alexey Kardashevskiy wrote:
> The guest view TCE tables are per KVM anyway (not per VCPU) so pass kvm*
> there. This will be used in the following patches where we will be
> attaching VFIO containers to LIOBNs via ioctl() to KVM (rather than
> to VCPU).
> 
> Signed-off-by: Alexey Kardashevskiy 

Reviewed-by: David Gibson 

> ---
>  arch/powerpc/include/asm/kvm_ppc.h  |  2 +-
>  arch/powerpc/kvm/book3s_64_vio.c|  7 ---
>  arch/powerpc/kvm/book3s_64_vio_hv.c | 13 +++--
>  3 files changed, 12 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
> b/arch/powerpc/include/asm/kvm_ppc.h
> index 2544eda..7f1abe9 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -167,7 +167,7 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
>  extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>   struct kvm_create_spapr_tce_64 *args);
>  extern struct kvmppc_spapr_tce_table *kvmppc_find_table(
> - struct kvm_vcpu *vcpu, unsigned long liobn);
> + struct kvm *kvm, unsigned long liobn);
>  extern long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
>   unsigned long ioba, unsigned long npages);
>  extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt,
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c 
> b/arch/powerpc/kvm/book3s_64_vio.c
> index c379ff5..15df8ae 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -212,12 +212,13 @@ fail:
>  long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> unsigned long ioba, unsigned long tce)
>  {
> - struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
> + struct kvmppc_spapr_tce_table *stt;
>   long ret;
>  
>   /* udbg_printf("H_PUT_TCE(): liobn=0x%lx ioba=0x%lx, tce=0x%lx\n", */
>   /*  liobn, ioba, tce); */
>  
> + stt = kvmppc_find_table(vcpu->kvm, liobn);
>   if (!stt)
>   return H_TOO_HARD;
>  
> @@ -245,7 +246,7 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>   u64 __user *tces;
>   u64 tce;
>  
> - stt = kvmppc_find_table(vcpu, liobn);
> + stt = kvmppc_find_table(vcpu->kvm, liobn);
>   if (!stt)
>   return H_TOO_HARD;
>  
> @@ -299,7 +300,7 @@ long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
>   struct kvmppc_spapr_tce_table *stt;
>   long i, ret;
>  
> - stt = kvmppc_find_table(vcpu, liobn);
> + stt = kvmppc_find_table(vcpu->kvm, liobn);
>   if (!stt)
>   return H_TOO_HARD;
>  
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c 
> b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index a3be4bd..8a6834e 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -49,10 +49,9 @@
>   * WARNING: This will be called in real or virtual mode on HV KVM and virtual
>   *  mode on PR KVM
>   */
> -struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
> +struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm *kvm,
>   unsigned long liobn)
>  {
> - struct kvm *kvm = vcpu->kvm;
>   struct kvmppc_spapr_tce_table *stt;
>  
>   list_for_each_entry_lockless(stt, &kvm->arch.spapr_tce_tables, list)
> @@ -194,12 +193,13 @@ static struct mm_iommu_table_group_mem_t 
> *kvmppc_rm_iommu_lookup(
>  long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>   unsigned long ioba, unsigned long tce)
>  {
> - struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
> + struct kvmppc_spapr_tce_table *stt;
>   long ret;
>  
>   /* udbg_printf("H_PUT_TCE(): liobn=0x%lx ioba=0x%lx, tce=0x%lx\n", */
>   /*  liobn, ioba, tce); */
>  
> + stt = kvmppc_find_table(vcpu->kvm, liobn);
>   if (!stt)
>   return H_TOO_HARD;
>  
> @@ -252,7 +252,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>   unsigned long tces, entry, ua = 0;
>   unsigned long *rmap = NULL;
>  
> - stt = kvmppc_find_table(vcpu, liobn);
> + stt = kvmppc_find_table(vcpu->kvm, liobn);
>   if (!stt)
>   return H_TOO_HARD;
>  
> @@ -335,7 +335,7 @@ long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
>   struct kvmppc_spapr_tce_table *stt;
>   long i, ret;
>  
> - stt = kvmppc_find_table(vcpu, liobn);
> + stt = kvmppc_find_table(vcpu->kvm, liobn);
>   if (!stt)
>   return H_TOO_HARD;
>  
> @@ -356,12 +356,13 @@ long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
>  long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> unsigned long ioba)
>  {
> - struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
> + struct kvmppc_spapr_tce_table *stt;
>   long ret;
>   unsigned long idx;
>   struct page *page;
>   u64

Re: [PATCH kernel 11/15] powerpc/powernv/iommu: Add real mode version of iommu_table_ops::exchange()

2016-08-11 Thread David Gibson
On Wed, Aug 03, 2016 at 06:40:52PM +1000, Alexey Kardashevskiy wrote:
> In real mode, TCE tables are invalidated using special
> cache-inhibited store instructions which are not available in
> virtual mode
> 
> This defines and implements exchange_rm() callback. This does not
> define set_rm/clear_rm/flush_rm callbacks as there is no user for those -
> exchange/exchange_rm are only to be used by KVM for VFIO.
> 
> The exchange_rm callback is defined for IODA1/IODA2 powernv platforms.
> 
> This replaces list_for_each_entry_rcu with its lockless version as
> from now on pnv_pci_ioda2_tce_invalidate() can be called in
> the real mode too.
> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
>  arch/powerpc/include/asm/iommu.h  |  7 +++
>  arch/powerpc/kernel/iommu.c   | 23 +++
>  arch/powerpc/platforms/powernv/pci-ioda.c | 26 +-
>  3 files changed, 55 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/iommu.h 
> b/arch/powerpc/include/asm/iommu.h
> index cd4df44..a13d207 100644
> --- a/arch/powerpc/include/asm/iommu.h
> +++ b/arch/powerpc/include/asm/iommu.h
> @@ -64,6 +64,11 @@ struct iommu_table_ops {
>   long index,
>   unsigned long *hpa,
>   enum dma_data_direction *direction);
> + /* Real mode */
> + int (*exchange_rm)(struct iommu_table *tbl,
> + long index,
> + unsigned long *hpa,
> + enum dma_data_direction *direction);
>  #endif
>   void (*clear)(struct iommu_table *tbl,
>   long index, long npages);
> @@ -209,6 +214,8 @@ extern void iommu_del_device(struct device *dev);
>  extern int __init tce_iommu_bus_notifier_init(void);
>  extern long iommu_tce_xchg(struct iommu_table *tbl, unsigned long entry,
>   unsigned long *hpa, enum dma_data_direction *direction);
> +extern long iommu_tce_xchg_rm(struct iommu_table *tbl, unsigned long entry,
> + unsigned long *hpa, enum dma_data_direction *direction);
>  #else
>  static inline void iommu_register_group(struct iommu_table_group 
> *table_group,
>   int pci_domain_number,
> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> index a8f017a..65b2dac 100644
> --- a/arch/powerpc/kernel/iommu.c
> +++ b/arch/powerpc/kernel/iommu.c
> @@ -1020,6 +1020,29 @@ long iommu_tce_xchg(struct iommu_table *tbl, unsigned 
> long entry,
>  }
>  EXPORT_SYMBOL_GPL(iommu_tce_xchg);
>  
> +long iommu_tce_xchg_rm(struct iommu_table *tbl, unsigned long entry,
> + unsigned long *hpa, enum dma_data_direction *direction)
> +{
> + long ret;
> +
> + ret = tbl->it_ops->exchange_rm(tbl, entry, hpa, direction);
> +
> + if (!ret && ((*direction == DMA_FROM_DEVICE) ||
> + (*direction == DMA_BIDIRECTIONAL))) {
> + struct page *pg = realmode_pfn_to_page(*hpa >> PAGE_SHIFT);
> +
> + if (likely(pg)) {
> + SetPageDirty(pg);
> + } else {

Isn't there a race here, if someone else updates this TCE entry
between your initial exchange and the rollback exchange below?

> + tbl->it_ops->exchange_rm(tbl, entry, hpa, direction);
> + ret = -EFAULT;
> + }
> + }
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_tce_xchg_rm);
> +
>  int iommu_take_ownership(struct iommu_table *tbl)
>  {
>   unsigned long flags, i, sz = (tbl->it_size + 7) >> 3;
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> b/arch/powerpc/platforms/powernv/pci-ioda.c
> index c04afd2..a0b5ea6 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -1827,6 +1827,17 @@ static int pnv_ioda1_tce_xchg(struct iommu_table *tbl, 
> long index,
>  
>   return ret;
>  }
> +
> +static int pnv_ioda1_tce_xchg_rm(struct iommu_table *tbl, long index,
> + unsigned long *hpa, enum dma_data_direction *direction)
> +{
> + long ret = pnv_tce_xchg(tbl, index, hpa, direction);
> +
> + if (!ret)
> + pnv_pci_p7ioc_tce_invalidate(tbl, index, 1, true);
> +
> + return ret;
> +}
>  #endif
>  
>  static void pnv_ioda1_tce_free(struct iommu_table *tbl, long index,
> @@ -1841,6 +1852,7 @@ static struct iommu_table_ops pnv_ioda1_iommu_ops = {
>   .set = pnv_ioda1_tce_build,
>  #ifdef CONFIG_IOMMU_API
>   .exchange = pnv_ioda1_tce_xchg,
> + .exchange_rm = pnv_ioda1_tce_xchg_rm,
>  #endif
>   .clear = pnv_ioda1_tce_free,
>   .get = pnv_tce_get,
> @@ -1915,7 +1927,7 @@ static void pnv_pci_ioda2_tce_invalidate(struct 
> iommu_table *tbl,
>  {
>   struct iommu_table_group_link *tgl;
>  
> - list_for_each_entry_rcu(tgl, &tbl->it_group_list, next) {
> + list_for_each_entry_lockless(tgl, &tbl->it_group_list, next) {

So.. IIUC, previously this had a bool rm parameter, bu

Re: [PATCH kernel 12/15] KVM: PPC: Enable IOMMU_API for KVM_BOOK3S_64 permanently

2016-08-11 Thread David Gibson
On Wed, Aug 03, 2016 at 06:40:53PM +1000, Alexey Kardashevskiy wrote:
> It does not make much sense to have KVM in book3s-64 and
> not to have IOMMU bits for PCI pass through support as it costs little
> and allows VFIO to function on book3s KVM.
> 
> Having IOMMU_API always enabled makes it unnecessary to have a lot of
> "#ifdef IOMMU_API" in arch/powerpc/kvm/book3s_64_vio*. With those
> ifdef's we could have only user space emulated devices accelerated
> (but not VFIO) which do not seem to be very useful.
> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
>  arch/powerpc/kvm/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
> index b7c494b..63b60a8 100644
> --- a/arch/powerpc/kvm/Kconfig
> +++ b/arch/powerpc/kvm/Kconfig
> @@ -65,6 +65,7 @@ config KVM_BOOK3S_64
>   select KVM
>   select KVM_BOOK3S_PR_POSSIBLE if !KVM_BOOK3S_HV_POSSIBLE
>   select KVM_VFIO if VFIO
> + select SPAPR_TCE_IOMMU if IOMMU_SUPPORT
>   ---help---
> Support running unmodified book3s_64 and book3s_32 guest kernels
> in virtual machines on book3s_64 host processors.

I don't quite see how this change accomplishes the stated goal.
AFAICT even with this change you can still turn off IOMMU_SUPPORT,
which will break the IOMMU for VFIO passthrough, but not IOMMU
acceleration for emulated devices (since that requires no interaction
with the hardware IOMMU).

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH kernel 09/15] powerpc/mmu: Add real mode support for IOMMU preregistered memory

2016-08-11 Thread David Gibson
On Wed, Aug 03, 2016 at 06:40:50PM +1000, Alexey Kardashevskiy wrote:
> This makes mm_iommu_lookup() able to work in realmode by replacing
> list_for_each_entry_rcu() (which can do debug stuff which can fail in
> real mode) with list_for_each_entry_lockless().
> 
> This adds realmode version of mm_iommu_ua_to_hpa() which adds
> explicit vmalloc'd-to-linear address conversion.
> Unlike mm_iommu_ua_to_hpa(), mm_iommu_ua_to_hpa_rm() can fail.
> 
> This changes mm_iommu_preregistered() to receive @mm as in real mode
> @current does not always have a correct pointer.
> 
> This adds realmode version of mm_iommu_lookup() which receives @mm
> (for the same reason as for mm_iommu_preregistered()) and uses
> lockless version of list_for_each_entry_rcu().
> 
> Signed-off-by: Alexey Kardashevskiy 

Reviewed-by: David Gibson 

> ---
>  arch/powerpc/include/asm/mmu_context.h |  4 
>  arch/powerpc/mm/mmu_context_iommu.c| 39 
> ++
>  2 files changed, 43 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/mmu_context.h 
> b/arch/powerpc/include/asm/mmu_context.h
> index a4c4ed5..939030c 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -27,10 +27,14 @@ extern long mm_iommu_put(struct mm_struct *mm,
>  extern void mm_iommu_init(struct mm_struct *mm);
>  extern struct mm_iommu_table_group_mem_t *mm_iommu_lookup(struct mm_struct 
> *mm,
>   unsigned long ua, unsigned long size);
> +extern struct mm_iommu_table_group_mem_t *mm_iommu_lookup_rm(
> + struct mm_struct *mm, unsigned long ua, unsigned long size);
>  extern struct mm_iommu_table_group_mem_t *mm_iommu_find(struct mm_struct *mm,
>   unsigned long ua, unsigned long entries);
>  extern long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem,
>   unsigned long ua, unsigned long *hpa);
> +extern long mm_iommu_ua_to_hpa_rm(struct mm_iommu_table_group_mem_t *mem,
> + unsigned long ua, unsigned long *hpa);
>  extern long mm_iommu_mapped_inc(struct mm_iommu_table_group_mem_t *mem);
>  extern void mm_iommu_mapped_dec(struct mm_iommu_table_group_mem_t *mem);
>  #endif
> diff --git a/arch/powerpc/mm/mmu_context_iommu.c 
> b/arch/powerpc/mm/mmu_context_iommu.c
> index 10f01fe..36a906c 100644
> --- a/arch/powerpc/mm/mmu_context_iommu.c
> +++ b/arch/powerpc/mm/mmu_context_iommu.c
> @@ -242,6 +242,25 @@ struct mm_iommu_table_group_mem_t 
> *mm_iommu_lookup(struct mm_struct *mm,
>  }
>  EXPORT_SYMBOL_GPL(mm_iommu_lookup);
>  
> +struct mm_iommu_table_group_mem_t *mm_iommu_lookup_rm(struct mm_struct *mm,
> + unsigned long ua, unsigned long size)
> +{
> + struct mm_iommu_table_group_mem_t *mem, *ret = NULL;
> +
> + list_for_each_entry_lockless(mem, &mm->context.iommu_group_mem_list,
> + next) {
> + if ((mem->ua <= ua) &&
> + (ua + size <= mem->ua +
> +  (mem->entries << PAGE_SHIFT))) {
> + ret = mem;
> + break;
> + }
> + }
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(mm_iommu_lookup_rm);
> +
>  struct mm_iommu_table_group_mem_t *mm_iommu_find(struct mm_struct *mm,
>   unsigned long ua, unsigned long entries)
>  {
> @@ -273,6 +292,26 @@ long mm_iommu_ua_to_hpa(struct 
> mm_iommu_table_group_mem_t *mem,
>  }
>  EXPORT_SYMBOL_GPL(mm_iommu_ua_to_hpa);
>  
> +long mm_iommu_ua_to_hpa_rm(struct mm_iommu_table_group_mem_t *mem,
> + unsigned long ua, unsigned long *hpa)
> +{
> + const long entry = (ua - mem->ua) >> PAGE_SHIFT;
> + void *va = &mem->hpas[entry];
> + unsigned long *ra;
> +
> + if (entry >= mem->entries)
> + return -EFAULT;
> +
> + ra = (void *) vmalloc_to_phys(va);
> + if (!ra)
> + return -EFAULT;
> +
> + *hpa = *ra | (ua & ~PAGE_MASK);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(mm_iommu_ua_to_hpa_rm);
> +
>  long mm_iommu_mapped_inc(struct mm_iommu_table_group_mem_t *mem)
>  {
>   if (atomic64_inc_not_zero(&mem->mapped))

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH kernel 10/15] KVM: PPC: Use preregistered memory API to access TCE list

2016-08-11 Thread David Gibson
On Wed, Aug 03, 2016 at 06:40:51PM +1000, Alexey Kardashevskiy wrote:
> VFIO on sPAPR already implements guest memory pre-registration
> when the entire guest RAM gets pinned. This can be used to translate
> the physical address of a guest page containing the TCE list
> from H_PUT_TCE_INDIRECT.
> 
> This makes use of the pre-registrered memory API to access TCE list
> pages in order to avoid unnecessary locking on the KVM memory
> reverse map as we know that all of guest memory is pinned and
> we have a flat array mapping GPA to HPA which makes it simpler and
> quicker to index into that array (even with looking up the
> kernel page tables in vmalloc_to_phys) than it is to find the memslot,
> lock the rmap entry, look up the user page tables, and unlock the rmap
> entry. Note that the rmap pointer is initialized to NULL where declared
> (not in this patch).
> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
> Changes:
> v2:
> * updated the commit log with Paul's comment
> ---
>  arch/powerpc/kvm/book3s_64_vio_hv.c | 65 
> -
>  1 file changed, 49 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c 
> b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index d461c44..a3be4bd 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -180,6 +180,17 @@ long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
>  EXPORT_SYMBOL_GPL(kvmppc_gpa_to_ua);
>  
>  #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> +static inline bool kvmppc_preregistered(struct kvm_vcpu *vcpu)
> +{
> + return mm_iommu_preregistered(vcpu->kvm->mm);
> +}
> +
> +static struct mm_iommu_table_group_mem_t *kvmppc_rm_iommu_lookup(
> + struct kvm_vcpu *vcpu, unsigned long ua, unsigned long size)
> +{
> + return mm_iommu_lookup_rm(vcpu->kvm->mm, ua, size);
> +}
> +
>  long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>   unsigned long ioba, unsigned long tce)
>  {
> @@ -260,23 +271,44 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>   if (ret != H_SUCCESS)
>   return ret;
>  
> - if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, &rmap))
> - return H_TOO_HARD;
> + if (kvmppc_preregistered(vcpu)) {
> + /*
> +  * We get here if guest memory was pre-registered which
> +  * is normally VFIO case and gpa->hpa translation does not
> +  * depend on hpt.
> +  */
> + struct mm_iommu_table_group_mem_t *mem;
>  
> - rmap = (void *) vmalloc_to_phys(rmap);
> + if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, NULL))
> + return H_TOO_HARD;

Wouldn't it be clearer to put the gpa->ua lookup outside the if?
You'd have to throw away the rmap you get in the prereg case, but it
shouldn't be harmful, should it?

>  
> - /*
> -  * Synchronize with the MMU notifier callbacks in
> -  * book3s_64_mmu_hv.c (kvm_unmap_hva_hv etc.).
> -  * While we have the rmap lock, code running on other CPUs
> -  * cannot finish unmapping the host real page that backs
> -  * this guest real page, so we are OK to access the host
> -  * real page.
> -  */
> - lock_rmap(rmap);
> - if (kvmppc_rm_ua_to_hpa(vcpu, ua, &tces)) {
> - ret = H_TOO_HARD;
> - goto unlock_exit;
> + mem = kvmppc_rm_iommu_lookup(vcpu, ua, IOMMU_PAGE_SIZE_4K);
> + if (!mem || mm_iommu_ua_to_hpa_rm(mem, ua, &tces))
> + return H_TOO_HARD;

This doesn't fall back to the rmap approach if it can't locate the
page in question in the prereg map.  IIUC that means that this will
now work less well than previously if you have a userspace which
preregisters some memory, but not all of guest RAM.  I'm not sure if
we care about that, since no such userspace currently exists.


> + } else {
> + /*
> +  * This is emulated devices case.

This is a bit misleading - this case will only be triggered if there
are *no* prereg-ed VFIO devices.  The case above can be used even for
emulated devices, if there happen to also be VFIO devices present
which have preregistered guest RAM.

> +  * We do not require memory to be preregistered in this case
> +  * so lock rmap and do __find_linux_pte_or_hugepte().
> +  */
> + if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, &rmap))
> + return H_TOO_HARD;
> +
> + rmap = (void *) vmalloc_to_phys(rmap);
> +
> + /*
> +  * Synchronize with the MMU notifier callbacks in
> +  * book3s_64_mmu_hv.c (kvm_unmap_hva_hv etc.).
> +  * While we have the rmap lock, code running on other CPUs
> +  * cannot finish unmapping the host real page that backs
> +  * this guest real page, so we are OK to access the host
> +  * real page.
> +  */

Re: [PATCH] powerpc: populate the default bus with machine_arch_initcall

2016-08-11 Thread Michael Ellerman
Kevin Hao  writes:

> With the commit 44a7185c2ae6 ("of/platform: Add common method to
> populate default bus"), a default function is introduced to populate
> the default bus and this function is invoked at the arch_initcall_sync
> level. This will override the arch specific population of default bus
> which run at a lower level than arch_initcall_sync. Since not all
> powerpc specific buses are added to the of_default_bus_match_table[],
> this causes some powerpc specific bus are not probed. Fix this by
> using a more preceding initcall.
>
> Signed-off-by: Kevin Hao 
> ---
> Of course we can adjust the powerpc arch codes to use the
> of_platform_default_populate_init(), but it has high risk to break
> other boards given the complicated powerpc specific buses. So I would
> like just to fix the broken boards in the current release, and cook 
> a patch to change to of_platform_default_populate_init() for linux-next.
>
> Only boot test on a mpc8315erdb board.
>
>  arch/powerpc/platforms/40x/ep405.c   | 2 +-
>  arch/powerpc/platforms/40x/ppc40x_simple.c   | 2 +-
>  arch/powerpc/platforms/40x/virtex.c  | 2 +-
>  arch/powerpc/platforms/40x/walnut.c  | 2 +-
>  arch/powerpc/platforms/44x/canyonlands.c | 2 +-
>  arch/powerpc/platforms/44x/ebony.c   | 2 +-
>  arch/powerpc/platforms/44x/iss4xx.c  | 2 +-
>  arch/powerpc/platforms/44x/ppc44x_simple.c   | 2 +-
>  arch/powerpc/platforms/44x/ppc476.c  | 2 +-
>  arch/powerpc/platforms/44x/sam440ep.c| 2 +-
>  arch/powerpc/platforms/44x/virtex.c  | 2 +-
>  arch/powerpc/platforms/44x/warp.c| 2 +-
>  arch/powerpc/platforms/82xx/ep8248e.c| 2 +-
>  arch/powerpc/platforms/82xx/km82xx.c | 2 +-
>  arch/powerpc/platforms/82xx/mpc8272_ads.c| 2 +-
>  arch/powerpc/platforms/82xx/pq2fads.c| 2 +-
>  arch/powerpc/platforms/83xx/mpc831x_rdb.c| 2 +-
>  arch/powerpc/platforms/83xx/mpc834x_itx.c| 2 +-
>  arch/powerpc/platforms/85xx/ppa8548.c| 2 +-
>  arch/powerpc/platforms/8xx/adder875.c| 2 +-
>  arch/powerpc/platforms/8xx/ep88xc.c  | 2 +-
>  arch/powerpc/platforms/8xx/mpc86xads_setup.c | 2 +-
>  arch/powerpc/platforms/8xx/mpc885ads_setup.c | 2 +-
>  arch/powerpc/platforms/8xx/tqm8xx_setup.c| 2 +-
>  arch/powerpc/platforms/cell/setup.c  | 2 +-
>  arch/powerpc/platforms/embedded6xx/gamecube.c| 2 +-
>  arch/powerpc/platforms/embedded6xx/linkstation.c | 2 +-
>  arch/powerpc/platforms/embedded6xx/mvme5100.c| 2 +-
>  arch/powerpc/platforms/embedded6xx/storcenter.c  | 2 +-
>  arch/powerpc/platforms/embedded6xx/wii.c | 2 +-
>  arch/powerpc/platforms/pasemi/setup.c| 2 +-

That's not a very minimal fix.

Every one of those initcall changes could be introducing a bug, by
changing the order vs other init calls.

Can we just go back to the old behaviour on ppc?

cheers


Re: [PATCH kernel 08/15] powerpc/vfio_spapr_tce: Add reference counting to iommu_table

2016-08-11 Thread David Gibson
On Wed, Aug 03, 2016 at 06:40:49PM +1000, Alexey Kardashevskiy wrote:
> So far iommu_table obejcts were only used in virtual mode and had
> a single owner. We are going to change by implementing in-kernel
> acceleration of DMA mapping requests, including real mode.
> 
> This adds a kref to iommu_table and defines new helpers to update it.
> This replaces iommu_free_table() with iommu_table_put() and makes
> iommu_free_table() static. iommu_table_get() is not used in this patch
> but will be in the following one.
> 
> While we are here, this removes @node_name parameter as it has never been
> really useful on powernv and carrying it for the pseries platform code to
> iommu_free_table() seems to be quite useless too.
> 
> This should cause no behavioral change.
> 
> Signed-off-by: Alexey Kardashevskiy 

Reviewed-by: David Gibson 

> ---
>  arch/powerpc/include/asm/iommu.h  |  5 +++--
>  arch/powerpc/kernel/iommu.c   | 24 +++-
>  arch/powerpc/kernel/vio.c |  2 +-
>  arch/powerpc/platforms/powernv/pci-ioda.c | 14 +++---
>  arch/powerpc/platforms/powernv/pci.c  |  1 +
>  arch/powerpc/platforms/pseries/iommu.c|  3 ++-
>  drivers/vfio/vfio_iommu_spapr_tce.c   |  2 +-
>  7 files changed, 34 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/iommu.h 
> b/arch/powerpc/include/asm/iommu.h
> index f49a72a..cd4df44 100644
> --- a/arch/powerpc/include/asm/iommu.h
> +++ b/arch/powerpc/include/asm/iommu.h
> @@ -114,6 +114,7 @@ struct iommu_table {
>   struct list_head it_group_list;/* List of iommu_table_group_link */
>   unsigned long *it_userspace; /* userspace view of the table */
>   struct iommu_table_ops *it_ops;
> + struct krefit_kref;
>  };
>  
>  #define IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry) \
> @@ -146,8 +147,8 @@ static inline void *get_iommu_table_base(struct device 
> *dev)
>  
>  extern int dma_iommu_dma_supported(struct device *dev, u64 mask);
>  
> -/* Frees table for an individual device node */
> -extern void iommu_free_table(struct iommu_table *tbl, const char *node_name);
> +extern void iommu_table_get(struct iommu_table *tbl);
> +extern void iommu_table_put(struct iommu_table *tbl);
>  
>  /* Initializes an iommu_table based in values set in the passed-in
>   * structure
> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> index 13263b0..a8f017a 100644
> --- a/arch/powerpc/kernel/iommu.c
> +++ b/arch/powerpc/kernel/iommu.c
> @@ -710,13 +710,13 @@ struct iommu_table *iommu_init_table(struct iommu_table 
> *tbl, int nid)
>   return tbl;
>  }
>  
> -void iommu_free_table(struct iommu_table *tbl, const char *node_name)
> +static void iommu_table_free(struct kref *kref)
>  {
>   unsigned long bitmap_sz;
>   unsigned int order;
> + struct iommu_table *tbl;
>  
> - if (!tbl)
> - return;
> + tbl = container_of(kref, struct iommu_table, it_kref);
>  
>   if (tbl->it_ops->free)
>   tbl->it_ops->free(tbl);
> @@ -735,7 +735,7 @@ void iommu_free_table(struct iommu_table *tbl, const char 
> *node_name)
>  
>   /* verify that table contains no entries */
>   if (!bitmap_empty(tbl->it_map, tbl->it_size))
> - pr_warn("%s: Unexpected TCEs for %s\n", __func__, node_name);
> + pr_warn("%s: Unexpected TCEs\n", __func__);
>  
>   /* calculate bitmap size in bytes */
>   bitmap_sz = BITS_TO_LONGS(tbl->it_size) * sizeof(unsigned long);
> @@ -747,7 +747,21 @@ void iommu_free_table(struct iommu_table *tbl, const 
> char *node_name)
>   /* free table */
>   kfree(tbl);
>  }
> -EXPORT_SYMBOL_GPL(iommu_free_table);
> +
> +void iommu_table_get(struct iommu_table *tbl)
> +{
> + kref_get(&tbl->it_kref);
> +}
> +EXPORT_SYMBOL_GPL(iommu_table_get);
> +
> +void iommu_table_put(struct iommu_table *tbl)
> +{
> + if (!tbl)
> + return;
> +
> + kref_put(&tbl->it_kref, iommu_table_free);
> +}
> +EXPORT_SYMBOL_GPL(iommu_table_put);
>  
>  /* Creates TCEs for a user provided buffer.  The user buffer must be
>   * contiguous real kernel storage (not vmalloc).  The address passed here
> diff --git a/arch/powerpc/kernel/vio.c b/arch/powerpc/kernel/vio.c
> index 8d7358f..188f452 100644
> --- a/arch/powerpc/kernel/vio.c
> +++ b/arch/powerpc/kernel/vio.c
> @@ -1318,7 +1318,7 @@ static void vio_dev_release(struct device *dev)
>   struct iommu_table *tbl = get_iommu_table_base(dev);
>  
>   if (tbl)
> - iommu_free_table(tbl, of_node_full_name(dev->of_node));
> + iommu_table_put(tbl);
>   of_node_put(dev->of_node);
>   kfree(to_vio_dev(dev));
>  }
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 74ab8382..c04afd2 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -1394,7 +1394,7 @@ static void pnv_pci_ioda2_release_dma_pe(struct 

Re: [PATCH kernel 06/15] powerpc/mm/iommu: Put pages on process exit

2016-08-11 Thread David Gibson
On Wed, Aug 03, 2016 at 06:40:47PM +1000, Alexey Kardashevskiy wrote:
> At the moment VFIO IOMMU SPAPR v2 driver pins all guest RAM pages when
> the userspace starts using VFIO.

This doesn't sound accurate.  Isn't it userspace that decides what
gets pinned, not the VFIO driver?

>When the userspace process finishes,
> all the pinned pages need to be put; this is done as a part of
> the userspace memory context (MM) destruction which happens on
> the very last mmdrop().
> 
> This approach has a problem that a MM of the userspace process
> may live longer than the userspace process itself as kernel threads
> use userspace process MMs which was runnning on a CPU where
> the kernel thread was scheduled to. If this happened, the MM remains
> referenced until this exact kernel thread wakes up again
> and releases the very last reference to the MM, on an idle system this
> can take even hours.
> 
> This references and caches MM once per container and adds tracking
> how many times each preregistered area was registered in
> a specific container. This way we do not depend on @current pointing to
> a valid task descriptor.

The handling of @current and refcounting the mm sounds more like its
describing the previous patch.

THe description of counting how many times each prereg area is
registered doesn't seem accurate, since you block multiple
registration with an EBUSY.  Or else it's describing the 'used'
counter in the lower-level mm_iommu_table_group_mem_t tracking,
rather than anything changed by this patch.

> This changes the userspace interface to return EBUSY if memory is
> already registered (mm_iommu_get() used to increment the counter);
> however it should not have any practical effect as the only
> userspace tool available now does register memory area once per
> container anyway.
> 
> As tce_iommu_register_pages/tce_iommu_unregister_pages are called
> under container->lock, this does not need additional locking.
> 
> Signed-off-by: Alexey Kardashevskiy 
> 
> # Conflicts:
> # arch/powerpc/include/asm/mmu_context.h
> # arch/powerpc/mm/mmu_context_book3s64.c
> # arch/powerpc/mm/mmu_context_iommu.c

Looks like some lines to be cleaned up in the message.

> ---
>  arch/powerpc/include/asm/mmu_context.h |  1 -
>  arch/powerpc/mm/mmu_context_book3s64.c |  4 ---
>  arch/powerpc/mm/mmu_context_iommu.c| 11 ---
>  drivers/vfio/vfio_iommu_spapr_tce.c| 52 
> +-
>  4 files changed, 51 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/mmu_context.h 
> b/arch/powerpc/include/asm/mmu_context.h
> index b85cc7b..a4c4ed5 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -25,7 +25,6 @@ extern long mm_iommu_get(struct mm_struct *mm,
>  extern long mm_iommu_put(struct mm_struct *mm,
>   struct mm_iommu_table_group_mem_t *mem);
>  extern void mm_iommu_init(struct mm_struct *mm);
> -extern void mm_iommu_cleanup(struct mm_struct *mm);
>  extern struct mm_iommu_table_group_mem_t *mm_iommu_lookup(struct mm_struct 
> *mm,
>   unsigned long ua, unsigned long size);
>  extern struct mm_iommu_table_group_mem_t *mm_iommu_find(struct mm_struct *mm,
> diff --git a/arch/powerpc/mm/mmu_context_book3s64.c 
> b/arch/powerpc/mm/mmu_context_book3s64.c
> index ad82735..1a07969 100644
> --- a/arch/powerpc/mm/mmu_context_book3s64.c
> +++ b/arch/powerpc/mm/mmu_context_book3s64.c
> @@ -159,10 +159,6 @@ static inline void destroy_pagetable_page(struct 
> mm_struct *mm)
>  
>  void destroy_context(struct mm_struct *mm)
>  {
> -#ifdef CONFIG_SPAPR_TCE_IOMMU
> - mm_iommu_cleanup(mm);
> -#endif
> -
>  #ifdef CONFIG_PPC_ICSWX
>   drop_cop(mm->context.acop, mm);
>   kfree(mm->context.cop_lockp);
> diff --git a/arch/powerpc/mm/mmu_context_iommu.c 
> b/arch/powerpc/mm/mmu_context_iommu.c
> index ee6685b..10f01fe 100644
> --- a/arch/powerpc/mm/mmu_context_iommu.c
> +++ b/arch/powerpc/mm/mmu_context_iommu.c
> @@ -293,14 +293,3 @@ void mm_iommu_init(struct mm_struct *mm)
>  {
>   INIT_LIST_HEAD_RCU(&mm->context.iommu_group_mem_list);
>  }
> -
> -void mm_iommu_cleanup(struct mm_struct *mm)
> -{
> - struct mm_iommu_table_group_mem_t *mem, *tmp;
> -
> - list_for_each_entry_safe(mem, tmp, &mm->context.iommu_group_mem_list,
> - next) {
> - list_del_rcu(&mem->next);
> - mm_iommu_do_free(mem);
> - }
> -}
> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
> b/drivers/vfio/vfio_iommu_spapr_tce.c
> index 9752e77..40e71a0 100644
> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> @@ -89,6 +89,15 @@ struct tce_iommu_group {
>  };
>  
>  /*
> + * A container needs to remember which preregistered areas and how many times
> + * it has referenced to do proper cleanup at the userspace process exit.
> + */
> +struct tce_iommu_prereg {
> + struct list_head next;
> + struct mm_iommu_table_group_mem_t *mem

Re: [PATCH kernel 07/15] powerpc/iommu: Cleanup iommu_table disposal

2016-08-11 Thread David Gibson
On Wed, Aug 03, 2016 at 06:40:48PM +1000, Alexey Kardashevskiy wrote:
> At the moment iommu_table could be disposed by either calling
> iommu_table_free() directly or it_ops::free() which only implementation
> for IODA2 calls iommu_table_free() anyway.
> 
> As we are going to have reference counting on tables, we need an unified
> way of disposing tables.
> 
> This moves it_ops::free() call into iommu_free_table() and makes use
> of the latter everywhere. The free() callback now handles only
> platform-specific data.
> 
> This should cause no behavioral change.
> 
> Signed-off-by: Alexey Kardashevskiy 

Reviewed-by: David Gibson 

> ---
>  arch/powerpc/kernel/iommu.c   | 4 
>  arch/powerpc/platforms/powernv/pci-ioda.c | 6 ++
>  drivers/vfio/vfio_iommu_spapr_tce.c   | 2 +-
>  3 files changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> index a8e3490..13263b0 100644
> --- a/arch/powerpc/kernel/iommu.c
> +++ b/arch/powerpc/kernel/iommu.c
> @@ -718,6 +718,9 @@ void iommu_free_table(struct iommu_table *tbl, const char 
> *node_name)
>   if (!tbl)
>   return;
>  
> + if (tbl->it_ops->free)
> + tbl->it_ops->free(tbl);
> +
>   if (!tbl->it_map) {
>   kfree(tbl);
>   return;
> @@ -744,6 +747,7 @@ void iommu_free_table(struct iommu_table *tbl, const char 
> *node_name)
>   /* free table */
>   kfree(tbl);
>  }
> +EXPORT_SYMBOL_GPL(iommu_free_table);
>  
>  /* Creates TCEs for a user provided buffer.  The user buffer must be
>   * contiguous real kernel storage (not vmalloc).  The address passed here
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 59c7e7d..74ab8382 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -1394,7 +1394,6 @@ static void pnv_pci_ioda2_release_dma_pe(struct pci_dev 
> *dev, struct pnv_ioda_pe
>   iommu_group_put(pe->table_group.group);
>   BUG_ON(pe->table_group.group);
>   }
> - pnv_pci_ioda2_table_free_pages(tbl);
>   iommu_free_table(tbl, of_node_full_name(dev->dev.of_node));
>  }
>  
> @@ -1987,7 +1986,6 @@ static void pnv_ioda2_tce_free(struct iommu_table *tbl, 
> long index,
>  static void pnv_ioda2_table_free(struct iommu_table *tbl)
>  {
>   pnv_pci_ioda2_table_free_pages(tbl);
> - iommu_free_table(tbl, "pnv");
>  }
>  
>  static struct iommu_table_ops pnv_ioda2_iommu_ops = {
> @@ -2313,7 +2311,7 @@ static long pnv_pci_ioda2_setup_default_config(struct 
> pnv_ioda_pe *pe)
>   if (rc) {
>   pe_err(pe, "Failed to configure 32-bit TCE table, err %ld\n",
>   rc);
> - pnv_ioda2_table_free(tbl);
> + iommu_free_table(tbl, "");
>   return rc;
>   }
>  
> @@ -2399,7 +2397,7 @@ static void pnv_ioda2_take_ownership(struct 
> iommu_table_group *table_group)
>  
>   pnv_pci_ioda2_set_bypass(pe, false);
>   pnv_pci_ioda2_unset_window(&pe->table_group, 0);
> - pnv_ioda2_table_free(tbl);
> + iommu_free_table(tbl, "pnv");
>  }
>  
>  static void pnv_ioda2_release_ownership(struct iommu_table_group 
> *table_group)
> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
> b/drivers/vfio/vfio_iommu_spapr_tce.c
> index 40e71a0..79f26c7 100644
> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> @@ -660,7 +660,7 @@ static void tce_iommu_free_table(struct iommu_table *tbl)
>   unsigned long pages = tbl->it_allocated_size >> PAGE_SHIFT;
>  
>   tce_iommu_userspace_view_free(tbl);
> - tbl->it_ops->free(tbl);
> + iommu_free_table(tbl, "");
>   decrement_locked_vm(pages);
>  }
>  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH kernel 05/15] powerpc/iommu: Stop using @current in mm_iommu_xxx

2016-08-11 Thread David Gibson
On Wed, Aug 03, 2016 at 06:40:46PM +1000, Alexey Kardashevskiy wrote:
> In some situations the userspace memory context may live longer than
> the userspace process itself so if we need to do proper memory context
> cleanup, we better cache @mm and use it later when the process is gone
> (@current or @current->mm are NULL).
> 
> This changes mm_iommu_xxx API to receive mm_struct instead of using one
> from @current.
> 
> This is needed by the following patch to do proper cleanup in time.
> This depends on "powerpc/powernv/ioda: Fix endianness when reading TCEs"
> to do proper cleanup via tce_iommu_clear() patch.
> 
> To keep API consistent, this replaces mm_context_t with mm_struct;
> we stick to mm_struct as mm_iommu_adjust_locked_vm() helper needs
> access to &mm->mmap_sem.
> 
> This should cause no behavioral change.
> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
>  arch/powerpc/include/asm/mmu_context.h | 20 +++--
>  arch/powerpc/kernel/setup-common.c |  2 +-
>  arch/powerpc/mm/mmu_context_book3s64.c |  4 +--
>  arch/powerpc/mm/mmu_context_iommu.c| 54 
> ++
>  drivers/vfio/vfio_iommu_spapr_tce.c| 41 --
>  5 files changed, 62 insertions(+), 59 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/mmu_context.h 
> b/arch/powerpc/include/asm/mmu_context.h
> index 9d2cd0c..b85cc7b 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -18,16 +18,18 @@ extern void destroy_context(struct mm_struct *mm);
>  #ifdef CONFIG_SPAPR_TCE_IOMMU
>  struct mm_iommu_table_group_mem_t;
>  
> -extern bool mm_iommu_preregistered(void);
> -extern long mm_iommu_get(unsigned long ua, unsigned long entries,
> +extern bool mm_iommu_preregistered(struct mm_struct *mm);
> +extern long mm_iommu_get(struct mm_struct *mm,
> + unsigned long ua, unsigned long entries,
>   struct mm_iommu_table_group_mem_t **pmem);
> -extern long mm_iommu_put(struct mm_iommu_table_group_mem_t *mem);
> -extern void mm_iommu_init(mm_context_t *ctx);
> -extern void mm_iommu_cleanup(mm_context_t *ctx);
> -extern struct mm_iommu_table_group_mem_t *mm_iommu_lookup(unsigned long ua,
> - unsigned long size);
> -extern struct mm_iommu_table_group_mem_t *mm_iommu_find(unsigned long ua,
> - unsigned long entries);
> +extern long mm_iommu_put(struct mm_struct *mm,
> + struct mm_iommu_table_group_mem_t *mem);
> +extern void mm_iommu_init(struct mm_struct *mm);
> +extern void mm_iommu_cleanup(struct mm_struct *mm);
> +extern struct mm_iommu_table_group_mem_t *mm_iommu_lookup(struct mm_struct 
> *mm,
> + unsigned long ua, unsigned long size);
> +extern struct mm_iommu_table_group_mem_t *mm_iommu_find(struct mm_struct *mm,
> + unsigned long ua, unsigned long entries);
>  extern long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem,
>   unsigned long ua, unsigned long *hpa);
>  extern long mm_iommu_mapped_inc(struct mm_iommu_table_group_mem_t *mem);
> diff --git a/arch/powerpc/kernel/setup-common.c 
> b/arch/powerpc/kernel/setup-common.c
> index 714b4ba..e90b68a 100644
> --- a/arch/powerpc/kernel/setup-common.c
> +++ b/arch/powerpc/kernel/setup-common.c
> @@ -905,7 +905,7 @@ void __init setup_arch(char **cmdline_p)
>   init_mm.context.pte_frag = NULL;
>  #endif
>  #ifdef CONFIG_SPAPR_TCE_IOMMU
> - mm_iommu_init(&init_mm.context);
> + mm_iommu_init(&init_mm);
>  #endif
>   irqstack_early_init();
>   exc_lvl_early_init();
> diff --git a/arch/powerpc/mm/mmu_context_book3s64.c 
> b/arch/powerpc/mm/mmu_context_book3s64.c
> index b114f8b..ad82735 100644
> --- a/arch/powerpc/mm/mmu_context_book3s64.c
> +++ b/arch/powerpc/mm/mmu_context_book3s64.c
> @@ -115,7 +115,7 @@ int init_new_context(struct task_struct *tsk, struct 
> mm_struct *mm)
>   mm->context.pte_frag = NULL;
>  #endif
>  #ifdef CONFIG_SPAPR_TCE_IOMMU
> - mm_iommu_init(&mm->context);
> + mm_iommu_init(mm);
>  #endif
>   return 0;
>  }
> @@ -160,7 +160,7 @@ static inline void destroy_pagetable_page(struct 
> mm_struct *mm)
>  void destroy_context(struct mm_struct *mm)
>  {
>  #ifdef CONFIG_SPAPR_TCE_IOMMU
> - mm_iommu_cleanup(&mm->context);
> + mm_iommu_cleanup(mm);
>  #endif
>  
>  #ifdef CONFIG_PPC_ICSWX
> diff --git a/arch/powerpc/mm/mmu_context_iommu.c 
> b/arch/powerpc/mm/mmu_context_iommu.c
> index da6a216..ee6685b 100644
> --- a/arch/powerpc/mm/mmu_context_iommu.c
> +++ b/arch/powerpc/mm/mmu_context_iommu.c
> @@ -53,7 +53,7 @@ static long mm_iommu_adjust_locked_vm(struct mm_struct *mm,
>   }
>  
>   pr_debug("[%d] RLIMIT_MEMLOCK HASH64 %c%ld %ld/%ld\n",
> - current->pid,
> + current ? current->pid : 0,
>   incr ? '+' : '-',
>   npages << PAGE_SHIFT,
>   mm->locked_vm << PAGE_SHIFT,
> @@ -63,28 +63,22 @@ static long mm_iommu_adjus

Re: [PATCH 0/2] ibmvfc: FC-TAPE Support

2016-08-11 Thread Martin K. Petersen
> "Tyrel" == Tyrel Datwyler  writes:

Tyrel> On 08/03/2016 02:36 PM, Tyrel Datwyler wrote:
>> This patchset introduces optional FC-TAPE/FC Class 3 Error Recovery
>> to the ibmvfc client driver.
>> 
>> Tyrel Datwyler (2): ibmvfc: Set READ FCP_XFER_READY DISABLED bit in
>> PRLI ibmvfc: add FC Class 3 Error Recovery support
>> 
>> drivers/scsi/ibmvscsi/ibmvfc.c | 11 +++
>> drivers/scsi/ibmvscsi/ibmvfc.h | 1 + 2 files changed, 12
>> insertions(+)
>> 

Tyrel> ping?

-ENOREVIEWS

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH v2 02/20] powerpc: Always restore FPU/VEC/VSX if hardware transactional memory in use

2016-08-11 Thread kbuild test robot
Hi Cyril,

[auto build test ERROR on powerpc/next]
[also build test ERROR on v4.8-rc1 next-20160811]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Cyril-Bur/Consistent-TM-structures/20160812-075557
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-sbc834x_defconfig (attached as .config)
compiler: powerpc-linux-gnu-gcc (Debian 5.4.0-6) 5.4.0 20160609
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=powerpc 

All error/warnings (new ones prefixed by >>):

   In file included from arch/powerpc/include/asm/processor.h:13:0,
from arch/powerpc/include/asm/thread_info.h:33,
from include/linux/thread_info.h:54,
from include/asm-generic/preempt.h:4,
from ./arch/powerpc/include/generated/asm/preempt.h:1,
from include/linux/preempt.h:59,
from include/linux/spinlock.h:50,
from include/linux/seqlock.h:35,
from include/linux/time.h:5,
from include/uapi/linux/timex.h:56,
from include/linux/timex.h:56,
from include/linux/sched.h:19,
from arch/powerpc/kernel/process.c:18:
   arch/powerpc/kernel/process.c: In function 'restore_fp':
>> arch/powerpc/include/asm/reg.h:64:23: error: left shift count >= width of 
>> type [-Werror=shift-count-overflow]
#define __MASK(X) (1UL<<(X))
  ^
>> arch/powerpc/include/asm/reg.h:116:18: note: in expansion of macro '__MASK'
#define MSR_TS_T __MASK(MSR_TS_T_LG) /*  Transaction Transactional */
 ^
>> arch/powerpc/include/asm/reg.h:117:22: note: in expansion of macro 'MSR_TS_T'
#define MSR_TS_MASK (MSR_TS_T | MSR_TS_S)   /* Transaction State bits */
 ^
>> arch/powerpc/include/asm/reg.h:118:34: note: in expansion of macro 
>> 'MSR_TS_MASK'
#define MSR_TM_ACTIVE(x) (((x) & MSR_TS_MASK) != 0) /* Transaction active? 
*/
 ^
>> arch/powerpc/kernel/process.c:211:29: note: in expansion of macro 
>> 'MSR_TM_ACTIVE'
 if (tsk->thread.load_fp || MSR_TM_ACTIVE(tsk->thread.regs->msr)) {
^
>> arch/powerpc/include/asm/reg.h:64:23: error: left shift count >= width of 
>> type [-Werror=shift-count-overflow]
#define __MASK(X) (1UL<<(X))
  ^
   arch/powerpc/include/asm/reg.h:115:18: note: in expansion of macro '__MASK'
#define MSR_TS_S __MASK(MSR_TS_S_LG) /*  Transaction Suspended */
 ^
>> arch/powerpc/include/asm/reg.h:117:33: note: in expansion of macro 'MSR_TS_S'
#define MSR_TS_MASK (MSR_TS_T | MSR_TS_S)   /* Transaction State bits */
^
>> arch/powerpc/include/asm/reg.h:118:34: note: in expansion of macro 
>> 'MSR_TS_MASK'
#define MSR_TM_ACTIVE(x) (((x) & MSR_TS_MASK) != 0) /* Transaction active? 
*/
 ^
>> arch/powerpc/kernel/process.c:211:29: note: in expansion of macro 
>> 'MSR_TM_ACTIVE'
 if (tsk->thread.load_fp || MSR_TM_ACTIVE(tsk->thread.regs->msr)) {
^
   arch/powerpc/kernel/process.c: In function 'restore_math':
>> arch/powerpc/include/asm/reg.h:64:23: error: left shift count >= width of 
>> type [-Werror=shift-count-overflow]
#define __MASK(X) (1UL<<(X))
  ^
>> arch/powerpc/include/asm/reg.h:116:18: note: in expansion of macro '__MASK'
#define MSR_TS_T __MASK(MSR_TS_T_LG) /*  Transaction Transactional */
 ^
>> arch/powerpc/include/asm/reg.h:117:22: note: in expansion of macro 'MSR_TS_T'
#define MSR_TS_MASK (MSR_TS_T | MSR_TS_S)   /* Transaction State bits */
 ^
>> arch/powerpc/include/asm/reg.h:118:34: note: in expansion of macro 
>> 'MSR_TS_MASK'
#define MSR_TM_ACTIVE(x) (((x) & MSR_TS_MASK) != 0) /* Transaction active? 
*/
 ^
   arch/powerpc/kernel/process.c:468:7: note: in expansion of macro 
'MSR_TM_ACTIVE'
 if (!MSR_TM_ACTIVE(regs->msr) &&
  ^
>> arch/powerpc/include/asm/reg.h:64:23: error: left shift count >= width of 
>> type [-Werror=shift-count-overflow]
#define __MASK(X) (1UL<&l

Re: [PATCH] powerpc: populate the default bus with machine_arch_initcall

2016-08-11 Thread Kevin Hao
On Thu, Aug 11, 2016 at 08:17:52AM -0500, Rob Herring wrote:
> On Thu, Aug 11, 2016 at 6:09 AM, Kevin Hao  wrote:
> > With the commit 44a7185c2ae6 ("of/platform: Add common method to
> > populate default bus"), a default function is introduced to populate
> > the default bus and this function is invoked at the arch_initcall_sync
> > level. This will override the arch specific population of default bus
> > which run at a lower level than arch_initcall_sync. Since not all
> > powerpc specific buses are added to the of_default_bus_match_table[],
> > this causes some powerpc specific bus are not probed. Fix this by
> > using a more preceding initcall.
> >
> > Signed-off-by: Kevin Hao 
> > ---
> > Of course we can adjust the powerpc arch codes to use the
> > of_platform_default_populate_init(), but it has high risk to break
> > other boards given the complicated powerpc specific buses. So I would
> > like just to fix the broken boards in the current release, and cook
> > a patch to change to of_platform_default_populate_init() for linux-next.
> 
> The patch that broke things was sitting in -next for some time and no
> one reported anything. Are all these boards broken?

At least in theory. :-)
The effect may be different due to what devices are missed. For me, the
Gianfar Ethernet on my mpc8315erdb board is malfunction due to the MIDIO bus
is not probed.

> 
> I'm fine to just disable the default call for PPC instead if there's
> some chance this does not fix some boards.

I have tried to cover all the invocation of of_platform_bus_probe() via
machine_device_initcall(). Yes, I maybe missed some boards. But won't
we want to take this as a step to use the default populate function since
it does remove some reduplication codes?

> There could be some other
> initcall ordering dependencies.
> 
> >
> > Only boot test on a mpc8315erdb board.
> 
> Curious, what would it take to remove the of_platform_bus_probe and
> use the default here? We can add additional bus compatibles to match.

I thought about this. But the bus compatibles list seems a bit longer and
it may cause some side effects on some boards due to all these additional
buses. So that changes seem a bit aggressive to me. It does seem a feature
for linux-next. The following is the compatible buses list which are needed
to be added to the default match table if we want fix all the current broken
boards:
{ .compatible = "fsl,ep8248e-bcsr", },
{ .compatible = "fsl,pq2pro-localbus", },
{ .compatible = "fsl,qe", },
{ .compatible = "fsl,srio", },
{ .compatible = "gianfar", },
{ .compatible = "gpio-leds", },
{ .compatible = "hawk-bridge", },
{ .compatible = "ibm,ebc", },
{ .compatible = "ibm,opb", },
{ .compatible = "ibm,plb3", },
{ .compatible = "ibm,plb4", },
{ .compatible = "ibm,plb6", },
{ .compatible = "nintendo,flipper", },
{ .compatible = "nintendo,hollywood", },
{ .compatible = "pasemi,localbus", },
{ .compatible = "pasemi,sdc", },
{ .compatible = "soc", },
{ .compatible = "xlnx,compound", },
{ .compatible = "xlnx,dcr-v29-1.00.a", },
{ .compatible = "xlnx,opb-v20-1.10.c", },
{ .compatible = "xlnx,plb-v34-1.01.a", },
{ .compatible = "xlnx,plb-v34-1.02.a", },
{ .compatible = "xlnx,plb-v46-1.00.a", },
{ .compatible = "xlnx,plb-v46-1.02.a", },
{ .name = "cpm", },
{ .name = "localbus", },
{ .name = "soc", },
{ .type = "axon", },
{ .type = "ebc", },
{ .type = "opb", },
{ .type = "plb4", },
{ .type = "plb5", },
{ .type = "qe", },
{ .type = "soc", },
{ .type = "spider", },

Of course I can choose to use the default function if all you guys think it is
better. :-)

> The difference between of_platform_bus_probe and
> of_platform_bus_populate is the former will match root nodes with no
> compatible string. Most platforms should not need that behavior and it
> would be nice to know which ones.

I don't think this difference would cause any real side effect for these boards.

Thanks,
Kevin


signature.asc
Description: PGP signature


Re: [PATCH] mm: Initialize per_cpu_nodestats for hotadded pgdats

2016-08-11 Thread Balbir Singh


On 12/08/16 02:04, Reza Arbab wrote:
> The following oops occurs after a pgdat is hotadded:
> 
> [   86.839956] Unable to handle kernel paging request for data at address 
> 0x00c30001
> [   86.840132] Faulting instruction address: 0xc022f8f4
> [   86.840328] Oops: Kernel access of bad area, sig: 11 [#1]
> [   86.840468] SMP NR_CPUS=2048 NUMA pSeries
> [   86.840612] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 
> ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute bridge stp 
> llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 
> nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter 
> ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat 
> nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter 
> nls_utf8 isofs sg virtio_balloon uio_pdrv_genirq uio ip_tables xfs libcrc32c 
> sr_mod cdrom sd_mod virtio_net ibmvscsi scsi_transport_srp virtio_pci 
> virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod
> [   86.842955] CPU: 0 PID: 0 Comm: swapper/0 Tainted: GW 
> 4.8.0-rc1-device #110
> [   86.843140] task: c0ef3080 task.stack: c0f6c000
> [   86.843323] NIP: c022f8f4 LR: c022f948 CTR: 
> 
> [   86.843595] REGS: c0f6fa50 TRAP: 0300   Tainted: GW 
> (4.8.0-rc1-device)
> [   86.843889] MSR: 80010280b033   
> CR: 84002028  XER: 2000
> [   86.844624] CFAR: d1d2013c DAR: 00c30001 DSISR: 4000 
> SOFTE: 0
> GPR00: c022f948 c0f6fcd0 c0f71400 0001
> GPR04: 0100   00c3
> GPR08:  0001 00c3 
> GPR12: 2200 c130 c0faefb4 c0faefa8
> GPR16: c0f6c000 c0f6c080 c0bf15b0 c0f6c080
> GPR20: c0bf4928  0003 c0bf4968
> GPR24: c000ffed   c0f6fd58
> GPR28: 0001 0001 c0f6fcf0 c000ffed9c08
> [   86.847747] NIP [c022f8f4] refresh_cpu_vm_stats+0x1a4/0x2f0
> [   86.847897] LR [c022f948] refresh_cpu_vm_stats+0x1f8/0x2f0
> [   86.848060] Call Trace:
> [   86.848183] [c0f6fcd0] [c022f948] 
> refresh_cpu_vm_stats+0x1f8/0x2f0 (unreliable)
> 
> Add per_cpu_nodestats initialization to the hotplug codepath.
> 
> Signed-off-by: Reza Arbab 
> ---
>  mm/memory_hotplug.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 3894b65..41266dc 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1219,6 +1219,7 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 
> start)
>  
>   /* init node's zones as empty zones, we don't have any present pages.*/
>   free_area_init_node(nid, zones_size, start_pfn, zholes_size);
> + pgdat->per_cpu_nodestats = alloc_percpu(struct per_cpu_nodestat);
>  
>   /*
>* The node we allocated has no zone fallback lists. For avoiding
> @@ -1249,6 +1250,7 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 
> start)
>  static void rollback_node_hotadd(int nid, pg_data_t *pgdat)
>  {
>   arch_refresh_nodedata(nid, NULL);
> + free_percpu(pgdat->per_cpu_nodestats);
>   arch_free_nodedata(pgdat);
>   return;
>  }
> 

I wonder if node_set_online() should do the allocation and offline should free.
But that would be a larger change

Balbir


Re: [PATCH 3/4] powerpc/mm: allow memory hotplug into a memoryless node

2016-08-11 Thread Balbir Singh


On 09/08/16 04:27, Reza Arbab wrote:
> Remove the check which prevents us from hotplugging into an empty node.
> 
> Signed-off-by: Reza Arbab 
> ---
>  arch/powerpc/mm/numa.c | 13 +
>  1 file changed, 1 insertion(+), 12 deletions(-)
> 
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 80d067d..bc70c4f 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -1127,7 +1127,7 @@ static int hot_add_node_scn_to_nid(unsigned long 
> scn_addr)
>  int hot_add_scn_to_nid(unsigned long scn_addr)
>  {
>   struct device_node *memory = NULL;
> - int nid, found = 0;
> + int nid;
>  

Do we want to do this only for ibm,hotplug-aperture compatible ranges?

I'm OK either ways

Acked-by: Balbir Singh 


Re: [PATCH 2/4] powerpc/mm: create numa nodes for hotplug memory

2016-08-11 Thread Balbir Singh


On 09/08/16 04:27, Reza Arbab wrote:
> When scanning the device tree to initialize the system NUMA topology,
> process dt elements with compatible id "ibm,hotplug-aperture" to create
> memoryless numa nodes.
> 
> These nodes will be filled when hotplug occurs within the associated
> address range.
> 
> Signed-off-by: Reza Arbab 
> ---

Looks good to me

Acked-by: Balbir Singh 


Re: [PATCH v5 11/13] powerpc: Allow userspace to set device tree properties in kexec_file_load

2016-08-11 Thread Thiago Jung Bauermann
Hello Sam,

Thanks for the quick response.

Am Freitag, 12 August 2016, 10:45:00 schrieb Samuel Mendoza-Jonas:
> On Thu, 2016-08-11 at 20:08 -0300, Thiago Jung Bauermann wrote:
> > @@ -908,4 +909,245 @@ bool find_debug_console(const void *fdt, int
> > chosen_node) return false;
> >  }
> >  
> > +/**
> > + * struct allowed_node - a node in the whitelist and its allowed
> > properties. + * @name:  node name or full node path
> > + * @properties:NULL-terminated array of names or
> > name=value pairs + *
> > + * If name starts with /, then the node has to be at the specified path
> > in + * the device tree (including unit addresses for all nodes in the
> > path). + * If it doesn't, then the node can be anywhere in the device
> > tree. + *
> > + * An entry in properties can specify a string value that the property
> > must + * have by using the "name=value" format. If the entry ends with
> > =, it means + * that the property must be empty.
> > + */
> > +static struct allowed_node {
> > +   const char *name;
> > +   const char *properties[9];
> > +} allowed_nodes[] = {
> > +   {
> > +   .name = "/chosen",
> > +   .properties = {
> > +   "stdout-path",
> > +   "linux,stdout-path",
> > +   NULL,
> > +   }
> > +   },
> > +   {
> > +   .name = "vga",
> > +   .properties = {
> > +   "device_type=display",
> > +   "assigned-addresses",
> > +   "width",
> > +   "height",
> > +   "depth",
> > +   "little-endian=",
> > +   "linux,opened=",
> > +   "linux,boot-display=",ss
> > +   NULL,
> > +   }
> > +   },
> > +};
> 
> Hi Thiago,
> 
> As much as this solves problems for *me*, I suspect adding 'vga' here
> might be the subject of some discussion. Having /chosen whitelisted makes
> sense on it's own, but 'vga' and its properties are very specific without
> much explanation.
> 
> If everyone's happy to have it there, cool! If not, I have the majority
> of a patch that handles the original reason for these property updates
> separately in the kernel rather than from userspace. If needed I'll clean
> it up and we can handle it that way.

Ok, that's good to know. I'm fine with it either way. In any case, 'vga' in 
this patch also serves a good real-life example of a non-trivial binding 
outside of /chosen that we might want to whitelist in the future.

-- 
[]'s
Thiago Jung Bauermann
IBM Linux Technology Center



Re: [PATCH v5 11/13] powerpc: Allow userspace to set device tree properties in kexec_file_load

2016-08-11 Thread Samuel Mendoza-Jonas
On Thu, 2016-08-11 at 20:08 -0300, Thiago Jung Bauermann wrote:
> Implement the arch_kexec_verify_buffer hook to verify that a device
> tree blob passed by userspace via kexec_file_load contains only nodes
> and properties from a whitelist.
> 
> In elf64_load we merge those properties into the device tree that
> will be passed to the next kernel.
> 
> Suggested-by: Michael Ellerman 
> Signed-off-by: Thiago Jung Bauermann 
> ---
>  arch/powerpc/include/asm/kexec.h   |   1 +
>  arch/powerpc/kernel/kexec_elf_64.c |   9 ++
>  arch/powerpc/kernel/machine_kexec_64.c | 242 
> +
>  3 files changed, 252 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/kexec.h 
> b/arch/powerpc/include/asm/kexec.h
> index f263cc867891..31bc64e07c8f 100644
> --- a/arch/powerpc/include/asm/kexec.h
> +++ b/arch/powerpc/include/asm/kexec.h
> @@ -99,6 +99,7 @@ int setup_purgatory(struct kimage *image, const void 
> *slave_code,
>  int setup_new_fdt(void *fdt, unsigned long initrd_load_addr,
>   unsigned long initrd_len, const char *cmdline);
>  bool find_debug_console(const void *fdt, int chosen_node);
> +int merge_partial_dtb(void *to, const void *from);
>  #endif /* CONFIG_KEXEC_FILE */
>  
>  #else /* !CONFIG_KEXEC */
> diff --git a/arch/powerpc/kernel/kexec_elf_64.c 
> b/arch/powerpc/kernel/kexec_elf_64.c
> index 49cba9509464..1b902ad66e2a 100644
> --- a/arch/powerpc/kernel/kexec_elf_64.c
> +++ b/arch/powerpc/kernel/kexec_elf_64.c
> @@ -210,6 +210,15 @@ void *elf64_load(struct kimage *image, char *kernel_buf,
> goto out;
> }
>  
> +   /* Add nodes and properties from the DTB passed by userspace. */
> +   if (image->dtb_buf) {
> +   ret = merge_partial_dtb(fdt, image->dtb_buf);
> +   if (ret) {
> +   pr_err("Error merging partial device tree.\n");
> +   goto out;
> +   }
> +   }
> +
> ret = setup_new_fdt(fdt, initrd_load_addr, initrd_len, cmdline);
> if (ret)
> goto out;
> diff --git a/arch/powerpc/kernel/machine_kexec_64.c 
> b/arch/powerpc/kernel/machine_kexec_64.c
> index 527f98efe651..a484a6346146 100644
> --- a/arch/powerpc/kernel/machine_kexec_64.c
> +++ b/arch/powerpc/kernel/machine_kexec_64.c
> @@ -35,6 +35,7 @@
>  #include 
>  
>  #define SLAVE_CODE_SIZE256
> +#define MAX_DT_PATH512
>  
>  #ifdef CONFIG_KEXEC_FILE
>  static struct kexec_file_ops *kexec_file_loaders[] = {
> @@ -908,4 +909,245 @@ bool find_debug_console(const void *fdt, int 
> chosen_node)
> return false;
>  }
>  
> +/**
> + * struct allowed_node - a node in the whitelist and its allowed properties.
> + * @name:  node name or full node path
> + * @properties:NULL-terminated array of names or name=value 
> pairs
> + *
> + * If name starts with /, then the node has to be at the specified path in
> + * the device tree (including unit addresses for all nodes in the path).
> + * If it doesn't, then the node can be anywhere in the device tree.
> + *
> + * An entry in properties can specify a string value that the property must
> + * have by using the "name=value" format. If the entry ends with =, it means
> + * that the property must be empty.
> + */
> +static struct allowed_node {
> +   const char *name;
> +   const char *properties[9];
> +} allowed_nodes[] = {
> +   {
> +   .name = "/chosen",
> +   .properties = {
> +   "stdout-path",
> +   "linux,stdout-path",
> +   NULL,
> +   }
> +   },
> +   {
> +   .name = "vga",
> +   .properties = {
> +   "device_type=display",
> +   "assigned-addresses",
> +   "width",
> +   "height",
> +   "depth",
> +   "little-endian=",
> +   "linux,opened=",
> +   "linux,boot-display=",ss
> +   NULL,
> +   }
> +   },
> +};

Hi Thiago,

As much as this solves problems for *me*, I suspect adding 'vga' here
might be the subject of some discussion. Having /chosen whitelisted makes
sense on it's own, but 'vga' and its properties are very specific without
much explanation.

If everyone's happy to have it there, cool! If not, I have the majority
of a patch that handles the original reason for these property updates
separately in the kernel rather than from userspace. If needed I'll clean
it up and we can handle it that way.

Cheers,
Sam


Re: [PATCH] soc: fsl/qe: fix Oops on CPM1 (and likely CPM2)

2016-08-11 Thread Scott Wood
On Mon, 2016-08-08 at 18:08 +0200, Christophe Leroy wrote:
> Commit 0e6e01ff694ee ("CPM/QE: use genalloc to manage CPM/QE muram")
> has changed the way muram is managed.
> genalloc uses kmalloc(), hence requires the SLAB to be up and running.
> 
> On powerpc 8xx, cpm_reset() is called early during startup.
> cpm_reset() then calls cpm_muram_init() before SLAB is available,
> hence the following Oops.
> 
> cpm_reset() cannot be called during initcalls because the CPM is
> needed for console
> 
> This patch splits cpm_muram_init() in two parts. The first part,
> related to mappings, is kept as cpm_muram_init()
> The second part is named cpm_muram_pool_init() and is called
> the first time cpm_muram_alloc() is used

Why do you need to split it, versus calling the full cpm_muram_init() on
demand?

-Scott



[PATCH v2 06/20] selftests/powerpc: Check for VSX preservation across userspace preemption

2016-08-11 Thread Cyril Bur
Ensure the kernel correctly switches VSX registers correctly. VSX
registers are all volatile, and despite the kernel preserving VSX
across syscalls, it doesn't have to. Test that during interrupts and
timeslices ending the VSX regs remain the same.

Signed-off-by: Cyril Bur 
---
 tools/testing/selftests/powerpc/math/Makefile  |   4 +-
 tools/testing/selftests/powerpc/math/vsx_asm.S |  61 +
 tools/testing/selftests/powerpc/math/vsx_preempt.c | 147 +
 tools/testing/selftests/powerpc/vsx_asm.h  |  71 ++
 4 files changed, 282 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/powerpc/math/vsx_asm.S
 create mode 100644 tools/testing/selftests/powerpc/math/vsx_preempt.c
 create mode 100644 tools/testing/selftests/powerpc/vsx_asm.h

diff --git a/tools/testing/selftests/powerpc/math/Makefile 
b/tools/testing/selftests/powerpc/math/Makefile
index 5b88875..aa6598b 100644
--- a/tools/testing/selftests/powerpc/math/Makefile
+++ b/tools/testing/selftests/powerpc/math/Makefile
@@ -1,4 +1,4 @@
-TEST_PROGS := fpu_syscall fpu_preempt fpu_signal vmx_syscall vmx_preempt 
vmx_signal
+TEST_PROGS := fpu_syscall fpu_preempt fpu_signal vmx_syscall vmx_preempt 
vmx_signal vsx_preempt
 
 all: $(TEST_PROGS)
 
@@ -13,6 +13,8 @@ vmx_syscall: vmx_asm.S
 vmx_preempt: vmx_asm.S
 vmx_signal: vmx_asm.S
 
+vsx_preempt: vsx_asm.S
+
 include ../../lib.mk
 
 clean:
diff --git a/tools/testing/selftests/powerpc/math/vsx_asm.S 
b/tools/testing/selftests/powerpc/math/vsx_asm.S
new file mode 100644
index 000..a110dd8
--- /dev/null
+++ b/tools/testing/selftests/powerpc/math/vsx_asm.S
@@ -0,0 +1,61 @@
+/*
+ * Copyright 2015, Cyril Bur, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include "../basic_asm.h"
+#include "../vsx_asm.h"
+
+#long check_vsx(vector int *r3);
+#This function wraps storeing VSX regs to the end of an array and a
+#call to a comparison function in C which boils down to a memcmp()
+FUNC_START(check_vsx)
+   PUSH_BASIC_STACK(32)
+   std r3,STACK_FRAME_PARAM(0)(sp)
+   addi r3, r3, 16 * 12 #Second half of array
+   bl store_vsx
+   ld r3,STACK_FRAME_PARAM(0)(sp)
+   bl vsx_memcmp
+   POP_BASIC_STACK(32)
+   blr
+FUNC_END(check_vsx)
+
+# int preempt_vmx(vector int *varray, int *threads_starting,
+# int *running);
+# On starting will (atomically) decrement threads_starting as a signal
+# that the VMX have been loaded with varray. Will proceed to check the
+# validity of the VMX registers while running is not zero.
+FUNC_START(preempt_vsx)
+   PUSH_BASIC_STACK(512)
+   std r3,STACK_FRAME_PARAM(0)(sp) # vector int *varray
+   std r4,STACK_FRAME_PARAM(1)(sp) # int *threads_starting
+   std r5,STACK_FRAME_PARAM(2)(sp) # int *running
+
+   bl load_vsx
+   nop
+
+   sync
+   # Atomic DEC
+   ld r3,STACK_FRAME_PARAM(1)(sp)
+1: lwarx r4,0,r3
+   addi r4,r4,-1
+   stwcx. r4,0,r3
+   bne- 1b
+
+2: ld r3,STACK_FRAME_PARAM(0)(sp)
+   bl check_vsx
+   nop
+   cmpdi r3,0
+   bne 3f
+   ld r4,STACK_FRAME_PARAM(2)(sp)
+   ld r5,0(r4)
+   cmpwi r5,0
+   bne 2b
+
+3: POP_BASIC_STACK(512)
+   blr
+FUNC_END(preempt_vsx)
diff --git a/tools/testing/selftests/powerpc/math/vsx_preempt.c 
b/tools/testing/selftests/powerpc/math/vsx_preempt.c
new file mode 100644
index 000..6387f03
--- /dev/null
+++ b/tools/testing/selftests/powerpc/math/vsx_preempt.c
@@ -0,0 +1,147 @@
+/*
+ * Copyright 2015, Cyril Bur, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * This test attempts to see if the VSX registers change across preemption.
+ * There is no way to be sure preemption happened so this test just
+ * uses many threads and a long wait. As such, a successful test
+ * doesn't mean much but a failure is bad.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "utils.h"
+
+/* Time to wait for workers to get preempted (seconds) */
+#define PREEMPT_TIME 20
+/*
+ * Factor by which to multiply number of online CPUs for total number of
+ * worker threads
+ */
+#define THREAD_FACTOR 8
+
+/*
+ * Ensure there is twice the number of non-volatile VMX regs!
+ * check_vmx() is going to use the other half as space to put the live
+ * registers before calling vsx_memcmp()
+ */
+__thread vector int varray[24] = {
+   {1, 2, 3, 4 }, {5, 6, 7, 8 }, {9, 10,11,12},
+   {13,14,15,16}, {17,18,19,20}, {21,22,23,24},
+   {25,26,27,28}, {29,

[PATCH v2 01/20] selftests/powerpc: Compile selftests against headers without AT_HWCAP2

2016-08-11 Thread Cyril Bur
It might be nice to compile selftests against older kernels and
headers but which may not have HWCAP2.

Signed-off-by: Cyril Bur 
---
 tools/testing/selftests/powerpc/utils.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/tools/testing/selftests/powerpc/utils.h 
b/tools/testing/selftests/powerpc/utils.h
index fbd33e5..ecd11b5 100644
--- a/tools/testing/selftests/powerpc/utils.h
+++ b/tools/testing/selftests/powerpc/utils.h
@@ -32,10 +32,17 @@ static inline bool have_hwcap(unsigned long ftr)
return ((unsigned long)get_auxv_entry(AT_HWCAP) & ftr) == ftr;
 }
 
+#ifdef AT_HWCAP2
 static inline bool have_hwcap2(unsigned long ftr2)
 {
return ((unsigned long)get_auxv_entry(AT_HWCAP2) & ftr2) == ftr2;
 }
+#else
+static inline bool have_hwcap2(unsigned long ftr2)
+{
+   return false;
+}
+#endif
 
 /* Yes, this is evil */
 #define FAIL_IF(x) \
-- 
2.9.2



[PATCH v2 19/20] powerpc: tm: Rename transct_(*) to ck(\1)_state

2016-08-11 Thread Cyril Bur
Make the structures being used for checkpointed state named
consistently with the pt_regs/ckpt_regs.

Signed-off-by: Cyril Bur 
---
 arch/powerpc/include/asm/processor.h |  8 ++---
 arch/powerpc/kernel/asm-offsets.c| 12 
 arch/powerpc/kernel/fpu.S|  2 +-
 arch/powerpc/kernel/process.c|  4 +--
 arch/powerpc/kernel/ptrace.c | 46 +--
 arch/powerpc/kernel/signal.h |  8 ++---
 arch/powerpc/kernel/signal_32.c  | 60 ++--
 arch/powerpc/kernel/signal_64.c  | 32 +--
 arch/powerpc/kernel/tm.S | 12 
 arch/powerpc/kernel/vector.S |  4 +--
 10 files changed, 94 insertions(+), 94 deletions(-)

diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index feab2ce..b3e0cfc 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -147,7 +147,7 @@ typedef struct {
 } mm_segment_t;
 
 #define TS_FPR(i) fp_state.fpr[i][TS_FPROFFSET]
-#define TS_TRANS_FPR(i) transact_fp.fpr[i][TS_FPROFFSET]
+#define TS_CKFPR(i) ckfp_state.fpr[i][TS_FPROFFSET]
 
 /* FP and VSX 0-31 register set */
 struct thread_fp_state {
@@ -275,9 +275,9 @@ struct thread_struct {
 *
 * These are analogous to how ckpt_regs and pt_regs work
 */
-   struct thread_fp_state transact_fp;
-   struct thread_vr_state transact_vr;
-   unsigned long   transact_vrsave;
+   struct thread_fp_state ckfp_state; /* Checkpointed FP state */
+   struct thread_vr_state ckvr_state; /* Checkpointed VR state */
+   unsigned long   ckvrsave; /* Checkpointed VRSAVE */
 #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
 #ifdef CONFIG_KVM_BOOK3S_32_HANDLER
void*   kvm_shadow_vcpu; /* KVM internal data */
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index b89d14c..dd0fc33 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -142,12 +142,12 @@ int main(void)
DEFINE(THREAD_TM_PPR, offsetof(struct thread_struct, tm_ppr));
DEFINE(THREAD_TM_DSCR, offsetof(struct thread_struct, tm_dscr));
DEFINE(PT_CKPT_REGS, offsetof(struct thread_struct, ckpt_regs));
-   DEFINE(THREAD_TRANSACT_VRSTATE, offsetof(struct thread_struct,
-transact_vr));
-   DEFINE(THREAD_TRANSACT_VRSAVE, offsetof(struct thread_struct,
-   transact_vrsave));
-   DEFINE(THREAD_TRANSACT_FPSTATE, offsetof(struct thread_struct,
-transact_fp));
+   DEFINE(THREAD_CKVRSTATE, offsetof(struct thread_struct,
+ckvr_state));
+   DEFINE(THREAD_CKVRSAVE, offsetof(struct thread_struct,
+   ckvrsave));
+   DEFINE(THREAD_CKFPSTATE, offsetof(struct thread_struct,
+ckfp_state));
/* Local pt_regs on stack for Transactional Memory funcs. */
DEFINE(TM_FRAME_SIZE, STACK_FRAME_OVERHEAD +
   sizeof(struct pt_regs) + 16);
diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S
index 15da2b5..181c187 100644
--- a/arch/powerpc/kernel/fpu.S
+++ b/arch/powerpc/kernel/fpu.S
@@ -68,7 +68,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
SYNC
MTMSRD(r5)
 
-   addir7,r3,THREAD_TRANSACT_FPSTATE
+   addir7,r3,THREAD_CKFPSTATE
lfd fr0,FPSTATE_FPSCR(r7)
MTFSF_L(fr0)
REST_32FPVSRS(0, R4, R7)
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 6836570..5216e04 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -860,8 +860,8 @@ static inline void tm_reclaim_task(struct task_struct *tsk)
 *
 * In switching we need to maintain a 2nd register state as
 * oldtask->thread.ckpt_regs.  We tm_reclaim(oldproc); this saves the
-* checkpointed (tbegin) state in ckpt_regs and saves the transactional
-* (current) FPRs into oldtask->thread.transact_fpr[].
+* checkpointed (tbegin) state in ckpt_regs, ckfp_state and
+* ckvr_state
 *
 * We also context switch (save) TFHAR/TEXASR/TFIAR in here.
 */
diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index ed53712..1e85d6b 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -384,7 +384,7 @@ static int gpr_set(struct task_struct *target, const struct 
user_regset *regset,
 
 /*
  * Regardless of transactions, 'fp_state' holds the current running
- * value of all FPR registers and 'transact_fp' holds the last checkpointed
+ * value of all FPR registers and 'ckfp_state' holds the last checkpointed
  * value of all FPR registers for the current transaction.
  *
  * Userspace interface buffer layout:
@@ 

[PATCH v2 20/20] powerpc: Remove do_load_up_transact_{fpu,altivec}

2016-08-11 Thread Cyril Bur
Previous rework of TM code leaves these functions unused

Signed-off-by: Cyril Bur 
---
 arch/powerpc/include/asm/tm.h |  5 -
 arch/powerpc/kernel/fpu.S | 26 --
 arch/powerpc/kernel/vector.S  | 25 -
 3 files changed, 56 deletions(-)

diff --git a/arch/powerpc/include/asm/tm.h b/arch/powerpc/include/asm/tm.h
index c22d704..82e06ca 100644
--- a/arch/powerpc/include/asm/tm.h
+++ b/arch/powerpc/include/asm/tm.h
@@ -9,11 +9,6 @@
 
 #ifndef __ASSEMBLY__
 
-#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
-extern void do_load_up_transact_fpu(struct thread_struct *thread);
-extern void do_load_up_transact_altivec(struct thread_struct *thread);
-#endif
-
 extern void tm_enable(void);
 extern void tm_reclaim(struct thread_struct *thread,
   unsigned long orig_msr, uint8_t cause);
diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S
index 181c187..08d14b0 100644
--- a/arch/powerpc/kernel/fpu.S
+++ b/arch/powerpc/kernel/fpu.S
@@ -50,32 +50,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX);  
\
 #define REST_32FPVSRS(n,c,base) __REST_32FPVSRS(n,__REG_##c,__REG_##base)
 #define SAVE_32FPVSRS(n,c,base) __SAVE_32FPVSRS(n,__REG_##c,__REG_##base)
 
-#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
-/* void do_load_up_transact_fpu(struct thread_struct *thread)
- *
- * This is similar to load_up_fpu but for the transactional version of the FP
- * register set.  It doesn't mess with the task MSR or valid flags.
- * Furthermore, we don't do lazy FP with TM currently.
- */
-_GLOBAL(do_load_up_transact_fpu)
-   mfmsr   r6
-   ori r5,r6,MSR_FP
-#ifdef CONFIG_VSX
-BEGIN_FTR_SECTION
-   orisr5,r5,MSR_VSX@h
-END_FTR_SECTION_IFSET(CPU_FTR_VSX)
-#endif
-   SYNC
-   MTMSRD(r5)
-
-   addir7,r3,THREAD_CKFPSTATE
-   lfd fr0,FPSTATE_FPSCR(r7)
-   MTFSF_L(fr0)
-   REST_32FPVSRS(0, R4, R7)
-
-   blr
-#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
-
 /*
  * Load state from memory into FP registers including FPSCR.
  * Assumes the caller has enabled FP in the MSR.
diff --git a/arch/powerpc/kernel/vector.S b/arch/powerpc/kernel/vector.S
index 7dc4021..bc85bdf 100644
--- a/arch/powerpc/kernel/vector.S
+++ b/arch/powerpc/kernel/vector.S
@@ -7,31 +7,6 @@
 #include 
 #include 
 
-#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
-/* void do_load_up_transact_altivec(struct thread_struct *thread)
- *
- * This is similar to load_up_altivec but for the transactional version of the
- * vector regs.  It doesn't mess with the task MSR or valid flags.
- * Furthermore, VEC laziness is not supported with TM currently.
- */
-_GLOBAL(do_load_up_transact_altivec)
-   mfmsr   r6
-   orisr5,r6,MSR_VEC@h
-   MTMSRD(r5)
-   isync
-
-   li  r4,1
-   stw r4,THREAD_USED_VR(r3)
-
-   li  r10,THREAD_CKVRSTATE+VRSTATE_VSCR
-   lvx v0,r10,r3
-   mtvscr  v0
-   addir10,r3,THREAD_CKVRSTATE
-   REST_32VRS(0,r4,r10)
-
-   blr
-#endif
-
 /*
  * Load state from memory into VMX registers including VSCR.
  * Assumes the caller has enabled VMX in the MSR.
-- 
2.9.2



[PATCH v2 18/20] powerpc: tm: Always use fp_state and vr_state to store live registers

2016-08-11 Thread Cyril Bur
There is currently an inconsistency as to how the entire CPU register
state is saved and restored when a thread uses transactional memory
(TM).

Using transactional memory results in the CPU having duplicated
(almost all) of its register state. This duplication results in a set
of registers which can be considered 'live', those being currently
modified by the instructions being executed and another set that is
frozen at a point in time.

On context switch, both sets of state have to be saved and (later)
restored. These two states are often called a variety of different
things. Common terms for the state which only exists after has entered
a transaction (performed a TBEGIN instruction) in hardware is the
'transactional' or 'speculative'.

Between a TBEGIN and a TEND or TABORT (or an event that causes the
hardware to abort), regardless of the use of TSUSPEND the
transactional state can be referred to as the live state.

The second state is often to referred to as the 'checkpointed' state
and is a duplication of the live state when the TBEGIN instruction is
executed. This state is kept in the hardware and will be rolled back
to on transaction failure.

Currently all the registers stored in pt_regs are ALWAYS the live
registers, that is, when a thread has transactional registers their
values are stored in pt_regs and the checkpointed state is in
ckpt_regs. A strange opposite is true for fp_state. When a thread is
non transactional fp_state holds the live registers. When a thread
has initiated a transaction fp_state holds the checkpointed state and
transact_fp becomes the structure which holds the live state (at this
point it is a transactional state). The same is true for vr_state

This method creates confusion as to where the live state is, in some
circumstances it requires extra work to determine where to put the
live state and prevents the use of common functions designed (probably
before TM) to save the live state.

With this patch pt_regs, fp_state and vr_state all represent the
same thing and the other structures [pending rename] are for
checkpointed state.

Signed-off-by: Cyril Bur 
---
 arch/powerpc/include/asm/processor.h |   7 +-
 arch/powerpc/kernel/process.c|  63 +++-
 arch/powerpc/kernel/ptrace.c | 278 +--
 arch/powerpc/kernel/signal_32.c  |  50 +++
 arch/powerpc/kernel/signal_64.c  |  53 +++
 arch/powerpc/kernel/tm.S |  94 ++--
 arch/powerpc/kernel/traps.c  |  12 +-
 7 files changed, 196 insertions(+), 361 deletions(-)

diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 68e3bf5..feab2ce 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -267,16 +267,13 @@ struct thread_struct {
unsigned long   tm_dscr;
 
/*
-* Transactional FP and VSX 0-31 register set.
-* NOTE: the sense of these is the opposite of the integer ckpt_regs!
+* Checkpointed FP and VSX 0-31 register set.
 *
 * When a transaction is active/signalled/scheduled etc., *regs is the
 * most recent set of/speculated GPRs with ckpt_regs being the older
 * checkpointed regs to which we roll back if transaction aborts.
 *
-* However, fpr[] is the checkpointed 'base state' of FP regs, and
-* transact_fpr[] is the new set of transactional values.
-* VRs work the same way.
+* These are analogous to how ckpt_regs and pt_regs work
 */
struct thread_fp_state transact_fp;
struct thread_vr_state transact_vr;
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 0cfbc89..6836570 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -808,26 +808,14 @@ static inline bool hw_brk_match(struct arch_hw_breakpoint 
*a,
 static void tm_reclaim_thread(struct thread_struct *thr,
  struct thread_info *ti, uint8_t cause)
 {
-   unsigned long msr_diff = 0;
+   unsigned long msr_diff = thr->regs->msr;
 
-   /*
-* If FP/VSX registers have been already saved to the
-* thread_struct, move them to the transact_fp array.
-* We clear the TIF_RESTORE_TM bit since after the reclaim
-* the thread will no longer be transactional.
-*/
if (test_ti_thread_flag(ti, TIF_RESTORE_TM)) {
-   msr_diff = thr->ckpt_regs.msr & ~thr->regs->msr;
-   if (msr_diff & MSR_FP)
-   memcpy(&thr->transact_fp, &thr->fp_state,
-  sizeof(struct thread_fp_state));
-   if (msr_diff & MSR_VEC)
-   memcpy(&thr->transact_vr, &thr->vr_state,
-  sizeof(struct thread_vr_state));
+   msr_diff = (thr->ckpt_regs.msr & ~thr->regs->msr)
+   & (MSR_FP | MSR_VEC | MSR_VSX | MSR_FE0 | MSR_FE1);
+
  

[PATCH v2 17/20] selftests/powerpc: Add checks for transactional VSXs in signal contexts

2016-08-11 Thread Cyril Bur
If a thread receives a signal while transactional the kernel creates a
second context to show the transactional state of the process. This
test loads some known values and waits for a signal and confirms that
the expected values are in the signal context.

Signed-off-by: Cyril Bur 
---
 tools/testing/selftests/powerpc/tm/Makefile|   2 +-
 .../powerpc/tm/tm-signal-context-chk-vsx.c | 125 +
 2 files changed, 126 insertions(+), 1 deletion(-)
 create mode 100644 
tools/testing/selftests/powerpc/tm/tm-signal-context-chk-vsx.c

diff --git a/tools/testing/selftests/powerpc/tm/Makefile 
b/tools/testing/selftests/powerpc/tm/Makefile
index 06c44aa..9d53f8b 100644
--- a/tools/testing/selftests/powerpc/tm/Makefile
+++ b/tools/testing/selftests/powerpc/tm/Makefile
@@ -1,5 +1,5 @@
 SIGNAL_CONTEXT_CHK_TESTS := tm-signal-context-chk-gpr 
tm-signal-context-chk-fpu \
-   tm-signal-context-chk-vmx
+   tm-signal-context-chk-vmx tm-signal-context-chk-vsx
 
 TEST_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv tm-signal-stack \
tm-vmxcopy tm-fork tm-tar tm-tmspr $(SIGNAL_CONTEXT_CHK_TESTS)
diff --git a/tools/testing/selftests/powerpc/tm/tm-signal-context-chk-vsx.c 
b/tools/testing/selftests/powerpc/tm/tm-signal-context-chk-vsx.c
new file mode 100644
index 000..b99c3d8
--- /dev/null
+++ b/tools/testing/selftests/powerpc/tm/tm-signal-context-chk-vsx.c
@@ -0,0 +1,125 @@
+/*
+ * Copyright 2016, Cyril Bur, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ *
+ * Test the kernel's signal frame code.
+ *
+ * The kernel sets up two sets of ucontexts if the signal was to be
+ * delivered while the thread was in a transaction.
+ * Expected behaviour is that the checkpointed state is in the user
+ * context passed to the signal handler. The speculated state can be
+ * accessed with the uc_link pointer.
+ *
+ * The rationale for this is that if TM unaware code (which linked
+ * against TM libs) installs a signal handler it will not know of the
+ * speculative nature of the 'live' registers and may infer the wrong
+ * thing.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include "utils.h"
+#include "tm.h"
+
+#define MAX_ATTEMPT 50
+
+#define NV_VSX_REGS 12
+
+long tm_signal_self_context_load(pid_t pid, long *gprs, double *fps, vector 
int *vms, vector int *vss);
+
+static sig_atomic_t fail;
+
+vector int vss[] = {
+   {1, 2, 3, 4 },{5, 6, 7, 8 },{9, 10,11,12},
+   {13,14,15,16},{17,18,19,20},{21,22,23,24},
+   {25,26,27,28},{29,30,31,32},{33,34,35,36},
+   {37,38,39,40},{41,42,43,44},{45,46,47,48},
+   {-1, -2, -3, -4 },{-5, -6, -7, -8 },{-9, -10,-11,-12},
+   {-13,-14,-15,-16},{-17,-18,-19,-20},{-21,-22,-23,-24},
+   {-25,-26,-27,-28},{-29,-30,-31,-32},{-33,-34,-35,-36},
+   {-37,-38,-39,-40},{-41,-42,-43,-44},{-45,-46,-47,-48}
+};
+
+static void signal_usr1(int signum, siginfo_t *info, void *uc)
+{
+   int i;
+   uint8_t vsc[sizeof(vector int)];
+   uint8_t vst[sizeof(vector int)];
+   ucontext_t *ucp = uc;
+   ucontext_t *tm_ucp = ucp->uc_link;
+
+   /*
+* The other half of the VSX regs will be after v_regs.
+*
+* In short, vmx_reserve array holds everything. v_regs is a 16
+* byte aligned pointer at the start of vmx_reserve (vmx_reserve
+* may or may not be 16 aligned) where the v_regs structure exists.
+* (half of) The VSX regsters are directly after v_regs so the
+* easiest way to find them below.
+*/
+   long *vsx_ptr = (long *)(ucp->uc_mcontext.v_regs + 1);
+   long *tm_vsx_ptr = (long *)(tm_ucp->uc_mcontext.v_regs + 1);
+   for (i = 0; i < NV_VSX_REGS && !fail; i++) {
+   memcpy(vsc, &ucp->uc_mcontext.fp_regs[i + 20], 8);
+   memcpy(vsc + 8, &vsx_ptr[20 + i], 8);
+   fail = memcmp(vsc, &vss[i], sizeof(vector int));
+   memcpy(vst, &tm_ucp->uc_mcontext.fp_regs[i + 20], 8);
+   memcpy(vst + 8, &tm_vsx_ptr[20 + i], 8);
+   fail |= memcmp(vst, &vss[i + NV_VSX_REGS], sizeof(vector int));
+
+   if (fail) {
+   int j;
+
+   fprintf(stderr, "Failed on %d vsx 0x", i);
+   for (j = 0; j < 16; j++)
+   fprintf(stderr, "%02x", vsc[j]);
+   fprintf(stderr, " vs 0x");
+   for (j = 0; j < 16; j++)
+   fprintf(stderr, "%02x", vst[j]);
+   fprintf(stderr, "\n");
+   }
+   }
+}
+
+static int tm_signal_context_chk()
+{
+   struct sigaction act;
+   int i;
+   long rc;
+   pid_t pid = getpid();
+
+   SKIP_IF(!have

[PATCH v2 16/20] selftests/powerpc: Add checks for transactional VMXs in signal contexts

2016-08-11 Thread Cyril Bur
If a thread receives a signal while transactional the kernel creates a
second context to show the transactional state of the process. This
test loads some known values and waits for a signal and confirms that
the expected values are in the signal context.

Signed-off-by: Cyril Bur 
---
 tools/testing/selftests/powerpc/tm/Makefile|   3 +-
 .../powerpc/tm/tm-signal-context-chk-vmx.c | 110 +
 2 files changed, 112 insertions(+), 1 deletion(-)
 create mode 100644 
tools/testing/selftests/powerpc/tm/tm-signal-context-chk-vmx.c

diff --git a/tools/testing/selftests/powerpc/tm/Makefile 
b/tools/testing/selftests/powerpc/tm/Makefile
index 103648f..06c44aa 100644
--- a/tools/testing/selftests/powerpc/tm/Makefile
+++ b/tools/testing/selftests/powerpc/tm/Makefile
@@ -1,4 +1,5 @@
-SIGNAL_CONTEXT_CHK_TESTS := tm-signal-context-chk-gpr tm-signal-context-chk-fpu
+SIGNAL_CONTEXT_CHK_TESTS := tm-signal-context-chk-gpr 
tm-signal-context-chk-fpu \
+   tm-signal-context-chk-vmx
 
 TEST_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv tm-signal-stack \
tm-vmxcopy tm-fork tm-tar tm-tmspr $(SIGNAL_CONTEXT_CHK_TESTS)
diff --git a/tools/testing/selftests/powerpc/tm/tm-signal-context-chk-vmx.c 
b/tools/testing/selftests/powerpc/tm/tm-signal-context-chk-vmx.c
new file mode 100644
index 000..f0ee55f
--- /dev/null
+++ b/tools/testing/selftests/powerpc/tm/tm-signal-context-chk-vmx.c
@@ -0,0 +1,110 @@
+/*
+ * Copyright 2016, Cyril Bur, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ *
+ * Test the kernel's signal frame code.
+ *
+ * The kernel sets up two sets of ucontexts if the signal was to be
+ * delivered while the thread was in a transaction.
+ * Expected behaviour is that the checkpointed state is in the user
+ * context passed to the signal handler. The speculated state can be
+ * accessed with the uc_link pointer.
+ *
+ * The rationale for this is that if TM unaware code (which linked
+ * against TM libs) installs a signal handler it will not know of the
+ * speculative nature of the 'live' registers and may infer the wrong
+ * thing.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include "utils.h"
+#include "tm.h"
+
+#define MAX_ATTEMPT 50
+
+#define NV_VMX_REGS 12
+
+long tm_signal_self_context_load(pid_t pid, long *gprs, double *fps, vector 
int *vms, vector int *vss);
+
+static sig_atomic_t fail;
+
+vector int vms[] = {
+   {1, 2, 3, 4 },{5, 6, 7, 8 },{9, 10,11,12},
+   {13,14,15,16},{17,18,19,20},{21,22,23,24},
+   {25,26,27,28},{29,30,31,32},{33,34,35,36},
+   {37,38,39,40},{41,42,43,44},{45,46,47,48},
+   {-1, -2, -3, -4}, {-5, -6, -7, -8}, {-9, -10,-11,-12},
+   {-13,-14,-15,-16},{-17,-18,-19,-20},{-21,-22,-23,-24},
+   {-25,-26,-27,-28},{-29,-30,-31,-32},{-33,-34,-35,-36},
+   {-37,-38,-39,-40},{-41,-42,-43,-44},{-45,-46,-47,-48}
+};
+
+static void signal_usr1(int signum, siginfo_t *info, void *uc)
+{
+   int i;
+   ucontext_t *ucp = uc;
+   ucontext_t *tm_ucp = ucp->uc_link;
+
+   for (i = 0; i < NV_VMX_REGS && !fail; i++) {
+   fail = memcmp(ucp->uc_mcontext.v_regs->vrregs[i + 20],
+   &vms[i], sizeof(vector int));
+   fail |= memcmp(tm_ucp->uc_mcontext.v_regs->vrregs[i + 20],
+   &vms[i + NV_VMX_REGS], sizeof (vector int));
+
+   if (fail) {
+   int j;
+
+   fprintf(stderr, "Failed on %d vmx 0x", i);
+   for (j = 0; j < 4; j++)
+   fprintf(stderr, "%04x", 
ucp->uc_mcontext.v_regs->vrregs[i + 20][j]);
+   fprintf(stderr, " vs 0x");
+   for (j = 0 ; j < 4; j++)
+   fprintf(stderr, "%04x", 
tm_ucp->uc_mcontext.v_regs->vrregs[i + 20][j]);
+   fprintf(stderr, "\n");
+   }
+   }
+}
+
+static int tm_signal_context_chk()
+{
+   struct sigaction act;
+   int i;
+   long rc;
+   pid_t pid = getpid();
+
+   SKIP_IF(!have_htm());
+
+   act.sa_sigaction = signal_usr1;
+   sigemptyset(&act.sa_mask);
+   act.sa_flags = SA_SIGINFO;
+   if (sigaction(SIGUSR1, &act, NULL) < 0) {
+   perror("sigaction sigusr1");
+   exit(1);
+   }
+
+   i = 0;
+   while (i < MAX_ATTEMPT && !fail) {
+   rc = tm_signal_self_context_load(pid, NULL, NULL, vms, NULL);
+   FAIL_IF(rc != pid);
+   i++;
+   }
+
+   return fail;
+}
+
+int main(void)
+{
+   return test_harness(tm_signal_context_chk, "tm_signal_context_chk_vmx");
+}
-- 
2.9.2



[PATCH v2 15/20] selftests/powerpc: Add checks for transactional FPUs in signal contexts

2016-08-11 Thread Cyril Bur
If a thread receives a signal while transactional the kernel creates a
second context to show the transactional state of the process. This
test loads some known values and waits for a signal and confirms that
the expected values are in the signal context.

Signed-off-by: Cyril Bur 
---
 tools/testing/selftests/powerpc/tm/Makefile|  2 +-
 .../powerpc/tm/tm-signal-context-chk-fpu.c | 92 ++
 2 files changed, 93 insertions(+), 1 deletion(-)
 create mode 100644 
tools/testing/selftests/powerpc/tm/tm-signal-context-chk-fpu.c

diff --git a/tools/testing/selftests/powerpc/tm/Makefile 
b/tools/testing/selftests/powerpc/tm/Makefile
index 2b6fe8f..103648f 100644
--- a/tools/testing/selftests/powerpc/tm/Makefile
+++ b/tools/testing/selftests/powerpc/tm/Makefile
@@ -1,4 +1,4 @@
-SIGNAL_CONTEXT_CHK_TESTS := tm-signal-context-chk-gpr
+SIGNAL_CONTEXT_CHK_TESTS := tm-signal-context-chk-gpr tm-signal-context-chk-fpu
 
 TEST_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv tm-signal-stack \
tm-vmxcopy tm-fork tm-tar tm-tmspr $(SIGNAL_CONTEXT_CHK_TESTS)
diff --git a/tools/testing/selftests/powerpc/tm/tm-signal-context-chk-fpu.c 
b/tools/testing/selftests/powerpc/tm/tm-signal-context-chk-fpu.c
new file mode 100644
index 000..c760deb
--- /dev/null
+++ b/tools/testing/selftests/powerpc/tm/tm-signal-context-chk-fpu.c
@@ -0,0 +1,92 @@
+/*
+ * Copyright 2016, Cyril Bur, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ *
+ * Test the kernel's signal frame code.
+ *
+ * The kernel sets up two sets of ucontexts if the signal was to be
+ * delivered while the thread was in a transaction.
+ * Expected behaviour is that the checkpointed state is in the user
+ * context passed to the signal handler. The speculated state can be
+ * accessed with the uc_link pointer.
+ *
+ * The rationale for this is that if TM unaware code (which linked
+ * against TM libs) installs a signal handler it will not know of the
+ * speculative nature of the 'live' registers and may infer the wrong
+ * thing.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include "utils.h"
+#include "tm.h"
+
+#define MAX_ATTEMPT 50
+
+#define NV_FPU_REGS 18
+
+long tm_signal_self_context_load(pid_t pid, long *gprs, double *fps, vector 
int *vms, vector int *vss);
+
+/* Be sure there are 2x as many as there are NV FPU regs (2x18) */
+static double fps[] = {
+1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
+   -1,-2,-3,-4,-5,-6,-7,-8,-9,-10,-11,-12,-13,-14,-15,-16,-17,-18
+};
+
+static sig_atomic_t fail;
+
+static void signal_usr1(int signum, siginfo_t *info, void *uc)
+{
+   int i;
+   ucontext_t *ucp = uc;
+   ucontext_t *tm_ucp = ucp->uc_link;
+
+   for (i = 0; i < NV_FPU_REGS && !fail; i++) {
+   fail = (ucp->uc_mcontext.fp_regs[i + 14] != fps[i]);
+   fail |= (tm_ucp->uc_mcontext.fp_regs[i + 14] != fps[i + 
NV_FPU_REGS]);
+   if (fail)
+   printf("Failed on %d FP %g or %g\n", i, 
ucp->uc_mcontext.fp_regs[i + 14], tm_ucp->uc_mcontext.fp_regs[i + 14]);
+   }
+}
+
+static int tm_signal_context_chk_fpu()
+{
+   struct sigaction act;
+   int i;
+   long rc;
+   pid_t pid = getpid();
+
+   SKIP_IF(!have_htm());
+
+   act.sa_sigaction = signal_usr1;
+   sigemptyset(&act.sa_mask);
+   act.sa_flags = SA_SIGINFO;
+   if (sigaction(SIGUSR1, &act, NULL) < 0) {
+   perror("sigaction sigusr1");
+   exit(1);
+   }
+
+   i = 0;
+   while (i < MAX_ATTEMPT && !fail) {
+   rc = tm_signal_self_context_load(pid, NULL, fps, NULL, NULL);
+   FAIL_IF(rc != pid);
+   i++;
+   }
+
+   return fail;
+}
+
+int main(void)
+{
+   return test_harness(tm_signal_context_chk_fpu, 
"tm_signal_context_chk_fpu");
+}
-- 
2.9.2



[PATCH v2 14/20] selftests/powerpc: Add checks for transactional GPRs in signal contexts

2016-08-11 Thread Cyril Bur
If a thread receives a signal while transactional the kernel creates a
second context to show the transactional state of the process. This
test loads some known values and waits for a signal and confirms that
the expected values are in the signal context.

Signed-off-by: Cyril Bur 
---
 tools/testing/selftests/powerpc/tm/Makefile|   7 +-
 .../powerpc/tm/tm-signal-context-chk-gpr.c |  90 
 tools/testing/selftests/powerpc/tm/tm-signal.S | 114 +
 3 files changed, 210 insertions(+), 1 deletion(-)
 create mode 100644 
tools/testing/selftests/powerpc/tm/tm-signal-context-chk-gpr.c
 create mode 100644 tools/testing/selftests/powerpc/tm/tm-signal.S

diff --git a/tools/testing/selftests/powerpc/tm/Makefile 
b/tools/testing/selftests/powerpc/tm/Makefile
index 9d301d7..2b6fe8f 100644
--- a/tools/testing/selftests/powerpc/tm/Makefile
+++ b/tools/testing/selftests/powerpc/tm/Makefile
@@ -1,5 +1,7 @@
+SIGNAL_CONTEXT_CHK_TESTS := tm-signal-context-chk-gpr
+
 TEST_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv tm-signal-stack \
-   tm-vmxcopy tm-fork tm-tar tm-tmspr tm-exec tm-execed
+   tm-vmxcopy tm-fork tm-tar tm-tmspr $(SIGNAL_CONTEXT_CHK_TESTS)
 
 all: $(TEST_PROGS)
 
@@ -11,6 +13,9 @@ tm-syscall: tm-syscall-asm.S
 tm-syscall: CFLAGS += -I../../../../../usr/include
 tm-tmspr: CFLAGS += -pthread
 
+$(SIGNAL_CONTEXT_CHK_TESTS): tm-signal.S
+$(SIGNAL_CONTEXT_CHK_TESTS): CFLAGS += -mhtm -m64
+
 include ../../lib.mk
 
 clean:
diff --git a/tools/testing/selftests/powerpc/tm/tm-signal-context-chk-gpr.c 
b/tools/testing/selftests/powerpc/tm/tm-signal-context-chk-gpr.c
new file mode 100644
index 000..df91330
--- /dev/null
+++ b/tools/testing/selftests/powerpc/tm/tm-signal-context-chk-gpr.c
@@ -0,0 +1,90 @@
+/*
+ * Copyright 2016, Cyril Bur, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ *
+ * Test the kernel's signal frame code.
+ *
+ * The kernel sets up two sets of ucontexts if the signal was to be
+ * delivered while the thread was in a transaction.
+ * Expected behaviour is that the checkpointed state is in the user
+ * context passed to the signal handler. The speculated state can be
+ * accessed with the uc_link pointer.
+ *
+ * The rationale for this is that if TM unaware code (which linked
+ * against TM libs) installs a signal handler it will not know of the
+ * speculative nature of the 'live' registers and may infer the wrong
+ * thing.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include "utils.h"
+#include "tm.h"
+
+#define MAX_ATTEMPT 50
+
+#define NV_GPR_REGS 18
+
+long tm_signal_self_context_load(pid_t pid, long *gprs, double *fps, vector 
int *vms, vector int *vss);
+
+static sig_atomic_t fail;
+
+static long gps[] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 
17, 18,
+
-1,-2,-3,-4,-5,-6,-7,-8,-9,-10,-11,-12,-13,-14,-15,-16,-17,-18};
+
+static void signal_usr1(int signum, siginfo_t *info, void *uc)
+{
+   int i;
+   ucontext_t *ucp = uc;
+   ucontext_t *tm_ucp = ucp->uc_link;
+
+   for (i = 0; i < NV_GPR_REGS && !fail; i++) {
+   fail = (ucp->uc_mcontext.gp_regs[i + 14] != gps[i]);
+   fail |= (tm_ucp->uc_mcontext.gp_regs[i + 14] != gps[i + 
NV_GPR_REGS]);
+   if (fail)
+   printf("Failed on %d GPR %lu or %lu\n", i,
+   ucp->uc_mcontext.gp_regs[i + 14], 
tm_ucp->uc_mcontext.gp_regs[i + 14]);
+   }
+}
+
+static int tm_signal_context_chk_gpr()
+{
+   struct sigaction act;
+   int i;
+   long rc;
+   pid_t pid = getpid();
+
+   SKIP_IF(!have_htm());
+
+   act.sa_sigaction = signal_usr1;
+   sigemptyset(&act.sa_mask);
+   act.sa_flags = SA_SIGINFO;
+   if (sigaction(SIGUSR1, &act, NULL) < 0) {
+   perror("sigaction sigusr1");
+   exit(1);
+   }
+
+   i = 0;
+   while (i < MAX_ATTEMPT && !fail) {
+   rc = tm_signal_self_context_load(pid, gps, NULL, NULL, NULL);
+   FAIL_IF(rc != pid);
+   i++;
+   }
+
+   return fail;
+}
+
+int main(void)
+{
+   return test_harness(tm_signal_context_chk_gpr, 
"tm_signal_context_chk_gpr");
+}
diff --git a/tools/testing/selftests/powerpc/tm/tm-signal.S 
b/tools/testing/selftests/powerpc/tm/tm-signal.S
new file mode 100644
index 000..4e13e8b
--- /dev/null
+++ b/tools/testing/selftests/powerpc/tm/tm-signal.S
@@ -0,0 +1,114 @@
+/*
+ * Copyright 2015, Cyril Bur, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of 

[PATCH v2 13/20] selftests/powerpc: Check that signals always get delivered

2016-08-11 Thread Cyril Bur
Signed-off-by: Cyril Bur 
---
 tools/testing/selftests/powerpc/Makefile   |   1 +
 tools/testing/selftests/powerpc/signal/Makefile|  12 +++
 tools/testing/selftests/powerpc/signal/signal.S|  50 ++
 tools/testing/selftests/powerpc/signal/signal.c| 111 +
 tools/testing/selftests/powerpc/signal/signal_tm.c | 110 
 5 files changed, 284 insertions(+)
 create mode 100644 tools/testing/selftests/powerpc/signal/Makefile
 create mode 100644 tools/testing/selftests/powerpc/signal/signal.S
 create mode 100644 tools/testing/selftests/powerpc/signal/signal.c
 create mode 100644 tools/testing/selftests/powerpc/signal/signal_tm.c

diff --git a/tools/testing/selftests/powerpc/Makefile 
b/tools/testing/selftests/powerpc/Makefile
index 3c40c9d..96a8593 100644
--- a/tools/testing/selftests/powerpc/Makefile
+++ b/tools/testing/selftests/powerpc/Makefile
@@ -19,6 +19,7 @@ SUB_DIRS = alignment  \
   dscr \
   mm   \
   pmu  \
+  signal   \
   primitives   \
   stringloops  \
   switch_endian\
diff --git a/tools/testing/selftests/powerpc/signal/Makefile 
b/tools/testing/selftests/powerpc/signal/Makefile
new file mode 100644
index 000..97944cf
--- /dev/null
+++ b/tools/testing/selftests/powerpc/signal/Makefile
@@ -0,0 +1,12 @@
+TEST_PROGS := signal signal_tm
+
+all: $(TEST_PROGS)
+
+$(TEST_PROGS): ../harness.c ../utils.c signal.S
+
+signal_tm: CFLAGS += -mhtm
+
+include ../../lib.mk
+
+clean:
+   rm -f $(TEST_PROGS) *.o
diff --git a/tools/testing/selftests/powerpc/signal/signal.S 
b/tools/testing/selftests/powerpc/signal/signal.S
new file mode 100644
index 000..7043d52
--- /dev/null
+++ b/tools/testing/selftests/powerpc/signal/signal.S
@@ -0,0 +1,50 @@
+/*
+ * Copyright 2015, Cyril Bur, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include "../basic_asm.h"
+
+/* long signal_self(pid_t pid, int sig); */
+FUNC_START(signal_self)
+   li  r0,37 /* sys_kill */
+   /* r3 already has our pid in it */
+   /* r4 already has signal type in it */
+   sc
+   bc  4,3,1f
+   subfze  r3,r3
+1: blr
+FUNC_END(signal_self)
+
+/* long tm_signal_self(pid_t pid, int sig, int *ret); */
+FUNC_START(tm_signal_self)
+   PUSH_BASIC_STACK(8)
+   std r5,STACK_FRAME_PARAM(0)(sp) /* ret */
+   tbegin.
+   beq 1f
+   tsuspend.
+   li  r0,37 /* sys_kill */
+   /* r3 already has our pid in it */
+   /* r4 already has signal type in it */
+   sc
+   ld  r5,STACK_FRAME_PARAM(0)(sp) /* ret */
+   bc  4,3,2f
+   subfze  r3,r3
+2: std r3,0(r5)
+   tabort. 0
+   tresume. /* Be nice to some cleanup, jumps back to tbegin then to 1: */
+   /*
+* Transaction should be proper doomed and we should never get
+* here
+*/
+   li  r3,1
+   POP_BASIC_STACK(8)
+   blr
+1: li  r3,0
+   POP_BASIC_STACK(8)
+   blr
+FUNC_END(tm_signal_self)
diff --git a/tools/testing/selftests/powerpc/signal/signal.c 
b/tools/testing/selftests/powerpc/signal/signal.c
new file mode 100644
index 000..e7dedd2
--- /dev/null
+++ b/tools/testing/selftests/powerpc/signal/signal.c
@@ -0,0 +1,111 @@
+/*
+ * Copyright 2016, Cyril Bur, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * Sending one self a signal should always get delivered.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include "utils.h"
+
+#define MAX_ATTEMPT 50
+#define TIMEOUT 5
+
+extern long signal_self(pid_t pid, int sig);
+
+static sig_atomic_t signaled;
+static sig_atomic_t fail;
+
+static void signal_handler(int sig)
+{
+   if (sig == SIGUSR1)
+   signaled = 1;
+   else
+   fail = 1;
+}
+
+static int test_signal()
+{
+   int i;
+   struct sigaction act;
+   pid_t ppid = getpid();
+   pid_t pid;
+
+   act.sa_handler = signal_handler;
+   act.sa_flags = 0;
+   sigemptyset(&act.sa_mask);
+   if (sigaction(SIGUSR1, &act, NULL) < 0) {
+   perror("sigaction SIGUSR1");
+   exit(1);
+   }
+   if (sigaction(SIGALRM, &act, NULL) < 0) {
+   perror("sigaction SIGALRM");
+   exit(1);
+   }
+
+   /* Don't do this for MAX_ATTEMPT, its simply too long */
+   for(i  = 0; i < 1000; i++) {
+   pid = fo

[PATCH v2 12/20] selftests/powerpc: Add TM tcheck helpers in C

2016-08-11 Thread Cyril Bur
Signed-off-by: Cyril Bur 
---
 tools/testing/selftests/powerpc/tm/tm.h | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/tools/testing/selftests/powerpc/tm/tm.h 
b/tools/testing/selftests/powerpc/tm/tm.h
index 60318ba..2c8da74 100644
--- a/tools/testing/selftests/powerpc/tm/tm.h
+++ b/tools/testing/selftests/powerpc/tm/tm.h
@@ -52,4 +52,31 @@ static inline bool failure_is_nesting(void)
return (__builtin_get_texasru() & 0x40);
 }
 
+static inline int tcheck(void)
+{
+   long cr;
+   asm volatile ("tcheck 0" : "=r"(cr) : : "cr0");
+   return (cr >> 28) & 4;
+}
+
+static inline bool tcheck_doomed(void)
+{
+   return tcheck() & 8;
+}
+
+static inline bool tcheck_active(void)
+{
+   return tcheck() & 4;
+}
+
+static inline bool tcheck_suspended(void)
+{
+   return tcheck() & 2;
+}
+
+static inline bool tcheck_transactional(void)
+{
+   return tcheck() & 6;
+}
+
 #endif /* _SELFTESTS_POWERPC_TM_TM_H */
-- 
2.9.2



[PATCH v2 09/20] selftests/powerpc: Introduce GPR asm helper header file

2016-08-11 Thread Cyril Bur
Signed-off-by: Cyril Bur 
---
 tools/testing/selftests/powerpc/gpr_asm.h | 96 +++
 1 file changed, 96 insertions(+)
 create mode 100644 tools/testing/selftests/powerpc/gpr_asm.h

diff --git a/tools/testing/selftests/powerpc/gpr_asm.h 
b/tools/testing/selftests/powerpc/gpr_asm.h
new file mode 100644
index 000..f6f3885
--- /dev/null
+++ b/tools/testing/selftests/powerpc/gpr_asm.h
@@ -0,0 +1,96 @@
+/*
+ * Copyright 2016, Cyril Bur, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _SELFTESTS_POWERPC_GPR_ASM_H
+#define _SELFTESTS_POWERPC_GPR_ASM_H
+
+#include "basic_asm.h"
+
+#define __PUSH_NVREGS(top_pos); \
+   std r31,(top_pos)(%r1); \
+   std r30,(top_pos - 8)(%r1); \
+   std r29,(top_pos - 16)(%r1); \
+   std r28,(top_pos - 24)(%r1); \
+   std r27,(top_pos - 32)(%r1); \
+   std r26,(top_pos - 40)(%r1); \
+   std r25,(top_pos - 48)(%r1); \
+   std r24,(top_pos - 56)(%r1); \
+   std r23,(top_pos - 64)(%r1); \
+   std r22,(top_pos - 72)(%r1); \
+   std r21,(top_pos - 80)(%r1); \
+   std r20,(top_pos - 88)(%r1); \
+   std r19,(top_pos - 96)(%r1); \
+   std r18,(top_pos - 104)(%r1); \
+   std r17,(top_pos - 112)(%r1); \
+   std r16,(top_pos - 120)(%r1); \
+   std r15,(top_pos - 128)(%r1); \
+   std r14,(top_pos - 136)(%r1)
+
+#define __POP_NVREGS(top_pos); \
+   ld r31,(top_pos)(%r1); \
+   ld r30,(top_pos - 8)(%r1); \
+   ld r29,(top_pos - 16)(%r1); \
+   ld r28,(top_pos - 24)(%r1); \
+   ld r27,(top_pos - 32)(%r1); \
+   ld r26,(top_pos - 40)(%r1); \
+   ld r25,(top_pos - 48)(%r1); \
+   ld r24,(top_pos - 56)(%r1); \
+   ld r23,(top_pos - 64)(%r1); \
+   ld r22,(top_pos - 72)(%r1); \
+   ld r21,(top_pos - 80)(%r1); \
+   ld r20,(top_pos - 88)(%r1); \
+   ld r19,(top_pos - 96)(%r1); \
+   ld r18,(top_pos - 104)(%r1); \
+   ld r17,(top_pos - 112)(%r1); \
+   ld r16,(top_pos - 120)(%r1); \
+   ld r15,(top_pos - 128)(%r1); \
+   ld r14,(top_pos - 136)(%r1)
+
+#define PUSH_NVREGS(stack_size) \
+   __PUSH_NVREGS(stack_size + STACK_FRAME_MIN_SIZE)
+
+/* 18 NV FPU REGS */
+#define PUSH_NVREGS_BELOW_FPU(stack_size) \
+   __PUSH_NVREGS(stack_size + STACK_FRAME_MIN_SIZE - (18 * 8))
+
+#define POP_NVREGS(stack_size) \
+   __POP_NVREGS(stack_size + STACK_FRAME_MIN_SIZE)
+
+/* 18 NV FPU REGS */
+#define POP_NVREGS_BELOW_FPU(stack_size) \
+   __POP_NVREGS(stack_size + STACK_FRAME_MIN_SIZE - (18 * 8))
+
+/*
+ * Careful calling this, it will 'clobber' NVGPRs (by design)
+ * Don't call this from C
+ */
+FUNC_START(load_gpr)
+   ld  r14,0(r3)
+   ld  r15,8(r3)
+   ld  r16,16(r3)
+   ld  r17,24(r3)
+   ld  r18,32(r3)
+   ld  r19,40(r3)
+   ld  r20,48(r3)
+   ld  r21,56(r3)
+   ld  r22,64(r3)
+   ld  r23,72(r3)
+   ld  r24,80(r3)
+   ld  r25,88(r3)
+   ld  r26,96(r3)
+   ld  r27,104(r3)
+   ld  r28,112(r3)
+   ld  r29,120(r3)
+   ld  r30,128(r3)
+   ld  r31,136(r3)
+   blr
+FUNC_END(load_gpr)
+
+
+#endif /* _SELFTESTS_POWERPC_GPR_ASM_H */
-- 
2.9.2



[PATCH v2 10/20] selftests/powerpc: Add transactional memory defines

2016-08-11 Thread Cyril Bur
Signed-off-by: Cyril Bur 
---
 tools/testing/selftests/powerpc/basic_asm.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/tools/testing/selftests/powerpc/basic_asm.h 
b/tools/testing/selftests/powerpc/basic_asm.h
index 3349a07..5131059 100644
--- a/tools/testing/selftests/powerpc/basic_asm.h
+++ b/tools/testing/selftests/powerpc/basic_asm.h
@@ -4,6 +4,10 @@
 #include 
 #include 
 
+#define TBEGIN .long 0x7C00051D
+#define TSUSPEND .long 0x7C0005DD
+#define TRESUME .long 0x7C2005DD
+
 #define LOAD_REG_IMMEDIATE(reg,expr) \
lis reg,(expr)@highest; \
ori reg,reg,(expr)@higher;  \
-- 
2.9.2



[PATCH v2 11/20] selftests/powerpc: Allow tests to extend their kill timeout

2016-08-11 Thread Cyril Bur
Signed-off-by: Cyril Bur 
---
 tools/testing/selftests/powerpc/harness.c | 9 +++--
 tools/testing/selftests/powerpc/utils.h   | 2 +-
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/powerpc/harness.c 
b/tools/testing/selftests/powerpc/harness.c
index 52f9be7..248a820 100644
--- a/tools/testing/selftests/powerpc/harness.c
+++ b/tools/testing/selftests/powerpc/harness.c
@@ -19,9 +19,9 @@
 #include "subunit.h"
 #include "utils.h"
 
-#define TIMEOUT120
 #define KILL_TIMEOUT   5
 
+static uint64_t timeout = 120;
 
 int run_test(int (test_function)(void), char *name)
 {
@@ -44,7 +44,7 @@ int run_test(int (test_function)(void), char *name)
setpgid(pid, pid);
 
/* Wake us up in timeout seconds */
-   alarm(TIMEOUT);
+   alarm(timeout);
terminated = false;
 
 wait:
@@ -94,6 +94,11 @@ static struct sigaction alarm_action = {
.sa_handler = alarm_handler,
 };
 
+void test_harness_set_timeout(uint64_t time)
+{
+   timeout = time;
+}
+
 int test_harness(int (test_function)(void), char *name)
 {
int rc;
diff --git a/tools/testing/selftests/powerpc/utils.h 
b/tools/testing/selftests/powerpc/utils.h
index ecd11b5..53405e8 100644
--- a/tools/testing/selftests/powerpc/utils.h
+++ b/tools/testing/selftests/powerpc/utils.h
@@ -22,7 +22,7 @@ typedef uint32_t u32;
 typedef uint16_t u16;
 typedef uint8_t u8;
 
-
+void test_harness_set_timeout(uint64_t time);
 int test_harness(int (test_function)(void), char *name);
 extern void *get_auxv_entry(int type);
 int pick_online_cpu(void);
-- 
2.9.2



[PATCH v2 08/20] selftests/powerpc: Move VMX stack frame macros to header file

2016-08-11 Thread Cyril Bur
Signed-off-by: Cyril Bur 
---
 tools/testing/selftests/powerpc/math/vmx_asm.S | 85 +-
 tools/testing/selftests/powerpc/vmx_asm.h  | 98 ++
 2 files changed, 99 insertions(+), 84 deletions(-)
 create mode 100644 tools/testing/selftests/powerpc/vmx_asm.h

diff --git a/tools/testing/selftests/powerpc/math/vmx_asm.S 
b/tools/testing/selftests/powerpc/math/vmx_asm.S
index 1b8c248..fd74da4 100644
--- a/tools/testing/selftests/powerpc/math/vmx_asm.S
+++ b/tools/testing/selftests/powerpc/math/vmx_asm.S
@@ -8,90 +8,7 @@
  */
 
 #include "../basic_asm.h"
-
-# POS MUST BE 16 ALIGNED!
-#define PUSH_VMX(pos,reg) \
-   li  reg,pos; \
-   stvxv20,reg,sp; \
-   addireg,reg,16; \
-   stvxv21,reg,sp; \
-   addireg,reg,16; \
-   stvxv22,reg,sp; \
-   addireg,reg,16; \
-   stvxv23,reg,sp; \
-   addireg,reg,16; \
-   stvxv24,reg,sp; \
-   addireg,reg,16; \
-   stvxv25,reg,sp; \
-   addireg,reg,16; \
-   stvxv26,reg,sp; \
-   addireg,reg,16; \
-   stvxv27,reg,sp; \
-   addireg,reg,16; \
-   stvxv28,reg,sp; \
-   addireg,reg,16; \
-   stvxv29,reg,sp; \
-   addireg,reg,16; \
-   stvxv30,reg,sp; \
-   addireg,reg,16; \
-   stvxv31,reg,sp;
-
-# POS MUST BE 16 ALIGNED!
-#define POP_VMX(pos,reg) \
-   li  reg,pos; \
-   lvx v20,reg,sp; \
-   addireg,reg,16; \
-   lvx v21,reg,sp; \
-   addireg,reg,16; \
-   lvx v22,reg,sp; \
-   addireg,reg,16; \
-   lvx v23,reg,sp; \
-   addireg,reg,16; \
-   lvx v24,reg,sp; \
-   addireg,reg,16; \
-   lvx v25,reg,sp; \
-   addireg,reg,16; \
-   lvx v26,reg,sp; \
-   addireg,reg,16; \
-   lvx v27,reg,sp; \
-   addireg,reg,16; \
-   lvx v28,reg,sp; \
-   addireg,reg,16; \
-   lvx v29,reg,sp; \
-   addireg,reg,16; \
-   lvx v30,reg,sp; \
-   addireg,reg,16; \
-   lvx v31,reg,sp;
-
-# Carefull this will 'clobber' vmx (by design)
-# Don't call this from C
-FUNC_START(load_vmx)
-   li  r5,0
-   lvx v20,r5,r3
-   addir5,r5,16
-   lvx v21,r5,r3
-   addir5,r5,16
-   lvx v22,r5,r3
-   addir5,r5,16
-   lvx v23,r5,r3
-   addir5,r5,16
-   lvx v24,r5,r3
-   addir5,r5,16
-   lvx v25,r5,r3
-   addir5,r5,16
-   lvx v26,r5,r3
-   addir5,r5,16
-   lvx v27,r5,r3
-   addir5,r5,16
-   lvx v28,r5,r3
-   addir5,r5,16
-   lvx v29,r5,r3
-   addir5,r5,16
-   lvx v30,r5,r3
-   addir5,r5,16
-   lvx v31,r5,r3
-   blr
-FUNC_END(load_vmx)
+#include "../vmx_asm.h"
 
 # Should be safe from C, only touches r4, r5 and v0,v1,v2
 FUNC_START(check_vmx)
diff --git a/tools/testing/selftests/powerpc/vmx_asm.h 
b/tools/testing/selftests/powerpc/vmx_asm.h
new file mode 100644
index 000..461845dd
--- /dev/null
+++ b/tools/testing/selftests/powerpc/vmx_asm.h
@@ -0,0 +1,98 @@
+/*
+ * Copyright 2015, Cyril Bur, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include "basic_asm.h"
+
+/* POS MUST BE 16 ALIGNED! */
+#define PUSH_VMX(pos,reg) \
+   li  reg,pos; \
+   stvxv20,reg,%r1; \
+   addireg,reg,16; \
+   stvxv21,reg,%r1; \
+   addireg,reg,16; \
+   stvxv22,reg,%r1; \
+   addireg,reg,16; \
+   stvxv23,reg,%r1; \
+   addireg,reg,16; \
+   stvxv24,reg,%r1; \
+   addireg,reg,16; \
+   stvxv25,reg,%r1; \
+   addireg,reg,16; \
+   stvxv26,reg,%r1; \
+   addireg,reg,16; \
+   stvxv27,reg,%r1; \
+   addireg,reg,16; \
+   stvxv28,reg,%r1; \
+   addireg,reg,16; \
+   stvxv29,reg,%r1; \
+   addireg,reg,16; \
+   stvxv30,reg,%r1; \
+   addireg,reg,16; \
+   stvxv31,reg,%r1;
+
+/* POS MUST BE 16 ALIGNED! */
+#define POP_VMX(pos,reg) \
+   li  reg,pos; \
+   lvx v20,reg,%r1; \
+   addireg,reg,16; \
+   lvx v21,reg,%r1; \
+   addireg,reg,16; \
+   lvx v22,reg,%r1; \
+   addireg,reg,16; \
+   lvx v23,reg,%r1; \
+   addireg,reg,16; \
+   lvx v24,reg,%r1; \
+   addireg,reg,16; \
+   lvx v25,reg,%r1; \
+   addireg,reg,16; \
+   lvx v26,reg,%r1; \
+   addireg,reg,16; \
+   lvx v27,reg,%r1; \
+   addireg,reg,16; \
+   lvx v28,reg,%r1; \
+   addireg,reg,16; \
+   lvx v29,reg

[PATCH v2 04/20] powerpc: Return the new MSR from msr_check_and_set()

2016-08-11 Thread Cyril Bur
mfmsr() is a fairly expensive call and callers of msr_check_and_set()
may want to make decisions bits in the MSR that it did not change but
may not know the value of.

This patch would avoid a two calls to mfmsr().

Signed-off-by: Cyril Bur 
---
 arch/powerpc/include/asm/reg.h | 2 +-
 arch/powerpc/kernel/process.c  | 4 +++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index f69f40f..0a3dde9 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -1247,7 +1247,7 @@ static inline void mtmsr_isync(unsigned long val)
 : "memory")
 #endif
 
-extern void msr_check_and_set(unsigned long bits);
+extern unsigned long msr_check_and_set(unsigned long bits);
 extern bool strict_msr_control;
 extern void __msr_check_and_clear(unsigned long bits);
 static inline void msr_check_and_clear(unsigned long bits)
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 79f0615..216cf05 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -104,7 +104,7 @@ static int __init enable_strict_msr_control(char *str)
 }
 early_param("ppc_strict_facility_enable", enable_strict_msr_control);
 
-void msr_check_and_set(unsigned long bits)
+unsigned long msr_check_and_set(unsigned long bits)
 {
unsigned long oldmsr = mfmsr();
unsigned long newmsr;
@@ -118,6 +118,8 @@ void msr_check_and_set(unsigned long bits)
 
if (oldmsr != newmsr)
mtmsr_isync(newmsr);
+
+   return newmsr;
 }
 
 void __msr_check_and_clear(unsigned long bits)
-- 
2.9.2



[PATCH v2 07/20] selftests/powerpc: Rework FPU stack placement macros and move to header file

2016-08-11 Thread Cyril Bur
The FPU regs are placed at the top of the stack frame. Currently the
position expected to be passed to the macro. The macros now should be
passed the stack frame size and from there they can calculate where to
put the regs, this makes the use simpler.

Also move them to a header file to be used in an different area of the
powerpc selftests

Signed-off-by: Cyril Bur 
---
 tools/testing/selftests/powerpc/fpu_asm.h  | 81 ++
 tools/testing/selftests/powerpc/math/fpu_asm.S | 73 ++-
 2 files changed, 86 insertions(+), 68 deletions(-)
 create mode 100644 tools/testing/selftests/powerpc/fpu_asm.h

diff --git a/tools/testing/selftests/powerpc/fpu_asm.h 
b/tools/testing/selftests/powerpc/fpu_asm.h
new file mode 100644
index 000..24061b8
--- /dev/null
+++ b/tools/testing/selftests/powerpc/fpu_asm.h
@@ -0,0 +1,81 @@
+/*
+ * Copyright 2016, Cyril Bur, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _SELFTESTS_POWERPC_FPU_ASM_H
+#define _SELFTESTS_POWERPC_FPU_ASM_H
+#include "basic_asm.h"
+
+#define PUSH_FPU(stack_size) \
+   stfdf31,(stack_size + STACK_FRAME_MIN_SIZE)(%r1); \
+   stfdf30,(stack_size + STACK_FRAME_MIN_SIZE - 8)(%r1); \
+   stfdf29,(stack_size + STACK_FRAME_MIN_SIZE - 16)(%r1); \
+   stfdf28,(stack_size + STACK_FRAME_MIN_SIZE - 24)(%r1); \
+   stfdf27,(stack_size + STACK_FRAME_MIN_SIZE - 32)(%r1); \
+   stfdf26,(stack_size + STACK_FRAME_MIN_SIZE - 40)(%r1); \
+   stfdf25,(stack_size + STACK_FRAME_MIN_SIZE - 48)(%r1); \
+   stfdf24,(stack_size + STACK_FRAME_MIN_SIZE - 56)(%r1); \
+   stfdf23,(stack_size + STACK_FRAME_MIN_SIZE - 64)(%r1); \
+   stfdf22,(stack_size + STACK_FRAME_MIN_SIZE - 72)(%r1); \
+   stfdf21,(stack_size + STACK_FRAME_MIN_SIZE - 80)(%r1); \
+   stfdf20,(stack_size + STACK_FRAME_MIN_SIZE - 88)(%r1); \
+   stfdf19,(stack_size + STACK_FRAME_MIN_SIZE - 96)(%r1); \
+   stfdf18,(stack_size + STACK_FRAME_MIN_SIZE - 104)(%r1); \
+   stfdf17,(stack_size + STACK_FRAME_MIN_SIZE - 112)(%r1); \
+   stfdf16,(stack_size + STACK_FRAME_MIN_SIZE - 120)(%r1); \
+   stfdf15,(stack_size + STACK_FRAME_MIN_SIZE - 128)(%r1); \
+   stfdf14,(stack_size + STACK_FRAME_MIN_SIZE - 136)(%r1);
+
+#define POP_FPU(stack_size) \
+   lfd f31,(stack_size + STACK_FRAME_MIN_SIZE)(%r1); \
+   lfd f30,(stack_size + STACK_FRAME_MIN_SIZE - 8)(%r1); \
+   lfd f29,(stack_size + STACK_FRAME_MIN_SIZE - 16)(%r1); \
+   lfd f28,(stack_size + STACK_FRAME_MIN_SIZE - 24)(%r1); \
+   lfd f27,(stack_size + STACK_FRAME_MIN_SIZE - 32)(%r1); \
+   lfd f26,(stack_size + STACK_FRAME_MIN_SIZE - 40)(%r1); \
+   lfd f25,(stack_size + STACK_FRAME_MIN_SIZE - 48)(%r1); \
+   lfd f24,(stack_size + STACK_FRAME_MIN_SIZE - 56)(%r1); \
+   lfd f23,(stack_size + STACK_FRAME_MIN_SIZE - 64)(%r1); \
+   lfd f22,(stack_size + STACK_FRAME_MIN_SIZE - 72)(%r1); \
+   lfd f21,(stack_size + STACK_FRAME_MIN_SIZE - 80)(%r1); \
+   lfd f20,(stack_size + STACK_FRAME_MIN_SIZE - 88)(%r1); \
+   lfd f19,(stack_size + STACK_FRAME_MIN_SIZE - 96)(%r1); \
+   lfd f18,(stack_size + STACK_FRAME_MIN_SIZE - 104)(%r1); \
+   lfd f17,(stack_size + STACK_FRAME_MIN_SIZE - 112)(%r1); \
+   lfd f16,(stack_size + STACK_FRAME_MIN_SIZE - 120)(%r1); \
+   lfd f15,(stack_size + STACK_FRAME_MIN_SIZE - 128)(%r1); \
+   lfd f14,(stack_size + STACK_FRAME_MIN_SIZE - 136)(%r1);
+
+/*
+ * Careful calling this, it will 'clobber' fpu (by design)
+ * Don't call this from C
+ */
+FUNC_START(load_fpu)
+   lfd f14,0(r3)
+   lfd f15,8(r3)
+   lfd f16,16(r3)
+   lfd f17,24(r3)
+   lfd f18,32(r3)
+   lfd f19,40(r3)
+   lfd f20,48(r3)
+   lfd f21,56(r3)
+   lfd f22,64(r3)
+   lfd f23,72(r3)
+   lfd f24,80(r3)
+   lfd f25,88(r3)
+   lfd f26,96(r3)
+   lfd f27,104(r3)
+   lfd f28,112(r3)
+   lfd f29,120(r3)
+   lfd f30,128(r3)
+   lfd f31,136(r3)
+   blr
+FUNC_END(load_fpu)
+
+#endif /* _SELFTESTS_POWERPC_FPU_ASM_H */
+
diff --git a/tools/testing/selftests/powerpc/math/fpu_asm.S 
b/tools/testing/selftests/powerpc/math/fpu_asm.S
index f3711d8..241f067 100644
--- a/tools/testing/selftests/powerpc/math/fpu_asm.S
+++ b/tools/testing/selftests/powerpc/math/fpu_asm.S
@@ -8,70 +8,7 @@
  */
 
 #include "../basic_asm.h"
-
-#define PUSH_FPU(pos) \
-   stfdf14,pos(sp); \
-   stfdf15,pos+8(sp); \
-   stfdf16,pos+16(sp); \
-   stfdf17,pos+24(sp); \
-   stfdf18,pos+32(sp)

[PATCH v2 05/20] powerpc: Never giveup a reclaimed thread when enabling kernel {fp, altivec, vsx}

2016-08-11 Thread Cyril Bur
After a thread is reclaimed from its active or suspended transactional
state the checkpointed state exists on CPU, this state (along with the
live/transactional state) has been saved in its entirety by the
reclaiming process.

There exists a sequence of events that would cause the kernel to call
one of enable_kernel_fp(), enable_kernel_altivec() or
enable_kernel_vsx() after a thread has been reclaimed. These functions
save away any user state on the CPU so that the kernel can use the
registers. Not only is this saving away unnecessary at this point, it
is actually incorrect. It causes a save of the checkpointed state to
the live structures within the thread struct thus destroying the true
live state for that thread.

Signed-off-by: Cyril Bur 
---
 arch/powerpc/kernel/process.c | 39 ---
 1 file changed, 36 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 216cf05..0cfbc89 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -198,12 +198,23 @@ EXPORT_SYMBOL_GPL(flush_fp_to_thread);
 
 void enable_kernel_fp(void)
 {
+   unsigned long cpumsr;
+
WARN_ON(preemptible());
 
-   msr_check_and_set(MSR_FP);
+   cpumsr = msr_check_and_set(MSR_FP);
 
if (current->thread.regs && (current->thread.regs->msr & MSR_FP)) {
check_if_tm_restore_required(current);
+   /*
+* If a thread has already been reclaimed then the
+* checkpointed registers are on the CPU but have definitely
+* been saved by the reclaim code. Don't need to and *cannot*
+* giveup as this would save  to the 'live' structure not the
+* checkpointed structure.
+*/
+   if(!MSR_TM_ACTIVE(cpumsr) && 
MSR_TM_ACTIVE(current->thread.regs->msr))
+   return;
__giveup_fpu(current);
}
 }
@@ -250,12 +261,23 @@ EXPORT_SYMBOL(giveup_altivec);
 
 void enable_kernel_altivec(void)
 {
+   unsigned long cpumsr;
+
WARN_ON(preemptible());
 
-   msr_check_and_set(MSR_VEC);
+   cpumsr = msr_check_and_set(MSR_VEC);
 
if (current->thread.regs && (current->thread.regs->msr & MSR_VEC)) {
check_if_tm_restore_required(current);
+   /*
+* If a thread has already been reclaimed then the
+* checkpointed registers are on the CPU but have definitely
+* been saved by the reclaim code. Don't need to and *cannot*
+* giveup as this would save  to the 'live' structure not the
+* checkpointed structure.
+*/
+   if(!MSR_TM_ACTIVE(cpumsr) && 
MSR_TM_ACTIVE(current->thread.regs->msr))
+   return;
__giveup_altivec(current);
}
 }
@@ -324,12 +346,23 @@ static void save_vsx(struct task_struct *tsk)
 
 void enable_kernel_vsx(void)
 {
+   unsigned long cpumsr;
+
WARN_ON(preemptible());
 
-   msr_check_and_set(MSR_FP|MSR_VEC|MSR_VSX);
+   cpumsr = msr_check_and_set(MSR_FP|MSR_VEC|MSR_VSX);
 
if (current->thread.regs && (current->thread.regs->msr & MSR_VSX)) {
check_if_tm_restore_required(current);
+   /*
+* If a thread has already been reclaimed then the
+* checkpointed registers are on the CPU but have definitely
+* been saved by the reclaim code. Don't need to and *cannot*
+* giveup as this would save  to the 'live' structure not the
+* checkpointed structure.
+*/
+   if(!MSR_TM_ACTIVE(cpumsr) && 
MSR_TM_ACTIVE(current->thread.regs->msr))
+   return;
if (current->thread.regs->msr & MSR_FP)
__giveup_fpu(current);
if (current->thread.regs->msr & MSR_VEC)
-- 
2.9.2



[PATCH v2 03/20] powerpc: Add check_if_tm_restore_required() to giveup_all()

2016-08-11 Thread Cyril Bur
giveup_all() causes FPU/VMX/VSX facilitities to be disabled in a
threads MSR. If this thread was transactional this should be recorded
as reclaiming/recheckpointing code will need to know.

Fixes: c208505 ("powerpc: create giveup_all()")
Signed-off-by: Cyril Bur 
---
 arch/powerpc/kernel/process.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index a5cdef9..79f0615 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -439,6 +439,7 @@ void giveup_all(struct task_struct *tsk)
return;
 
msr_check_and_set(msr_all_available);
+   check_if_tm_restore_required(tsk);
 
 #ifdef CONFIG_PPC_FPU
if (usermsr & MSR_FP)
-- 
2.9.2



[PATCH v2 02/20] powerpc: Always restore FPU/VEC/VSX if hardware transactional memory in use

2016-08-11 Thread Cyril Bur
Comment from arch/powerpc/kernel/process.c:967:
 If userspace is inside a transaction (whether active or
 suspended) and FP/VMX/VSX instructions have ever been enabled
 inside that transaction, then we have to keep them enabled
 and keep the FP/VMX/VSX state loaded while ever the transaction
 continues.  The reason is that if we didn't, and subsequently
 got a FP/VMX/VSX unavailable interrupt inside a transaction,
 we don't know whether it's the same transaction, and thus we
 don't know which of the checkpointed state and the ransactional
 state to use.

restore_math() restore_fp() and restore_altivec() currently may not
restore the registers. It doesn't appear that this is more serious
than a performance penalty. If the math registers aren't restored the
userspace thread will still be run with the facility disabled.
Userspace will not be able to read invalid values. On the first access
it will take an facility unavailable exception and the kernel will
detected an active transaction, at which point it will abort the
transaction. There is the possibility for a pathological case
preventing any progress by transactions, however, transactions
are never guaranteed to make progress.

Fixes: 70fe3d9 ("powerpc: Restore FPU/VEC/VSX if previously used")
Signed-off-by: Cyril Bur 
---
 arch/powerpc/kernel/process.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 58ccf86..a5cdef9 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -208,7 +208,7 @@ void enable_kernel_fp(void)
 EXPORT_SYMBOL(enable_kernel_fp);
 
 static int restore_fp(struct task_struct *tsk) {
-   if (tsk->thread.load_fp) {
+   if (tsk->thread.load_fp || MSR_TM_ACTIVE(tsk->thread.regs->msr)) {
load_fp_state(¤t->thread.fp_state);
current->thread.load_fp++;
return 1;
@@ -278,7 +278,8 @@ EXPORT_SYMBOL_GPL(flush_altivec_to_thread);
 
 static int restore_altivec(struct task_struct *tsk)
 {
-   if (cpu_has_feature(CPU_FTR_ALTIVEC) && tsk->thread.load_vec) {
+   if (cpu_has_feature(CPU_FTR_ALTIVEC) &&
+   (tsk->thread.load_vec || MSR_TM_ACTIVE(tsk->thread.regs->msr))) 
{
load_vr_state(&tsk->thread.vr_state);
tsk->thread.used_vr = 1;
tsk->thread.load_vec++;
@@ -464,7 +465,8 @@ void restore_math(struct pt_regs *regs)
 {
unsigned long msr;
 
-   if (!current->thread.load_fp && !loadvec(current->thread))
+   if (!MSR_TM_ACTIVE(regs->msr) &&
+   !current->thread.load_fp && !loadvec(current->thread))
return;
 
msr = regs->msr;
-- 
2.9.2



[PATCH v2 00/20] Consistent TM structures

2016-08-11 Thread Cyril Bur
Hello,

This series has grown considerably from v1.

Similarities with v1 include:
- Selftests are all the same, they have simply been split into several
  patches with comments from MPE and Daniel Axtens incorporated. It
  is possible some things slipped through the cracks selftest wise as
  the focus has been on the final three patches.
- The final three patches have been reworked following extra testing
  and from review by Simon Guo.

Differences include:
- Patches 2-5 are fixes for existing problems found in the course of
  verifying the final three patches. In the case of "powerpc: Never
  giveup a reclaimed thread when enabling kernel {fp, altivec, vsx}" it
  has proven difficult to narrow down when the bug was introduced. It
  does not exist in 3.8 when TM was introduced but does exist in 4.4. I
  was unable to boot 3.13 (or 3.12) in an attempt to further bisect.
- As ptrace code was merged between v1 and v2, work was needed there
  to make it fit in with the final three patches.

The overall aim of this series may have gotten lost here. The final
three patches are the goal here.

Cyril Bur (20):
  selftests/powerpc: Compile selftests against headers without AT_HWCAP2
  powerpc: Always restore FPU/VEC/VSX if hardware transactional memory
in use
  powerpc: Add check_if_tm_restore_required() to giveup_all()
  powerpc: Return the new MSR from msr_check_and_set()
  powerpc: Never giveup a reclaimed thread when enabling kernel {fp,
altivec, vsx}
  selftests/powerpc: Check for VSX preservation across userspace
preemption
  selftests/powerpc: Rework FPU stack placement macros and move to
header file
  selftests/powerpc: Move VMX stack frame macros to header file
  selftests/powerpc: Introduce GPR asm helper header file
  selftests/powerpc: Add transactional memory defines
  selftests/powerpc: Allow tests to extend their kill timeout
  selftests/powerpc: Add TM tcheck helpers in C
  selftests/powerpc: Check that signals always get delivered
  selftests/powerpc: Add checks for transactional GPRs in signal
contexts
  selftests/powerpc: Add checks for transactional FPUs in signal
contexts
  selftests/powerpc: Add checks for transactional VMXs in signal
contexts
  selftests/powerpc: Add checks for transactional VSXs in signal
contexts
  powerpc: tm: Always use fp_state and vr_state to store live registers
  powerpc: tm: Rename transct_(*) to ck(\1)_state
  powerpc: Remove do_load_up_transact_{fpu,altivec}

 arch/powerpc/include/asm/processor.h   |  15 +-
 arch/powerpc/include/asm/reg.h |   2 +-
 arch/powerpc/include/asm/tm.h  |   5 -
 arch/powerpc/kernel/asm-offsets.c  |  12 +-
 arch/powerpc/kernel/fpu.S  |  26 --
 arch/powerpc/kernel/process.c  | 119 +
 arch/powerpc/kernel/ptrace.c   | 278 +
 arch/powerpc/kernel/signal.h   |   8 +-
 arch/powerpc/kernel/signal_32.c|  84 +++
 arch/powerpc/kernel/signal_64.c|  59 ++---
 arch/powerpc/kernel/tm.S   |  94 +++
 arch/powerpc/kernel/traps.c|  12 +-
 arch/powerpc/kernel/vector.S   |  25 --
 tools/testing/selftests/powerpc/Makefile   |   1 +
 tools/testing/selftests/powerpc/basic_asm.h|   4 +
 tools/testing/selftests/powerpc/fpu_asm.h  |  81 ++
 tools/testing/selftests/powerpc/gpr_asm.h  |  96 +++
 tools/testing/selftests/powerpc/harness.c  |   9 +-
 tools/testing/selftests/powerpc/math/Makefile  |   4 +-
 tools/testing/selftests/powerpc/math/fpu_asm.S |  73 +-
 tools/testing/selftests/powerpc/math/vmx_asm.S |  85 +--
 tools/testing/selftests/powerpc/math/vsx_asm.S |  61 +
 tools/testing/selftests/powerpc/math/vsx_preempt.c | 147 +++
 tools/testing/selftests/powerpc/signal/Makefile|  12 +
 tools/testing/selftests/powerpc/signal/signal.S|  50 
 tools/testing/selftests/powerpc/signal/signal.c| 111 
 tools/testing/selftests/powerpc/signal/signal_tm.c | 110 
 tools/testing/selftests/powerpc/tm/Makefile|   8 +-
 .../powerpc/tm/tm-signal-context-chk-fpu.c |  92 +++
 .../powerpc/tm/tm-signal-context-chk-gpr.c |  90 +++
 .../powerpc/tm/tm-signal-context-chk-vmx.c | 110 
 .../powerpc/tm/tm-signal-context-chk-vsx.c | 125 +
 tools/testing/selftests/powerpc/tm/tm-signal.S | 114 +
 tools/testing/selftests/powerpc/tm/tm.h|  27 ++
 tools/testing/selftests/powerpc/utils.h|   9 +-
 tools/testing/selftests/powerpc/vmx_asm.h  |  98 
 tools/testing/selftests/powerpc/vsx_asm.h  |  71 ++
 37 files changed, 1709 insertions(+), 618 deletions(-)
 create mode 100644 tools/testing/selftests/powerpc/fpu_asm.h
 create mode 100644 tools/t

[PATCH v5 12/13] powerpc: Add purgatory for kexec_file_load implementation.

2016-08-11 Thread Thiago Jung Bauermann
This purgatory implementation comes from kexec-tools, almost unchanged.

The only changes were that the sha256_regions global variable was
renamed to sha_regions to match what kexec_file_load expects, and to
use the sha256.c file from x86's purgatory to avoid adding yet another
SHA-256 implementation.

Also, some formatting warnings found by checkpatch.pl were fixed.

Signed-off-by: Thiago Jung Bauermann 
---
 arch/powerpc/Makefile |   4 +
 arch/powerpc/purgatory/.gitignore |   2 +
 arch/powerpc/purgatory/Makefile   |  36 +++
 arch/powerpc/purgatory/console-ppc64.c|  38 +++
 arch/powerpc/purgatory/crashdump-ppc64.h  |  42 
 arch/powerpc/purgatory/crashdump_backup.c |  36 +++
 arch/powerpc/purgatory/crtsavres.S|   5 +
 arch/powerpc/purgatory/hvCall.S   |  27 +
 arch/powerpc/purgatory/hvCall.h   |   8 ++
 arch/powerpc/purgatory/kexec-sha256.h |  11 ++
 arch/powerpc/purgatory/ppc64_asm.h|  20 
 arch/powerpc/purgatory/printf.c   | 164 ++
 arch/powerpc/purgatory/purgatory-ppc64.c  |  41 
 arch/powerpc/purgatory/purgatory-ppc64.h  |   6 ++
 arch/powerpc/purgatory/purgatory.c|  62 +++
 arch/powerpc/purgatory/purgatory.h|  11 ++
 arch/powerpc/purgatory/sha256.c   |   6 ++
 arch/powerpc/purgatory/sha256.h   |   1 +
 arch/powerpc/purgatory/string.S   |   1 +
 arch/powerpc/purgatory/v2wrap.S   | 134 
 20 files changed, 655 insertions(+)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index ca254546cd05..beb928ff6b77 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -254,6 +254,7 @@ core-y  += arch/powerpc/kernel/ 
\
 core-$(CONFIG_XMON)+= arch/powerpc/xmon/
 core-$(CONFIG_KVM) += arch/powerpc/kvm/
 core-$(CONFIG_PERF_EVENTS) += arch/powerpc/perf/
+core-$(CONFIG_KEXEC_FILE)  += arch/powerpc/purgatory/
 
 drivers-$(CONFIG_OPROFILE) += arch/powerpc/oprofile/
 
@@ -375,6 +376,9 @@ archclean:
$(Q)$(MAKE) $(clean)=$(boot)
 
 archprepare: checkbin
+ifeq ($(CONFIG_KEXEC_FILE),y)
+   $(Q)$(MAKE) $(build)=arch/powerpc/purgatory 
arch/powerpc/purgatory/kexec-purgatory.c
+endif
 
 # Use the file '.tmp_gas_check' for binutils tests, as gas won't output
 # to stdout and these checks are run even on install targets.
diff --git a/arch/powerpc/purgatory/.gitignore 
b/arch/powerpc/purgatory/.gitignore
new file mode 100644
index ..e9e66f178a6d
--- /dev/null
+++ b/arch/powerpc/purgatory/.gitignore
@@ -0,0 +1,2 @@
+kexec-purgatory.c
+purgatory.ro
diff --git a/arch/powerpc/purgatory/Makefile b/arch/powerpc/purgatory/Makefile
new file mode 100644
index ..63daf95e5703
--- /dev/null
+++ b/arch/powerpc/purgatory/Makefile
@@ -0,0 +1,36 @@
+purgatory-y := purgatory.o printf.o string.o v2wrap.o hvCall.o \
+   purgatory-ppc64.o console-ppc64.o crashdump_backup.o \
+   crtsavres.o sha256.o
+
+targets += $(purgatory-y)
+PURGATORY_OBJS = $(addprefix $(obj)/,$(purgatory-y))
+
+LDFLAGS_purgatory.ro := -e purgatory_start -r --no-undefined -nostartfiles \
+   -nostdlib -nodefaultlibs
+targets += purgatory.ro
+
+# Default KBUILD_CFLAGS can have -pg option set when FTRACE is enabled. That
+# in turn leaves some undefined symbols like __fentry__ in purgatory and not
+# sure how to relocate those. Like kexec-tools, use custom flags.
+
+KBUILD_CFLAGS := -Wall -Wstrict-prototypes -fno-strict-aliasing \
+   -fno-zero-initialized-in-bss -fno-builtin -ffreestanding \
+   -fno-PIC -fno-PIE -fno-stack-protector  -fno-exceptions \
+   -msoft-float -MD -Os
+KBUILD_CFLAGS += -m$(CONFIG_WORD_SIZE)
+
+$(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE
+   $(call if_changed,ld)
+
+targets += kexec-purgatory.c
+
+CMD_BIN2C = $(objtree)/scripts/basic/bin2c
+quiet_cmd_bin2c = BIN2C   $@
+  cmd_bin2c = $(CMD_BIN2C) kexec_purgatory < $< > $@
+
+$(obj)/kexec-purgatory.c: $(obj)/purgatory.ro FORCE
+   $(call if_changed,bin2c)
+   @:
+
+
+obj-$(CONFIG_KEXEC_FILE)   += kexec-purgatory.o
diff --git a/arch/powerpc/purgatory/console-ppc64.c 
b/arch/powerpc/purgatory/console-ppc64.c
new file mode 100644
index ..3d07be0b5d08
--- /dev/null
+++ b/arch/powerpc/purgatory/console-ppc64.c
@@ -0,0 +1,38 @@
+/*
+ * kexec: Linux boots Linux
+ *
+ * Created by: Mohan Kumar M (mo...@in.ibm.com)
+ *
+ * Copyright (C) IBM Corporation, 2005. All rights reserved
+ *
+ * Code taken from kexec-tools.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation (version 2 of the License).
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warrant

[PATCH v5 13/13] powerpc: Enable CONFIG_KEXEC_FILE in powerpc server defconfigs.

2016-08-11 Thread Thiago Jung Bauermann
Enable CONFIG_KEXEC_FILE in powernv_defconfig, ppc64_defconfig and
pseries_defconfig.

It depends on CONFIG_CRYPTO_SHA256=y, so add that as well.

Signed-off-by: Thiago Jung Bauermann 
---
 arch/powerpc/configs/powernv_defconfig | 2 ++
 arch/powerpc/configs/ppc64_defconfig   | 2 ++
 arch/powerpc/configs/pseries_defconfig | 2 ++
 3 files changed, 6 insertions(+)

diff --git a/arch/powerpc/configs/powernv_defconfig 
b/arch/powerpc/configs/powernv_defconfig
index dce352e9153b..319e1fb7b0c9 100644
--- a/arch/powerpc/configs/powernv_defconfig
+++ b/arch/powerpc/configs/powernv_defconfig
@@ -47,6 +47,7 @@ CONFIG_BINFMT_MISC=m
 CONFIG_PPC_TRANSACTIONAL_MEM=y
 CONFIG_HOTPLUG_CPU=y
 CONFIG_KEXEC=y
+CONFIG_KEXEC_FILE=y
 CONFIG_IRQ_ALL_CPUS=y
 CONFIG_NUMA=y
 CONFIG_MEMORY_HOTPLUG=y
@@ -298,6 +299,7 @@ CONFIG_CRYPTO_CCM=m
 CONFIG_CRYPTO_PCBC=m
 CONFIG_CRYPTO_HMAC=y
 CONFIG_CRYPTO_MICHAEL_MIC=m
+CONFIG_CRYPTO_SHA256=y
 CONFIG_CRYPTO_TGR192=m
 CONFIG_CRYPTO_WP512=m
 CONFIG_CRYPTO_ANUBIS=m
diff --git a/arch/powerpc/configs/ppc64_defconfig 
b/arch/powerpc/configs/ppc64_defconfig
index 0a8d250cb97e..a0355ccc7f55 100644
--- a/arch/powerpc/configs/ppc64_defconfig
+++ b/arch/powerpc/configs/ppc64_defconfig
@@ -44,6 +44,7 @@ CONFIG_HZ_100=y
 CONFIG_BINFMT_MISC=m
 CONFIG_PPC_TRANSACTIONAL_MEM=y
 CONFIG_KEXEC=y
+CONFIG_KEXEC_FILE=y
 CONFIG_CRASH_DUMP=y
 CONFIG_IRQ_ALL_CPUS=y
 CONFIG_MEMORY_HOTREMOVE=y
@@ -333,6 +334,7 @@ CONFIG_CRYPTO_TEST=m
 CONFIG_CRYPTO_PCBC=m
 CONFIG_CRYPTO_HMAC=y
 CONFIG_CRYPTO_MICHAEL_MIC=m
+CONFIG_CRYPTO_SHA256=y
 CONFIG_CRYPTO_TGR192=m
 CONFIG_CRYPTO_WP512=m
 CONFIG_CRYPTO_ANUBIS=m
diff --git a/arch/powerpc/configs/pseries_defconfig 
b/arch/powerpc/configs/pseries_defconfig
index 654aeffc57ef..23af4a72930e 100644
--- a/arch/powerpc/configs/pseries_defconfig
+++ b/arch/powerpc/configs/pseries_defconfig
@@ -50,6 +50,7 @@ CONFIG_HZ_100=y
 CONFIG_BINFMT_MISC=m
 CONFIG_PPC_TRANSACTIONAL_MEM=y
 CONFIG_KEXEC=y
+CONFIG_KEXEC_FILE=y
 CONFIG_IRQ_ALL_CPUS=y
 CONFIG_MEMORY_HOTPLUG=y
 CONFIG_MEMORY_HOTREMOVE=y
@@ -300,6 +301,7 @@ CONFIG_CRYPTO_TEST=m
 CONFIG_CRYPTO_PCBC=m
 CONFIG_CRYPTO_HMAC=y
 CONFIG_CRYPTO_MICHAEL_MIC=m
+CONFIG_CRYPTO_SHA256=y
 CONFIG_CRYPTO_TGR192=m
 CONFIG_CRYPTO_WP512=m
 CONFIG_CRYPTO_ANUBIS=m
-- 
1.9.1



[PATCH v5 11/13] powerpc: Allow userspace to set device tree properties in kexec_file_load

2016-08-11 Thread Thiago Jung Bauermann
Implement the arch_kexec_verify_buffer hook to verify that a device
tree blob passed by userspace via kexec_file_load contains only nodes
and properties from a whitelist.

In elf64_load we merge those properties into the device tree that
will be passed to the next kernel.

Suggested-by: Michael Ellerman 
Signed-off-by: Thiago Jung Bauermann 
---
 arch/powerpc/include/asm/kexec.h   |   1 +
 arch/powerpc/kernel/kexec_elf_64.c |   9 ++
 arch/powerpc/kernel/machine_kexec_64.c | 242 +
 3 files changed, 252 insertions(+)

diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index f263cc867891..31bc64e07c8f 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -99,6 +99,7 @@ int setup_purgatory(struct kimage *image, const void 
*slave_code,
 int setup_new_fdt(void *fdt, unsigned long initrd_load_addr,
  unsigned long initrd_len, const char *cmdline);
 bool find_debug_console(const void *fdt, int chosen_node);
+int merge_partial_dtb(void *to, const void *from);
 #endif /* CONFIG_KEXEC_FILE */
 
 #else /* !CONFIG_KEXEC */
diff --git a/arch/powerpc/kernel/kexec_elf_64.c 
b/arch/powerpc/kernel/kexec_elf_64.c
index 49cba9509464..1b902ad66e2a 100644
--- a/arch/powerpc/kernel/kexec_elf_64.c
+++ b/arch/powerpc/kernel/kexec_elf_64.c
@@ -210,6 +210,15 @@ void *elf64_load(struct kimage *image, char *kernel_buf,
goto out;
}
 
+   /* Add nodes and properties from the DTB passed by userspace. */
+   if (image->dtb_buf) {
+   ret = merge_partial_dtb(fdt, image->dtb_buf);
+   if (ret) {
+   pr_err("Error merging partial device tree.\n");
+   goto out;
+   }
+   }
+
ret = setup_new_fdt(fdt, initrd_load_addr, initrd_len, cmdline);
if (ret)
goto out;
diff --git a/arch/powerpc/kernel/machine_kexec_64.c 
b/arch/powerpc/kernel/machine_kexec_64.c
index 527f98efe651..a484a6346146 100644
--- a/arch/powerpc/kernel/machine_kexec_64.c
+++ b/arch/powerpc/kernel/machine_kexec_64.c
@@ -35,6 +35,7 @@
 #include 
 
 #define SLAVE_CODE_SIZE256
+#define MAX_DT_PATH512
 
 #ifdef CONFIG_KEXEC_FILE
 static struct kexec_file_ops *kexec_file_loaders[] = {
@@ -908,4 +909,245 @@ bool find_debug_console(const void *fdt, int chosen_node)
return false;
 }
 
+/**
+ * struct allowed_node - a node in the whitelist and its allowed properties.
+ * @name:  node name or full node path
+ * @properties:NULL-terminated array of names or name=value 
pairs
+ *
+ * If name starts with /, then the node has to be at the specified path in
+ * the device tree (including unit addresses for all nodes in the path).
+ * If it doesn't, then the node can be anywhere in the device tree.
+ *
+ * An entry in properties can specify a string value that the property must
+ * have by using the "name=value" format. If the entry ends with =, it means
+ * that the property must be empty.
+ */
+static struct allowed_node {
+   const char *name;
+   const char *properties[9];
+} allowed_nodes[] = {
+   {
+   .name = "/chosen",
+   .properties = {
+   "stdout-path",
+   "linux,stdout-path",
+   NULL,
+   }
+   },
+   {
+   .name = "vga",
+   .properties = {
+   "device_type=display",
+   "assigned-addresses",
+   "width",
+   "height",
+   "depth",
+   "little-endian=",
+   "linux,opened=",
+   "linux,boot-display=",
+   NULL,
+   }
+   },
+};
+
+/**
+ * verify_properties() - verify that all properties in a node are allowed
+ * @properties:Array of allowed properties in the node.
+ * @fdt:   Device tree blob.
+ * @node:  Offset to node being verified.
+ *
+ * Return: 0 on success, negative errno on error.
+ */
+static int verify_properties(const char *properties[], const void *fdt, int 
node)
+{
+   int prop;
+
+   for (prop = fdt_first_property_offset(fdt, node); prop >= 0;
+prop = fdt_next_property_offset(fdt, prop)) {
+   const char *prop_name;
+   const void *prop_val;
+   int i;
+
+   prop_val = fdt_getprop_by_offset(fdt, prop, &prop_name, NULL);
+   if (prop_val == NULL) {
+   pr_debug("Error reading device tree.\n");
+   return -EINVAL;
+   }
+
+   for (i = 0; properties[i] != NULL; i++) {
+   size_t len;
+   const char *allowed_prop = properties[i];
+
+   len = strchrnul(allowed_prop, '=') - allowed_prop;
+  

[PATCH v5 10/13] powerpc: Add support for loading ELF kernels with kexec_file_load.

2016-08-11 Thread Thiago Jung Bauermann
This uses all the infrastructure built up by the previous patches
in the series to load an ELF vmlinux file and an initrd. It uses the
flattened device tree at initial_boot_params as a base and adjusts memory
reservations and its /chosen node for the next kernel.

Signed-off-by: Thiago Jung Bauermann 
---
 arch/powerpc/include/asm/kexec_elf_64.h |  10 ++
 arch/powerpc/kernel/Makefile|   1 +
 arch/powerpc/kernel/kexec_elf_64.c  | 284 
 arch/powerpc/kernel/machine_kexec_64.c  |   5 +-
 4 files changed, 299 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/kexec_elf_64.h 
b/arch/powerpc/include/asm/kexec_elf_64.h
new file mode 100644
index ..30da6bc0ccf8
--- /dev/null
+++ b/arch/powerpc/include/asm/kexec_elf_64.h
@@ -0,0 +1,10 @@
+#ifndef __POWERPC_KEXEC_ELF_64_H__
+#define __POWERPC_KEXEC_ELF_64_H__
+
+#ifdef CONFIG_KEXEC_FILE
+
+extern struct kexec_file_ops kexec_elf64_ops;
+
+#endif /* CONFIG_KEXEC_FILE */
+
+#endif /* __POWERPC_KEXEC_ELF_64_H__ */
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index ce18a985bcfc..d149f5ebac90 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -109,6 +109,7 @@ obj-$(CONFIG_PCI)   += pci_$(CONFIG_WORD_SIZE).o 
$(pci64-y) \
 obj-$(CONFIG_PCI_MSI)  += msi.o
 obj-$(CONFIG_KEXEC)+= machine_kexec.o crash.o \
   machine_kexec_$(CONFIG_WORD_SIZE).o
+obj-$(CONFIG_KEXEC_FILE)   += kexec_elf_$(CONFIG_WORD_SIZE).o
 obj-$(CONFIG_AUDIT)+= audit.o
 obj64-$(CONFIG_AUDIT)  += compat_audit.o
 
diff --git a/arch/powerpc/kernel/kexec_elf_64.c 
b/arch/powerpc/kernel/kexec_elf_64.c
new file mode 100644
index ..49cba9509464
--- /dev/null
+++ b/arch/powerpc/kernel/kexec_elf_64.c
@@ -0,0 +1,284 @@
+/*
+ * Load ELF vmlinux file for the kexec_file_load syscall.
+ *
+ * Copyright (C) 2004  Adam Litke (a...@us.ibm.com)
+ * Copyright (C) 2004  IBM Corp.
+ * Copyright (C) 2005  R Sharada (shar...@in.ibm.com)
+ * Copyright (C) 2006  Mohan Kumar M (mo...@in.ibm.com)
+ * Copyright (C) 2016  IBM Corporation
+ *
+ * Based on kexec-tools' kexec-elf-exec.c and kexec-elf-ppc64.c.
+ * Heavily modified for the kernel by
+ * Thiago Jung Bauermann .
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation (version 2 of the License).
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#define pr_fmt(fmt)"kexec_elf: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+extern size_t kexec_purgatory_size;
+
+#define PURGATORY_STACK_SIZE   (16 * 1024)
+
+/**
+ * build_elf_exec_info - read ELF executable and check that we can use it
+ */
+static int build_elf_exec_info(const char *buf, size_t len, struct elfhdr 
*ehdr,
+  struct elf_info *elf_info)
+{
+   int i;
+   int ret;
+
+   ret = elf_read_from_buffer(buf, len, ehdr, elf_info);
+   if (ret)
+   return ret;
+
+   /* Big endian vmlinux has type ET_DYN. */
+   if (ehdr->e_type != ET_EXEC && ehdr->e_type != ET_DYN) {
+   pr_err("Not an ELF executable.\n");
+   goto error;
+   } else if (!elf_info->proghdrs) {
+   pr_err("No ELF program header.\n");
+   goto error;
+   }
+
+   for (i = 0; i < ehdr->e_phnum; i++) {
+   /*
+* Kexec does not support loading interpreters.
+* In addition this check keeps us from attempting
+* to kexec ordinay executables.
+*/
+   if (elf_info->proghdrs[i].p_type == PT_INTERP) {
+   pr_err("Requires an ELF interpreter.\n");
+   goto error;
+   }
+   }
+
+   return 0;
+error:
+   elf_free_info(elf_info);
+   return -ENOEXEC;
+}
+
+static int elf64_probe(const char *buf, unsigned long len)
+{
+   struct elfhdr ehdr;
+   struct elf_info elf_info;
+   int ret;
+
+   ret = build_elf_exec_info(buf, len, &ehdr, &elf_info);
+   if (ret)
+   return ret;
+
+   elf_free_info(&elf_info);
+
+   return elf_check_arch(&ehdr) ? 0 : -ENOEXEC;
+}
+
+/**
+ * elf_exec_load - load ELF executable image
+ * @lowest_load_addr:  On return, will be the address where the first PT_LOAD
+ * section will be loaded in memory.
+ *
+ * Return:
+ * 0 on success, negative value on failure.
+ */
+static int elf_exec_load(struct kimage *image, struct elfhdr *ehdr,
+struct elf_info *elf_info,
+un

[PATCH v5 09/13] powerpc: Add code to work with device trees in kexec_file_load.

2016-08-11 Thread Thiago Jung Bauermann
kexec_file_load needs to set up the device tree that will be used
by the next kernel and check whether it provides a console
that can be used by the purgatory.

Signed-off-by: Thiago Jung Bauermann 
---
 arch/powerpc/include/asm/kexec.h   |   3 +
 arch/powerpc/kernel/machine_kexec_64.c | 222 +
 2 files changed, 225 insertions(+)

diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index 83b81b7bdca1..f263cc867891 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -96,6 +96,9 @@ int setup_purgatory(struct kimage *image, const void 
*slave_code,
const void *fdt, unsigned long kernel_load_addr,
unsigned long fdt_load_addr, unsigned long stack_top,
int debug);
+int setup_new_fdt(void *fdt, unsigned long initrd_load_addr,
+ unsigned long initrd_len, const char *cmdline);
+bool find_debug_console(const void *fdt, int chosen_node);
 #endif /* CONFIG_KEXEC_FILE */
 
 #else /* !CONFIG_KEXEC */
diff --git a/arch/powerpc/kernel/machine_kexec_64.c 
b/arch/powerpc/kernel/machine_kexec_64.c
index 1e678dc5096a..897b724ea9fd 100644
--- a/arch/powerpc/kernel/machine_kexec_64.c
+++ b/arch/powerpc/kernel/machine_kexec_64.c
@@ -683,4 +683,226 @@ int setup_purgatory(struct kimage *image, const void 
*slave_code,
return 0;
 }
 
+/*
+ * setup_new_fdt() - modify /chosen and memory reservation for the next kernel
+ * @fdt:
+ * @initrd_load_addr:  Address where the next initrd will be loaded.
+ * @initrd_len:Size of the next initrd, or 0 if there will be 
none.
+ * @cmdline:   Command line for the next kernel, or NULL if there will
+ * be none.
+ *
+ * Return: 0 on success, or negative errno on error.
+ */
+int setup_new_fdt(void *fdt, unsigned long initrd_load_addr,
+ unsigned long initrd_len, const char *cmdline)
+{
+   uint64_t oldfdt_addr;
+   int i, ret, chosen_node;
+   const void *prop;
+
+   /* Remove memory reservation for the current device tree. */
+   oldfdt_addr = __pa(initial_boot_params);
+   for (i = 0; i < fdt_num_mem_rsv(fdt); i++) {
+   uint64_t rsv_start, rsv_size;
+
+   ret = fdt_get_mem_rsv(fdt, i, &rsv_start, &rsv_size);
+   if (ret) {
+   pr_err("Malformed device tree.\n");
+   return -EINVAL;
+   }
+
+   if (rsv_start == oldfdt_addr &&
+   rsv_size == fdt_totalsize(initial_boot_params)) {
+   ret = fdt_del_mem_rsv(fdt, i);
+   if (ret) {
+   pr_err("Error deleting fdt reservation.\n");
+   return -EINVAL;
+   }
+
+   pr_debug("Removed old device tree reservation.\n");
+   break;
+   }
+   }
+
+   chosen_node = fdt_path_offset(fdt, "/chosen");
+   if (chosen_node == -FDT_ERR_NOTFOUND) {
+   chosen_node = fdt_add_subnode(fdt, fdt_path_offset(fdt, "/"),
+ "chosen");
+   if (chosen_node < 0) {
+   pr_err("Error creating /chosen.\n");
+   return -EINVAL;
+   }
+   } else if (chosen_node < 0) {
+   pr_err("Malformed device tree: error reading /chosen.\n");
+   return -EINVAL;
+   }
+
+   /* Did we boot using an initrd? */
+   prop = fdt_getprop(fdt, chosen_node, "linux,initrd-start", NULL);
+   if (prop) {
+   uint64_t tmp_start, tmp_end, tmp_size, tmp_sizepg;
+
+   tmp_start = fdt64_to_cpu(*((const fdt64_t *) prop));
+
+   prop = fdt_getprop(fdt, chosen_node, "linux,initrd-end", NULL);
+   if (!prop) {
+   pr_err("Malformed device tree.\n");
+   return -EINVAL;
+   }
+   tmp_end = fdt64_to_cpu(*((const fdt64_t *) prop));
+
+   /*
+* kexec reserves exact initrd size, while firmware may
+* reserve a multiple of PAGE_SIZE, so check for both.
+*/
+   tmp_size = tmp_end - tmp_start;
+   tmp_sizepg = round_up(tmp_size, PAGE_SIZE);
+
+   /* Remove memory reservation for the current initrd. */
+   for (i = 0; i < fdt_num_mem_rsv(fdt); i++) {
+   uint64_t rsv_start, rsv_size;
+
+   ret = fdt_get_mem_rsv(fdt, i, &rsv_start, &rsv_size);
+   if (ret) {
+   pr_err("Malformed device tree.\n");
+   return -EINVAL;
+   }
+
+   if (rsv_start == tmp_start &&
+   (rsv_size == tmp_size || rsv_size == tmp_sizepg)) {
+   

[PATCH v5 07/13] powerpc: Add functions to read ELF files of any endianness.

2016-08-11 Thread Thiago Jung Bauermann
A little endian kernel might need to kexec a big endian kernel (the
opposite is less likely but could happen as well), so we can't just cast
the buffer with the binary to ELF structs and use them as is done
elsewhere.

This patch adds functions which do byte-swapping as necessary when
populating the ELF structs. These functions will be used in the next
patch in the series.

Signed-off-by: Thiago Jung Bauermann 
---
 arch/powerpc/include/asm/elf_util.h |  19 ++
 arch/powerpc/kernel/Makefile|   2 +-
 arch/powerpc/kernel/elf_util.c  | 476 
 3 files changed, 496 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/elf_util.h 
b/arch/powerpc/include/asm/elf_util.h
index 3405eeabe542..18703d56eabd 100644
--- a/arch/powerpc/include/asm/elf_util.h
+++ b/arch/powerpc/include/asm/elf_util.h
@@ -20,6 +20,14 @@
 #include 
 
 struct elf_info {
+   /*
+* Where the ELF binary contents are kept.
+* Memory managed by the user of the struct.
+*/
+   const char *buffer;
+
+   const struct elfhdr *ehdr;
+   const struct elf_phdr *proghdrs;
struct elf_shdr *sechdrs;
 
/* Index of stubs section. */
@@ -63,6 +71,17 @@ static inline unsigned long my_r2(const struct elf_info 
*elf_info)
return elf_info->sechdrs[elf_info->toc_section].sh_addr + 0x8000;
 }
 
+static inline bool elf_is_elf_file(const struct elfhdr *ehdr)
+{
+   return memcmp(ehdr->e_ident, ELFMAG, SELFMAG) == 0;
+}
+
+int elf_read_from_buffer(const char *buf, size_t len, struct elfhdr *ehdr,
+struct elf_info *elf_info);
+void elf_init_elf_info(const struct elfhdr *ehdr, struct elf_shdr *sechdrs,
+  struct elf_info *elf_info);
+void elf_free_info(struct elf_info *elf_info);
+
 int elf64_apply_relocate_add(const struct elf_info *elf_info,
 const char *strtab, const Elf64_Rela *rela,
 unsigned int num_rela, void *syms_base,
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index e38aace0a6e7..6159ec6ac032 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -124,7 +124,7 @@ obj-y   += iomap.o
 endif
 
 ifeq ($(CONFIG_MODULES)$(CONFIG_WORD_SIZE),y64)
-obj-y  += elf_util_64.o
+obj-y  += elf_util.o elf_util_64.o
 endif
 
 obj64-$(CONFIG_PPC_TRANSACTIONAL_MEM)  += tm.o
diff --git a/arch/powerpc/kernel/elf_util.c b/arch/powerpc/kernel/elf_util.c
new file mode 100644
index ..1df4a116ad90
--- /dev/null
+++ b/arch/powerpc/kernel/elf_util.c
@@ -0,0 +1,476 @@
+/*
+ * Utility functions to work with ELF files.
+ *
+ * Copyright (C) 2016, IBM Corporation
+ *
+ * Based on kexec-tools' kexec-elf.c. Heavily modified for the
+ * kernel by Thiago Jung Bauermann .
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation (version 2 of the License).
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+
+#if ELF_CLASS == ELFCLASS32
+#define elf_addr_to_cpuelf32_to_cpu
+
+#ifndef Elf_Rel
+#define Elf_RelElf32_Rel
+#endif /* Elf_Rel */
+#else /* ELF_CLASS == ELFCLASS32 */
+#define elf_addr_to_cpuelf64_to_cpu
+
+#ifndef Elf_Rel
+#define Elf_RelElf64_Rel
+#endif /* Elf_Rel */
+
+static uint64_t elf64_to_cpu(const struct elfhdr *ehdr, uint64_t value)
+{
+   if (ehdr->e_ident[EI_DATA] == ELFDATA2LSB)
+   value = le64_to_cpu(value);
+   else if (ehdr->e_ident[EI_DATA] == ELFDATA2MSB)
+   value = be64_to_cpu(value);
+
+   return value;
+}
+#endif /* ELF_CLASS == ELFCLASS32 */
+
+static uint16_t elf16_to_cpu(const struct elfhdr *ehdr, uint16_t value)
+{
+   if (ehdr->e_ident[EI_DATA] == ELFDATA2LSB)
+   value = le16_to_cpu(value);
+   else if (ehdr->e_ident[EI_DATA] == ELFDATA2MSB)
+   value = be16_to_cpu(value);
+
+   return value;
+}
+
+static uint32_t elf32_to_cpu(const struct elfhdr *ehdr, uint32_t value)
+{
+   if (ehdr->e_ident[EI_DATA] == ELFDATA2LSB)
+   value = le32_to_cpu(value);
+   else if (ehdr->e_ident[EI_DATA] == ELFDATA2MSB)
+   value = be32_to_cpu(value);
+
+   return value;
+}
+
+/**
+ * elf_is_ehdr_sane - check that it is safe to use the ELF header
+ * @buf_len:   size of the buffer in which the ELF file is loaded.
+ */
+static bool elf_is_ehdr_sane(const struct elfhdr *ehdr, size_t buf_len)
+{
+   if (ehdr->e_phnum > 0 && ehdr->e_phentsize != sizeof(struc

[PATCH v5 08/13] powerpc: Implement kexec_file_load.

2016-08-11 Thread Thiago Jung Bauermann
arch_kexec_walk_mem and arch_kexec_apply_relocations_add are used by
generic kexec code, while setup_purgatory is powerpc-specific and sets
runtime variables needed by the powerpc purgatory implementation.

Signed-off-by: Josh Sklar 
Signed-off-by: Thiago Jung Bauermann 
---
 arch/powerpc/Kconfig   |  13 ++
 arch/powerpc/include/asm/kexec.h   |   7 +
 arch/powerpc/include/asm/systbl.h  |   1 +
 arch/powerpc/include/asm/unistd.h  |   2 +-
 arch/powerpc/include/uapi/asm/unistd.h |   1 +
 arch/powerpc/kernel/Makefile   |   4 +-
 arch/powerpc/kernel/machine_kexec_64.c | 252 +
 7 files changed, 278 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index ec4047e170a0..ff362ca60d1b 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -459,6 +459,19 @@ config KEXEC
  interface is strongly in flux, so no good recommendation can be
  made.
 
+config KEXEC_FILE
+   bool "kexec file based system call"
+   select KEXEC_CORE
+   select BUILD_BIN2C
+   depends on PPC64
+   depends on CRYPTO=y
+   depends on CRYPTO_SHA256=y
+   help
+ This is a new version of the kexec system call. This call is
+ file based and takes in file descriptors as system call arguments
+ for kernel and initramfs as opposed to a list of segments as is the
+ case for the older kexec call.
+
 config RELOCATABLE
bool "Build a relocatable kernel"
depends on (PPC64 && !COMPILE_TEST) || (FLATMEM && (44x || FSL_BOOKE))
diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index a46f5f45570c..83b81b7bdca1 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -91,6 +91,13 @@ static inline bool kdump_in_progress(void)
return crashing_cpu >= 0;
 }
 
+#ifdef CONFIG_KEXEC_FILE
+int setup_purgatory(struct kimage *image, const void *slave_code,
+   const void *fdt, unsigned long kernel_load_addr,
+   unsigned long fdt_load_addr, unsigned long stack_top,
+   int debug);
+#endif /* CONFIG_KEXEC_FILE */
+
 #else /* !CONFIG_KEXEC */
 static inline void crash_kexec_secondary(struct pt_regs *regs) { }
 
diff --git a/arch/powerpc/include/asm/systbl.h 
b/arch/powerpc/include/asm/systbl.h
index 2fc5d4db503c..4b369d83fe9c 100644
--- a/arch/powerpc/include/asm/systbl.h
+++ b/arch/powerpc/include/asm/systbl.h
@@ -386,3 +386,4 @@ SYSCALL(mlock2)
 SYSCALL(copy_file_range)
 COMPAT_SYS_SPU(preadv2)
 COMPAT_SYS_SPU(pwritev2)
+SYSCALL(kexec_file_load)
diff --git a/arch/powerpc/include/asm/unistd.h 
b/arch/powerpc/include/asm/unistd.h
index cf12c580f6b2..a01e97d3f305 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -12,7 +12,7 @@
 #include 
 
 
-#define NR_syscalls382
+#define NR_syscalls383
 
 #define __NR__exit __NR_exit
 
diff --git a/arch/powerpc/include/uapi/asm/unistd.h 
b/arch/powerpc/include/uapi/asm/unistd.h
index e9f5f41aa55a..2f26335a3c42 100644
--- a/arch/powerpc/include/uapi/asm/unistd.h
+++ b/arch/powerpc/include/uapi/asm/unistd.h
@@ -392,5 +392,6 @@
 #define __NR_copy_file_range   379
 #define __NR_preadv2   380
 #define __NR_pwritev2  381
+#define __NR_kexec_file_load   382
 
 #endif /* _UAPI_ASM_POWERPC_UNISTD_H_ */
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 6159ec6ac032..ce18a985bcfc 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -123,9 +123,11 @@ ifneq ($(CONFIG_PPC_INDIRECT_PIO),y)
 obj-y  += iomap.o
 endif
 
-ifeq ($(CONFIG_MODULES)$(CONFIG_WORD_SIZE),y64)
+ifneq ($(CONFIG_MODULES)$(CONFIG_KEXEC_FILE),)
+ifeq ($(CONFIG_WORD_SIZE),64)
 obj-y  += elf_util.o elf_util_64.o
 endif
+endif
 
 obj64-$(CONFIG_PPC_TRANSACTIONAL_MEM)  += tm.o
 
diff --git a/arch/powerpc/kernel/machine_kexec_64.c 
b/arch/powerpc/kernel/machine_kexec_64.c
index 4c780a342282..1e678dc5096a 100644
--- a/arch/powerpc/kernel/machine_kexec_64.c
+++ b/arch/powerpc/kernel/machine_kexec_64.c
@@ -18,6 +18,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
@@ -31,6 +33,12 @@
 #include 
 #include 
 
+#define SLAVE_CODE_SIZE256
+
+#ifdef CONFIG_KEXEC_FILE
+static struct kexec_file_ops *kexec_file_loaders[] = { };
+#endif
+
 #ifdef CONFIG_PPC_BOOK3E
 int default_machine_kexec_prepare(struct kimage *image)
 {
@@ -432,3 +440,247 @@ static int __init export_htab_values(void)
 }
 late_initcall(export_htab_values);
 #endif /* CONFIG_PPC_STD_MMU_64 */
+
+#ifdef CONFIG_KEXEC_FILE
+int arch_kexec_kernel_image_probe(struct kimage *image, void *buf,
+ unsigned long buf_len)
+{
+   int i, ret = -ENOEXEC;
+   struct kexec_file_ops *fops;
+
+   /* We don't support crash kernels yet. */
+   if (image->type =

[PATCH v5 06/13] powerpc: Adapt elf64_apply_relocate_add for kexec_file_load.

2016-08-11 Thread Thiago Jung Bauermann
Extend elf64_apply_relocate_add to support relative symbols. This is
necessary because there is a difference between how the module loading
mechanism and the kexec purgatory loading code use Elf64_Sym.st_value
at relocation time: the former changes st_value to point to the absolute
memory address before relocating the module, while the latter does that
adjustment during relocation of the purgatory.

Also, add a check_symbols argument so that the kexec code can be stricter
about undefined symbols.

Finally, add relocation types used by the purgatory.

Signed-off-by: Thiago Jung Bauermann 
---
 arch/powerpc/include/asm/elf_util.h |  1 +
 arch/powerpc/kernel/elf_util_64.c   | 84 -
 arch/powerpc/kernel/module_64.c |  5 ++-
 3 files changed, 88 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/elf_util.h 
b/arch/powerpc/include/asm/elf_util.h
index a012ba03282d..3405eeabe542 100644
--- a/arch/powerpc/include/asm/elf_util.h
+++ b/arch/powerpc/include/asm/elf_util.h
@@ -67,6 +67,7 @@ int elf64_apply_relocate_add(const struct elf_info *elf_info,
 const char *strtab, const Elf64_Rela *rela,
 unsigned int num_rela, void *syms_base,
 void *loc_base, Elf64_Addr addr_base,
+bool relative_symbols, bool check_symbols,
 const char *obj_name);
 
 #endif /* _ASM_POWERPC_ELF_UTIL_H */
diff --git a/arch/powerpc/kernel/elf_util_64.c 
b/arch/powerpc/kernel/elf_util_64.c
index 8e5d400ac9f2..80f209a42abd 100644
--- a/arch/powerpc/kernel/elf_util_64.c
+++ b/arch/powerpc/kernel/elf_util_64.c
@@ -74,6 +74,8 @@ static void squash_toc_save_inst(const char *name, unsigned 
long addr) { }
  * @syms_base: Contents of the associated symbol table.
  * @loc_base:  Contents of the section to which relocations apply.
  * @addr_base: The address where the section will be loaded in memory.
+ * @relative_symbols:  Are the symbols' st_value members relative?
+ * @check_symbols: Fail if an unexpected symbol is found?
  * @obj_name:  The name of the ELF binary, for information messages.
  *
  * Applies RELA relocations to an ELF file already at its final location
@@ -84,11 +86,13 @@ int elf64_apply_relocate_add(const struct elf_info 
*elf_info,
 const char *strtab, const Elf64_Rela *rela,
 unsigned int num_rela, void *syms_base,
 void *loc_base, Elf64_Addr addr_base,
+bool relative_symbols, bool check_symbols,
 const char *obj_name)
 {
unsigned int i;
unsigned long *location;
unsigned long address;
+   unsigned long sec_base;
unsigned long value;
const char *name;
Elf64_Sym *sym;
@@ -121,8 +125,36 @@ int elf64_apply_relocate_add(const struct elf_info 
*elf_info,
   name, (unsigned long)sym->st_value,
   (long)rela[i].r_addend);
 
+   if (check_symbols) {
+   /*
+* TOC symbols appear as undefined but should be
+* resolved as well, so allow them to be processed.
+*/
+   if (sym->st_shndx == SHN_UNDEF &&
+   strcmp(name, ".TOC.") != 0) {
+   pr_err("Undefined symbol: %s\n", name);
+   return -ENOEXEC;
+   } else if (sym->st_shndx == SHN_COMMON) {
+   pr_err("Symbol '%s' in common section.\n", 
name);
+   return -ENOEXEC;
+   }
+   }
+
+   if (relative_symbols && sym->st_shndx != SHN_ABS) {
+   if (sym->st_shndx >= elf_info->ehdr->e_shnum) {
+   pr_err("Invalid section %d for symbol %s\n",
+  sym->st_shndx, name);
+   return -ENOEXEC;
+   } else {
+   struct elf_shdr *sechdrs = elf_info->sechdrs;
+
+   sec_base = sechdrs[sym->st_shndx].sh_addr;
+   }
+   } else
+   sec_base = 0;
+
/* `Everything is relative'. */
-   value = sym->st_value + rela[i].r_addend;
+   value = sym->st_value + sec_base + rela[i].r_addend;
 
switch (ELF64_R_TYPE(rela[i].r_info)) {
case R_PPC64_ADDR32:
@@ -135,6 +167,10 @@ int elf64_apply_relocate_add(const struct elf_info 
*elf_info,
*(unsigned long *)location = value;
break;
 
+   case R_PPC64_REL32:
+   *(uint32_t *)location = value - 
(uint32_t)(uint64_t

[PATCH v5 05/13] powerpc: Generalize elf64_apply_relocate_add.

2016-08-11 Thread Thiago Jung Bauermann
When apply_relocate_add is called, modules are already loaded at their
final location in memory so Elf64_Shdr.sh_addr can be used for accessing
the section contents as well as the base address for relocations.

This is not the case for kexec's purgatory, because it will only be
copied to its final location right before being executed. Therefore,
it needs to be relocated while it is still in a temporary buffer. In
this case, Elf64_Shdr.sh_addr can't be used to access the sections'
contents.

This patch allows elf64_apply_relocate_add to be used when the ELF
binary is not yet at its final location by adding an addr_base argument
to specify the address at which the section will be loaded, and rela,
loc_base and syms_base to point to the sections' contents.

Signed-off-by: Thiago Jung Bauermann 
---
 arch/powerpc/include/asm/elf_util.h |  6 ++--
 arch/powerpc/kernel/elf_util_64.c   | 63 +
 arch/powerpc/kernel/module_64.c | 17 --
 3 files changed, 61 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/include/asm/elf_util.h 
b/arch/powerpc/include/asm/elf_util.h
index 37372559fe62..a012ba03282d 100644
--- a/arch/powerpc/include/asm/elf_util.h
+++ b/arch/powerpc/include/asm/elf_util.h
@@ -64,7 +64,9 @@ static inline unsigned long my_r2(const struct elf_info 
*elf_info)
 }
 
 int elf64_apply_relocate_add(const struct elf_info *elf_info,
-const char *strtab, unsigned int symindex,
-unsigned int relsec, const char *obj_name);
+const char *strtab, const Elf64_Rela *rela,
+unsigned int num_rela, void *syms_base,
+void *loc_base, Elf64_Addr addr_base,
+const char *obj_name);
 
 #endif /* _ASM_POWERPC_ELF_UTIL_H */
diff --git a/arch/powerpc/kernel/elf_util_64.c 
b/arch/powerpc/kernel/elf_util_64.c
index decad2c34f38..8e5d400ac9f2 100644
--- a/arch/powerpc/kernel/elf_util_64.c
+++ b/arch/powerpc/kernel/elf_util_64.c
@@ -69,33 +69,56 @@ static void squash_toc_save_inst(const char *name, unsigned 
long addr) { }
  * elf64_apply_relocate_add - apply 64 bit RELA relocations
  * @elf_info:  Support information for the ELF binary being relocated.
  * @strtab:String table for the associated symbol table.
- * @symindex:  Section header index for the associated symbol table.
- * @relsec:Section header index for the relocations to apply.
+ * @rela:  Contents of the section with the relocations to apply.
+ * @num_rela:  Number of relocation entries in the section.
+ * @syms_base: Contents of the associated symbol table.
+ * @loc_base:  Contents of the section to which relocations apply.
+ * @addr_base: The address where the section will be loaded in memory.
  * @obj_name:  The name of the ELF binary, for information messages.
+ *
+ * Applies RELA relocations to an ELF file already at its final location
+ * in memory (in which case loc_base == addr_base), or still in a temporary
+ * buffer.
  */
 int elf64_apply_relocate_add(const struct elf_info *elf_info,
-const char *strtab, unsigned int symindex,
-unsigned int relsec, const char *obj_name)
+const char *strtab, const Elf64_Rela *rela,
+unsigned int num_rela, void *syms_base,
+void *loc_base, Elf64_Addr addr_base,
+const char *obj_name)
 {
unsigned int i;
-   Elf64_Shdr *sechdrs = elf_info->sechdrs;
-   Elf64_Rela *rela = (void *)sechdrs[relsec].sh_addr;
-   Elf64_Sym *sym;
unsigned long *location;
+   unsigned long address;
unsigned long value;
+   const char *name;
+   Elf64_Sym *sym;
+
+   for (i = 0; i < num_rela; i++) {
+   /*
+* rels[i].r_offset contains the byte offset from the beginning
+* of section to the storage unit affected.
+*
+* This is the location to update in the temporary buffer where
+* the section is currently loaded. The section will finally
+* be loaded to a different address later, pointed to by
+* addr_base.
+*/
+   location = loc_base + rela[i].r_offset;
+
+   /* Final address of the location. */
+   address = addr_base + rela[i].r_offset;
 
+   /* This is the symbol the relocation is referring to. */
+   sym = (Elf64_Sym *) syms_base + ELF64_R_SYM(rela[i].r_info);
 
-   for (i = 0; i < sechdrs[relsec].sh_size / sizeof(*rela); i++) {
-   /* This is where to make the change */
-   location = (void *)sechdrs[sechdrs[relsec].sh_info].sh_addr
-   + rela[i].r_offset;
-   /* This i

[PATCH v5 04/13] powerpc: Factor out relocation code from module_64.c to elf_util_64.c.

2016-08-11 Thread Thiago Jung Bauermann
The kexec_file_load system call needs to relocate the purgatory, so
factor out the module relocation code so that it can be shared.

This patch's purpose is to move the ELF relocation logic from
apply_relocate_add to elf_util_64.c with as few changes as
possible. The following changes were needed:

To avoid having module-specific code in a general purpose utility
function, struct elf_info was created to contain the information
needed for ELF binaries manipulation.

my_r2, stub_for_addr and create_stub were changed to use it instead of
having to receive a struct module, since they are called from
elf64_apply_relocate_add.

local_entry_offset and squash_toc_save_inst were only used by
apply_rellocate_add, so they were moved to elf_util_64.c as well.

Signed-off-by: Thiago Jung Bauermann 
---
 arch/powerpc/include/asm/elf_util.h |  70 
 arch/powerpc/include/asm/module.h   |  14 +-
 arch/powerpc/kernel/Makefile|   4 +
 arch/powerpc/kernel/elf_util_64.c   | 269 +++
 arch/powerpc/kernel/module_64.c | 312 
 5 files changed, 386 insertions(+), 283 deletions(-)

diff --git a/arch/powerpc/include/asm/elf_util.h 
b/arch/powerpc/include/asm/elf_util.h
new file mode 100644
index ..37372559fe62
--- /dev/null
+++ b/arch/powerpc/include/asm/elf_util.h
@@ -0,0 +1,70 @@
+/*
+ * Utility functions to work with ELF files.
+ *
+ * Copyright (C) 2016, IBM Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef _ASM_POWERPC_ELF_UTIL_H
+#define _ASM_POWERPC_ELF_UTIL_H
+
+#include 
+
+struct elf_info {
+   struct elf_shdr *sechdrs;
+
+   /* Index of stubs section. */
+   unsigned int stubs_section;
+   /* Index of TOC section. */
+   unsigned int toc_section;
+};
+
+#ifdef __powerpc64__
+#ifdef PPC64_ELF_ABI_v2
+
+/* An address is simply the address of the function. */
+typedef unsigned long func_desc_t;
+#else
+
+/* An address is address of the OPD entry, which contains address of fn. */
+typedef struct ppc64_opd_entry func_desc_t;
+#endif /* PPC64_ELF_ABI_v2 */
+
+/* Like PPC32, we need little trampolines to do > 24-bit jumps (into
+   the kernel itself).  But on PPC64, these need to be used for every
+   jump, actually, to reset r2 (TOC+0x8000). */
+struct ppc64_stub_entry
+{
+   /* 28 byte jump instruction sequence (7 instructions). We only
+* need 6 instructions on ABIv2 but we always allocate 7 so
+* so we don't have to modify the trampoline load instruction. */
+   u32 jump[7];
+   /* Used by ftrace to identify stubs */
+   u32 magic;
+   /* Data for the above code */
+   func_desc_t funcdata;
+};
+#endif
+
+/* r2 is the TOC pointer: it actually points 0x8000 into the TOC (this
+   gives the value maximum span in an instruction which uses a signed
+   offset) */
+static inline unsigned long my_r2(const struct elf_info *elf_info)
+{
+   return elf_info->sechdrs[elf_info->toc_section].sh_addr + 0x8000;
+}
+
+int elf64_apply_relocate_add(const struct elf_info *elf_info,
+const char *strtab, unsigned int symindex,
+unsigned int relsec, const char *obj_name);
+
+#endif /* _ASM_POWERPC_ELF_UTIL_H */
diff --git a/arch/powerpc/include/asm/module.h 
b/arch/powerpc/include/asm/module.h
index cd4ffd86765f..f2073115d518 100644
--- a/arch/powerpc/include/asm/module.h
+++ b/arch/powerpc/include/asm/module.h
@@ -12,7 +12,14 @@
 #include 
 #include 
 #include 
+#include 
 
+/* Both low and high 16 bits are added as SIGNED additions, so if low
+   16 bits has high bit set, high 16 bits must be adjusted.  These
+   macros do that (stolen from binutils). */
+#define PPC_LO(v) ((v) & 0x)
+#define PPC_HI(v) (((v) >> 16) & 0x)
+#define PPC_HA(v) PPC_HI ((v) + 0x8000)
 
 #ifndef __powerpc64__
 /*
@@ -33,8 +40,7 @@ struct ppc_plt_entry {
 
 struct mod_arch_specific {
 #ifdef __powerpc64__
-   unsigned int stubs_section; /* Index of stubs section in module */
-   unsigned int toc_section;   /* What section is the TOC? */
+   struct elf_info elf_info;
bool toc_fixed; /* Have we fixed up .TOC.? */
 #ifdef CONFIG_DYNAMIC_FTRACE
unsigned long toc;
@@ -90,6 +96,10 @@ static inline int module_finalize_ftrace(struct module *mod, 
const Elf_Shdr *sec
 }
 #endif
 
+unsigned long stub_for_addr(const struct elf_info *elf_info, unsigned long 
addr,
+   const char *obj_name);
+int restore_r2(u32 *instructi

[PATCH v5 02/13] kexec_file: Change kexec_add_buffer to take kexec_buf as argument.

2016-08-11 Thread Thiago Jung Bauermann
Adapt all callers to the new function prototype.

In addition, change the type of kexec_buf.buffer from char * to void *.
There is no particular reason for it to be a char *, and the change
allows us to get rid of 3 existing casts to char * in the code.

Signed-off-by: Thiago Jung Bauermann 
Acked-by: Dave Young 
---
 arch/x86/kernel/crash.c   | 37 
 arch/x86/kernel/kexec-bzimage64.c | 48 +++--
 include/linux/kexec.h |  8 +---
 kernel/kexec_file.c   | 88 ++-
 4 files changed, 87 insertions(+), 94 deletions(-)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 9616cf76940c..38a1cdf6aa05 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -615,9 +615,9 @@ static int determine_backup_region(u64 start, u64 end, void 
*arg)
 
 int crash_load_segments(struct kimage *image)
 {
-   unsigned long src_start, src_sz, elf_sz;
-   void *elf_addr;
int ret;
+   struct kexec_buf kbuf = { .image = image, .buf_min = 0,
+ .buf_max = ULONG_MAX, .top_down = false };
 
/*
 * Determine and load a segment for backup area. First 640K RAM
@@ -631,43 +631,44 @@ int crash_load_segments(struct kimage *image)
if (ret < 0)
return ret;
 
-   src_start = image->arch.backup_src_start;
-   src_sz = image->arch.backup_src_sz;
-
/* Add backup segment. */
-   if (src_sz) {
+   if (image->arch.backup_src_sz) {
+   kbuf.buffer = &crash_zero_bytes;
+   kbuf.bufsz = sizeof(crash_zero_bytes);
+   kbuf.memsz = image->arch.backup_src_sz;
+   kbuf.buf_align = PAGE_SIZE;
/*
 * Ideally there is no source for backup segment. This is
 * copied in purgatory after crash. Just add a zero filled
 * segment for now to make sure checksum logic works fine.
 */
-   ret = kexec_add_buffer(image, (char *)&crash_zero_bytes,
-  sizeof(crash_zero_bytes), src_sz,
-  PAGE_SIZE, 0, -1, 0,
-  &image->arch.backup_load_addr);
+   ret = kexec_add_buffer(&kbuf);
if (ret)
return ret;
+   image->arch.backup_load_addr = kbuf.mem;
pr_debug("Loaded backup region at 0x%lx backup_start=0x%lx 
memsz=0x%lx\n",
-image->arch.backup_load_addr, src_start, src_sz);
+image->arch.backup_load_addr,
+image->arch.backup_src_start, kbuf.memsz);
}
 
/* Prepare elf headers and add a segment */
-   ret = prepare_elf_headers(image, &elf_addr, &elf_sz);
+   ret = prepare_elf_headers(image, &kbuf.buffer, &kbuf.bufsz);
if (ret)
return ret;
 
-   image->arch.elf_headers = elf_addr;
-   image->arch.elf_headers_sz = elf_sz;
+   image->arch.elf_headers = kbuf.buffer;
+   image->arch.elf_headers_sz = kbuf.bufsz;
 
-   ret = kexec_add_buffer(image, (char *)elf_addr, elf_sz, elf_sz,
-   ELF_CORE_HEADER_ALIGN, 0, -1, 0,
-   &image->arch.elf_load_addr);
+   kbuf.memsz = kbuf.bufsz;
+   kbuf.buf_align = ELF_CORE_HEADER_ALIGN;
+   ret = kexec_add_buffer(&kbuf);
if (ret) {
vfree((void *)image->arch.elf_headers);
return ret;
}
+   image->arch.elf_load_addr = kbuf.mem;
pr_debug("Loaded ELF headers at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
-image->arch.elf_load_addr, elf_sz, elf_sz);
+image->arch.elf_load_addr, kbuf.bufsz, kbuf.bufsz);
 
return ret;
 }
diff --git a/arch/x86/kernel/kexec-bzimage64.c 
b/arch/x86/kernel/kexec-bzimage64.c
index f2356bda2b05..4b3a75329fb6 100644
--- a/arch/x86/kernel/kexec-bzimage64.c
+++ b/arch/x86/kernel/kexec-bzimage64.c
@@ -331,17 +331,17 @@ static void *bzImage64_load(struct kimage *image, char 
*kernel,
 
struct setup_header *header;
int setup_sects, kern16_size, ret = 0;
-   unsigned long setup_header_size, params_cmdline_sz, params_misc_sz;
+   unsigned long setup_header_size, params_cmdline_sz;
struct boot_params *params;
unsigned long bootparam_load_addr, kernel_load_addr, initrd_load_addr;
unsigned long purgatory_load_addr;
-   unsigned long kernel_bufsz, kernel_memsz, kernel_align;
-   char *kernel_buf;
struct bzimage64_data *ldata;
struct kexec_entry64_regs regs64;
void *stack;
unsigned int setup_hdr_offset = offsetof(struct boot_params, hdr);
unsigned int efi_map_offset, efi_map_sz, efi_setup_data_offset;
+   struct kexec_buf kbuf = { .image = image, .buf_max = ULONG_MAX,
+ .top_down = true };

[PATCH v5 03/13] kexec_file: Factor out kexec_locate_mem_hole from kexec_add_buffer.

2016-08-11 Thread Thiago Jung Bauermann
kexec_locate_mem_hole will be used by the PowerPC kexec_file_load
implementation to find free memory for the purgatory stack.

Signed-off-by: Thiago Jung Bauermann 
Acked-by: Dave Young 
---
 include/linux/kexec.h |  1 +
 kernel/kexec_file.c   | 25 -
 2 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 28bc9f335d0d..ceccc5856aab 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -176,6 +176,7 @@ struct kexec_buf {
 int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf,
   int (*func)(u64, u64, void *));
 extern int kexec_add_buffer(struct kexec_buf *kbuf);
+int kexec_locate_mem_hole(struct kexec_buf *kbuf);
 int __weak arch_kexec_verify_buffer(enum kexec_file_type type, const void *buf,
unsigned long size);
 #endif /* CONFIG_KEXEC_FILE */
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 58818264ad0e..772cb491715e 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -524,6 +524,23 @@ int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf,
 }
 
 /**
+ * kexec_locate_mem_hole - find free memory for the purgatory or the next 
kernel
+ * @kbuf:  Parameters for the memory search.
+ *
+ * On success, kbuf->mem will have the start address of the memory region 
found.
+ *
+ * Return: 0 on success, negative errno on error.
+ */
+int kexec_locate_mem_hole(struct kexec_buf *kbuf)
+{
+   int ret;
+
+   ret = arch_kexec_walk_mem(kbuf, locate_mem_hole_callback);
+
+   return ret == 1 ? 0 : -EADDRNOTAVAIL;
+}
+
+/**
  * kexec_add_buffer - place a buffer in a kexec segment
  * @kbuf:  Buffer contents and memory parameters.
  *
@@ -563,11 +580,9 @@ int kexec_add_buffer(struct kexec_buf *kbuf)
kbuf->buf_align = max(kbuf->buf_align, PAGE_SIZE);
 
/* Walk the RAM ranges and allocate a suitable range for the buffer */
-   ret = arch_kexec_walk_mem(kbuf, locate_mem_hole_callback);
-   if (ret != 1) {
-   /* A suitable memory range could not be found for buffer */
-   return -EADDRNOTAVAIL;
-   }
+   ret = kexec_locate_mem_hole(kbuf);
+   if (ret)
+   return ret;
 
/* Found a suitable memory range */
ksegment = &kbuf->image->segment[kbuf->image->nr_segments];
-- 
1.9.1



[PATCH v5 01/13] kexec_file: Allow arch-specific memory walking for kexec_add_buffer

2016-08-11 Thread Thiago Jung Bauermann
Allow architectures to specify a different memory walking function for
kexec_add_buffer. x86 uses iomem to track reserved memory ranges, but
PowerPC uses the memblock subsystem.

Signed-off-by: Thiago Jung Bauermann 
Acked-by: Dave Young 
---
 include/linux/kexec.h   | 26 ++
 kernel/kexec_file.c | 30 ++
 kernel/kexec_internal.h | 16 
 3 files changed, 48 insertions(+), 24 deletions(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 29202935055d..5ffd0011395c 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -149,6 +149,32 @@ struct kexec_file_ops {
 #endif
 };
 
+/**
+ * struct kexec_buf - parameters for finding a place for a buffer in memory
+ * @image: kexec image in which memory to search.
+ * @buffer:Contents which will be copied to the allocated memory.
+ * @bufsz: Size of @buffer.
+ * @mem:   On return will have address of the buffer in memory.
+ * @memsz: Size for the buffer in memory.
+ * @buf_align: Minimum alignment needed.
+ * @buf_min:   The buffer can't be placed below this address.
+ * @buf_max:   The buffer can't be placed above this address.
+ * @top_down:  Allocate from top of memory.
+ */
+struct kexec_buf {
+   struct kimage *image;
+   char *buffer;
+   unsigned long bufsz;
+   unsigned long mem;
+   unsigned long memsz;
+   unsigned long buf_align;
+   unsigned long buf_min;
+   unsigned long buf_max;
+   bool top_down;
+};
+
+int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf,
+  int (*func)(u64, u64, void *));
 int __weak arch_kexec_verify_buffer(enum kexec_file_type type, const void *buf,
unsigned long size);
 #endif /* CONFIG_KEXEC_FILE */
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index c32d1d65bb77..e63fd4592e20 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -502,6 +502,27 @@ static int locate_mem_hole_callback(u64 start, u64 end, 
void *arg)
return locate_mem_hole_bottom_up(start, end, kbuf);
 }
 
+/**
+ * arch_kexec_walk_mem - call func(data) on free memory regions
+ * @kbuf:  Context info for the search. Also passed to @func.
+ * @func:  Function to call for each memory region.
+ *
+ * Return: The memory walk will stop when func returns a non-zero value
+ * and that value will be returned. If all free regions are visited without
+ * func returning non-zero, then zero will be returned.
+ */
+int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf,
+  int (*func)(u64, u64, void *))
+{
+   if (kbuf->image->type == KEXEC_TYPE_CRASH)
+   return walk_iomem_res_desc(crashk_res.desc,
+  IORESOURCE_SYSTEM_RAM | 
IORESOURCE_BUSY,
+  crashk_res.start, crashk_res.end,
+  kbuf, func);
+   else
+   return walk_system_ram_res(0, ULONG_MAX, kbuf, func);
+}
+
 /*
  * Helper function for placing a buffer in a kexec segment. This assumes
  * that kexec_mutex is held.
@@ -548,14 +569,7 @@ int kexec_add_buffer(struct kimage *image, char *buffer, 
unsigned long bufsz,
kbuf->top_down = top_down;
 
/* Walk the RAM ranges and allocate a suitable range for the buffer */
-   if (image->type == KEXEC_TYPE_CRASH)
-   ret = walk_iomem_res_desc(crashk_res.desc,
-   IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY,
-   crashk_res.start, crashk_res.end, kbuf,
-   locate_mem_hole_callback);
-   else
-   ret = walk_system_ram_res(0, -1, kbuf,
- locate_mem_hole_callback);
+   ret = arch_kexec_walk_mem(kbuf, locate_mem_hole_callback);
if (ret != 1) {
/* A suitable memory range could not be found for buffer */
return -EADDRNOTAVAIL;
diff --git a/kernel/kexec_internal.h b/kernel/kexec_internal.h
index 0a52315d9c62..4cef7e4706b0 100644
--- a/kernel/kexec_internal.h
+++ b/kernel/kexec_internal.h
@@ -20,22 +20,6 @@ struct kexec_sha_region {
unsigned long len;
 };
 
-/*
- * Keeps track of buffer parameters as provided by caller for requesting
- * memory placement of buffer.
- */
-struct kexec_buf {
-   struct kimage *image;
-   char *buffer;
-   unsigned long bufsz;
-   unsigned long mem;
-   unsigned long memsz;
-   unsigned long buf_align;
-   unsigned long buf_min;
-   unsigned long buf_max;
-   bool top_down;  /* allocate from top of memory hole */
-};
-
 void kimage_file_post_load_cleanup(struct kimage *image);
 #else /* CONFIG_KEXEC_FILE */
 static inline void kimage_file_post_load_cleanup(struct kimage *image) { }
-- 
1.9.1



[PATCH v5 00/13] kexec_file_load implementation for PowerPC

2016-08-11 Thread Thiago Jung Bauermann
[ Andrew, since this series touches generic code, x86 and powerpc,
  Michael Ellerman and Dave Young think it should go via your tree. ]

The main differences in this version are (more detailed changelog at
the end of this email):

- The code which is not specific to loading ELF format kernels were
  moved from kexec_elf_64.c to machine_kexec_64.c.

- There is a new patch implementing support for receiving a device tree
  blob from userspace, checking it against a whitelist of allowed nodes
  and properties and copying it into the device tree for the next kernel.
  This is the only patch that depends on the "extend kexec_file_load
  system call" series. Everything else can be upstreamed independently
  of that series.

- Also, I realised that the patch "Add support for loading ELF kernels
  with kexec_file_load." was too big, so I moved some changes to other
  patches to facilitate review. Details of what went where are in the
  changelog.

Original cover letter:

This patch series implements the kexec_file_load system call on PowerPC.

This system call moves the reading of the kernel, initrd and the device tree
from the userspace kexec tool to the kernel. This is needed if you want to
do one or both of the following:

1. only allow loading of signed kernels.
2. "measure" (i.e., record the hashes of) the kernel, initrd, kernel
   command line and other boot inputs for the Integrity Measurement
   Architecture subsystem.

The above are the functions kexec already has built into kexec_file_load.
Yesterday I posted a set of patches which allows a third feature:

3. have IMA pass-on its event log (where integrity measurements are
   registered) accross kexec to the second kernel, so that the event
   history is preserved.

Because OpenPower uses an intermediary Linux instance as a boot loader
(skiroot), feature 1 is needed to implement secure boot for the platform,
while features 2 and 3 are needed to implement trusted boot.

This patch series starts by removing an x86 assumption from kexec_file:
kexec_add_buffer uses iomem to find reserved memory ranges, but PowerPC
uses the memblock subsystem.  A hook is added so that each arch can
specify how memory ranges can be found.

Also, the memory-walking logic in kexec_add_buffer is useful in this
implementation to find a free area for the purgatory's stack, so the
next patch moves that logic to kexec_locate_mem_hole.

The kexec_file_load system call needs to apply relocations to the
purgatory but adding code for that would duplicate functionality with
the module loading mechanism, which also needs to apply relocations to
the kernel modules.  Therefore, this patch series factors out the module
relocation code so that it can be shared.

One thing that is still missing is crashkernel support, which I intend
to submit shortly. For now, arch_kexec_kernel_image_probe rejects crash
kernels.

This code is based on kexec-tools, but with many modifications to adapt
it to the kernel environment and facilities. Except the purgatory,
which only has minimal changes.

Changes for v5:
- Rebased series on v4.8-rc1 + the extend kexec_file_load series.
- Patch "powerpc: Adapt elf64_apply_relocate_add for kexec_file_load."
  - New patch. These changes were previously in patch 10.
The code itself is unchanged from v4.
- Patch "powerpc: Implement kexec_file_load."
  - Moved arch_kexec_walk_mem, arch_kexec_apply_relocations_add and
setup_purgatory from patch 10 to this patch.
  - arch_kexec_apply_relocations_add is unchanged from v4.
  - Fixed off-by-one error in arch_kexec_walk_mem when passing range
to func.
  - Moved setup_purgatory from kexec_elf_64.c to machine_kexec_64.c,
and changed it to receive a pointer to the slave code directly
rather than a struct elf_info and getting the pointer from there.
- Patch "powerpc: Add code to work with device trees in kexec_file_load."
  - New patch. These changes were previously in patch 10.
  - find_debug_console moved from kexec_elf_64.c to machine_kexec_64.c.
The code is unchanged from v4.
  - setup_new_fdt is a new function factored out of elf64_load. The only
code change from v4 is to create /chosen if it doesn't exist yet.
- Patch "powerpc: Add support for loading ELF kernels with kexec_file_load."
  - This patch was too big, so moved some of its changes to other patches
to facilitate review.
  - Allow loading ELF file type ET_DYN, which is what the BE kernel uses.
  - The code adapting the device tree for booting the new kernel was moved
out of elf64_load to setup_new_fdt.
- Patch "powerpc: Allow userspace to set device tree properties in 
kexec_file_load"
  - New patch.
  - The code in this patch didn't exist in v4.
  - This is the only patch that depends on the extend kexec_file_load series.
- Patch "powerpc: Enable CONFIG_KEXEC_FILE in powerpc server defconfigs."
  - New patch.

Changes for v4:
- Rebased series on today's powerpc/next.
- Patch "kexec_file: Remove unused members from struct kexec_buf.

[PATCH v2 2/2] kexec: extend kexec_file_load system call

2016-08-11 Thread Thiago Jung Bauermann
From: AKASHI Takahiro 

Device tree blob must be passed to a second kernel on DTB-capable
archs, like powerpc and arm64, but the current kernel interface
lacks this support.

This patch extends kexec_file_load system call by adding an extra
argument to this syscall so that an arbitrary number of file descriptors
can be handed out from user space to the kernel.

long sys_kexec_file_load(int kernel_fd, int initrd_fd,
 unsigned long cmdline_len,
 const char __user *cmdline_ptr,
 unsigned long flags,
 const struct kexec_fdset __user *ufdset);

If KEXEC_FILE_EXTRA_FDS is set to the "flags" argument, the "ufdset"
argument points to the following struct buffer:

struct kexec_fdset {
int nr_fds;
struct kexec_file_fd fds[0];
}

Signed-off-by: AKASHI Takahiro 
Signed-off-by: Thiago Jung Bauermann 
---
 include/linux/fs.h |  1 +
 include/linux/kexec.h  |  7 ++--
 include/linux/syscalls.h   |  4 ++-
 include/uapi/linux/kexec.h | 22 
 kernel/kexec_file.c| 83 ++
 5 files changed, 108 insertions(+), 9 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3523bf62f328..847d9c31f428 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2656,6 +2656,7 @@ extern int do_pipe_flags(int *, int);
id(MODULE, kernel-module)   \
id(KEXEC_IMAGE, kexec-image)\
id(KEXEC_INITRAMFS, kexec-initramfs)\
+   id(KEXEC_PARTIAL_DTB, kexec-partial-dtb)\
id(POLICY, security-policy) \
id(MAX_ID, )
 
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 4f85d284ed0b..29202935055d 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -148,7 +148,10 @@ struct kexec_file_ops {
kexec_verify_sig_t *verify_sig;
 #endif
 };
-#endif
+
+int __weak arch_kexec_verify_buffer(enum kexec_file_type type, const void *buf,
+   unsigned long size);
+#endif /* CONFIG_KEXEC_FILE */
 
 struct kimage {
kimage_entry_t head;
@@ -280,7 +283,7 @@ extern int kexec_load_disabled;
 
 /* List of defined/legal kexec file flags */
 #define KEXEC_FILE_FLAGS   (KEXEC_FILE_UNLOAD | KEXEC_FILE_ON_CRASH | \
-KEXEC_FILE_NO_INITRAMFS)
+KEXEC_FILE_NO_INITRAMFS | KEXEC_FILE_EXTRA_FDS)
 
 #define VMCOREINFO_BYTES   (4096)
 #define VMCOREINFO_NOTE_NAME   "VMCOREINFO"
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index d02239022bd0..fc072bdb74e3 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -66,6 +66,7 @@ struct perf_event_attr;
 struct file_handle;
 struct sigaltstack;
 union bpf_attr;
+struct kexec_fdset;
 
 #include 
 #include 
@@ -321,7 +322,8 @@ asmlinkage long sys_kexec_load(unsigned long entry, 
unsigned long nr_segments,
 asmlinkage long sys_kexec_file_load(int kernel_fd, int initrd_fd,
unsigned long cmdline_len,
const char __user *cmdline_ptr,
-   unsigned long flags);
+   unsigned long flags,
+   const struct kexec_fdset __user *ufdset);
 
 asmlinkage long sys_exit(int error_code);
 asmlinkage long sys_exit_group(int error_code);
diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h
index aae5ebf2022b..6279be79efba 100644
--- a/include/uapi/linux/kexec.h
+++ b/include/uapi/linux/kexec.h
@@ -23,6 +23,28 @@
 #define KEXEC_FILE_UNLOAD  0x0001
 #define KEXEC_FILE_ON_CRASH0x0002
 #define KEXEC_FILE_NO_INITRAMFS0x0004
+#define KEXEC_FILE_EXTRA_FDS   0x0008
+
+enum kexec_file_type {
+   KEXEC_FILE_TYPE_KERNEL,
+   KEXEC_FILE_TYPE_INITRAMFS,
+
+   /*
+* Device Tree Blob containing just the nodes and properties that
+* the kexec_file_load caller wants to add or modify.
+*/
+   KEXEC_FILE_TYPE_PARTIAL_DTB,
+};
+
+struct kexec_file_fd {
+   enum kexec_file_type type;
+   int fd;
+};
+
+struct kexec_fdset {
+   int nr_fds;
+   struct kexec_file_fd fds[0];
+};
 
 /* These values match the ELF architecture values.
  * Unless there is a good reason that should continue to be the case.
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 113af2f219b9..d6803dd884e2 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -25,6 +25,9 @@
 #include 
 #include "kexec_internal.h"
 
+#define MAX_FDSET_SIZE (sizeof(struct kexec_fdset) + \
+   KEXEC_SEGMENT_MAX * sizeof(struct 
kexec_file_fd))
+
 /*
  * Declare these symbols weak so that if architecture provides a purgatory,
  * these will be overridden.
@@ -116,6 +119,22 @@ voi

[PATCH v2 1/2] kexec: add dtb info to struct kimage

2016-08-11 Thread Thiago Jung Bauermann
From: AKASHI Takahiro 

Device tree blob must be passed to a second kernel on DTB-capable
archs, like powerpc and arm64, but the current kernel interface
lacks this support.

This patch adds dtb buffer information to struct kimage.
When users don't specify dtb explicitly and the one used for the current
kernel can be re-used, this change will be good enough for implementing
kexec_file_load feature.

Signed-off-by: AKASHI Takahiro 
---
 include/linux/kexec.h | 3 +++
 kernel/kexec_file.c   | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index d743baaa..4f85d284ed0b 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -192,6 +192,9 @@ struct kimage {
char *cmdline_buf;
unsigned long cmdline_buf_len;
 
+   void *dtb_buf;
+   unsigned long dtb_buf_len;
+
/* File operations provided by image loader */
struct kexec_file_ops *fops;
 
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 503bc2d348e5..113af2f219b9 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -92,6 +92,9 @@ void kimage_file_post_load_cleanup(struct kimage *image)
vfree(image->initrd_buf);
image->initrd_buf = NULL;
 
+   vfree(image->dtb_buf);
+   image->dtb_buf = NULL;
+
kfree(image->cmdline_buf);
image->cmdline_buf = NULL;
 
-- 
1.9.1



[PATCH v2 0/2] extend kexec_file_load system call

2016-08-11 Thread Thiago Jung Bauermann
This patch series is from AKASHI Takahiro. I will use it in my next
version of the kexec_file_load implementation for powerpc, so I am
rebasing it on top of v4.8-rc1.

I dropped the patch which adds __NR_kexec_file_load to
 for simplicity, since the powerpc patches already
add it to powerpc's . I don't know which approach is
better.

The first patch in this series is unchanged from v1.

The second patch is the same one I posted on July 26th. It has the
following changes from v1:

- Added the arch_kexec_verify_buffer hook, where each architecture can
  verify if the DTB is safe to load.
- Renamed KEXEC_FILE_TYPE_DTB to KEXEC_FILE_TYPE_PARTIAL_DTB.
- Limited max number of fds to KEXEC_SEGMENT_MAX.
- Changed to use fixed size buffer for fdset instead of allocating it.
- Changed to return -EINVAL if an unknown file type is found in fdset.

I am also posting a new version of the kexec_file_load syscall
implementation for powerpc which uses the arch_kexec_verify_buffer hook
to enforce a whitelist of nodes and properties that userspace can pass to
the next kernel, as suggested by Michael Ellerman.

You can find it in a new patch in the powerpc series called
"powerpc: Allow userspace to set device tree properties in kexec_file_load"

Original cover letter:

Device tree blob must be passed to a second kernel on DTB-capable
archs, like powerpc and arm64, but the current kernel interface
lacks this support.

This patch extends kexec_file_load system call by adding an extra
argument to this syscall so that an arbitrary number of file descriptors
can be handed out from user space to the kernel.

See the background [1].

Please note that the new interface looks quite similar to the current
system call, but that it won't always mean that it provides the "binary
compatibility."

[1] http://lists.infradead.org/pipermail/kexec/2016-June/016276.html

AKASHI Takahiro (1):
  kexec: add dtb info to struct kimage

Thiago Jung Bauermann (1):
  kexec: extend kexec_file_load system call

 include/linux/fs.h |  1 +
 include/linux/kexec.h  | 10 --
 include/linux/syscalls.h   |  4 ++-
 include/uapi/linux/kexec.h | 22 
 kernel/kexec_file.c| 86 ++
 5 files changed, 114 insertions(+), 9 deletions(-)

-- 
1.9.1



Re: [PATCH] powerpc/32: Remove one insn in __bswapdi2

2016-08-11 Thread Segher Boessenkool
On Thu, Aug 11, 2016 at 11:34:37PM +0200, Gabriel Paubert wrote:
> On the other hand gcc did at the time a very poor job (quite an
> understatement) at bswapdi when compiling for 64 bit processors 
> (see the example).
> 
> But what do modern compilers generate for bswapdi these days? Do they
> still call the library or not?

Nope.

> After all, bswapdi on 32 bit processors only takes 6 instructions if the
> input and output registers don't overlap.

For this testcase:
===
typedef unsigned long long u64;
u64 bs(u64 x) { return __builtin_bswap64(x); }
===

we get with -m32:
===
bs:
mr 9,3
rotlwi 3,4,24
rlwimi 3,4,8,8,15
rlwimi 3,4,8,24,31
rotlwi 4,9,24
rlwimi 4,9,8,8,15
rlwimi 4,9,8,24,31
blr
===

and with -m64:
===
.L.bs:
srdi 10,3,32
mr 9,3
rotlwi 3,3,24
rotlwi 8,10,24
rlwimi 3,9,8,8,15
rlwimi 8,10,8,8,15
rlwimi 3,9,8,24,31
rlwimi 8,10,8,24,31
sldi 3,3,32
or 3,3,8
blr
===

Neither as tight as possible, but neither horrible either.


Segher


Re: [PATCH] powerpc/32: Remove one insn in __bswapdi2

2016-08-11 Thread Gabriel Paubert
On Wed, Aug 10, 2016 at 12:18:15PM +0200, Christophe Leroy wrote:
> 
> 
> Le 10/08/2016 à 10:56, Gabriel Paubert a écrit :
> >On Fri, Aug 05, 2016 at 01:28:02PM +0200, Christophe Leroy wrote:
> >>Signed-off-by: Christophe Leroy 
> >>---
> >> arch/powerpc/kernel/misc_32.S | 3 +--
> >> 1 file changed, 1 insertion(+), 2 deletions(-)
> >>
> >>diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
> >>index e025230..e18055c 100644
> >>--- a/arch/powerpc/kernel/misc_32.S
> >>+++ b/arch/powerpc/kernel/misc_32.S
> >>@@ -578,9 +578,8 @@ _GLOBAL(__bswapdi2)
> >>rlwimi  r9,r4,24,0,7
> >>rlwimi  r10,r3,24,0,7
> >>rlwimi  r9,r4,24,16,23
> >>-   rlwimi  r10,r3,24,16,23
> >>+   rlwimi  r4,r3,24,16,23
> >>mr  r3,r9
> >>-   mr  r4,r10
> >>blr
> >>
> >
> >Hmmm, are you sure that it works? rlwimi is a bit special since the
> >first operand is both an input and an output of the instruction.
> >
> >
> 
> Oops, you are right ...

I just found this: 

http://hardwarebug.org/2010/01/14/beware-the-builtins/

the bswapdi2 suggested sequence only needs a single mr instruction, the 
other one is absorbed in a rotlwi.

The scheduling looks poor, but it seems impossible to interleave the
operations between the two halves without adding another instructions,
and the routine is 8 instructions long, which happens to be exactly a
cache line on most 32 bit processors.

On the other hand gcc did at the time a very poor job (quite an
understatement) at bswapdi when compiling for 64 bit processors 
(see the example).

But what do modern compilers generate for bswapdi these days? Do they
still call the library or not?

After all, bswapdi on 32 bit processors only takes 6 instructions if the
input and output registers don't overlap.

Gabriel



Re: [PATCH 0/2] ibmvfc: FC-TAPE Support

2016-08-11 Thread Tyrel Datwyler
On 08/03/2016 02:36 PM, Tyrel Datwyler wrote:
> This patchset introduces optional FC-TAPE/FC Class 3 Error Recovery to the
> ibmvfc client driver.
> 
> Tyrel Datwyler (2):
>   ibmvfc: Set READ FCP_XFER_READY DISABLED bit in PRLI
>   ibmvfc: add FC Class 3 Error Recovery support
> 
>  drivers/scsi/ibmvscsi/ibmvfc.c | 11 +++
>  drivers/scsi/ibmvscsi/ibmvfc.h |  1 +
>  2 files changed, 12 insertions(+)
> 

ping?



[PATCH v4] powerpc/pseries: use pci_host_bridge.release_fn() to kfree(phb)

2016-08-11 Thread Mauricio Faria de Oliveira
This patch leverages 'struct pci_host_bridge' from the PCI subsystem
in order to free the pci_controller only after the last reference to
its devices is dropped (avoiding an oops in pcibios_release_device()
if the last reference is dropped after pcibios_free_controller()).

The patch relies on pci_host_bridge.release_fn() (and .release_data),
which is called automatically by the PCI subsystem when the root bus
is released (i.e., the last reference is dropped).  Those fields are
set via pci_set_host_bridge_release() (e.g. in the platform-specific
implementation of pcibios_root_bridge_prepare()).

It introduces the 'pcibios_free_controller_deferred()' .release_fn()
and it expects .release_data to hold a pointer to the pci_controller.

The function implictly calls 'pcibios_free_controller()', so an user
must *NOT* explicitly call it if using the new _deferred() callback.

The functionality is enabled for pseries (although it isn't platform
specific, and may be used by cxl).

Details on not-so-elegant design choices:

 - Use 'pci_host_bridge.release_data' field as pointer to associated
   'struct pci_controller' so *not* to 'pci_bus_to_host(bridge->bus)'
   in pcibios_free_controller_deferred().

   That's because pci_remove_root_bus() sets 'host_bridge->bus = NULL'
   (so, if the last reference is released after pci_remove_root_bus()
   runs, which eventually reaches pcibios_free_controller_deferred(),
   that would hit a null pointer dereference).

   The cxl/vphb.c code calls pci_remove_root_bus(), and the cxl folks
   are interested in this fix.

Test-case #1 (hold references)

  # ls -ld /sys/block/sd* | grep -m1 0021:01:00.0
  <...> /sys/block/sdaa -> ../devices/pci0021:01/0021:01:00.0/<...>

  # ls -ld /sys/block/sd* | grep -m1 0021:01:00.1
  <...> /sys/block/sdab -> ../devices/pci0021:01/0021:01:00.1/<...>

  # cat >/dev/sdaa & pid1=$!
  # cat >/dev/sdab & pid2=$!

  # drmgr -w 5 -d 1 -c phb -s 'PHB 33' -r
  Validating PHB DLPAR capability...yes.
  [  594.306719] pci_hp_remove_devices: PCI: Removing devices on bus 0021:01
  [  594.306738] pci_hp_remove_devices:Removing 0021:01:00.0...
  ...
  [  598.236381] pci_hp_remove_devices:Removing 0021:01:00.1...
  ...
  [  611.972077] pci_bus 0021:01: busn_res: [bus 01-ff] is released
  [  611.972140] rpadlpar_io: slot PHB 33 removed

  # kill -9 $pid1
  # kill -9 $pid2
  [  632.918088] pcibios_free_controller_deferred: domain 33, dynamic 1

Test-case #2 (don't hold references)

  # drmgr -w 5 -d 1 -c phb -s 'PHB 33' -r
  Validating PHB DLPAR capability...yes.
  [  916.357363] pci_hp_remove_devices: PCI: Removing devices on bus 0021:01
  [  916.357386] pci_hp_remove_devices:Removing 0021:01:00.0...
  ...
  [  920.566527] pci_hp_remove_devices:Removing 0021:01:00.1...
  ...
  [  933.955873] pci_bus 0021:01: busn_res: [bus 01-ff] is released
  [  933.955977] pcibios_free_controller_deferred: domain 33, dynamic 1
  [  933.955999] rpadlpar_io: slot PHB 33 removed

Suggested-By: Gavin Shan 
Signed-off-by: Mauricio Faria de Oliveira 
---
Changelog:
 - v4: improve usability/design/documentation:
   - rename function to pcibios_free_controller_deferred()
   - from function call pcibios_free_controller()
   - no more struct pci_controller.bridge field
   thanks: Gavin Shan, Andrew Donnellan
 - v3: different approach: struct pci_host_bridge.release_fn()
 - v2: different approach: struct pci_controller.refcount 

 arch/powerpc/include/asm/pci-bridge.h  |  1 +
 arch/powerpc/kernel/pci-common.c   | 36 ++
 arch/powerpc/platforms/pseries/pci.c   |  4 
 arch/powerpc/platforms/pseries/pci_dlpar.c |  7 --
 4 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index b5e88e4..c0309c5 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -301,6 +301,7 @@ extern void pci_process_bridge_OF_ranges(struct 
pci_controller *hose,
 /* Allocate & free a PCI host bridge structure */
 extern struct pci_controller *pcibios_alloc_controller(struct device_node 
*dev);
 extern void pcibios_free_controller(struct pci_controller *phb);
+extern void pcibios_free_controller_deferred(struct pci_host_bridge *bridge);
 
 #ifdef CONFIG_PCI
 extern int pcibios_vaddr_is_ioport(void __iomem *address);
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index a5c0153..8c48a78 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -151,6 +151,42 @@ void pcibios_free_controller(struct pci_controller *phb)
 EXPORT_SYMBOL_GPL(pcibios_free_controller);
 
 /*
+ * This function is used to call pcibios_free_controller()
+ * in a deferred manner: a callback from the PCI subsystem.
+ *
+ * _*DO NOT*_ call pcibios_free_controller() explicitly if
+ * this is used (or it may access an invalid *phb pointer).
+ *
+ * The callback occurs when all re

Re: [PATCH v3] powerpc/pseries: use pci_host_bridge.release_fn() to kfree(phb)

2016-08-11 Thread Mauricio Faria de Oliveira

Hi Gavin,

tl;dr: thanks for the comments & suggestions; i'll submit v4.

On 08/11/2016 03:40 AM, Gavin Shan wrote: [added some line breaks]

It seems the user has two options here:



(1) Setup bridge's release_fn() and call
pcibios_free_controller() explicitly;


I think the v3 design was non-intuitive in that point -- it does not
seem right for an user to use both options:

if release_fn() is set and is called before pcibios_free_controller()
(normal case w/ DLPAR/PCI hotplug/cxl, as buses/devices are supposed
to be removed before the controller is released) the latter will use
an invalid 'phb' pointer. (what Andrew reported)

In that scenario, it's not even possible for pcibios_free_controller()
to try to detect if release_fn() was already run or not, as the only
information it has is the 'phb' pointer, which may be invalid.

So, I believe the elegant way out of this is your suggestion to have
"immediate or deferred release" and make the user *choose* either one.

Obviously, let's make this explicit to the user -- w/ rename & comments.

> (2) Call pcibios_free_controller() without

a valid bridge's release_fn() initialized.


Ok, that looks legitimate for those using immediate release (default).

i.e., once an user decides to use deferred released, it's understood
that pcibios_free_controller() should not be called.

> I think we can provide better interface

to users: what we do in pcibios_free_controller() and 
pcibios_host_bridge_release()
should be (almost) same. pcibios_host_bridge_release() can be a wrapper of
pcibios_free_controller().


Right; I implemented only kfree() in pcibios_host_bridge_release()
because I was focused on when it runs *after* pcibios_free_controller();
but it turns out that if it runs *before*, phb becomes invalid pointer.

So, you're right -- both functions are expected to have the same effect
(slightly different code), that is all of what pcibios_free_controller()
does.  The only difference should be the timing. (good point on wrapper)

> With this, the users have two options: (1) Rely on bridge's

release_fn() to free the PCI controller; (2) Call pcibios_free_controller() as 
we're
doing currently. Those two options corresponds to immediately or deferred 
releasing.


Looks very good.  I'll submit a v4 like this:
-rename pcibios_host_bridge_release()/pcibios_free_controller_deferred()
-add comments about using _either_ one or another
-pcibios_free_controller_deferred() calls pcibios_free_controller().

--
Mauricio Faria de Oliveira
IBM Linux Technology Center



Re: [PATCH 1/4] dt-bindings: add doc for ibm,hotplug-aperture

2016-08-11 Thread Reza Arbab

On Thu, Aug 11, 2016 at 02:39:23PM +1000, Stewart Smith wrote:
Forgive me for being absent on the whole discussion here, but is this 
an OPAL specific binding? If so, shouldn't the docs also appear in the

skiboot tree?


Good question. I guess it's not necessarily OPAL-specific, even though 
OPAL may initially be the only implementor of the binding.


Would it be more appropriate to move the file up a directory, directly 
under Documentation/devicetree/bindings/powerpc? I hesitated at that 
because the binding is tied to "ibm,associativity".


--
Reza Arbab



Re: [PATCH v3] powerpc/pseries: use pci_host_bridge.release_fn() to kfree(phb)

2016-08-11 Thread Mauricio Faria de Oliveira

On 08/11/2016 02:01 AM, Andrew Donnellan wrote:

In cxl, we currently call:

pci_remove_root_bus(phb->bus);
pcibios_free_controller(phb);

which appears to break with this patch after I wire up
pci_set_host_bridge_release() in cxl, as phb can be freed before we call
pcibios_free_controller().


Ugh; you're right. I believe the user is expected to use either one way
or another, but now I see it's not that intuitive -- a design fault.

I'll address this w/ the other review/suggestion by Gavin; replying it.


Missing a '---' here :)


Changelog:


Ok, thanks!


--
Mauricio Faria de Oliveira
IBM Linux Technology Center



[PATCH] mm: Initialize per_cpu_nodestats for hotadded pgdats

2016-08-11 Thread Reza Arbab
The following oops occurs after a pgdat is hotadded:

[   86.839956] Unable to handle kernel paging request for data at address 
0x00c30001
[   86.840132] Faulting instruction address: 0xc022f8f4
[   86.840328] Oops: Kernel access of bad area, sig: 11 [#1]
[   86.840468] SMP NR_CPUS=2048 NUMA pSeries
[   86.840612] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 
ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute bridge stp 
llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 
nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter 
ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat 
nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter 
nls_utf8 isofs sg virtio_balloon uio_pdrv_genirq uio ip_tables xfs libcrc32c 
sr_mod cdrom sd_mod virtio_net ibmvscsi scsi_transport_srp virtio_pci 
virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod
[   86.842955] CPU: 0 PID: 0 Comm: swapper/0 Tainted: GW 
4.8.0-rc1-device #110
[   86.843140] task: c0ef3080 task.stack: c0f6c000
[   86.843323] NIP: c022f8f4 LR: c022f948 CTR: 
[   86.843595] REGS: c0f6fa50 TRAP: 0300   Tainted: GW 
(4.8.0-rc1-device)
[   86.843889] MSR: 80010280b033   
CR: 84002028  XER: 2000
[   86.844624] CFAR: d1d2013c DAR: 00c30001 DSISR: 4000 
SOFTE: 0
GPR00: c022f948 c0f6fcd0 c0f71400 0001
GPR04: 0100   00c3
GPR08:  0001 00c3 
GPR12: 2200 c130 c0faefb4 c0faefa8
GPR16: c0f6c000 c0f6c080 c0bf15b0 c0f6c080
GPR20: c0bf4928  0003 c0bf4968
GPR24: c000ffed   c0f6fd58
GPR28: 0001 0001 c0f6fcf0 c000ffed9c08
[   86.847747] NIP [c022f8f4] refresh_cpu_vm_stats+0x1a4/0x2f0
[   86.847897] LR [c022f948] refresh_cpu_vm_stats+0x1f8/0x2f0
[   86.848060] Call Trace:
[   86.848183] [c0f6fcd0] [c022f948] 
refresh_cpu_vm_stats+0x1f8/0x2f0 (unreliable)

Add per_cpu_nodestats initialization to the hotplug codepath.

Signed-off-by: Reza Arbab 
---
 mm/memory_hotplug.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 3894b65..41266dc 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1219,6 +1219,7 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 
start)
 
/* init node's zones as empty zones, we don't have any present pages.*/
free_area_init_node(nid, zones_size, start_pfn, zholes_size);
+   pgdat->per_cpu_nodestats = alloc_percpu(struct per_cpu_nodestat);
 
/*
 * The node we allocated has no zone fallback lists. For avoiding
@@ -1249,6 +1250,7 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 
start)
 static void rollback_node_hotadd(int nid, pg_data_t *pgdat)
 {
arch_refresh_nodedata(nid, NULL);
+   free_percpu(pgdat->per_cpu_nodestats);
arch_free_nodedata(pgdat);
return;
 }
-- 
1.8.3.1



Re: mm: Initialise per_cpu_nodestats for all online pgdats at boot

2016-08-11 Thread Reza Arbab

On Thu, Aug 11, 2016 at 10:28:08AM +0100, Mel Gorman wrote:

Fix looks ok. Can you add a proper changelog to it including an example
oops or do you need me to do it?


Sure, no problem. Patch to follow.

--
Reza Arbab



Re: [TESTING] kbuild: link drivers subdirectories separately

2016-08-11 Thread Arnd Bergmann
On Thursday, August 11, 2016 3:49:03 PM CEST Arnd Bergmann wrote:
> @@ -137,7 +134,8 @@ obj-$(CONFIG_PPC_PS3)   += ps3/
>  obj-$(CONFIG_OF)   += of/
>  obj-$(CONFIG_SSB)  += ssb/
>  obj-$(CONFIG_BCMA) += bcma/
> -obj-y  += vhost/
> +obj-$(CONFIG_VHOST_RING)   += vhost/
> +obj-$(CONFIG_VHOST)+= vhost/
>  obj-$(CONFIG_VLYNQ)+= vlynq/
>  obj-$(CONFIG_STAGING)  += staging/
>  obj-y  += platform/
> 

This hunk should have been the other way round to apply and work correctly,
I mixed up the number of reverts I had on my tree before it.

Arnd



[pasemi] Internal CompactFlash (CF) card device not recognised after the powerpc-4.8-1 merge

2016-08-11 Thread Christian Zigotzky

Hi All,

I was able to patch the RC1 with the Nemo and PHB-numbering patch. 
Additionally I added some printks in the file pata_of_platform.c. I 
wanted to know which values have the following variables:


ctl_res = io_res;
io_res.start += 0x800;
ctl_res.start = ctl_res.start + 0x80e;
io_res.end = ctl_res.start-1;

It compiled without any problems but unfortunately I didn't see any 
printk outputs of these variables. The output of pata_of_platform is 
missing too. I see this output in the dmesg of the kernel 4.7 but I 
don't see it in the dmesg of the kernel 4.8.


I have the feeling, that pata_of_platform doesn't work anymore. Maybe 
this is the reason, why the CF card doesn't work anymore.


Maybe this is the problem: 
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/diff/arch/powerpc/platforms/pasemi/setup.c?id=bad60e6f259a01cf9f29a1ef8d435ab6c60b2de9


Do you have any hints for me?

Cheers,

Christian

On 05 August 2016 at 11:42 PM, Darren Stevens wrote:

Hello Nicholas

On 06/08/2016, Nicholas Piggin wrote:


Hi Christian,

On 05 August 2016 at 1:41 PM, Christian Zigotzky wrote:

Hi All,



The internal PASEMI CompactFlash (CF) card device doesn't work anymore
after the powerpc-4.8-1 merge. That means the code for the internal CF
card device in the Nemo patch doesn't work after the first PowerPC
merge. The CompactFlash (CF) card slot is wired to the CPU local bus.
It is typically used to hold the Linux kernel. I know it isn't well to
use an own patch for that but I think it is a good time to integrate
the PASEMI internal CompactFlash (CF) card device to the official
kernel. What do you think? I am not a programmer so I can't integrate
the source code for the internal CF card device. But maybe you can
take the patch and integrate it.



We use the following patch for the kernel 4.7:



 diff -rupN a/drivers/ata/pata_of_platform.c
b/drivers/ata/pata_of_platform.c

 --- a/drivers/ata/pata_of_platform.c   2016-08-05
09:58:41.410569036 +0200

 +++ b/drivers/ata/pata_of_platform.c   2016-08-05
09:59:54.41424 +0200

 @@ -41,14 +41,36 @@ static int pata_of_platform_probe(struct

return -EINVAL;

 }



 -   ret = of_address_to_resource(dn, 1, &ctl_res);

 -   if (ret) {

 -  dev_err(&ofdev->dev, "can't get CTL address from "

 - "device tree\n");

 -  return -EINVAL;

 +   if (of_device_is_compatible(dn, "electra-ide")) {

 +  /* Altstatus is really at offset 0x3f6 from the primary window

 +   * on electra-ide. Adjust ctl_res and io_res accordingly.

 +   */

 +  ctl_res = io_res;

 +  ctl_res.start = ctl_res.start+0x3f6;

 +  io_res.end = ctl_res.start-1;

 +

 +#ifdef CONFIG_PPC_PASEMI_SB600

 +   } else if (of_device_is_compatible(dn, "electra-cf")) {

 +   /* Task regs are at 0x800, with alt status @ 0x80e
in the primary window

 +* on electra-cf. Adjust ctl_res and io_res
accordingly.

 +*/

 +   ctl_res = io_res;

 +   io_res.start += 0x800;

 +   ctl_res.start = ctl_res.start + 0x80e;

 +   io_res.end = ctl_res.start-1;

 +#endif

 +   } else {

 +  ret = of_address_to_resource(dn, 1, &ctl_res);

 +  if (ret) {

 + dev_err(&ofdev->dev, "can't get CTL address from "

 +"device tree\n");

 + return -EINVAL;

 +  }

 }



 irq_res = platform_get_resource(ofdev, IORESOURCE_IRQ, 0);

 +   if (irq_res)

 +  irq_res->flags = 0;



 prop = of_get_property(dn, "reg-shift", NULL);

 if (prop)

 @@ -65,6 +87,11 @@ static int pata_of_platform_probe(struct

dev_info(&ofdev->dev, "pio-mode unspecified, assuming
PIO0\n");

 }



 +#ifdef CONFIG_PPC_PASEMI_SB600

 +   irq_res = 0;// force irq off (doesn't
seem to work)

 +#endif

 +

 +

 pio_mask = 1 << pio_mode;

 pio_mask |= (1 << pio_mode) - 1;



 @@ -74,7 +101,11 @@ static int pata_of_platform_probe(struct



  static struct of_device_id pata_of_platform_match[] = {

 { .compatible = "ata-generic", },

 -   { },

 +   { .compatible = "electra-ide", },

 +#ifdef CONFIG_PPC_PASEMI_SB600

 +   { .compatible = "electra-cf",},

 +#endif

 +   {},

  };

  MODULE_DEVICE_TABLE(of, pata_of_platform_match);



dmesg with the kernel 4.7:



zcat /var/log/dmesg.1.gz | grep -i ata7



 [2.939788] ata7: PATA max PIO0 no IRQ, using PIO polling mmio
cmd 0xf800 ctl 0xf80e

 [3.099186] ata7.00: CFA: SanDisk SDCFB-256, HDX 2.33, max PIO4

 [3.099191] ata7.00: 501760 sectors, multi 0: LBA

 [3.099199] ata7.00: configured for PIO



The dmesg of the latest Git kernel doesn't have any output of our
internal CF card device

[TESTING] kbuild: link drivers subdirectories separately

2016-08-11 Thread Arnd Bergmann
On ARM, relative branches between functions can not span more than 32MB,
which limits the size of an ELF section. In the final link, the linker
will introduce trampolines that perform long calls to avoid the limit,
and during a recursive link, trampolines are added within the section.

However, this does not work for cross-section branches when the source
section is already larger than 32MB because there is no longer space
to put the trampoline.

We are unable to build an allyesconfig kernel on ARM because the
.text section in drivers/built-in.o has that problem.

This patch avoids it by linking drivers/*/built-in.o directly into
vmlinux.o, rather than first linking them into drivers/built-in.o.

Signed-off-by: Arnd Bergmann 
---
This patch gets allyesconfig to work for me on ARM. We have previously
decided that this is too ugly, but you can use it for comparing the
link times.

diff --git a/Makefile b/Makefile
index 2eae4bab0d9b..091ca3a3015b 100644
--- a/Makefile
+++ b/Makefile
@@ -557,13 +557,6 @@ scripts: scripts_basic include/config/auto.conf 
include/config/tristate.conf \
 asm-generic gcc-plugins
$(Q)$(MAKE) $(build)=$(@)
 
-# Objects we will link into vmlinux / subdirs we need to visit
-init-y := init/
-drivers-y  := drivers/ sound/ firmware/
-net-y  := net/
-libs-y := lib/
-core-y := usr/
-virt-y := virt/
 endif # KBUILD_EXTMOD
 
 ifeq ($(dot-config),1)
@@ -584,6 +577,20 @@ $(KCONFIG_CONFIG) include/config/auto.conf.cmd: ;
 # we execute the config step to be sure to catch updated Kconfig files
 include/config/%.conf: $(KCONFIG_CONFIG) include/config/auto.conf.cmd
$(Q)$(MAKE) -f $(srctree)/Makefile silentoldconfig
+
+# Objects we will link into vmlinux / subdirs we need to visit
+init-y := init/
+net-y  := net/
+libs-y := lib/
+core-y := usr/
+virt-y := virt/
+
+# split out objects from drivers to avoid recursively linking large .o files
+include drivers/Makefile
+drivers-y  := $(addprefix drivers/,$(obj-y) $(obj-m))
+drivers-y  += sound/ firmware/
+obj-y  :=
+
 else
 # external modules needs include/generated/autoconf.h and 
include/config/auto.conf
 # but do not care if they are up-to-date. Use auto.conf to trigger the test
diff --git a/drivers/Makefile b/drivers/Makefile
index 9cfa547d67ce..38848742db1f 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -95,10 +95,7 @@ obj-$(CONFIG_ATA_OVER_ETH)   += block/aoe/
 obj-$(CONFIG_PARIDE)   += block/paride/
 obj-$(CONFIG_TC)   += tc/
 obj-$(CONFIG_UWB)  += uwb/
-obj-$(CONFIG_USB_PHY)  += usb/
-obj-$(CONFIG_USB)  += usb/
-obj-$(CONFIG_PCI)  += usb/
-obj-$(CONFIG_USB_GADGET)   += usb/
+obj-y  += usb/
 obj-$(CONFIG_SERIO)+= input/serio/
 obj-$(CONFIG_GAMEPORT) += input/gameport/
 obj-$(CONFIG_INPUT)+= input/
@@ -137,7 +134,8 @@ obj-$(CONFIG_PPC_PS3)   += ps3/
 obj-$(CONFIG_OF)   += of/
 obj-$(CONFIG_SSB)  += ssb/
 obj-$(CONFIG_BCMA) += bcma/
-obj-y  += vhost/
+obj-$(CONFIG_VHOST_RING)   += vhost/
+obj-$(CONFIG_VHOST)+= vhost/
 obj-$(CONFIG_VLYNQ)+= vlynq/
 obj-$(CONFIG_STAGING)  += staging/
 obj-y  += platform/



Re: [PATCH] powerpc: populate the default bus with machine_arch_initcall

2016-08-11 Thread Rob Herring
On Thu, Aug 11, 2016 at 6:09 AM, Kevin Hao  wrote:
> With the commit 44a7185c2ae6 ("of/platform: Add common method to
> populate default bus"), a default function is introduced to populate
> the default bus and this function is invoked at the arch_initcall_sync
> level. This will override the arch specific population of default bus
> which run at a lower level than arch_initcall_sync. Since not all
> powerpc specific buses are added to the of_default_bus_match_table[],
> this causes some powerpc specific bus are not probed. Fix this by
> using a more preceding initcall.
>
> Signed-off-by: Kevin Hao 
> ---
> Of course we can adjust the powerpc arch codes to use the
> of_platform_default_populate_init(), but it has high risk to break
> other boards given the complicated powerpc specific buses. So I would
> like just to fix the broken boards in the current release, and cook
> a patch to change to of_platform_default_populate_init() for linux-next.

The patch that broke things was sitting in -next for some time and no
one reported anything. Are all these boards broken?

I'm fine to just disable the default call for PPC instead if there's
some chance this does not fix some boards. There could be some other
initcall ordering dependencies.

>
> Only boot test on a mpc8315erdb board.

Curious, what would it take to remove the of_platform_bus_probe and
use the default here? We can add additional bus compatibles to match.
The difference between of_platform_bus_probe and
of_platform_bus_populate is the former will match root nodes with no
compatible string. Most platforms should not need that behavior and it
would be nice to know which ones.

Rob


Re: powerpc allyesconfig / allmodconfig linux-next next-20160729 - next-20160729 build failures

2016-08-11 Thread Nicholas Piggin
On Thu, 11 Aug 2016 15:04:00 +0200
Arnd Bergmann  wrote:

> On Thursday, August 11, 2016 10:43:20 PM CEST Nicholas Piggin wrote:
> > On Wed, 03 Aug 2016 22:13:28 +0200

> > Final ld time
> > inclink
> > real0m0.378s
> > user0m0.304s
> > sys 0m0.076s
> > 
> > thinarc
> > real0m0.894s
> > user0m0.684s
> > sys 0m0.200s  
> 
> This also still seems fine.
> 
> > For both cases final link gets slower with thin archives. I guess there is 
> > some
> > per-file overhead but I thought with --whole-archive it should not be that 
> > much
> > slower. Still, overall time for main ar/ld phases comes out about the same 
> > in
> > the end so I don't think it's too much problem. Unless ARM blows up 
> > significantly
> > worse with a bigger config.  
> 
> Unfortunately I think it does. I haven't tried your latest series yet,
> but I think the total time for removing built-in.o and relinking went
> up from around 4 minutes (already way too much) to 18 minutes for me.
> 
> > Linking with thin archives takes significantly more time in bfd hash lookup 
> > code.
> > I haven't dug much further yet.  
> 
> Can you try the ARM allyesconfig with thin archives? I'll follow up with two
> patches: one to get ARM to link without thin archives, and one that I used
> to get --gc-sections to work.

Okay send them over, I'll try digging into it. There is not much kbuild
code to maintain so we don't have to switch every arch. It would be nice
to though.

Thanks,
Nick


[PATCH v2] powerpc: move hmi.c to arch/powerpc/kvm/

2016-08-11 Thread Paolo Bonzini
hmi.c functions are unused unless sibling_subcore_state is nonzero, and
that in turn happens only if KVM is in use.  So move the code to
arch/powerpc/kvm/, putting it under CONFIG_KVM_BOOK3S_HV_POSSIBLE
rather than CONFIG_PPC_BOOK3S_64.  The sibling_subcore_state is also
included in struct paca_struct only if KVM is supported by the kernel.

Cc: Daniel Axtens 
Cc: Michael Ellerman 
Cc: Mahesh Salgaonkar 
Cc: Paul Mackerras 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: kvm-...@vger.kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Paolo Bonzini 
---
v1->v2: use CONFIG_KVM_BOOK3S_HV_POSSIBLE, not
CONFIG_KVM_BOOK3S_64_HANDLER.  The former implies
the latter, but the reverse is not true.

 arch/powerpc/include/asm/hmi.h |  2 +-
 arch/powerpc/include/asm/paca.h| 12 +++-
 arch/powerpc/kernel/Makefile   |  2 +-
 arch/powerpc/kvm/Makefile  |  1 +
 arch/powerpc/{kernel/hmi.c => kvm/book3s_hv_hmi.c} |  0
 5 files changed, 10 insertions(+), 7 deletions(-)
 rename arch/powerpc/{kernel/hmi.c => kvm/book3s_hv_hmi.c} (100%)

diff --git a/arch/powerpc/include/asm/hmi.h b/arch/powerpc/include/asm/hmi.h
index 88b4901ac4ee..85b7a1a21e22 100644
--- a/arch/powerpc/include/asm/hmi.h
+++ b/arch/powerpc/include/asm/hmi.h
@@ -21,7 +21,7 @@
 #ifndef __ASM_PPC64_HMI_H__
 #define __ASM_PPC64_HMI_H__
 
-#ifdef CONFIG_PPC_BOOK3S_64
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 
 #defineCORE_TB_RESYNC_REQ_BIT  63
 #define MAX_SUBCORE_PER_CORE   4
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 148303e7771f..6a6792bb39fb 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -183,11 +183,6 @@ struct paca_struct {
 */
u16 in_mce;
u8 hmi_event_available;  /* HMI event is available */
-   /*
-* Bitmap for sibling subcore status. See kvm/book3s_hv_ras.c for
-* more details
-*/
-   struct sibling_subcore_state *sibling_subcore_state;
 #endif
 
/* Stuff for accurate time accounting */
@@ -202,6 +197,13 @@ struct paca_struct {
struct kvmppc_book3s_shadow_vcpu shadow_vcpu;
 #endif
struct kvmppc_host_state kvm_hstate;
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+   /*
+* Bitmap for sibling subcore status. See kvm/book3s_hv_ras.c for
+* more details
+*/
+   struct sibling_subcore_state *sibling_subcore_state;
+#endif
 #endif
 };
 
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index b2027a5cf508..fe4c075bcf50 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -41,7 +41,7 @@ obj-$(CONFIG_VDSO32)  += vdso32/
 obj-$(CONFIG_HAVE_HW_BREAKPOINT)   += hw_breakpoint.o
 obj-$(CONFIG_PPC_BOOK3S_64)+= cpu_setup_ppc970.o cpu_setup_pa6t.o
 obj-$(CONFIG_PPC_BOOK3S_64)+= cpu_setup_power.o
-obj-$(CONFIG_PPC_BOOK3S_64)+= mce.o mce_power.o hmi.o
+obj-$(CONFIG_PPC_BOOK3S_64)+= mce.o mce_power.o
 obj-$(CONFIG_PPC_BOOK3E_64)+= exceptions-64e.o idle_book3e.o
 obj-$(CONFIG_PPC64)+= vdso64/
 obj-$(CONFIG_ALTIVEC)  += vecemu.o
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index 1f9e5529e692..855d4b95d752 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -78,6 +78,7 @@ kvm-book3s_64-builtin-xics-objs-$(CONFIG_KVM_XICS) := \
 
 ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HANDLER) += \
+   book3s_hv_hmi.o \
book3s_hv_rmhandlers.o \
book3s_hv_rm_mmu.o \
book3s_hv_ras.o \
diff --git a/arch/powerpc/kernel/hmi.c b/arch/powerpc/kvm/book3s_hv_hmi.c
similarity index 100%
rename from arch/powerpc/kernel/hmi.c
rename to arch/powerpc/kvm/book3s_hv_hmi.c
-- 
1.8.3.1



Re: powerpc allyesconfig / allmodconfig linux-next next-20160729 - next-20160729 build failures

2016-08-11 Thread Arnd Bergmann
On Thursday, August 11, 2016 10:43:20 PM CEST Nicholas Piggin wrote:
> On Wed, 03 Aug 2016 22:13:28 +0200
> Arnd Bergmann  wrote:
> 
> > On Wednesday, August 3, 2016 2:44:29 PM CEST Segher Boessenkool wrote:
> > > Hi Arnd,
> > > 
> > > On Wed, Aug 03, 2016 at 08:52:48PM +0200, Arnd Bergmann wrote:  
> > > > From my first look, it seems that all of lib/*.o is now getting linked
> > > > into vmlinux, while we traditionally leave out everything from lib/
> > > > that is not referenced.
> > > > 
> > > > I also see a noticeable overhead in link time, the numbers are for
> > > > a cache-hot rebuild after a successful allyesconfig build, using a
> > > > 24-way Opteron@2.5Ghz, just relinking vmlinux:
> > > > 
> > > > $ time make skj30 vmlinux # before
> > > > real2m8.092s
> > > > user3m41.008s
> > > > sys 0m48.172s
> > > > 
> > > > $ time make skj30 vmlinux # after
> > > > real4m10.189s
> > > > user5m43.804s
> > > > sys 0m52.988s  
> > > 
> > > Is it better when using rcT instead of rcsT?  
> > 
> > It seems to be noticeably better for the clean rebuild case, though
> > not as good as the original:
> > 
> > real3m34.015s
> > user5m7.104s
> > sys 0m49.172s
> > 
> > I've also tried now with my own patch applied as well (linking
> > each drivers/*/built-in.o into vmlinux rather than having them
> > linked into drivers/built-in.o first), but that makes no
> > difference.
> 
> I just want to come back to this, because I've subbmitted the thin
> archives kbuild patch, I wanted to make sure we're doing okay on
> ARM/ARM64. I cross compiled with my laptop.
> 
> For ARM64 allyesconfig:
> 
> After building then removing all built-in.o then rebuilding vmlinux:
> inclink
> time make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- -j8 vmlinux
> real1m18.977s
> user2m14.512s
> sys 0m29.704s
> 
> thinarc
> time make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- -j8 vmlinux
> real1m18.433s
> user2m6.128s
> sys 0m28.372s
> 
> 
> Final ld time
> inclink
> real0m4.005s
> user0m3.464s
> sys 0m0.536s
> 
> thinarc
> real0m5.841s
> user0m4.916s
> sys 0m0.916s
> 
> 
> Build directory size is of course much better (3953MB vs 5519MB).

Ok, looks great. Some downsides and some upsides here, but overall
I think this is a win.

> 
> For ARM, defconfig
> 
> After building then removing all built-in.o then rebuilding vmlinux:
> inclink
> real  0m19.593s
> user  0m22.372s
> sys   0m6.428s
> 
> thinarc
> real  0m18.919s
> user  0m21.924s
> sys   0m6.400s
> 
> 
> Final ld time
> inclink
> real  0m0.378s
> user  0m0.304s
> sys   0m0.076s
> 
> thinarc
> real0m0.894s
> user0m0.684s
> sys 0m0.200s

This also still seems fine.

> For both cases final link gets slower with thin archives. I guess there is 
> some
> per-file overhead but I thought with --whole-archive it should not be that 
> much
> slower. Still, overall time for main ar/ld phases comes out about the same in
> the end so I don't think it's too much problem. Unless ARM blows up 
> significantly
> worse with a bigger config.

Unfortunately I think it does. I haven't tried your latest series yet,
but I think the total time for removing built-in.o and relinking went
up from around 4 minutes (already way too much) to 18 minutes for me.

> Linking with thin archives takes significantly more time in bfd hash lookup 
> code.
> I haven't dug much further yet.

Can you try the ARM allyesconfig with thin archives? I'll follow up with two
patches: one to get ARM to link without thin archives, and one that I used
to get --gc-sections to work.

Arnd


Re: powerpc allyesconfig / allmodconfig linux-next next-20160729 - next-20160729 build failures

2016-08-11 Thread Nicholas Piggin
On Wed, 03 Aug 2016 22:13:28 +0200
Arnd Bergmann  wrote:

> On Wednesday, August 3, 2016 2:44:29 PM CEST Segher Boessenkool wrote:
> > Hi Arnd,
> > 
> > On Wed, Aug 03, 2016 at 08:52:48PM +0200, Arnd Bergmann wrote:  
> > > From my first look, it seems that all of lib/*.o is now getting linked
> > > into vmlinux, while we traditionally leave out everything from lib/
> > > that is not referenced.
> > > 
> > > I also see a noticeable overhead in link time, the numbers are for
> > > a cache-hot rebuild after a successful allyesconfig build, using a
> > > 24-way Opteron@2.5Ghz, just relinking vmlinux:
> > > 
> > > $ time make skj30 vmlinux # before
> > > real  2m8.092s
> > > user  3m41.008s
> > > sys   0m48.172s
> > > 
> > > $ time make skj30 vmlinux # after
> > > real  4m10.189s
> > > user  5m43.804s
> > > sys   0m52.988s  
> > 
> > Is it better when using rcT instead of rcsT?  
> 
> It seems to be noticeably better for the clean rebuild case, though
> not as good as the original:
> 
> real  3m34.015s
> user  5m7.104s
> sys   0m49.172s
> 
> I've also tried now with my own patch applied as well (linking
> each drivers/*/built-in.o into vmlinux rather than having them
> linked into drivers/built-in.o first), but that makes no
> difference.

I just want to come back to this, because I've subbmitted the thin
archives kbuild patch, I wanted to make sure we're doing okay on
ARM/ARM64. I cross compiled with my laptop.

For ARM64 allyesconfig:

After building then removing all built-in.o then rebuilding vmlinux:
inclink
time make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- -j8 vmlinux
real1m18.977s
user2m14.512s
sys 0m29.704s

thinarc
time make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- -j8 vmlinux
real1m18.433s
user2m6.128s
sys 0m28.372s


Final ld time
inclink
real0m4.005s
user0m3.464s
sys 0m0.536s

thinarc
real0m5.841s
user0m4.916s
sys 0m0.916s


Build directory size is of course much better (3953MB vs 5519MB).


For ARM, defconfig

After building then removing all built-in.o then rebuilding vmlinux:
inclink
real0m19.593s
user0m22.372s
sys 0m6.428s

thinarc
real0m18.919s
user0m21.924s
sys 0m6.400s


Final ld time
inclink
real0m0.378s
user0m0.304s
sys 0m0.076s

thinarc
real0m0.894s
user0m0.684s
sys 0m0.200s

For both cases final link gets slower with thin archives. I guess there is some
per-file overhead but I thought with --whole-archive it should not be that much
slower. Still, overall time for main ar/ld phases comes out about the same in
the end so I don't think it's too much problem. Unless ARM blows up 
significantly
worse with a bigger config.

Linking with thin archives takes significantly more time in bfd hash lookup 
code.
I haven't dug much further yet.

Thanks,
Nick


Re: [PATCH] perf/core: Fix the mask in perf_output_sample_regs

2016-08-11 Thread Peter Zijlstra

Sorry, found it in my inbox while clearing out backlog..

On Sun, Jul 03, 2016 at 11:31:58PM +0530, Madhavan Srinivasan wrote:
> When decoding the perf_regs mask in perf_output_sample_regs(),
> we loop through the mask using find_first_bit and find_next_bit functions.
> While the exisitng code works fine in most of the case,
> the logic is broken for 32bit kernel (Big Endian).
> When reading u64 mask using (u32 *)(&val)[0], find_*_bit() assumes it gets
> lower 32bits of u64 but instead gets upper 32bits which is wrong.
> Proposed fix is to swap the words of the u64 to handle this case.

> This is _not_ endianness swap.

But it looks an awful lot like it..

> +++ b/kernel/events/core.c
> @@ -5205,8 +5205,10 @@ perf_output_sample_regs(struct perf_output_handle 
> *handle,
>   struct pt_regs *regs, u64 mask)
>  {
>   int bit;
> + DECLARE_BITMAP(_mask, 64);
>  
> - for_each_set_bit(bit, (const unsigned long *) &mask,
> + bitmap_from_u64(_mask, mask);
> + for_each_set_bit(bit, _mask,
>sizeof(mask) * BITS_PER_BYTE) {
>   u64 val;

> +++ b/lib/bitmap.c

> +void bitmap_from_u64(unsigned long *dst, u64 mask)
> +{
> + dst[0] = mask & ULONG_MAX;
> +
> + if (sizeof(mask) > sizeof(unsigned long))
> + dst[1] = mask >> 32;
> +}
> +EXPORT_SYMBOL(bitmap_from_u64);

Looks small enough for an inline.

Alternatively you can go all the way and add bitmap_from_u64array(), but
that seems massive overkill.

Tedious stuff.. I can't come up with anything prettier :/


Re: [PATCH] powerpc: sysdev: cpm: fix gpio save_regs functions

2016-08-11 Thread Linus Walleij
On Thu, Aug 11, 2016 at 10:50 AM, Christophe Leroy
 wrote:

> of_mm_gpiochip_add_data() calls mm_gc->save_regs() before
> setting the data. Therefore ->save_regs() cannot use
> gpiochip_get_data()
>
> [0.275940] Unable to handle kernel paging request for data at address 
> 0x0130
> [0.283120] Faulting instruction address: 0xc01b44cc
> [0.288175] Oops: Kernel access of bad area, sig: 11 [#1]
> [0.293343] PREEMPT CMPC885
> [0.296141] CPU: 0 PID: 1 Comm: swapper Not tainted 4.7.0-g65124df-dirty 
> #68
> [0.304131] task: c6074000 ti: c608 task.ti: c608
> [0.309459] NIP: c01b44cc LR: c0011720 CTR: c0011708
> [0.314372] REGS: c6081d90 TRAP: 0300   Not tainted  (4.7.0-g65124df-dirty)
> [0.322267] MSR: 9032   CR: 2428  XER: 2000
> [0.328813] DAR: 0130 DSISR: c000
> GPR00: c01b6d0c c6081e40 c6074000 c6017000 c9028000 c601d028 c6081dd8 
> GPR08: c601d028   0001 2444  c0002790 
> GPR16:       c05643b0 0083
> GPR24: c04a1a6c c056 c04a8308 c04c6480 c0012498 c6017000 c7ffcc78 c6017000
> [0.360806] NIP [c01b44cc] gpiochip_get_data+0x4/0xc
> [0.365684] LR [c0011720] cpm1_gpio16_save_regs+0x18/0x44
> [0.370972] Call Trace:
> [0.373451] [c6081e50] [c01b6d0c] of_mm_gpiochip_add_data+0x70/0xdc
> [0.379624] [c6081e70] [c00124c0] cpm_init_par_io+0x28/0x118
> [0.385238] [c6081e80] [c04a8ac0] do_one_initcall+0xb0/0x17c
> [0.390819] [c6081ef0] [c04a8cbc] kernel_init_freeable+0x130/0x1dc
> [0.396924] [c6081f30] [c00027a4] kernel_init+0x14/0x110
> [0.402177] [c6081f40] [c000b424] ret_from_kernel_thread+0x5c/0x64
> [0.408233] Instruction dump:
> [0.411168] 4182fafc 3f80c040 48234c6d 3bc0fff0 3b9c5ed0 4bfffaf4 81290020 
> 712a0004
> [0.418825] 4182fb34 48234c51 4bfffb2c 81230004 <80690130> 4e800020 
> 7c0802a6 9421ffe0
> [0.426763] ---[ end trace fe4113ee21d72ffa ]---
>
> fixes: e65078f1f3490 ("powerpc: sysdev: cpm1: use gpiochip data pointer")
> fixes: a14a2d484b386 ("powerpc: cpm_common: use gpiochip data pointer")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Christophe Leroy 

Reviewed-by: Linus Walleij 

Sorry for screwing stuff up :(

Yours,
Linus Walleij


Re: [PATCH 0/7] ima: carry the measurement list across kexec

2016-08-11 Thread Mimi Zohar
On Thu, 2016-08-11 at 17:38 +1000, Balbir Singh wrote:
> 
> On 09/08/16 22:36, Mimi Zohar wrote:
> > On Tue, 2016-08-09 at 15:19 +1000, Balbir Singh wrote:
> >>
> >> On 04/08/16 22:24, Mimi Zohar wrote:
> >>> The TPM PCRs are only reset on a hard reboot.  In order to validate a
> >>> TPM's quote after a soft reboot (eg. kexec -e), the IMA measurement list
> >>> of the running kernel must be saved and then restored on the subsequent
> >>> boot.
> >>>
> >>> The existing securityfs binary_runtime_measurements file conveniently
> >>> provides a serialized format of the IMA measurement list. This patch
> >>> set serializes the measurement list in this format and restores it.
> >>>
> >>> This patch set pre-req's Thiago Bauermann's "kexec_file: Add buffer
> >>> hand-over for the next kernel" patch set* for actually carrying the
> >>> serialized measurement list across the kexec.
> >>>
> >>> Mimi
> >>>
> >>
> >> Hi, Mimi
> >>
> >> I am trying to convince myself of the security of the solution. I asked
> >> Thiago as well, but may be I am be lagging behind in understanding.
> >>
> >> We trust the kernel to hand over PCR values of the old kernel (which
> >> cannot be validated) to the IMA subsystem in the new kernel for storage.
> >> I guess the idea is for ima_add_boot_aggregate to do the right thing?
> >> How do we validate what the old kernel is giving us? Why do we care for
> >> the old measurement list? Is it still of significance in the new kernel?
> >>
> > 
> > Hi Balbir,
> > 
> > To validate the hardware TPM PCR values requires walking the measurement
> > list simulating the TPM extend operation.  The resulting values should
> > match the hardware TPM PCRs.
> > 
> > In the case of a soft reboot, the TPM PCRs are not reset to 0, so all
> > the measurements of the running system, including those from previous
> > soft reboots, need to be included in the measurement list.   Without
> > these measurements, the simulated PCR values will not match the hardware
> > TPM PCR values.  Thus the need for this patch set.
> > 
> > Measurements can not be added/removed/changed in the measurement list
> > without it being detectable.
> > 
> 
> Thanks Mimi
> 
> I think that makes sense
> 
> So effectively we do
> 
> first kernel boot -> 
> second kernel boot -> 
> 
> and so on

No, the running system doesn't verify the measurement list against the
PCRs, before saving and carrying it across kexec. If the system has been
compromised, it can't be trusted to verify itself.  Verifying the
measurement list needs to be done by a trusted third party.  The system
just carries the measurement list(s) across kexec.

Mimi



Re: powerpc/Makefile: Use cflags-y/aflags-y for setting endian options

2016-08-11 Thread Michael Ellerman
On Tue, 2016-09-08 at 12:43:46 UTC, Michael Ellerman wrote:
> When we introduced the little endian support, we added the endian flags
> to CC directly using override. I don't know the history of why we did
> that, I suspect no one does.
> 
> Although this mostly works, it has one bug, which is that CROSS32CC
> doesn't get -mbig-endian. That means when the compiler is little endian
> by default and the user is building big endian, vdso32 is incorrectly
> compiled as little endian and the kernel fails to build.
> 
> Instead we can add the endian flags to cflags-y/aflags-y, and then
> append those to KBUILD_CFLAGS/KBUILD_AFLAGS.
> 
> This has the advantage of being 1) less ugly, 2) the documented way of
> adding flags in the arch Makefile and 3) it fixes building vdso32 with a
> LE toolchain.
> 
> Signed-off-by: Michael Ellerman 

Applied to powerpc fixes.

https://git.kernel.org/powerpc/c/164af597ce945751e2dcd53d0a

cheers


Re: selftests/powerpc: Specify we expect to build with std=gnu99

2016-08-11 Thread Michael Ellerman
On Fri, 2016-29-07 at 10:48:09 UTC, Michael Ellerman wrote:
> We have some tests that assume we're using std=gnu99, which is fine on
> most compilers, but some old compilers use a different default.
> 
> So make it explicit that we want to use std=gnu99.
> 
> Signed-off-by: Michael Ellerman 

Applied to powerpc fixes.

https://git.kernel.org/powerpc/c/ca49e64f0cb1368fc666a53b16

cheers


Re: powerpc: Update obsolete comment in setup_32.c about early_init()

2016-08-11 Thread Michael Ellerman
On Wed, 2016-10-08 at 07:32:38 UTC, Benjamin Herrenschmidt wrote:
> We don't identify the machine type anymore...
> 
> Signed-off-by: Benjamin Herrenschmidt 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/f9cc1d1f808dbdfd56978259d2

cheers


Re: powerpc: rebuild vdsos correctly

2016-08-11 Thread Michael Ellerman
On Mon, 2016-08-08 at 09:35:43 UTC, Nicholas Piggin wrote:
> When using if_changed, we need to add FORCE to dependencies, otherwise
> we don't get command line change checking amongst other things. This
> has resulted in vdsos not being rebuilt when switching between big and
> little endian.
> 
> Signed-off-by: Nicholas Piggin 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/b9a4a0d02c5b8d9a1397c11d74

cheers


Re: powerpc: Fix crash during static key init on ppc32

2016-08-11 Thread Michael Ellerman
On Wed, 2016-10-08 at 07:27:34 UTC, Benjamin Herrenschmidt wrote:
> We cannot do those initializations from apply_feature_fixups() as
> this function runs in a very restricted environment in 32-bit where
> the kernel isn't running at its linked address and the PTRRELOC()
> macro must be used for any global accesss.
> 
> Instead, split them into a separtate steup_feature_keys() function
> which is called in a more suitable spot on ppc32.
> 
> Signed-off-by: Benjamin Herrenschmidt 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/97f6e0cc35026a2a09147a6da6

cheers


Re: powerpc: Print the kernel load address at the end of prom_init

2016-08-11 Thread Michael Ellerman
On Wed, 2016-10-08 at 07:29:29 UTC, Benjamin Herrenschmidt wrote:
> This makes it easier to debug crashes that happen very early before
> the kernel takes over Open Firmware by allowing us to relate the OF
> reported crashing addresses to offsets within the kernel.
> 
> Signed-off-by: Benjamin Herrenschmidt 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/7d70c63c7132eb95e428e94524

cheers


[PATCH] powerpc: populate the default bus with machine_arch_initcall

2016-08-11 Thread Kevin Hao
With the commit 44a7185c2ae6 ("of/platform: Add common method to
populate default bus"), a default function is introduced to populate
the default bus and this function is invoked at the arch_initcall_sync
level. This will override the arch specific population of default bus
which run at a lower level than arch_initcall_sync. Since not all
powerpc specific buses are added to the of_default_bus_match_table[],
this causes some powerpc specific bus are not probed. Fix this by
using a more preceding initcall.

Signed-off-by: Kevin Hao 
---
Of course we can adjust the powerpc arch codes to use the
of_platform_default_populate_init(), but it has high risk to break
other boards given the complicated powerpc specific buses. So I would
like just to fix the broken boards in the current release, and cook 
a patch to change to of_platform_default_populate_init() for linux-next.

Only boot test on a mpc8315erdb board.

 arch/powerpc/platforms/40x/ep405.c   | 2 +-
 arch/powerpc/platforms/40x/ppc40x_simple.c   | 2 +-
 arch/powerpc/platforms/40x/virtex.c  | 2 +-
 arch/powerpc/platforms/40x/walnut.c  | 2 +-
 arch/powerpc/platforms/44x/canyonlands.c | 2 +-
 arch/powerpc/platforms/44x/ebony.c   | 2 +-
 arch/powerpc/platforms/44x/iss4xx.c  | 2 +-
 arch/powerpc/platforms/44x/ppc44x_simple.c   | 2 +-
 arch/powerpc/platforms/44x/ppc476.c  | 2 +-
 arch/powerpc/platforms/44x/sam440ep.c| 2 +-
 arch/powerpc/platforms/44x/virtex.c  | 2 +-
 arch/powerpc/platforms/44x/warp.c| 2 +-
 arch/powerpc/platforms/82xx/ep8248e.c| 2 +-
 arch/powerpc/platforms/82xx/km82xx.c | 2 +-
 arch/powerpc/platforms/82xx/mpc8272_ads.c| 2 +-
 arch/powerpc/platforms/82xx/pq2fads.c| 2 +-
 arch/powerpc/platforms/83xx/mpc831x_rdb.c| 2 +-
 arch/powerpc/platforms/83xx/mpc834x_itx.c| 2 +-
 arch/powerpc/platforms/85xx/ppa8548.c| 2 +-
 arch/powerpc/platforms/8xx/adder875.c| 2 +-
 arch/powerpc/platforms/8xx/ep88xc.c  | 2 +-
 arch/powerpc/platforms/8xx/mpc86xads_setup.c | 2 +-
 arch/powerpc/platforms/8xx/mpc885ads_setup.c | 2 +-
 arch/powerpc/platforms/8xx/tqm8xx_setup.c| 2 +-
 arch/powerpc/platforms/cell/setup.c  | 2 +-
 arch/powerpc/platforms/embedded6xx/gamecube.c| 2 +-
 arch/powerpc/platforms/embedded6xx/linkstation.c | 2 +-
 arch/powerpc/platforms/embedded6xx/mvme5100.c| 2 +-
 arch/powerpc/platforms/embedded6xx/storcenter.c  | 2 +-
 arch/powerpc/platforms/embedded6xx/wii.c | 2 +-
 arch/powerpc/platforms/pasemi/setup.c| 2 +-
 31 files changed, 31 insertions(+), 31 deletions(-)

diff --git a/arch/powerpc/platforms/40x/ep405.c 
b/arch/powerpc/platforms/40x/ep405.c
index 1c8aec6e9bb7..1328cb38e5d7 100644
--- a/arch/powerpc/platforms/40x/ep405.c
+++ b/arch/powerpc/platforms/40x/ep405.c
@@ -62,7 +62,7 @@ static int __init ep405_device_probe(void)
 
return 0;
 }
-machine_device_initcall(ep405, ep405_device_probe);
+machine_arch_initcall(ep405, ep405_device_probe);
 
 static void __init ep405_init_bcsr(void)
 {
diff --git a/arch/powerpc/platforms/40x/ppc40x_simple.c 
b/arch/powerpc/platforms/40x/ppc40x_simple.c
index 2a050007bbae..50dce54e6b3b 100644
--- a/arch/powerpc/platforms/40x/ppc40x_simple.c
+++ b/arch/powerpc/platforms/40x/ppc40x_simple.c
@@ -39,7 +39,7 @@ static int __init ppc40x_device_probe(void)
 
return 0;
 }
-machine_device_initcall(ppc40x_simple, ppc40x_device_probe);
+machine_arch_initcall(ppc40x_simple, ppc40x_device_probe);
 
 /* This is the list of boards that can be supported by this simple
  * platform code.  This does _not_ mean the boards are compatible,
diff --git a/arch/powerpc/platforms/40x/virtex.c 
b/arch/powerpc/platforms/40x/virtex.c
index 91a08ea758a8..d262696b3cbc 100644
--- a/arch/powerpc/platforms/40x/virtex.c
+++ b/arch/powerpc/platforms/40x/virtex.c
@@ -33,7 +33,7 @@ static int __init virtex_device_probe(void)
 
return 0;
 }
-machine_device_initcall(virtex, virtex_device_probe);
+machine_arch_initcall(virtex, virtex_device_probe);
 
 static int __init virtex_probe(void)
 {
diff --git a/arch/powerpc/platforms/40x/walnut.c 
b/arch/powerpc/platforms/40x/walnut.c
index e5797815e2f1..9a9c0bccba47 100644
--- a/arch/powerpc/platforms/40x/walnut.c
+++ b/arch/powerpc/platforms/40x/walnut.c
@@ -42,7 +42,7 @@ static int __init walnut_device_probe(void)
 
return 0;
 }
-machine_device_initcall(walnut, walnut_device_probe);
+machine_arch_initcall(walnut, walnut_device_probe);
 
 static int __init walnut_probe(void)
 {
diff --git a/arch/powerpc/platforms/44x/canyonlands.c 
b/arch/powerpc/platforms/44x/canyonlands.c
index 157f4ce46386..681fa66ff194 100644
--- a/arch/powerpc/platforms/44x/canyonlands.c
+++ b/arch/powerpc/platforms/44x/canyonlands.c
@@ -47,7 +47,7 @@ static int __init ppc460ex_device_probe(void)
 
return 0;
 }
-machine_device_

  1   2   >