Re: [PATCH] soc: fsl/qe: fix Oops on CPM1 (and likely CPM2)
Le 12/08/2016 à 01:29, Scott Wood a écrit : On Mon, 2016-08-08 at 18:08 +0200, Christophe Leroy wrote: Commit 0e6e01ff694ee ("CPM/QE: use genalloc to manage CPM/QE muram") has changed the way muram is managed. genalloc uses kmalloc(), hence requires the SLAB to be up and running. On powerpc 8xx, cpm_reset() is called early during startup. cpm_reset() then calls cpm_muram_init() before SLAB is available, hence the following Oops. cpm_reset() cannot be called during initcalls because the CPM is needed for console This patch splits cpm_muram_init() in two parts. The first part, related to mappings, is kept as cpm_muram_init() The second part is named cpm_muram_pool_init() and is called the first time cpm_muram_alloc() is used Why do you need to split it, versus calling the full cpm_muram_init() on demand? There are drivers like for instance de i2c-cpm driver, that for instance call cpm_muram_addr() before calling cpm_muram_alloc() Therefore, we need muram_vbase and muram_pbase set. So if we want to keep a single function, it means we also have to call it on demand from cpm_muram_addr(), cpm_muram_offset() and cpm_muram_dma(). Is that what you recommend ? Christophe
Re: [PATCH kernel 14/15] vfio/spapr_tce: Export container API for external users
On 12/08/16 15:46, David Gibson wrote: > On Wed, Aug 10, 2016 at 10:46:30AM -0600, Alex Williamson wrote: >> On Wed, 10 Aug 2016 15:37:17 +1000 >> Alexey Kardashevskiy wrote: >> >>> On 09/08/16 22:16, Alex Williamson wrote: On Tue, 9 Aug 2016 15:19:39 +1000 Alexey Kardashevskiy wrote: > On 09/08/16 02:43, Alex Williamson wrote: >> On Wed, 3 Aug 2016 18:40:55 +1000 >> Alexey Kardashevskiy wrote: >> >>> This exports helpers which are needed to keep a VFIO container in >>> memory while there are external users such as KVM. >>> >>> Signed-off-by: Alexey Kardashevskiy >>> --- >>> drivers/vfio/vfio.c | 30 ++ >>> drivers/vfio/vfio_iommu_spapr_tce.c | 16 +++- >>> include/linux/vfio.h| 6 ++ >>> 3 files changed, 51 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c >>> index d1d70e0..baf6a9c 100644 >>> --- a/drivers/vfio/vfio.c >>> +++ b/drivers/vfio/vfio.c >>> @@ -1729,6 +1729,36 @@ long vfio_external_check_extension(struct >>> vfio_group *group, unsigned long arg) >>> EXPORT_SYMBOL_GPL(vfio_external_check_extension); >>> >>> /** >>> + * External user API for containers, exported by symbols to be linked >>> + * dynamically. >>> + * >>> + */ >>> +struct vfio_container *vfio_container_get_ext(struct file *filep) >>> +{ >>> + struct vfio_container *container = filep->private_data; >>> + >>> + if (filep->f_op != &vfio_fops) >>> + return ERR_PTR(-EINVAL); >>> + >>> + vfio_container_get(container); >>> + >>> + return container; >>> +} >>> +EXPORT_SYMBOL_GPL(vfio_container_get_ext); >>> + >>> +void vfio_container_put_ext(struct vfio_container *container) >>> +{ >>> + vfio_container_put(container); >>> +} >>> +EXPORT_SYMBOL_GPL(vfio_container_put_ext); >>> + >>> +void *vfio_container_get_iommu_data_ext(struct vfio_container >>> *container) >>> +{ >>> + return container->iommu_data; >>> +} >>> +EXPORT_SYMBOL_GPL(vfio_container_get_iommu_data_ext); >>> + >>> +/** >>> * Sub-module support >>> */ >>> /* >>> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c >>> b/drivers/vfio/vfio_iommu_spapr_tce.c >>> index 3594ad3..fceea3d 100644 >>> --- a/drivers/vfio/vfio_iommu_spapr_tce.c >>> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c >>> @@ -1331,6 +1331,21 @@ const struct vfio_iommu_driver_ops >>> tce_iommu_driver_ops = { >>> .detach_group = tce_iommu_detach_group, >>> }; >>> >>> +struct iommu_table *vfio_container_spapr_tce_table_get_ext(void >>> *iommu_data, >>> + u64 offset) >>> +{ >>> + struct tce_container *container = iommu_data; >>> + struct iommu_table *tbl = NULL; >>> + >>> + if (tce_iommu_find_table(container, offset, &tbl) < 0) >>> + return NULL; >>> + >>> + iommu_table_get(tbl); >>> + >>> + return tbl; >>> +} >>> +EXPORT_SYMBOL_GPL(vfio_container_spapr_tce_table_get_ext); >>> + >>> static int __init tce_iommu_init(void) >>> { >>> return vfio_register_iommu_driver(&tce_iommu_driver_ops); >>> @@ -1348,4 +1363,3 @@ MODULE_VERSION(DRIVER_VERSION); >>> MODULE_LICENSE("GPL v2"); >>> MODULE_AUTHOR(DRIVER_AUTHOR); >>> MODULE_DESCRIPTION(DRIVER_DESC); >>> - >>> diff --git a/include/linux/vfio.h b/include/linux/vfio.h >>> index 0ecae0b..1c2138a 100644 >>> --- a/include/linux/vfio.h >>> +++ b/include/linux/vfio.h >>> @@ -91,6 +91,12 @@ extern void vfio_group_put_external_user(struct >>> vfio_group *group); >>> extern int vfio_external_user_iommu_id(struct vfio_group *group); >>> extern long vfio_external_check_extension(struct vfio_group *group, >>> unsigned long arg); >>> +extern struct vfio_container *vfio_container_get_ext(struct file >>> *filep); >>> +extern void vfio_container_put_ext(struct vfio_container *container); >>> +extern void *vfio_container_get_iommu_data_ext( >>> + struct vfio_container *container); >>> +extern struct iommu_table *vfio_container_spapr_tce_table_get_ext( >>> + void *iommu_data, u64 offset); >>> >>> /* >>> * Sub-module helpers >> >> >> I think you need to take a closer look of the lifecycle of a container, >> having a reference means the container itself won't go away, but only >> having a group set within that container holds the actual IOMMU >> references. container->iommu_data is going to be NULL once the >> groups are lost. Thanks, > > > Container own
Re: [PATCH v4] powerpc/pseries: use pci_host_bridge.release_fn() to kfree(phb)
On 12/08/16 06:25, Mauricio Faria de Oliveira wrote: This patch leverages 'struct pci_host_bridge' from the PCI subsystem in order to free the pci_controller only after the last reference to its devices is dropped (avoiding an oops in pcibios_release_device() if the last reference is dropped after pcibios_free_controller()). The patch relies on pci_host_bridge.release_fn() (and .release_data), which is called automatically by the PCI subsystem when the root bus is released (i.e., the last reference is dropped). Those fields are set via pci_set_host_bridge_release() (e.g. in the platform-specific implementation of pcibios_root_bridge_prepare()). It introduces the 'pcibios_free_controller_deferred()' .release_fn() and it expects .release_data to hold a pointer to the pci_controller. The function implictly calls 'pcibios_free_controller()', so an user must *NOT* explicitly call it if using the new _deferred() callback. The functionality is enabled for pseries (although it isn't platform specific, and may be used by cxl). Details on not-so-elegant design choices: - Use 'pci_host_bridge.release_data' field as pointer to associated 'struct pci_controller' so *not* to 'pci_bus_to_host(bridge->bus)' in pcibios_free_controller_deferred(). That's because pci_remove_root_bus() sets 'host_bridge->bus = NULL' (so, if the last reference is released after pci_remove_root_bus() runs, which eventually reaches pcibios_free_controller_deferred(), that would hit a null pointer dereference). The cxl/vphb.c code calls pci_remove_root_bus(), and the cxl folks are interested in this fix. Test-case #1 (hold references) # ls -ld /sys/block/sd* | grep -m1 0021:01:00.0 <...> /sys/block/sdaa -> ../devices/pci0021:01/0021:01:00.0/<...> # ls -ld /sys/block/sd* | grep -m1 0021:01:00.1 <...> /sys/block/sdab -> ../devices/pci0021:01/0021:01:00.1/<...> # cat >/dev/sdaa & pid1=$! # cat >/dev/sdab & pid2=$! # drmgr -w 5 -d 1 -c phb -s 'PHB 33' -r Validating PHB DLPAR capability...yes. [ 594.306719] pci_hp_remove_devices: PCI: Removing devices on bus 0021:01 [ 594.306738] pci_hp_remove_devices:Removing 0021:01:00.0... ... [ 598.236381] pci_hp_remove_devices:Removing 0021:01:00.1... ... [ 611.972077] pci_bus 0021:01: busn_res: [bus 01-ff] is released [ 611.972140] rpadlpar_io: slot PHB 33 removed # kill -9 $pid1 # kill -9 $pid2 [ 632.918088] pcibios_free_controller_deferred: domain 33, dynamic 1 Test-case #2 (don't hold references) # drmgr -w 5 -d 1 -c phb -s 'PHB 33' -r Validating PHB DLPAR capability...yes. [ 916.357363] pci_hp_remove_devices: PCI: Removing devices on bus 0021:01 [ 916.357386] pci_hp_remove_devices:Removing 0021:01:00.0... ... [ 920.566527] pci_hp_remove_devices:Removing 0021:01:00.1... ... [ 933.955873] pci_bus 0021:01: busn_res: [bus 01-ff] is released [ 933.955977] pcibios_free_controller_deferred: domain 33, dynamic 1 [ 933.955999] rpadlpar_io: slot PHB 33 removed Suggested-By: Gavin Shan Signed-off-by: Mauricio Faria de Oliveira Reviewed-by: Andrew Donnellan Tested-by: Andrew Donnellan # cxl Does this justify a Cc: stable? -- Andrew Donnellan OzLabs, ADL Canberra andrew.donnel...@au1.ibm.com IBM Australia Limited
Re: [PATCH v4] powerpc/pseries: use pci_host_bridge.release_fn() to kfree(phb)
On 12/08/16 15:54, Gavin Shan wrote: It might be nicer for users to implement their own pcibios_free_controller_deferred(), meaning pSeries needs its own implementation for now. The reason is more user (pSeries) specific objects can be released together with the PHB. However, I'm still fine without the comment to be covered. That's probably not a bad idea, though from a cxl perspective I'm fine with using the current version. -- Andrew Donnellan OzLabs, ADL Canberra andrew.donnel...@au1.ibm.com IBM Australia Limited
Re: [PATCH v4] powerpc/pseries: use pci_host_bridge.release_fn() to kfree(phb)
On Thu, Aug 11, 2016 at 05:25:40PM -0300, Mauricio Faria de Oliveira wrote: >This patch leverages 'struct pci_host_bridge' from the PCI subsystem >in order to free the pci_controller only after the last reference to >its devices is dropped (avoiding an oops in pcibios_release_device() >if the last reference is dropped after pcibios_free_controller()). > >The patch relies on pci_host_bridge.release_fn() (and .release_data), >which is called automatically by the PCI subsystem when the root bus >is released (i.e., the last reference is dropped). Those fields are >set via pci_set_host_bridge_release() (e.g. in the platform-specific >implementation of pcibios_root_bridge_prepare()). > >It introduces the 'pcibios_free_controller_deferred()' .release_fn() >and it expects .release_data to hold a pointer to the pci_controller. > >The function implictly calls 'pcibios_free_controller()', so an user >must *NOT* explicitly call it if using the new _deferred() callback. > >The functionality is enabled for pseries (although it isn't platform >specific, and may be used by cxl). > >Details on not-so-elegant design choices: > > - Use 'pci_host_bridge.release_data' field as pointer to associated > 'struct pci_controller' so *not* to 'pci_bus_to_host(bridge->bus)' > in pcibios_free_controller_deferred(). > > That's because pci_remove_root_bus() sets 'host_bridge->bus = NULL' > (so, if the last reference is released after pci_remove_root_bus() > runs, which eventually reaches pcibios_free_controller_deferred(), > that would hit a null pointer dereference). > > The cxl/vphb.c code calls pci_remove_root_bus(), and the cxl folks > are interested in this fix. > >Test-case #1 (hold references) > > # ls -ld /sys/block/sd* | grep -m1 0021:01:00.0 > <...> /sys/block/sdaa -> ../devices/pci0021:01/0021:01:00.0/<...> > > # ls -ld /sys/block/sd* | grep -m1 0021:01:00.1 > <...> /sys/block/sdab -> ../devices/pci0021:01/0021:01:00.1/<...> > > # cat >/dev/sdaa & pid1=$! > # cat >/dev/sdab & pid2=$! > > # drmgr -w 5 -d 1 -c phb -s 'PHB 33' -r > Validating PHB DLPAR capability...yes. > [ 594.306719] pci_hp_remove_devices: PCI: Removing devices on bus 0021:01 > [ 594.306738] pci_hp_remove_devices:Removing 0021:01:00.0... > ... > [ 598.236381] pci_hp_remove_devices:Removing 0021:01:00.1... > ... > [ 611.972077] pci_bus 0021:01: busn_res: [bus 01-ff] is released > [ 611.972140] rpadlpar_io: slot PHB 33 removed > > # kill -9 $pid1 > # kill -9 $pid2 > [ 632.918088] pcibios_free_controller_deferred: domain 33, dynamic 1 > >Test-case #2 (don't hold references) > > # drmgr -w 5 -d 1 -c phb -s 'PHB 33' -r > Validating PHB DLPAR capability...yes. > [ 916.357363] pci_hp_remove_devices: PCI: Removing devices on bus 0021:01 > [ 916.357386] pci_hp_remove_devices:Removing 0021:01:00.0... > ... > [ 920.566527] pci_hp_remove_devices:Removing 0021:01:00.1... > ... > [ 933.955873] pci_bus 0021:01: busn_res: [bus 01-ff] is released > [ 933.955977] pcibios_free_controller_deferred: domain 33, dynamic 1 > [ 933.955999] rpadlpar_io: slot PHB 33 removed > >Suggested-By: Gavin Shan >Signed-off-by: Mauricio Faria de Oliveira I don't have more obvious comments except below one nitpicky: Reviewed-by: Gavin Shan >--- >Changelog: > - v4: improve usability/design/documentation: > - rename function to pcibios_free_controller_deferred() > - from function call pcibios_free_controller() > - no more struct pci_controller.bridge field > thanks: Gavin Shan, Andrew Donnellan > - v3: different approach: struct pci_host_bridge.release_fn() > - v2: different approach: struct pci_controller.refcount > > arch/powerpc/include/asm/pci-bridge.h | 1 + > arch/powerpc/kernel/pci-common.c | 36 ++ > arch/powerpc/platforms/pseries/pci.c | 4 > arch/powerpc/platforms/pseries/pci_dlpar.c | 7 -- > 4 files changed, 46 insertions(+), 2 deletions(-) > >diff --git a/arch/powerpc/include/asm/pci-bridge.h >b/arch/powerpc/include/asm/pci-bridge.h >index b5e88e4..c0309c5 100644 >--- a/arch/powerpc/include/asm/pci-bridge.h >+++ b/arch/powerpc/include/asm/pci-bridge.h >@@ -301,6 +301,7 @@ extern void pci_process_bridge_OF_ranges(struct >pci_controller *hose, > /* Allocate & free a PCI host bridge structure */ > extern struct pci_controller *pcibios_alloc_controller(struct device_node > *dev); > extern void pcibios_free_controller(struct pci_controller *phb); >+extern void pcibios_free_controller_deferred(struct pci_host_bridge *bridge); > > #ifdef CONFIG_PCI > extern int pcibios_vaddr_is_ioport(void __iomem *address); >diff --git a/arch/powerpc/kernel/pci-common.c >b/arch/powerpc/kernel/pci-common.c >index a5c0153..8c48a78 100644 >--- a/arch/powerpc/kernel/pci-common.c >+++ b/arch/powerpc/kernel/pci-common.c >@@ -151,6 +151,42 @@ void pcibios_free_controller(struct pci_controller *phb) > EXPORT_SYMBOL_GPL(pcibios_free_controller); > > /* >+ *
Re: [PATCH] powerpc: populate the default bus with machine_arch_initcall
On Fri, Aug 12, 2016 at 02:39:32PM +1000, Michael Ellerman wrote: > Kevin Hao writes: > > > With the commit 44a7185c2ae6 ("of/platform: Add common method to > > populate default bus"), a default function is introduced to populate > > the default bus and this function is invoked at the arch_initcall_sync > > level. This will override the arch specific population of default bus > > which run at a lower level than arch_initcall_sync. Since not all > > powerpc specific buses are added to the of_default_bus_match_table[], > > this causes some powerpc specific bus are not probed. Fix this by > > using a more preceding initcall. > > > > Signed-off-by: Kevin Hao > > --- > > Of course we can adjust the powerpc arch codes to use the > > of_platform_default_populate_init(), but it has high risk to break > > other boards given the complicated powerpc specific buses. So I would > > like just to fix the broken boards in the current release, and cook > > a patch to change to of_platform_default_populate_init() for linux-next. > > > > Only boot test on a mpc8315erdb board. > > > > arch/powerpc/platforms/40x/ep405.c | 2 +- > > arch/powerpc/platforms/40x/ppc40x_simple.c | 2 +- > > arch/powerpc/platforms/40x/virtex.c | 2 +- > > arch/powerpc/platforms/40x/walnut.c | 2 +- > > arch/powerpc/platforms/44x/canyonlands.c | 2 +- > > arch/powerpc/platforms/44x/ebony.c | 2 +- > > arch/powerpc/platforms/44x/iss4xx.c | 2 +- > > arch/powerpc/platforms/44x/ppc44x_simple.c | 2 +- > > arch/powerpc/platforms/44x/ppc476.c | 2 +- > > arch/powerpc/platforms/44x/sam440ep.c| 2 +- > > arch/powerpc/platforms/44x/virtex.c | 2 +- > > arch/powerpc/platforms/44x/warp.c| 2 +- > > arch/powerpc/platforms/82xx/ep8248e.c| 2 +- > > arch/powerpc/platforms/82xx/km82xx.c | 2 +- > > arch/powerpc/platforms/82xx/mpc8272_ads.c| 2 +- > > arch/powerpc/platforms/82xx/pq2fads.c| 2 +- > > arch/powerpc/platforms/83xx/mpc831x_rdb.c| 2 +- > > arch/powerpc/platforms/83xx/mpc834x_itx.c| 2 +- > > arch/powerpc/platforms/85xx/ppa8548.c| 2 +- > > arch/powerpc/platforms/8xx/adder875.c| 2 +- > > arch/powerpc/platforms/8xx/ep88xc.c | 2 +- > > arch/powerpc/platforms/8xx/mpc86xads_setup.c | 2 +- > > arch/powerpc/platforms/8xx/mpc885ads_setup.c | 2 +- > > arch/powerpc/platforms/8xx/tqm8xx_setup.c| 2 +- > > arch/powerpc/platforms/cell/setup.c | 2 +- > > arch/powerpc/platforms/embedded6xx/gamecube.c| 2 +- > > arch/powerpc/platforms/embedded6xx/linkstation.c | 2 +- > > arch/powerpc/platforms/embedded6xx/mvme5100.c| 2 +- > > arch/powerpc/platforms/embedded6xx/storcenter.c | 2 +- > > arch/powerpc/platforms/embedded6xx/wii.c | 2 +- > > arch/powerpc/platforms/pasemi/setup.c| 2 +- > > That's not a very minimal fix. > > Every one of those initcall changes could be introducing a bug, by > changing the order vs other init calls. > > Can we just go back to the old behaviour on ppc? Sure. How about this one? From 4362b4cdd8a6198df4cc46c628473f0d44e03fa8 Mon Sep 17 00:00:00 2001 From: Kevin Hao Date: Fri, 12 Aug 2016 13:30:03 +0800 Subject: [PATCH v2] of/platform: disable the of_platform_default_populate_init() for all the ppc boards With the commit 44a7185c2ae6 ("of/platform: Add common method to populate default bus"), a default function is introduced to populate the default bus and this function is invoked at the arch_initcall_sync level. But a lot of ppc boards use machine_device_initcall() to populate the default bus. This means that the default populate function has higher priority and would override the arch specific population of the bus. The side effect is that some arch specific bus are not probed, then cause various malfunction due to the miss of some devices. Since it is very possible to introduce bugs if we simply change the initcall level for all these boards(about 30+). This just disable this default function for all the ppc boards. Signed-off-by: Kevin Hao --- drivers/of/platform.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/of/platform.c b/drivers/of/platform.c index 8aa197691074..f39ccd5aa701 100644 --- a/drivers/of/platform.c +++ b/drivers/of/platform.c @@ -497,6 +497,7 @@ int of_platform_default_populate(struct device_node *root, } EXPORT_SYMBOL_GPL(of_platform_default_populate); +#ifndef CONFIG_PPC static int __init of_platform_default_populate_init(void) { struct device_node *node; @@ -521,6 +522,7 @@ static int __init of_platform_default_populate_init(void) return 0; } arch_initcall_sync(of_platform_default_populate_init); +#endif static int of_platform_device_destroy(struct device *dev, void *data) { -- 2.8.1 Thanks, Kevin signature.asc Description: PGP signature
Re: [PATCH kernel 14/15] vfio/spapr_tce: Export container API for external users
On Wed, Aug 03, 2016 at 06:40:55PM +1000, Alexey Kardashevskiy wrote: > This exports helpers which are needed to keep a VFIO container in > memory while there are external users such as KVM. > > Signed-off-by: Alexey Kardashevskiy I'll address Alex W's broader concerns in a different mail. But there are some more superficial problems with this as well. > --- > drivers/vfio/vfio.c | 30 ++ > drivers/vfio/vfio_iommu_spapr_tce.c | 16 +++- > include/linux/vfio.h| 6 ++ > 3 files changed, 51 insertions(+), 1 deletion(-) > > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c > index d1d70e0..baf6a9c 100644 > --- a/drivers/vfio/vfio.c > +++ b/drivers/vfio/vfio.c > @@ -1729,6 +1729,36 @@ long vfio_external_check_extension(struct vfio_group > *group, unsigned long arg) > EXPORT_SYMBOL_GPL(vfio_external_check_extension); > > /** > + * External user API for containers, exported by symbols to be linked > + * dynamically. > + * > + */ > +struct vfio_container *vfio_container_get_ext(struct file *filep) > +{ > + struct vfio_container *container = filep->private_data; > + > + if (filep->f_op != &vfio_fops) > + return ERR_PTR(-EINVAL); > + > + vfio_container_get(container); > + > + return container; > +} > +EXPORT_SYMBOL_GPL(vfio_container_get_ext); > + > +void vfio_container_put_ext(struct vfio_container *container) > +{ > + vfio_container_put(container); > +} > +EXPORT_SYMBOL_GPL(vfio_container_put_ext); > + > +void *vfio_container_get_iommu_data_ext(struct vfio_container *container) > +{ > + return container->iommu_data; > +} > +EXPORT_SYMBOL_GPL(vfio_container_get_iommu_data_ext); > + > +/** > * Sub-module support > */ > /* > diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c > b/drivers/vfio/vfio_iommu_spapr_tce.c > index 3594ad3..fceea3d 100644 > --- a/drivers/vfio/vfio_iommu_spapr_tce.c > +++ b/drivers/vfio/vfio_iommu_spapr_tce.c > @@ -1331,6 +1331,21 @@ const struct vfio_iommu_driver_ops > tce_iommu_driver_ops = { > .detach_group = tce_iommu_detach_group, > }; > > +struct iommu_table *vfio_container_spapr_tce_table_get_ext(void *iommu_data, > + u64 offset) I really dislike this name. I was confused for a while why this existed on top of vfio_container_get_ext(), the names are so similar. Making it take a void * is also really nasty since that void * has to be something specific. It would be better to have this take a vfio_container *, verify that the container really does have an spapr_tce backend, then lookup the tce_container and the actual IOMMU tables within. That might also let you drop vfio_container_get_iommu_data_ext() entirely. > +{ > + struct tce_container *container = iommu_data; > + struct iommu_table *tbl = NULL; > + > + if (tce_iommu_find_table(container, offset, &tbl) < 0) > + return NULL; > + > + iommu_table_get(tbl); > + > + return tbl; > +} > +EXPORT_SYMBOL_GPL(vfio_container_spapr_tce_table_get_ext); > + > static int __init tce_iommu_init(void) > { > return vfio_register_iommu_driver(&tce_iommu_driver_ops); > @@ -1348,4 +1363,3 @@ MODULE_VERSION(DRIVER_VERSION); > MODULE_LICENSE("GPL v2"); > MODULE_AUTHOR(DRIVER_AUTHOR); > MODULE_DESCRIPTION(DRIVER_DESC); > - > diff --git a/include/linux/vfio.h b/include/linux/vfio.h > index 0ecae0b..1c2138a 100644 > --- a/include/linux/vfio.h > +++ b/include/linux/vfio.h > @@ -91,6 +91,12 @@ extern void vfio_group_put_external_user(struct vfio_group > *group); > extern int vfio_external_user_iommu_id(struct vfio_group *group); > extern long vfio_external_check_extension(struct vfio_group *group, > unsigned long arg); > +extern struct vfio_container *vfio_container_get_ext(struct file *filep); > +extern void vfio_container_put_ext(struct vfio_container *container); > +extern void *vfio_container_get_iommu_data_ext( > + struct vfio_container *container); > +extern struct iommu_table *vfio_container_spapr_tce_table_get_ext( > + void *iommu_data, u64 offset); > > /* > * Sub-module helpers -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature
Re: [PATCH kernel 14/15] vfio/spapr_tce: Export container API for external users
On Wed, Aug 10, 2016 at 10:46:30AM -0600, Alex Williamson wrote: > On Wed, 10 Aug 2016 15:37:17 +1000 > Alexey Kardashevskiy wrote: > > > On 09/08/16 22:16, Alex Williamson wrote: > > > On Tue, 9 Aug 2016 15:19:39 +1000 > > > Alexey Kardashevskiy wrote: > > > > > >> On 09/08/16 02:43, Alex Williamson wrote: > > >>> On Wed, 3 Aug 2016 18:40:55 +1000 > > >>> Alexey Kardashevskiy wrote: > > >>> > > This exports helpers which are needed to keep a VFIO container in > > memory while there are external users such as KVM. > > > > Signed-off-by: Alexey Kardashevskiy > > --- > > drivers/vfio/vfio.c | 30 > > ++ > > drivers/vfio/vfio_iommu_spapr_tce.c | 16 +++- > > include/linux/vfio.h| 6 ++ > > 3 files changed, 51 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c > > index d1d70e0..baf6a9c 100644 > > --- a/drivers/vfio/vfio.c > > +++ b/drivers/vfio/vfio.c > > @@ -1729,6 +1729,36 @@ long vfio_external_check_extension(struct > > vfio_group *group, unsigned long arg) > > EXPORT_SYMBOL_GPL(vfio_external_check_extension); > > > > /** > > + * External user API for containers, exported by symbols to be linked > > + * dynamically. > > + * > > + */ > > +struct vfio_container *vfio_container_get_ext(struct file *filep) > > +{ > > + struct vfio_container *container = filep->private_data; > > + > > + if (filep->f_op != &vfio_fops) > > + return ERR_PTR(-EINVAL); > > + > > + vfio_container_get(container); > > + > > + return container; > > +} > > +EXPORT_SYMBOL_GPL(vfio_container_get_ext); > > + > > +void vfio_container_put_ext(struct vfio_container *container) > > +{ > > + vfio_container_put(container); > > +} > > +EXPORT_SYMBOL_GPL(vfio_container_put_ext); > > + > > +void *vfio_container_get_iommu_data_ext(struct vfio_container > > *container) > > +{ > > + return container->iommu_data; > > +} > > +EXPORT_SYMBOL_GPL(vfio_container_get_iommu_data_ext); > > + > > +/** > > * Sub-module support > > */ > > /* > > diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c > > b/drivers/vfio/vfio_iommu_spapr_tce.c > > index 3594ad3..fceea3d 100644 > > --- a/drivers/vfio/vfio_iommu_spapr_tce.c > > +++ b/drivers/vfio/vfio_iommu_spapr_tce.c > > @@ -1331,6 +1331,21 @@ const struct vfio_iommu_driver_ops > > tce_iommu_driver_ops = { > > .detach_group = tce_iommu_detach_group, > > }; > > > > +struct iommu_table *vfio_container_spapr_tce_table_get_ext(void > > *iommu_data, > > + u64 offset) > > +{ > > + struct tce_container *container = iommu_data; > > + struct iommu_table *tbl = NULL; > > + > > + if (tce_iommu_find_table(container, offset, &tbl) < 0) > > + return NULL; > > + > > + iommu_table_get(tbl); > > + > > + return tbl; > > +} > > +EXPORT_SYMBOL_GPL(vfio_container_spapr_tce_table_get_ext); > > + > > static int __init tce_iommu_init(void) > > { > > return vfio_register_iommu_driver(&tce_iommu_driver_ops); > > @@ -1348,4 +1363,3 @@ MODULE_VERSION(DRIVER_VERSION); > > MODULE_LICENSE("GPL v2"); > > MODULE_AUTHOR(DRIVER_AUTHOR); > > MODULE_DESCRIPTION(DRIVER_DESC); > > - > > diff --git a/include/linux/vfio.h b/include/linux/vfio.h > > index 0ecae0b..1c2138a 100644 > > --- a/include/linux/vfio.h > > +++ b/include/linux/vfio.h > > @@ -91,6 +91,12 @@ extern void vfio_group_put_external_user(struct > > vfio_group *group); > > extern int vfio_external_user_iommu_id(struct vfio_group *group); > > extern long vfio_external_check_extension(struct vfio_group *group, > > unsigned long arg); > > +extern struct vfio_container *vfio_container_get_ext(struct file > > *filep); > > +extern void vfio_container_put_ext(struct vfio_container *container); > > +extern void *vfio_container_get_iommu_data_ext( > > + struct vfio_container *container); > > +extern struct iommu_table *vfio_container_spapr_tce_table_get_ext( > > + void *iommu_data, u64 offset); > > > > /* > > * Sub-module helpers > > >>> > > >>> > > >>> I think you need to take a closer look of the lifecycle of a container, > > >>> having a reference means the container itself won't go away, but only > > >>> having a group set within that container holds the actual IOMMU > > >>> references. container->iommu_data is going to be NULL once the >
[PATCH v4 4/5] PCI: Add a new option for resource_alignment to reassign alignment
When using resource_alignment kernel parameter, the current implement reassigns the alignment by changing resources' size which can potentially break some drivers. For example, the driver uses the size to locate some register whose length is related to the size. This patch adds a new option "noresize" for the parameter to solve this problem. Signed-off-by: Yongji Xie --- Documentation/kernel-parameters.txt |9 ++--- drivers/pci/pci.c | 37 +-- 2 files changed, 33 insertions(+), 13 deletions(-) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 46c030a..c64e439 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -3023,15 +3023,18 @@ bytes respectively. Such letter suffixes can also be entirely omitted. window. The default value is 64 megabytes. resource_alignment= Format: - [@][:]:.[; ...] - [@]pci::\ - [::][; ...] + [@][noresize@][:] + :.[; ...] + [@][noresize@]pci:: + [::][; ...] Specifies alignment and device to reassign aligned memory resources. If is not specified, PAGE_SIZE is used as alignment. PCI-PCI bridge can be specified, if resource windows need to be expanded. + noresize: Don't change the resources' sizes when + reassigning alignment. ecrc= Enable/disable PCIe ECRC (transaction layer end-to-end CRC checking). bios: Use BIOS/firmware settings. This is the diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index caa0894..d895be7 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -4946,11 +4946,13 @@ static DEFINE_SPINLOCK(resource_alignment_lock); /** * pci_specified_resource_alignment - get resource alignment specified by user. * @dev: the PCI device to get + * @resize: whether or not to change resources' size when reassigning alignment * * RETURNS: Resource alignment if it is specified. * Zero if it is not specified. */ -static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev) +static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev, + bool *resize) { int seg, bus, slot, func, align_order, count; unsigned short vendor, device, subsystem_vendor, subsystem_device; @@ -4974,6 +4976,13 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev) } else { align_order = -1; } + + if (!strncmp(p, "noresize@", 9)) { + *resize = false; + p += 9; + } else + *resize = true; + if (strncmp(p, "pci:", 4) == 0) { /* PCI vendor/device (subvendor/subdevice) ids are specified */ p += 4; @@ -5045,6 +5054,7 @@ void pci_reassigndev_resource_alignment(struct pci_dev *dev) { int i; struct resource *r; + bool resize = true; resource_size_t align, size; /* @@ -5057,7 +5067,7 @@ void pci_reassigndev_resource_alignment(struct pci_dev *dev) return; /* check if specified PCI is target device to reassign */ - align = pci_specified_resource_alignment(dev); + align = pci_specified_resource_alignment(dev, &resize); if (!align) return; @@ -5080,15 +5090,22 @@ void pci_reassigndev_resource_alignment(struct pci_dev *dev) } size = resource_size(r); - if (size < align) { - size = align; - dev_info(&dev->dev, - "Rounding up size of resource #%d to %#llx.\n", - i, (unsigned long long)size); + if (resize) { + if (size < align) { + size = align; + dev_info(&dev->dev, + "Rounding up size of resource #%d to %#llx.\n", + i, (unsigned long long)size); + } + r->flags |= IORESOURCE_UNSET; + r->end = size - 1; + r->start = 0; + } else { + r->flags &= ~IORESOURCE_SIZEALIGN; +
[PATCH v4 5/5] PCI: Add a macro to set default alignment for all PCI devices
When vfio passthroughs a PCI device of which MMIO BARs are smaller than PAGE_SIZE, guest will not handle the mmio accesses to the BARs which leads to mmio emulations in host. This is because vfio will not allow to passthrough one BAR's mmio page which may be shared with other BARs. Otherwise, there will be a backdoor that guest can use to access BARs of other guest. This patch adds a macro to set default alignment for all PCI devices. Then we could solve this issue on some platforms which would easily hit this issue because of their 64K page such as PowerNV platform by defining this macro as PAGE_SIZE. Signed-off-by: Yongji Xie --- arch/powerpc/include/asm/pci.h |4 drivers/pci/pci.c |4 2 files changed, 8 insertions(+) diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h index e9bd6cf..5e31bc2 100644 --- a/arch/powerpc/include/asm/pci.h +++ b/arch/powerpc/include/asm/pci.h @@ -28,6 +28,10 @@ #define PCIBIOS_MIN_IO 0x1000 #define PCIBIOS_MIN_MEM0x1000 +#ifdef CONFIG_PPC_POWERNV +#define PCIBIOS_DEFAULT_ALIGNMENT PAGE_SIZE +#endif + struct pci_dev; /* Values for the `which' argument to sys_pciconfig_iobase syscall. */ diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index d895be7..feae59e 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -4959,6 +4959,10 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev, resource_size_t align = 0; char *p; +#ifdef PCIBIOS_DEFAULT_ALIGNMENT + align = PCIBIOS_DEFAULT_ALIGNMENT; + *resize = false; +#endif spin_lock(&resource_alignment_lock); p = resource_alignment_param; if (pci_has_flag(PCI_PROBE_ONLY)) { -- 1.7.9.5
[PATCH v4 3/5] PCI: Do not disable memory decoding in pci_reassigndev_resource_alignment()
We should not disable memory decoding when we reassign alignment in pci_reassigndev_resource_alignment(). It's meaningless and have some side effects. For example, we found it would break this kind of P2P bridge: 0001:02:02.0 PCI bridge: PLX Technology, Inc. PEX 8718 16-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev aa) And it may also potentially break the PCI devices with mmio_always_on bit set. Besides, disabling memory decoding is not expected in some fixup function such as fixup_vga(). The fixup_vga() read PCI_COMMAND_MEMORY to know whether the devices has been initialized by the firmware or not. Disabling memory decoding would cause the one initialized by firmware may not be set as the default VGA device when more than one graphics adapter is present. Signed-off-by: Yongji Xie --- drivers/pci/pci.c |8 +--- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index b8357d7..caa0894 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -5046,7 +5046,6 @@ void pci_reassigndev_resource_alignment(struct pci_dev *dev) int i; struct resource *r; resource_size_t align, size; - u16 command; /* * VF BARs are RO zero according to SR-IOV spec 3.4.1.11. Their @@ -5069,12 +5068,7 @@ void pci_reassigndev_resource_alignment(struct pci_dev *dev) return; } - dev_info(&dev->dev, - "Disabling memory decoding and releasing memory resources.\n"); - pci_read_config_word(dev, PCI_COMMAND, &command); - command &= ~PCI_COMMAND_MEMORY; - pci_write_config_word(dev, PCI_COMMAND, command); - + dev_info(&dev->dev, "Releasing memory resources.\n"); for (i = 0; i < PCI_BRIDGE_RESOURCES; i++) { r = &dev->resource[i]; if (!(r->flags & IORESOURCE_MEM)) -- 1.7.9.5
[PATCH v4 2/5] PCI: Ignore enforced alignment to VF BARs
VF BARs are read-only zeroes according to SRIOV spec, the normal way(writing BARs) of allocating resources wouldn't be applied to VFs. The VFs' resources would be allocated when we enable SR-IOV capability. So we should not try to reassign alignment after we enable VFs. It's meaningless and will release the allocated resources which leads to a bug. Signed-off-by: Yongji Xie --- drivers/pci/pci.c |9 + 1 file changed, 9 insertions(+) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 2d85a96..b8357d7 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -5048,6 +5048,15 @@ void pci_reassigndev_resource_alignment(struct pci_dev *dev) resource_size_t align, size; u16 command; + /* +* VF BARs are RO zero according to SR-IOV spec 3.4.1.11. Their +* resources would be allocated when we enable them and not be +* re-allocated any more. So we should never try to reassign +* VF's alignment here. +*/ + if (dev->is_virtfn) + return; + /* check if specified PCI is target device to reassign */ align = pci_specified_resource_alignment(dev); if (!align) -- 1.7.9.5
[PATCH v4 0/5] PCI: Introduce a way to enforce all MMIO BARs not to share PAGE_SIZE
This series introduces a way for PCI resource allocator to force MMIO BARs not to share PAGE_SIZE. This would make sense to VFIO driver. Because current VFIO implementation disallows to mmap sub-page(size < PAGE_SIZE) MMIO BARs which may share the same page with other BARs for security reasons. Thus, we have to handle mmio access to these BARs in QEMU emulation rather than in guest which will cause some performance loss. In our solution, we try to make use of the existing code path of resource_alignment kernel parameter and add a macro to set default alignment for it. Thus we can define this macro by default on some archs which may easily hit the performance issue because of their 64K page. In this series, patch 1,2,3 fixed bugs of using resource_alignment; patch 4 tried to add a new option for resource_alignment to use IORESOURCE_STARTALIGN to specify the alignment of PCI BARs; patch 5 adds a macro to set the default alignment of all MMIO BARs. Changelog v4: - Rebased against v4.8-rc1 - Drop one irrelevant patch - Drop the patch that adding wildcard to resource_alignment to enforce the alignment of all MMIO BARs to be at least PAGE_SIZE - Change the format of option "noresize" of resource_alignment - Code style improvements Changelog v3: - Ignore enforced alignment to fixed BARs - Fix issue that disabling memory decoding when reassigning the alignment - Only enable default alignment on PowerNV platform Changelog v2: - Ignore enforced alignment to VF BARs on pci_reassigndev_resource_alignment() Yongji Xie (5): PCI: Ignore enforced alignment when kernel uses existing firmware setup PCI: Ignore enforced alignment to VF BARs PCI: Do not disable memory decoding in pci_reassigndev_resource_alignment() PCI: Add a new option for resource_alignment to reassign alignment PCI: Add a macro to set default alignment for all PCI devices Documentation/kernel-parameters.txt |9 +++-- arch/powerpc/include/asm/pci.h |4 ++ drivers/pci/pci.c | 71 ++- 3 files changed, 64 insertions(+), 20 deletions(-) -- 1.7.9.5
[PATCH v4 1/5] PCI: Ignore enforced alignment when kernel uses existing firmware setup
PCI resources allocator will use firmware setup and not try to reassign resource when PCI_PROBE_ONLY or IORESOURCE_PCI_FIXED is set. The enforced alignment in pci_reassigndev_resource_alignment() should be ignored in this case. Otherwise, some PCI devices' resources would be released here and not re-allocated. Signed-off-by: Yongji Xie --- drivers/pci/pci.c | 13 + 1 file changed, 13 insertions(+) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index aab9d51..2d85a96 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -4959,6 +4959,13 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev) spin_lock(&resource_alignment_lock); p = resource_alignment_param; + if (pci_has_flag(PCI_PROBE_ONLY)) { + if (*p) + pr_info_once("PCI: resource_alignment ignored with PCI_PROBE_ONLY\n"); + spin_unlock(&resource_alignment_lock); + return 0; + } + while (*p) { count = 0; if (sscanf(p, "%d%n", &align_order, &count) == 1 && @@ -5063,6 +5070,12 @@ void pci_reassigndev_resource_alignment(struct pci_dev *dev) r = &dev->resource[i]; if (!(r->flags & IORESOURCE_MEM)) continue; + if (r->flags & IORESOURCE_PCI_FIXED) { + dev_info(&dev->dev, "No alignment for fixed BAR%d: %pR\n", + i, r); + continue; + } + size = resource_size(r); if (size < align) { size = align; -- 1.7.9.5
[PATCH 1/1] pci: host: pci-layerscape: add missing of_node_put after calling of_parse_phandle
of_node_put needs to be called when the device node which is got from of_parse_phandle has finished using. Cc: Minghuan Lian Cc: Mingkai Hu Cc: Roy Zang Signed-off-by: Peter Chen --- drivers/pci/host/pci-layerscape.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/pci/host/pci-layerscape.c b/drivers/pci/host/pci-layerscape.c index 114ba81..573b996 100644 --- a/drivers/pci/host/pci-layerscape.c +++ b/drivers/pci/host/pci-layerscape.c @@ -173,6 +173,8 @@ static int ls_pcie_msi_host_init(struct pcie_port *pp, return -EINVAL; } + of_node_put(msi_node); + return 0; } -- 1.9.1
Re: [PATCH kernel 05/15] powerpc/iommu: Stop using @current in mm_iommu_xxx
On 12/08/16 12:57, David Gibson wrote: > On Wed, Aug 03, 2016 at 06:40:46PM +1000, Alexey Kardashevskiy wrote: >> In some situations the userspace memory context may live longer than >> the userspace process itself so if we need to do proper memory context >> cleanup, we better cache @mm and use it later when the process is gone >> (@current or @current->mm are NULL). >> >> This changes mm_iommu_xxx API to receive mm_struct instead of using one >> from @current. >> >> This is needed by the following patch to do proper cleanup in time. >> This depends on "powerpc/powernv/ioda: Fix endianness when reading TCEs" >> to do proper cleanup via tce_iommu_clear() patch. >> >> To keep API consistent, this replaces mm_context_t with mm_struct; >> we stick to mm_struct as mm_iommu_adjust_locked_vm() helper needs >> access to &mm->mmap_sem. >> >> This should cause no behavioral change. >> >> Signed-off-by: Alexey Kardashevskiy >> --- >> arch/powerpc/include/asm/mmu_context.h | 20 +++-- >> arch/powerpc/kernel/setup-common.c | 2 +- >> arch/powerpc/mm/mmu_context_book3s64.c | 4 +-- >> arch/powerpc/mm/mmu_context_iommu.c| 54 >> ++ >> drivers/vfio/vfio_iommu_spapr_tce.c| 41 -- >> 5 files changed, 62 insertions(+), 59 deletions(-) >> >> diff --git a/arch/powerpc/include/asm/mmu_context.h >> b/arch/powerpc/include/asm/mmu_context.h >> index 9d2cd0c..b85cc7b 100644 >> --- a/arch/powerpc/include/asm/mmu_context.h >> +++ b/arch/powerpc/include/asm/mmu_context.h >> @@ -18,16 +18,18 @@ extern void destroy_context(struct mm_struct *mm); >> #ifdef CONFIG_SPAPR_TCE_IOMMU >> struct mm_iommu_table_group_mem_t; >> >> -extern bool mm_iommu_preregistered(void); >> -extern long mm_iommu_get(unsigned long ua, unsigned long entries, >> +extern bool mm_iommu_preregistered(struct mm_struct *mm); >> +extern long mm_iommu_get(struct mm_struct *mm, >> +unsigned long ua, unsigned long entries, >> struct mm_iommu_table_group_mem_t **pmem); >> -extern long mm_iommu_put(struct mm_iommu_table_group_mem_t *mem); >> -extern void mm_iommu_init(mm_context_t *ctx); >> -extern void mm_iommu_cleanup(mm_context_t *ctx); >> -extern struct mm_iommu_table_group_mem_t *mm_iommu_lookup(unsigned long ua, >> -unsigned long size); >> -extern struct mm_iommu_table_group_mem_t *mm_iommu_find(unsigned long ua, >> -unsigned long entries); >> +extern long mm_iommu_put(struct mm_struct *mm, >> +struct mm_iommu_table_group_mem_t *mem); >> +extern void mm_iommu_init(struct mm_struct *mm); >> +extern void mm_iommu_cleanup(struct mm_struct *mm); >> +extern struct mm_iommu_table_group_mem_t *mm_iommu_lookup(struct mm_struct >> *mm, >> +unsigned long ua, unsigned long size); >> +extern struct mm_iommu_table_group_mem_t *mm_iommu_find(struct mm_struct >> *mm, >> +unsigned long ua, unsigned long entries); >> extern long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem, >> unsigned long ua, unsigned long *hpa); >> extern long mm_iommu_mapped_inc(struct mm_iommu_table_group_mem_t *mem); >> diff --git a/arch/powerpc/kernel/setup-common.c >> b/arch/powerpc/kernel/setup-common.c >> index 714b4ba..e90b68a 100644 >> --- a/arch/powerpc/kernel/setup-common.c >> +++ b/arch/powerpc/kernel/setup-common.c >> @@ -905,7 +905,7 @@ void __init setup_arch(char **cmdline_p) >> init_mm.context.pte_frag = NULL; >> #endif >> #ifdef CONFIG_SPAPR_TCE_IOMMU >> -mm_iommu_init(&init_mm.context); >> +mm_iommu_init(&init_mm); >> #endif >> irqstack_early_init(); >> exc_lvl_early_init(); >> diff --git a/arch/powerpc/mm/mmu_context_book3s64.c >> b/arch/powerpc/mm/mmu_context_book3s64.c >> index b114f8b..ad82735 100644 >> --- a/arch/powerpc/mm/mmu_context_book3s64.c >> +++ b/arch/powerpc/mm/mmu_context_book3s64.c >> @@ -115,7 +115,7 @@ int init_new_context(struct task_struct *tsk, struct >> mm_struct *mm) >> mm->context.pte_frag = NULL; >> #endif >> #ifdef CONFIG_SPAPR_TCE_IOMMU >> -mm_iommu_init(&mm->context); >> +mm_iommu_init(mm); >> #endif >> return 0; >> } >> @@ -160,7 +160,7 @@ static inline void destroy_pagetable_page(struct >> mm_struct *mm) >> void destroy_context(struct mm_struct *mm) >> { >> #ifdef CONFIG_SPAPR_TCE_IOMMU >> -mm_iommu_cleanup(&mm->context); >> +mm_iommu_cleanup(mm); >> #endif >> >> #ifdef CONFIG_PPC_ICSWX >> diff --git a/arch/powerpc/mm/mmu_context_iommu.c >> b/arch/powerpc/mm/mmu_context_iommu.c >> index da6a216..ee6685b 100644 >> --- a/arch/powerpc/mm/mmu_context_iommu.c >> +++ b/arch/powerpc/mm/mmu_context_iommu.c >> @@ -53,7 +53,7 @@ static long mm_iommu_adjust_locked_vm(struct mm_struct *mm, >> } >> >> pr_debug("[%d] RLIMIT_MEMLOCK HASH64 %c%ld %ld/%ld\n", >> -current->pid, >> +current ? current->pid : 0, >> incr ? '+' : '-', >>
Re: [PATCH kernel 13/15] KVM: PPC: Pass kvm* to kvmppc_find_table()
On Wed, Aug 03, 2016 at 06:40:54PM +1000, Alexey Kardashevskiy wrote: > The guest view TCE tables are per KVM anyway (not per VCPU) so pass kvm* > there. This will be used in the following patches where we will be > attaching VFIO containers to LIOBNs via ioctl() to KVM (rather than > to VCPU). > > Signed-off-by: Alexey Kardashevskiy Reviewed-by: David Gibson > --- > arch/powerpc/include/asm/kvm_ppc.h | 2 +- > arch/powerpc/kvm/book3s_64_vio.c| 7 --- > arch/powerpc/kvm/book3s_64_vio_hv.c | 13 +++-- > 3 files changed, 12 insertions(+), 10 deletions(-) > > diff --git a/arch/powerpc/include/asm/kvm_ppc.h > b/arch/powerpc/include/asm/kvm_ppc.h > index 2544eda..7f1abe9 100644 > --- a/arch/powerpc/include/asm/kvm_ppc.h > +++ b/arch/powerpc/include/asm/kvm_ppc.h > @@ -167,7 +167,7 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu); > extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm, > struct kvm_create_spapr_tce_64 *args); > extern struct kvmppc_spapr_tce_table *kvmppc_find_table( > - struct kvm_vcpu *vcpu, unsigned long liobn); > + struct kvm *kvm, unsigned long liobn); > extern long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt, > unsigned long ioba, unsigned long npages); > extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt, > diff --git a/arch/powerpc/kvm/book3s_64_vio.c > b/arch/powerpc/kvm/book3s_64_vio.c > index c379ff5..15df8ae 100644 > --- a/arch/powerpc/kvm/book3s_64_vio.c > +++ b/arch/powerpc/kvm/book3s_64_vio.c > @@ -212,12 +212,13 @@ fail: > long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn, > unsigned long ioba, unsigned long tce) > { > - struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn); > + struct kvmppc_spapr_tce_table *stt; > long ret; > > /* udbg_printf("H_PUT_TCE(): liobn=0x%lx ioba=0x%lx, tce=0x%lx\n", */ > /* liobn, ioba, tce); */ > > + stt = kvmppc_find_table(vcpu->kvm, liobn); > if (!stt) > return H_TOO_HARD; > > @@ -245,7 +246,7 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu, > u64 __user *tces; > u64 tce; > > - stt = kvmppc_find_table(vcpu, liobn); > + stt = kvmppc_find_table(vcpu->kvm, liobn); > if (!stt) > return H_TOO_HARD; > > @@ -299,7 +300,7 @@ long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu, > struct kvmppc_spapr_tce_table *stt; > long i, ret; > > - stt = kvmppc_find_table(vcpu, liobn); > + stt = kvmppc_find_table(vcpu->kvm, liobn); > if (!stt) > return H_TOO_HARD; > > diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c > b/arch/powerpc/kvm/book3s_64_vio_hv.c > index a3be4bd..8a6834e 100644 > --- a/arch/powerpc/kvm/book3s_64_vio_hv.c > +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c > @@ -49,10 +49,9 @@ > * WARNING: This will be called in real or virtual mode on HV KVM and virtual > * mode on PR KVM > */ > -struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu, > +struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm *kvm, > unsigned long liobn) > { > - struct kvm *kvm = vcpu->kvm; > struct kvmppc_spapr_tce_table *stt; > > list_for_each_entry_lockless(stt, &kvm->arch.spapr_tce_tables, list) > @@ -194,12 +193,13 @@ static struct mm_iommu_table_group_mem_t > *kvmppc_rm_iommu_lookup( > long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn, > unsigned long ioba, unsigned long tce) > { > - struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn); > + struct kvmppc_spapr_tce_table *stt; > long ret; > > /* udbg_printf("H_PUT_TCE(): liobn=0x%lx ioba=0x%lx, tce=0x%lx\n", */ > /* liobn, ioba, tce); */ > > + stt = kvmppc_find_table(vcpu->kvm, liobn); > if (!stt) > return H_TOO_HARD; > > @@ -252,7 +252,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu, > unsigned long tces, entry, ua = 0; > unsigned long *rmap = NULL; > > - stt = kvmppc_find_table(vcpu, liobn); > + stt = kvmppc_find_table(vcpu->kvm, liobn); > if (!stt) > return H_TOO_HARD; > > @@ -335,7 +335,7 @@ long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu, > struct kvmppc_spapr_tce_table *stt; > long i, ret; > > - stt = kvmppc_find_table(vcpu, liobn); > + stt = kvmppc_find_table(vcpu->kvm, liobn); > if (!stt) > return H_TOO_HARD; > > @@ -356,12 +356,13 @@ long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu, > long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn, > unsigned long ioba) > { > - struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn); > + struct kvmppc_spapr_tce_table *stt; > long ret; > unsigned long idx; > struct page *page; > u64
Re: [PATCH kernel 11/15] powerpc/powernv/iommu: Add real mode version of iommu_table_ops::exchange()
On Wed, Aug 03, 2016 at 06:40:52PM +1000, Alexey Kardashevskiy wrote: > In real mode, TCE tables are invalidated using special > cache-inhibited store instructions which are not available in > virtual mode > > This defines and implements exchange_rm() callback. This does not > define set_rm/clear_rm/flush_rm callbacks as there is no user for those - > exchange/exchange_rm are only to be used by KVM for VFIO. > > The exchange_rm callback is defined for IODA1/IODA2 powernv platforms. > > This replaces list_for_each_entry_rcu with its lockless version as > from now on pnv_pci_ioda2_tce_invalidate() can be called in > the real mode too. > > Signed-off-by: Alexey Kardashevskiy > --- > arch/powerpc/include/asm/iommu.h | 7 +++ > arch/powerpc/kernel/iommu.c | 23 +++ > arch/powerpc/platforms/powernv/pci-ioda.c | 26 +- > 3 files changed, 55 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/include/asm/iommu.h > b/arch/powerpc/include/asm/iommu.h > index cd4df44..a13d207 100644 > --- a/arch/powerpc/include/asm/iommu.h > +++ b/arch/powerpc/include/asm/iommu.h > @@ -64,6 +64,11 @@ struct iommu_table_ops { > long index, > unsigned long *hpa, > enum dma_data_direction *direction); > + /* Real mode */ > + int (*exchange_rm)(struct iommu_table *tbl, > + long index, > + unsigned long *hpa, > + enum dma_data_direction *direction); > #endif > void (*clear)(struct iommu_table *tbl, > long index, long npages); > @@ -209,6 +214,8 @@ extern void iommu_del_device(struct device *dev); > extern int __init tce_iommu_bus_notifier_init(void); > extern long iommu_tce_xchg(struct iommu_table *tbl, unsigned long entry, > unsigned long *hpa, enum dma_data_direction *direction); > +extern long iommu_tce_xchg_rm(struct iommu_table *tbl, unsigned long entry, > + unsigned long *hpa, enum dma_data_direction *direction); > #else > static inline void iommu_register_group(struct iommu_table_group > *table_group, > int pci_domain_number, > diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c > index a8f017a..65b2dac 100644 > --- a/arch/powerpc/kernel/iommu.c > +++ b/arch/powerpc/kernel/iommu.c > @@ -1020,6 +1020,29 @@ long iommu_tce_xchg(struct iommu_table *tbl, unsigned > long entry, > } > EXPORT_SYMBOL_GPL(iommu_tce_xchg); > > +long iommu_tce_xchg_rm(struct iommu_table *tbl, unsigned long entry, > + unsigned long *hpa, enum dma_data_direction *direction) > +{ > + long ret; > + > + ret = tbl->it_ops->exchange_rm(tbl, entry, hpa, direction); > + > + if (!ret && ((*direction == DMA_FROM_DEVICE) || > + (*direction == DMA_BIDIRECTIONAL))) { > + struct page *pg = realmode_pfn_to_page(*hpa >> PAGE_SHIFT); > + > + if (likely(pg)) { > + SetPageDirty(pg); > + } else { Isn't there a race here, if someone else updates this TCE entry between your initial exchange and the rollback exchange below? > + tbl->it_ops->exchange_rm(tbl, entry, hpa, direction); > + ret = -EFAULT; > + } > + } > + > + return ret; > +} > +EXPORT_SYMBOL_GPL(iommu_tce_xchg_rm); > + > int iommu_take_ownership(struct iommu_table *tbl) > { > unsigned long flags, i, sz = (tbl->it_size + 7) >> 3; > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c > b/arch/powerpc/platforms/powernv/pci-ioda.c > index c04afd2..a0b5ea6 100644 > --- a/arch/powerpc/platforms/powernv/pci-ioda.c > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c > @@ -1827,6 +1827,17 @@ static int pnv_ioda1_tce_xchg(struct iommu_table *tbl, > long index, > > return ret; > } > + > +static int pnv_ioda1_tce_xchg_rm(struct iommu_table *tbl, long index, > + unsigned long *hpa, enum dma_data_direction *direction) > +{ > + long ret = pnv_tce_xchg(tbl, index, hpa, direction); > + > + if (!ret) > + pnv_pci_p7ioc_tce_invalidate(tbl, index, 1, true); > + > + return ret; > +} > #endif > > static void pnv_ioda1_tce_free(struct iommu_table *tbl, long index, > @@ -1841,6 +1852,7 @@ static struct iommu_table_ops pnv_ioda1_iommu_ops = { > .set = pnv_ioda1_tce_build, > #ifdef CONFIG_IOMMU_API > .exchange = pnv_ioda1_tce_xchg, > + .exchange_rm = pnv_ioda1_tce_xchg_rm, > #endif > .clear = pnv_ioda1_tce_free, > .get = pnv_tce_get, > @@ -1915,7 +1927,7 @@ static void pnv_pci_ioda2_tce_invalidate(struct > iommu_table *tbl, > { > struct iommu_table_group_link *tgl; > > - list_for_each_entry_rcu(tgl, &tbl->it_group_list, next) { > + list_for_each_entry_lockless(tgl, &tbl->it_group_list, next) { So.. IIUC, previously this had a bool rm parameter, bu
Re: [PATCH kernel 12/15] KVM: PPC: Enable IOMMU_API for KVM_BOOK3S_64 permanently
On Wed, Aug 03, 2016 at 06:40:53PM +1000, Alexey Kardashevskiy wrote: > It does not make much sense to have KVM in book3s-64 and > not to have IOMMU bits for PCI pass through support as it costs little > and allows VFIO to function on book3s KVM. > > Having IOMMU_API always enabled makes it unnecessary to have a lot of > "#ifdef IOMMU_API" in arch/powerpc/kvm/book3s_64_vio*. With those > ifdef's we could have only user space emulated devices accelerated > (but not VFIO) which do not seem to be very useful. > > Signed-off-by: Alexey Kardashevskiy > --- > arch/powerpc/kvm/Kconfig | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig > index b7c494b..63b60a8 100644 > --- a/arch/powerpc/kvm/Kconfig > +++ b/arch/powerpc/kvm/Kconfig > @@ -65,6 +65,7 @@ config KVM_BOOK3S_64 > select KVM > select KVM_BOOK3S_PR_POSSIBLE if !KVM_BOOK3S_HV_POSSIBLE > select KVM_VFIO if VFIO > + select SPAPR_TCE_IOMMU if IOMMU_SUPPORT > ---help--- > Support running unmodified book3s_64 and book3s_32 guest kernels > in virtual machines on book3s_64 host processors. I don't quite see how this change accomplishes the stated goal. AFAICT even with this change you can still turn off IOMMU_SUPPORT, which will break the IOMMU for VFIO passthrough, but not IOMMU acceleration for emulated devices (since that requires no interaction with the hardware IOMMU). -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature
Re: [PATCH kernel 09/15] powerpc/mmu: Add real mode support for IOMMU preregistered memory
On Wed, Aug 03, 2016 at 06:40:50PM +1000, Alexey Kardashevskiy wrote: > This makes mm_iommu_lookup() able to work in realmode by replacing > list_for_each_entry_rcu() (which can do debug stuff which can fail in > real mode) with list_for_each_entry_lockless(). > > This adds realmode version of mm_iommu_ua_to_hpa() which adds > explicit vmalloc'd-to-linear address conversion. > Unlike mm_iommu_ua_to_hpa(), mm_iommu_ua_to_hpa_rm() can fail. > > This changes mm_iommu_preregistered() to receive @mm as in real mode > @current does not always have a correct pointer. > > This adds realmode version of mm_iommu_lookup() which receives @mm > (for the same reason as for mm_iommu_preregistered()) and uses > lockless version of list_for_each_entry_rcu(). > > Signed-off-by: Alexey Kardashevskiy Reviewed-by: David Gibson > --- > arch/powerpc/include/asm/mmu_context.h | 4 > arch/powerpc/mm/mmu_context_iommu.c| 39 > ++ > 2 files changed, 43 insertions(+) > > diff --git a/arch/powerpc/include/asm/mmu_context.h > b/arch/powerpc/include/asm/mmu_context.h > index a4c4ed5..939030c 100644 > --- a/arch/powerpc/include/asm/mmu_context.h > +++ b/arch/powerpc/include/asm/mmu_context.h > @@ -27,10 +27,14 @@ extern long mm_iommu_put(struct mm_struct *mm, > extern void mm_iommu_init(struct mm_struct *mm); > extern struct mm_iommu_table_group_mem_t *mm_iommu_lookup(struct mm_struct > *mm, > unsigned long ua, unsigned long size); > +extern struct mm_iommu_table_group_mem_t *mm_iommu_lookup_rm( > + struct mm_struct *mm, unsigned long ua, unsigned long size); > extern struct mm_iommu_table_group_mem_t *mm_iommu_find(struct mm_struct *mm, > unsigned long ua, unsigned long entries); > extern long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem, > unsigned long ua, unsigned long *hpa); > +extern long mm_iommu_ua_to_hpa_rm(struct mm_iommu_table_group_mem_t *mem, > + unsigned long ua, unsigned long *hpa); > extern long mm_iommu_mapped_inc(struct mm_iommu_table_group_mem_t *mem); > extern void mm_iommu_mapped_dec(struct mm_iommu_table_group_mem_t *mem); > #endif > diff --git a/arch/powerpc/mm/mmu_context_iommu.c > b/arch/powerpc/mm/mmu_context_iommu.c > index 10f01fe..36a906c 100644 > --- a/arch/powerpc/mm/mmu_context_iommu.c > +++ b/arch/powerpc/mm/mmu_context_iommu.c > @@ -242,6 +242,25 @@ struct mm_iommu_table_group_mem_t > *mm_iommu_lookup(struct mm_struct *mm, > } > EXPORT_SYMBOL_GPL(mm_iommu_lookup); > > +struct mm_iommu_table_group_mem_t *mm_iommu_lookup_rm(struct mm_struct *mm, > + unsigned long ua, unsigned long size) > +{ > + struct mm_iommu_table_group_mem_t *mem, *ret = NULL; > + > + list_for_each_entry_lockless(mem, &mm->context.iommu_group_mem_list, > + next) { > + if ((mem->ua <= ua) && > + (ua + size <= mem->ua + > + (mem->entries << PAGE_SHIFT))) { > + ret = mem; > + break; > + } > + } > + > + return ret; > +} > +EXPORT_SYMBOL_GPL(mm_iommu_lookup_rm); > + > struct mm_iommu_table_group_mem_t *mm_iommu_find(struct mm_struct *mm, > unsigned long ua, unsigned long entries) > { > @@ -273,6 +292,26 @@ long mm_iommu_ua_to_hpa(struct > mm_iommu_table_group_mem_t *mem, > } > EXPORT_SYMBOL_GPL(mm_iommu_ua_to_hpa); > > +long mm_iommu_ua_to_hpa_rm(struct mm_iommu_table_group_mem_t *mem, > + unsigned long ua, unsigned long *hpa) > +{ > + const long entry = (ua - mem->ua) >> PAGE_SHIFT; > + void *va = &mem->hpas[entry]; > + unsigned long *ra; > + > + if (entry >= mem->entries) > + return -EFAULT; > + > + ra = (void *) vmalloc_to_phys(va); > + if (!ra) > + return -EFAULT; > + > + *hpa = *ra | (ua & ~PAGE_MASK); > + > + return 0; > +} > +EXPORT_SYMBOL_GPL(mm_iommu_ua_to_hpa_rm); > + > long mm_iommu_mapped_inc(struct mm_iommu_table_group_mem_t *mem) > { > if (atomic64_inc_not_zero(&mem->mapped)) -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature
Re: [PATCH kernel 10/15] KVM: PPC: Use preregistered memory API to access TCE list
On Wed, Aug 03, 2016 at 06:40:51PM +1000, Alexey Kardashevskiy wrote: > VFIO on sPAPR already implements guest memory pre-registration > when the entire guest RAM gets pinned. This can be used to translate > the physical address of a guest page containing the TCE list > from H_PUT_TCE_INDIRECT. > > This makes use of the pre-registrered memory API to access TCE list > pages in order to avoid unnecessary locking on the KVM memory > reverse map as we know that all of guest memory is pinned and > we have a flat array mapping GPA to HPA which makes it simpler and > quicker to index into that array (even with looking up the > kernel page tables in vmalloc_to_phys) than it is to find the memslot, > lock the rmap entry, look up the user page tables, and unlock the rmap > entry. Note that the rmap pointer is initialized to NULL where declared > (not in this patch). > > Signed-off-by: Alexey Kardashevskiy > --- > Changes: > v2: > * updated the commit log with Paul's comment > --- > arch/powerpc/kvm/book3s_64_vio_hv.c | 65 > - > 1 file changed, 49 insertions(+), 16 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c > b/arch/powerpc/kvm/book3s_64_vio_hv.c > index d461c44..a3be4bd 100644 > --- a/arch/powerpc/kvm/book3s_64_vio_hv.c > +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c > @@ -180,6 +180,17 @@ long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa, > EXPORT_SYMBOL_GPL(kvmppc_gpa_to_ua); > > #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE > +static inline bool kvmppc_preregistered(struct kvm_vcpu *vcpu) > +{ > + return mm_iommu_preregistered(vcpu->kvm->mm); > +} > + > +static struct mm_iommu_table_group_mem_t *kvmppc_rm_iommu_lookup( > + struct kvm_vcpu *vcpu, unsigned long ua, unsigned long size) > +{ > + return mm_iommu_lookup_rm(vcpu->kvm->mm, ua, size); > +} > + > long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn, > unsigned long ioba, unsigned long tce) > { > @@ -260,23 +271,44 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu, > if (ret != H_SUCCESS) > return ret; > > - if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, &rmap)) > - return H_TOO_HARD; > + if (kvmppc_preregistered(vcpu)) { > + /* > + * We get here if guest memory was pre-registered which > + * is normally VFIO case and gpa->hpa translation does not > + * depend on hpt. > + */ > + struct mm_iommu_table_group_mem_t *mem; > > - rmap = (void *) vmalloc_to_phys(rmap); > + if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, NULL)) > + return H_TOO_HARD; Wouldn't it be clearer to put the gpa->ua lookup outside the if? You'd have to throw away the rmap you get in the prereg case, but it shouldn't be harmful, should it? > > - /* > - * Synchronize with the MMU notifier callbacks in > - * book3s_64_mmu_hv.c (kvm_unmap_hva_hv etc.). > - * While we have the rmap lock, code running on other CPUs > - * cannot finish unmapping the host real page that backs > - * this guest real page, so we are OK to access the host > - * real page. > - */ > - lock_rmap(rmap); > - if (kvmppc_rm_ua_to_hpa(vcpu, ua, &tces)) { > - ret = H_TOO_HARD; > - goto unlock_exit; > + mem = kvmppc_rm_iommu_lookup(vcpu, ua, IOMMU_PAGE_SIZE_4K); > + if (!mem || mm_iommu_ua_to_hpa_rm(mem, ua, &tces)) > + return H_TOO_HARD; This doesn't fall back to the rmap approach if it can't locate the page in question in the prereg map. IIUC that means that this will now work less well than previously if you have a userspace which preregisters some memory, but not all of guest RAM. I'm not sure if we care about that, since no such userspace currently exists. > + } else { > + /* > + * This is emulated devices case. This is a bit misleading - this case will only be triggered if there are *no* prereg-ed VFIO devices. The case above can be used even for emulated devices, if there happen to also be VFIO devices present which have preregistered guest RAM. > + * We do not require memory to be preregistered in this case > + * so lock rmap and do __find_linux_pte_or_hugepte(). > + */ > + if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, &rmap)) > + return H_TOO_HARD; > + > + rmap = (void *) vmalloc_to_phys(rmap); > + > + /* > + * Synchronize with the MMU notifier callbacks in > + * book3s_64_mmu_hv.c (kvm_unmap_hva_hv etc.). > + * While we have the rmap lock, code running on other CPUs > + * cannot finish unmapping the host real page that backs > + * this guest real page, so we are OK to access the host > + * real page. > + */
Re: [PATCH] powerpc: populate the default bus with machine_arch_initcall
Kevin Hao writes: > With the commit 44a7185c2ae6 ("of/platform: Add common method to > populate default bus"), a default function is introduced to populate > the default bus and this function is invoked at the arch_initcall_sync > level. This will override the arch specific population of default bus > which run at a lower level than arch_initcall_sync. Since not all > powerpc specific buses are added to the of_default_bus_match_table[], > this causes some powerpc specific bus are not probed. Fix this by > using a more preceding initcall. > > Signed-off-by: Kevin Hao > --- > Of course we can adjust the powerpc arch codes to use the > of_platform_default_populate_init(), but it has high risk to break > other boards given the complicated powerpc specific buses. So I would > like just to fix the broken boards in the current release, and cook > a patch to change to of_platform_default_populate_init() for linux-next. > > Only boot test on a mpc8315erdb board. > > arch/powerpc/platforms/40x/ep405.c | 2 +- > arch/powerpc/platforms/40x/ppc40x_simple.c | 2 +- > arch/powerpc/platforms/40x/virtex.c | 2 +- > arch/powerpc/platforms/40x/walnut.c | 2 +- > arch/powerpc/platforms/44x/canyonlands.c | 2 +- > arch/powerpc/platforms/44x/ebony.c | 2 +- > arch/powerpc/platforms/44x/iss4xx.c | 2 +- > arch/powerpc/platforms/44x/ppc44x_simple.c | 2 +- > arch/powerpc/platforms/44x/ppc476.c | 2 +- > arch/powerpc/platforms/44x/sam440ep.c| 2 +- > arch/powerpc/platforms/44x/virtex.c | 2 +- > arch/powerpc/platforms/44x/warp.c| 2 +- > arch/powerpc/platforms/82xx/ep8248e.c| 2 +- > arch/powerpc/platforms/82xx/km82xx.c | 2 +- > arch/powerpc/platforms/82xx/mpc8272_ads.c| 2 +- > arch/powerpc/platforms/82xx/pq2fads.c| 2 +- > arch/powerpc/platforms/83xx/mpc831x_rdb.c| 2 +- > arch/powerpc/platforms/83xx/mpc834x_itx.c| 2 +- > arch/powerpc/platforms/85xx/ppa8548.c| 2 +- > arch/powerpc/platforms/8xx/adder875.c| 2 +- > arch/powerpc/platforms/8xx/ep88xc.c | 2 +- > arch/powerpc/platforms/8xx/mpc86xads_setup.c | 2 +- > arch/powerpc/platforms/8xx/mpc885ads_setup.c | 2 +- > arch/powerpc/platforms/8xx/tqm8xx_setup.c| 2 +- > arch/powerpc/platforms/cell/setup.c | 2 +- > arch/powerpc/platforms/embedded6xx/gamecube.c| 2 +- > arch/powerpc/platforms/embedded6xx/linkstation.c | 2 +- > arch/powerpc/platforms/embedded6xx/mvme5100.c| 2 +- > arch/powerpc/platforms/embedded6xx/storcenter.c | 2 +- > arch/powerpc/platforms/embedded6xx/wii.c | 2 +- > arch/powerpc/platforms/pasemi/setup.c| 2 +- That's not a very minimal fix. Every one of those initcall changes could be introducing a bug, by changing the order vs other init calls. Can we just go back to the old behaviour on ppc? cheers
Re: [PATCH kernel 08/15] powerpc/vfio_spapr_tce: Add reference counting to iommu_table
On Wed, Aug 03, 2016 at 06:40:49PM +1000, Alexey Kardashevskiy wrote: > So far iommu_table obejcts were only used in virtual mode and had > a single owner. We are going to change by implementing in-kernel > acceleration of DMA mapping requests, including real mode. > > This adds a kref to iommu_table and defines new helpers to update it. > This replaces iommu_free_table() with iommu_table_put() and makes > iommu_free_table() static. iommu_table_get() is not used in this patch > but will be in the following one. > > While we are here, this removes @node_name parameter as it has never been > really useful on powernv and carrying it for the pseries platform code to > iommu_free_table() seems to be quite useless too. > > This should cause no behavioral change. > > Signed-off-by: Alexey Kardashevskiy Reviewed-by: David Gibson > --- > arch/powerpc/include/asm/iommu.h | 5 +++-- > arch/powerpc/kernel/iommu.c | 24 +++- > arch/powerpc/kernel/vio.c | 2 +- > arch/powerpc/platforms/powernv/pci-ioda.c | 14 +++--- > arch/powerpc/platforms/powernv/pci.c | 1 + > arch/powerpc/platforms/pseries/iommu.c| 3 ++- > drivers/vfio/vfio_iommu_spapr_tce.c | 2 +- > 7 files changed, 34 insertions(+), 17 deletions(-) > > diff --git a/arch/powerpc/include/asm/iommu.h > b/arch/powerpc/include/asm/iommu.h > index f49a72a..cd4df44 100644 > --- a/arch/powerpc/include/asm/iommu.h > +++ b/arch/powerpc/include/asm/iommu.h > @@ -114,6 +114,7 @@ struct iommu_table { > struct list_head it_group_list;/* List of iommu_table_group_link */ > unsigned long *it_userspace; /* userspace view of the table */ > struct iommu_table_ops *it_ops; > + struct krefit_kref; > }; > > #define IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry) \ > @@ -146,8 +147,8 @@ static inline void *get_iommu_table_base(struct device > *dev) > > extern int dma_iommu_dma_supported(struct device *dev, u64 mask); > > -/* Frees table for an individual device node */ > -extern void iommu_free_table(struct iommu_table *tbl, const char *node_name); > +extern void iommu_table_get(struct iommu_table *tbl); > +extern void iommu_table_put(struct iommu_table *tbl); > > /* Initializes an iommu_table based in values set in the passed-in > * structure > diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c > index 13263b0..a8f017a 100644 > --- a/arch/powerpc/kernel/iommu.c > +++ b/arch/powerpc/kernel/iommu.c > @@ -710,13 +710,13 @@ struct iommu_table *iommu_init_table(struct iommu_table > *tbl, int nid) > return tbl; > } > > -void iommu_free_table(struct iommu_table *tbl, const char *node_name) > +static void iommu_table_free(struct kref *kref) > { > unsigned long bitmap_sz; > unsigned int order; > + struct iommu_table *tbl; > > - if (!tbl) > - return; > + tbl = container_of(kref, struct iommu_table, it_kref); > > if (tbl->it_ops->free) > tbl->it_ops->free(tbl); > @@ -735,7 +735,7 @@ void iommu_free_table(struct iommu_table *tbl, const char > *node_name) > > /* verify that table contains no entries */ > if (!bitmap_empty(tbl->it_map, tbl->it_size)) > - pr_warn("%s: Unexpected TCEs for %s\n", __func__, node_name); > + pr_warn("%s: Unexpected TCEs\n", __func__); > > /* calculate bitmap size in bytes */ > bitmap_sz = BITS_TO_LONGS(tbl->it_size) * sizeof(unsigned long); > @@ -747,7 +747,21 @@ void iommu_free_table(struct iommu_table *tbl, const > char *node_name) > /* free table */ > kfree(tbl); > } > -EXPORT_SYMBOL_GPL(iommu_free_table); > + > +void iommu_table_get(struct iommu_table *tbl) > +{ > + kref_get(&tbl->it_kref); > +} > +EXPORT_SYMBOL_GPL(iommu_table_get); > + > +void iommu_table_put(struct iommu_table *tbl) > +{ > + if (!tbl) > + return; > + > + kref_put(&tbl->it_kref, iommu_table_free); > +} > +EXPORT_SYMBOL_GPL(iommu_table_put); > > /* Creates TCEs for a user provided buffer. The user buffer must be > * contiguous real kernel storage (not vmalloc). The address passed here > diff --git a/arch/powerpc/kernel/vio.c b/arch/powerpc/kernel/vio.c > index 8d7358f..188f452 100644 > --- a/arch/powerpc/kernel/vio.c > +++ b/arch/powerpc/kernel/vio.c > @@ -1318,7 +1318,7 @@ static void vio_dev_release(struct device *dev) > struct iommu_table *tbl = get_iommu_table_base(dev); > > if (tbl) > - iommu_free_table(tbl, of_node_full_name(dev->of_node)); > + iommu_table_put(tbl); > of_node_put(dev->of_node); > kfree(to_vio_dev(dev)); > } > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c > b/arch/powerpc/platforms/powernv/pci-ioda.c > index 74ab8382..c04afd2 100644 > --- a/arch/powerpc/platforms/powernv/pci-ioda.c > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c > @@ -1394,7 +1394,7 @@ static void pnv_pci_ioda2_release_dma_pe(struct
Re: [PATCH kernel 06/15] powerpc/mm/iommu: Put pages on process exit
On Wed, Aug 03, 2016 at 06:40:47PM +1000, Alexey Kardashevskiy wrote: > At the moment VFIO IOMMU SPAPR v2 driver pins all guest RAM pages when > the userspace starts using VFIO. This doesn't sound accurate. Isn't it userspace that decides what gets pinned, not the VFIO driver? >When the userspace process finishes, > all the pinned pages need to be put; this is done as a part of > the userspace memory context (MM) destruction which happens on > the very last mmdrop(). > > This approach has a problem that a MM of the userspace process > may live longer than the userspace process itself as kernel threads > use userspace process MMs which was runnning on a CPU where > the kernel thread was scheduled to. If this happened, the MM remains > referenced until this exact kernel thread wakes up again > and releases the very last reference to the MM, on an idle system this > can take even hours. > > This references and caches MM once per container and adds tracking > how many times each preregistered area was registered in > a specific container. This way we do not depend on @current pointing to > a valid task descriptor. The handling of @current and refcounting the mm sounds more like its describing the previous patch. THe description of counting how many times each prereg area is registered doesn't seem accurate, since you block multiple registration with an EBUSY. Or else it's describing the 'used' counter in the lower-level mm_iommu_table_group_mem_t tracking, rather than anything changed by this patch. > This changes the userspace interface to return EBUSY if memory is > already registered (mm_iommu_get() used to increment the counter); > however it should not have any practical effect as the only > userspace tool available now does register memory area once per > container anyway. > > As tce_iommu_register_pages/tce_iommu_unregister_pages are called > under container->lock, this does not need additional locking. > > Signed-off-by: Alexey Kardashevskiy > > # Conflicts: > # arch/powerpc/include/asm/mmu_context.h > # arch/powerpc/mm/mmu_context_book3s64.c > # arch/powerpc/mm/mmu_context_iommu.c Looks like some lines to be cleaned up in the message. > --- > arch/powerpc/include/asm/mmu_context.h | 1 - > arch/powerpc/mm/mmu_context_book3s64.c | 4 --- > arch/powerpc/mm/mmu_context_iommu.c| 11 --- > drivers/vfio/vfio_iommu_spapr_tce.c| 52 > +- > 4 files changed, 51 insertions(+), 17 deletions(-) > > diff --git a/arch/powerpc/include/asm/mmu_context.h > b/arch/powerpc/include/asm/mmu_context.h > index b85cc7b..a4c4ed5 100644 > --- a/arch/powerpc/include/asm/mmu_context.h > +++ b/arch/powerpc/include/asm/mmu_context.h > @@ -25,7 +25,6 @@ extern long mm_iommu_get(struct mm_struct *mm, > extern long mm_iommu_put(struct mm_struct *mm, > struct mm_iommu_table_group_mem_t *mem); > extern void mm_iommu_init(struct mm_struct *mm); > -extern void mm_iommu_cleanup(struct mm_struct *mm); > extern struct mm_iommu_table_group_mem_t *mm_iommu_lookup(struct mm_struct > *mm, > unsigned long ua, unsigned long size); > extern struct mm_iommu_table_group_mem_t *mm_iommu_find(struct mm_struct *mm, > diff --git a/arch/powerpc/mm/mmu_context_book3s64.c > b/arch/powerpc/mm/mmu_context_book3s64.c > index ad82735..1a07969 100644 > --- a/arch/powerpc/mm/mmu_context_book3s64.c > +++ b/arch/powerpc/mm/mmu_context_book3s64.c > @@ -159,10 +159,6 @@ static inline void destroy_pagetable_page(struct > mm_struct *mm) > > void destroy_context(struct mm_struct *mm) > { > -#ifdef CONFIG_SPAPR_TCE_IOMMU > - mm_iommu_cleanup(mm); > -#endif > - > #ifdef CONFIG_PPC_ICSWX > drop_cop(mm->context.acop, mm); > kfree(mm->context.cop_lockp); > diff --git a/arch/powerpc/mm/mmu_context_iommu.c > b/arch/powerpc/mm/mmu_context_iommu.c > index ee6685b..10f01fe 100644 > --- a/arch/powerpc/mm/mmu_context_iommu.c > +++ b/arch/powerpc/mm/mmu_context_iommu.c > @@ -293,14 +293,3 @@ void mm_iommu_init(struct mm_struct *mm) > { > INIT_LIST_HEAD_RCU(&mm->context.iommu_group_mem_list); > } > - > -void mm_iommu_cleanup(struct mm_struct *mm) > -{ > - struct mm_iommu_table_group_mem_t *mem, *tmp; > - > - list_for_each_entry_safe(mem, tmp, &mm->context.iommu_group_mem_list, > - next) { > - list_del_rcu(&mem->next); > - mm_iommu_do_free(mem); > - } > -} > diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c > b/drivers/vfio/vfio_iommu_spapr_tce.c > index 9752e77..40e71a0 100644 > --- a/drivers/vfio/vfio_iommu_spapr_tce.c > +++ b/drivers/vfio/vfio_iommu_spapr_tce.c > @@ -89,6 +89,15 @@ struct tce_iommu_group { > }; > > /* > + * A container needs to remember which preregistered areas and how many times > + * it has referenced to do proper cleanup at the userspace process exit. > + */ > +struct tce_iommu_prereg { > + struct list_head next; > + struct mm_iommu_table_group_mem_t *mem
Re: [PATCH kernel 07/15] powerpc/iommu: Cleanup iommu_table disposal
On Wed, Aug 03, 2016 at 06:40:48PM +1000, Alexey Kardashevskiy wrote: > At the moment iommu_table could be disposed by either calling > iommu_table_free() directly or it_ops::free() which only implementation > for IODA2 calls iommu_table_free() anyway. > > As we are going to have reference counting on tables, we need an unified > way of disposing tables. > > This moves it_ops::free() call into iommu_free_table() and makes use > of the latter everywhere. The free() callback now handles only > platform-specific data. > > This should cause no behavioral change. > > Signed-off-by: Alexey Kardashevskiy Reviewed-by: David Gibson > --- > arch/powerpc/kernel/iommu.c | 4 > arch/powerpc/platforms/powernv/pci-ioda.c | 6 ++ > drivers/vfio/vfio_iommu_spapr_tce.c | 2 +- > 3 files changed, 7 insertions(+), 5 deletions(-) > > diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c > index a8e3490..13263b0 100644 > --- a/arch/powerpc/kernel/iommu.c > +++ b/arch/powerpc/kernel/iommu.c > @@ -718,6 +718,9 @@ void iommu_free_table(struct iommu_table *tbl, const char > *node_name) > if (!tbl) > return; > > + if (tbl->it_ops->free) > + tbl->it_ops->free(tbl); > + > if (!tbl->it_map) { > kfree(tbl); > return; > @@ -744,6 +747,7 @@ void iommu_free_table(struct iommu_table *tbl, const char > *node_name) > /* free table */ > kfree(tbl); > } > +EXPORT_SYMBOL_GPL(iommu_free_table); > > /* Creates TCEs for a user provided buffer. The user buffer must be > * contiguous real kernel storage (not vmalloc). The address passed here > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c > b/arch/powerpc/platforms/powernv/pci-ioda.c > index 59c7e7d..74ab8382 100644 > --- a/arch/powerpc/platforms/powernv/pci-ioda.c > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c > @@ -1394,7 +1394,6 @@ static void pnv_pci_ioda2_release_dma_pe(struct pci_dev > *dev, struct pnv_ioda_pe > iommu_group_put(pe->table_group.group); > BUG_ON(pe->table_group.group); > } > - pnv_pci_ioda2_table_free_pages(tbl); > iommu_free_table(tbl, of_node_full_name(dev->dev.of_node)); > } > > @@ -1987,7 +1986,6 @@ static void pnv_ioda2_tce_free(struct iommu_table *tbl, > long index, > static void pnv_ioda2_table_free(struct iommu_table *tbl) > { > pnv_pci_ioda2_table_free_pages(tbl); > - iommu_free_table(tbl, "pnv"); > } > > static struct iommu_table_ops pnv_ioda2_iommu_ops = { > @@ -2313,7 +2311,7 @@ static long pnv_pci_ioda2_setup_default_config(struct > pnv_ioda_pe *pe) > if (rc) { > pe_err(pe, "Failed to configure 32-bit TCE table, err %ld\n", > rc); > - pnv_ioda2_table_free(tbl); > + iommu_free_table(tbl, ""); > return rc; > } > > @@ -2399,7 +2397,7 @@ static void pnv_ioda2_take_ownership(struct > iommu_table_group *table_group) > > pnv_pci_ioda2_set_bypass(pe, false); > pnv_pci_ioda2_unset_window(&pe->table_group, 0); > - pnv_ioda2_table_free(tbl); > + iommu_free_table(tbl, "pnv"); > } > > static void pnv_ioda2_release_ownership(struct iommu_table_group > *table_group) > diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c > b/drivers/vfio/vfio_iommu_spapr_tce.c > index 40e71a0..79f26c7 100644 > --- a/drivers/vfio/vfio_iommu_spapr_tce.c > +++ b/drivers/vfio/vfio_iommu_spapr_tce.c > @@ -660,7 +660,7 @@ static void tce_iommu_free_table(struct iommu_table *tbl) > unsigned long pages = tbl->it_allocated_size >> PAGE_SHIFT; > > tce_iommu_userspace_view_free(tbl); > - tbl->it_ops->free(tbl); > + iommu_free_table(tbl, ""); > decrement_locked_vm(pages); > } > -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature
Re: [PATCH kernel 05/15] powerpc/iommu: Stop using @current in mm_iommu_xxx
On Wed, Aug 03, 2016 at 06:40:46PM +1000, Alexey Kardashevskiy wrote: > In some situations the userspace memory context may live longer than > the userspace process itself so if we need to do proper memory context > cleanup, we better cache @mm and use it later when the process is gone > (@current or @current->mm are NULL). > > This changes mm_iommu_xxx API to receive mm_struct instead of using one > from @current. > > This is needed by the following patch to do proper cleanup in time. > This depends on "powerpc/powernv/ioda: Fix endianness when reading TCEs" > to do proper cleanup via tce_iommu_clear() patch. > > To keep API consistent, this replaces mm_context_t with mm_struct; > we stick to mm_struct as mm_iommu_adjust_locked_vm() helper needs > access to &mm->mmap_sem. > > This should cause no behavioral change. > > Signed-off-by: Alexey Kardashevskiy > --- > arch/powerpc/include/asm/mmu_context.h | 20 +++-- > arch/powerpc/kernel/setup-common.c | 2 +- > arch/powerpc/mm/mmu_context_book3s64.c | 4 +-- > arch/powerpc/mm/mmu_context_iommu.c| 54 > ++ > drivers/vfio/vfio_iommu_spapr_tce.c| 41 -- > 5 files changed, 62 insertions(+), 59 deletions(-) > > diff --git a/arch/powerpc/include/asm/mmu_context.h > b/arch/powerpc/include/asm/mmu_context.h > index 9d2cd0c..b85cc7b 100644 > --- a/arch/powerpc/include/asm/mmu_context.h > +++ b/arch/powerpc/include/asm/mmu_context.h > @@ -18,16 +18,18 @@ extern void destroy_context(struct mm_struct *mm); > #ifdef CONFIG_SPAPR_TCE_IOMMU > struct mm_iommu_table_group_mem_t; > > -extern bool mm_iommu_preregistered(void); > -extern long mm_iommu_get(unsigned long ua, unsigned long entries, > +extern bool mm_iommu_preregistered(struct mm_struct *mm); > +extern long mm_iommu_get(struct mm_struct *mm, > + unsigned long ua, unsigned long entries, > struct mm_iommu_table_group_mem_t **pmem); > -extern long mm_iommu_put(struct mm_iommu_table_group_mem_t *mem); > -extern void mm_iommu_init(mm_context_t *ctx); > -extern void mm_iommu_cleanup(mm_context_t *ctx); > -extern struct mm_iommu_table_group_mem_t *mm_iommu_lookup(unsigned long ua, > - unsigned long size); > -extern struct mm_iommu_table_group_mem_t *mm_iommu_find(unsigned long ua, > - unsigned long entries); > +extern long mm_iommu_put(struct mm_struct *mm, > + struct mm_iommu_table_group_mem_t *mem); > +extern void mm_iommu_init(struct mm_struct *mm); > +extern void mm_iommu_cleanup(struct mm_struct *mm); > +extern struct mm_iommu_table_group_mem_t *mm_iommu_lookup(struct mm_struct > *mm, > + unsigned long ua, unsigned long size); > +extern struct mm_iommu_table_group_mem_t *mm_iommu_find(struct mm_struct *mm, > + unsigned long ua, unsigned long entries); > extern long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem, > unsigned long ua, unsigned long *hpa); > extern long mm_iommu_mapped_inc(struct mm_iommu_table_group_mem_t *mem); > diff --git a/arch/powerpc/kernel/setup-common.c > b/arch/powerpc/kernel/setup-common.c > index 714b4ba..e90b68a 100644 > --- a/arch/powerpc/kernel/setup-common.c > +++ b/arch/powerpc/kernel/setup-common.c > @@ -905,7 +905,7 @@ void __init setup_arch(char **cmdline_p) > init_mm.context.pte_frag = NULL; > #endif > #ifdef CONFIG_SPAPR_TCE_IOMMU > - mm_iommu_init(&init_mm.context); > + mm_iommu_init(&init_mm); > #endif > irqstack_early_init(); > exc_lvl_early_init(); > diff --git a/arch/powerpc/mm/mmu_context_book3s64.c > b/arch/powerpc/mm/mmu_context_book3s64.c > index b114f8b..ad82735 100644 > --- a/arch/powerpc/mm/mmu_context_book3s64.c > +++ b/arch/powerpc/mm/mmu_context_book3s64.c > @@ -115,7 +115,7 @@ int init_new_context(struct task_struct *tsk, struct > mm_struct *mm) > mm->context.pte_frag = NULL; > #endif > #ifdef CONFIG_SPAPR_TCE_IOMMU > - mm_iommu_init(&mm->context); > + mm_iommu_init(mm); > #endif > return 0; > } > @@ -160,7 +160,7 @@ static inline void destroy_pagetable_page(struct > mm_struct *mm) > void destroy_context(struct mm_struct *mm) > { > #ifdef CONFIG_SPAPR_TCE_IOMMU > - mm_iommu_cleanup(&mm->context); > + mm_iommu_cleanup(mm); > #endif > > #ifdef CONFIG_PPC_ICSWX > diff --git a/arch/powerpc/mm/mmu_context_iommu.c > b/arch/powerpc/mm/mmu_context_iommu.c > index da6a216..ee6685b 100644 > --- a/arch/powerpc/mm/mmu_context_iommu.c > +++ b/arch/powerpc/mm/mmu_context_iommu.c > @@ -53,7 +53,7 @@ static long mm_iommu_adjust_locked_vm(struct mm_struct *mm, > } > > pr_debug("[%d] RLIMIT_MEMLOCK HASH64 %c%ld %ld/%ld\n", > - current->pid, > + current ? current->pid : 0, > incr ? '+' : '-', > npages << PAGE_SHIFT, > mm->locked_vm << PAGE_SHIFT, > @@ -63,28 +63,22 @@ static long mm_iommu_adjus
Re: [PATCH 0/2] ibmvfc: FC-TAPE Support
> "Tyrel" == Tyrel Datwyler writes: Tyrel> On 08/03/2016 02:36 PM, Tyrel Datwyler wrote: >> This patchset introduces optional FC-TAPE/FC Class 3 Error Recovery >> to the ibmvfc client driver. >> >> Tyrel Datwyler (2): ibmvfc: Set READ FCP_XFER_READY DISABLED bit in >> PRLI ibmvfc: add FC Class 3 Error Recovery support >> >> drivers/scsi/ibmvscsi/ibmvfc.c | 11 +++ >> drivers/scsi/ibmvscsi/ibmvfc.h | 1 + 2 files changed, 12 >> insertions(+) >> Tyrel> ping? -ENOREVIEWS -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH v2 02/20] powerpc: Always restore FPU/VEC/VSX if hardware transactional memory in use
Hi Cyril, [auto build test ERROR on powerpc/next] [also build test ERROR on v4.8-rc1 next-20160811] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Cyril-Bur/Consistent-TM-structures/20160812-075557 base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next config: powerpc-sbc834x_defconfig (attached as .config) compiler: powerpc-linux-gnu-gcc (Debian 5.4.0-6) 5.4.0 20160609 reproduce: wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=powerpc All error/warnings (new ones prefixed by >>): In file included from arch/powerpc/include/asm/processor.h:13:0, from arch/powerpc/include/asm/thread_info.h:33, from include/linux/thread_info.h:54, from include/asm-generic/preempt.h:4, from ./arch/powerpc/include/generated/asm/preempt.h:1, from include/linux/preempt.h:59, from include/linux/spinlock.h:50, from include/linux/seqlock.h:35, from include/linux/time.h:5, from include/uapi/linux/timex.h:56, from include/linux/timex.h:56, from include/linux/sched.h:19, from arch/powerpc/kernel/process.c:18: arch/powerpc/kernel/process.c: In function 'restore_fp': >> arch/powerpc/include/asm/reg.h:64:23: error: left shift count >= width of >> type [-Werror=shift-count-overflow] #define __MASK(X) (1UL<<(X)) ^ >> arch/powerpc/include/asm/reg.h:116:18: note: in expansion of macro '__MASK' #define MSR_TS_T __MASK(MSR_TS_T_LG) /* Transaction Transactional */ ^ >> arch/powerpc/include/asm/reg.h:117:22: note: in expansion of macro 'MSR_TS_T' #define MSR_TS_MASK (MSR_TS_T | MSR_TS_S) /* Transaction State bits */ ^ >> arch/powerpc/include/asm/reg.h:118:34: note: in expansion of macro >> 'MSR_TS_MASK' #define MSR_TM_ACTIVE(x) (((x) & MSR_TS_MASK) != 0) /* Transaction active? */ ^ >> arch/powerpc/kernel/process.c:211:29: note: in expansion of macro >> 'MSR_TM_ACTIVE' if (tsk->thread.load_fp || MSR_TM_ACTIVE(tsk->thread.regs->msr)) { ^ >> arch/powerpc/include/asm/reg.h:64:23: error: left shift count >= width of >> type [-Werror=shift-count-overflow] #define __MASK(X) (1UL<<(X)) ^ arch/powerpc/include/asm/reg.h:115:18: note: in expansion of macro '__MASK' #define MSR_TS_S __MASK(MSR_TS_S_LG) /* Transaction Suspended */ ^ >> arch/powerpc/include/asm/reg.h:117:33: note: in expansion of macro 'MSR_TS_S' #define MSR_TS_MASK (MSR_TS_T | MSR_TS_S) /* Transaction State bits */ ^ >> arch/powerpc/include/asm/reg.h:118:34: note: in expansion of macro >> 'MSR_TS_MASK' #define MSR_TM_ACTIVE(x) (((x) & MSR_TS_MASK) != 0) /* Transaction active? */ ^ >> arch/powerpc/kernel/process.c:211:29: note: in expansion of macro >> 'MSR_TM_ACTIVE' if (tsk->thread.load_fp || MSR_TM_ACTIVE(tsk->thread.regs->msr)) { ^ arch/powerpc/kernel/process.c: In function 'restore_math': >> arch/powerpc/include/asm/reg.h:64:23: error: left shift count >= width of >> type [-Werror=shift-count-overflow] #define __MASK(X) (1UL<<(X)) ^ >> arch/powerpc/include/asm/reg.h:116:18: note: in expansion of macro '__MASK' #define MSR_TS_T __MASK(MSR_TS_T_LG) /* Transaction Transactional */ ^ >> arch/powerpc/include/asm/reg.h:117:22: note: in expansion of macro 'MSR_TS_T' #define MSR_TS_MASK (MSR_TS_T | MSR_TS_S) /* Transaction State bits */ ^ >> arch/powerpc/include/asm/reg.h:118:34: note: in expansion of macro >> 'MSR_TS_MASK' #define MSR_TM_ACTIVE(x) (((x) & MSR_TS_MASK) != 0) /* Transaction active? */ ^ arch/powerpc/kernel/process.c:468:7: note: in expansion of macro 'MSR_TM_ACTIVE' if (!MSR_TM_ACTIVE(regs->msr) && ^ >> arch/powerpc/include/asm/reg.h:64:23: error: left shift count >= width of >> type [-Werror=shift-count-overflow] #define __MASK(X) (1UL<&l
Re: [PATCH] powerpc: populate the default bus with machine_arch_initcall
On Thu, Aug 11, 2016 at 08:17:52AM -0500, Rob Herring wrote: > On Thu, Aug 11, 2016 at 6:09 AM, Kevin Hao wrote: > > With the commit 44a7185c2ae6 ("of/platform: Add common method to > > populate default bus"), a default function is introduced to populate > > the default bus and this function is invoked at the arch_initcall_sync > > level. This will override the arch specific population of default bus > > which run at a lower level than arch_initcall_sync. Since not all > > powerpc specific buses are added to the of_default_bus_match_table[], > > this causes some powerpc specific bus are not probed. Fix this by > > using a more preceding initcall. > > > > Signed-off-by: Kevin Hao > > --- > > Of course we can adjust the powerpc arch codes to use the > > of_platform_default_populate_init(), but it has high risk to break > > other boards given the complicated powerpc specific buses. So I would > > like just to fix the broken boards in the current release, and cook > > a patch to change to of_platform_default_populate_init() for linux-next. > > The patch that broke things was sitting in -next for some time and no > one reported anything. Are all these boards broken? At least in theory. :-) The effect may be different due to what devices are missed. For me, the Gianfar Ethernet on my mpc8315erdb board is malfunction due to the MIDIO bus is not probed. > > I'm fine to just disable the default call for PPC instead if there's > some chance this does not fix some boards. I have tried to cover all the invocation of of_platform_bus_probe() via machine_device_initcall(). Yes, I maybe missed some boards. But won't we want to take this as a step to use the default populate function since it does remove some reduplication codes? > There could be some other > initcall ordering dependencies. > > > > > Only boot test on a mpc8315erdb board. > > Curious, what would it take to remove the of_platform_bus_probe and > use the default here? We can add additional bus compatibles to match. I thought about this. But the bus compatibles list seems a bit longer and it may cause some side effects on some boards due to all these additional buses. So that changes seem a bit aggressive to me. It does seem a feature for linux-next. The following is the compatible buses list which are needed to be added to the default match table if we want fix all the current broken boards: { .compatible = "fsl,ep8248e-bcsr", }, { .compatible = "fsl,pq2pro-localbus", }, { .compatible = "fsl,qe", }, { .compatible = "fsl,srio", }, { .compatible = "gianfar", }, { .compatible = "gpio-leds", }, { .compatible = "hawk-bridge", }, { .compatible = "ibm,ebc", }, { .compatible = "ibm,opb", }, { .compatible = "ibm,plb3", }, { .compatible = "ibm,plb4", }, { .compatible = "ibm,plb6", }, { .compatible = "nintendo,flipper", }, { .compatible = "nintendo,hollywood", }, { .compatible = "pasemi,localbus", }, { .compatible = "pasemi,sdc", }, { .compatible = "soc", }, { .compatible = "xlnx,compound", }, { .compatible = "xlnx,dcr-v29-1.00.a", }, { .compatible = "xlnx,opb-v20-1.10.c", }, { .compatible = "xlnx,plb-v34-1.01.a", }, { .compatible = "xlnx,plb-v34-1.02.a", }, { .compatible = "xlnx,plb-v46-1.00.a", }, { .compatible = "xlnx,plb-v46-1.02.a", }, { .name = "cpm", }, { .name = "localbus", }, { .name = "soc", }, { .type = "axon", }, { .type = "ebc", }, { .type = "opb", }, { .type = "plb4", }, { .type = "plb5", }, { .type = "qe", }, { .type = "soc", }, { .type = "spider", }, Of course I can choose to use the default function if all you guys think it is better. :-) > The difference between of_platform_bus_probe and > of_platform_bus_populate is the former will match root nodes with no > compatible string. Most platforms should not need that behavior and it > would be nice to know which ones. I don't think this difference would cause any real side effect for these boards. Thanks, Kevin signature.asc Description: PGP signature
Re: [PATCH] mm: Initialize per_cpu_nodestats for hotadded pgdats
On 12/08/16 02:04, Reza Arbab wrote: > The following oops occurs after a pgdat is hotadded: > > [ 86.839956] Unable to handle kernel paging request for data at address > 0x00c30001 > [ 86.840132] Faulting instruction address: 0xc022f8f4 > [ 86.840328] Oops: Kernel access of bad area, sig: 11 [#1] > [ 86.840468] SMP NR_CPUS=2048 NUMA pSeries > [ 86.840612] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 > ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute bridge stp > llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 > nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter > ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat > nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter > nls_utf8 isofs sg virtio_balloon uio_pdrv_genirq uio ip_tables xfs libcrc32c > sr_mod cdrom sd_mod virtio_net ibmvscsi scsi_transport_srp virtio_pci > virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod > [ 86.842955] CPU: 0 PID: 0 Comm: swapper/0 Tainted: GW > 4.8.0-rc1-device #110 > [ 86.843140] task: c0ef3080 task.stack: c0f6c000 > [ 86.843323] NIP: c022f8f4 LR: c022f948 CTR: > > [ 86.843595] REGS: c0f6fa50 TRAP: 0300 Tainted: GW > (4.8.0-rc1-device) > [ 86.843889] MSR: 80010280b033 > CR: 84002028 XER: 2000 > [ 86.844624] CFAR: d1d2013c DAR: 00c30001 DSISR: 4000 > SOFTE: 0 > GPR00: c022f948 c0f6fcd0 c0f71400 0001 > GPR04: 0100 00c3 > GPR08: 0001 00c3 > GPR12: 2200 c130 c0faefb4 c0faefa8 > GPR16: c0f6c000 c0f6c080 c0bf15b0 c0f6c080 > GPR20: c0bf4928 0003 c0bf4968 > GPR24: c000ffed c0f6fd58 > GPR28: 0001 0001 c0f6fcf0 c000ffed9c08 > [ 86.847747] NIP [c022f8f4] refresh_cpu_vm_stats+0x1a4/0x2f0 > [ 86.847897] LR [c022f948] refresh_cpu_vm_stats+0x1f8/0x2f0 > [ 86.848060] Call Trace: > [ 86.848183] [c0f6fcd0] [c022f948] > refresh_cpu_vm_stats+0x1f8/0x2f0 (unreliable) > > Add per_cpu_nodestats initialization to the hotplug codepath. > > Signed-off-by: Reza Arbab > --- > mm/memory_hotplug.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 3894b65..41266dc 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1219,6 +1219,7 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 > start) > > /* init node's zones as empty zones, we don't have any present pages.*/ > free_area_init_node(nid, zones_size, start_pfn, zholes_size); > + pgdat->per_cpu_nodestats = alloc_percpu(struct per_cpu_nodestat); > > /* >* The node we allocated has no zone fallback lists. For avoiding > @@ -1249,6 +1250,7 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 > start) > static void rollback_node_hotadd(int nid, pg_data_t *pgdat) > { > arch_refresh_nodedata(nid, NULL); > + free_percpu(pgdat->per_cpu_nodestats); > arch_free_nodedata(pgdat); > return; > } > I wonder if node_set_online() should do the allocation and offline should free. But that would be a larger change Balbir
Re: [PATCH 3/4] powerpc/mm: allow memory hotplug into a memoryless node
On 09/08/16 04:27, Reza Arbab wrote: > Remove the check which prevents us from hotplugging into an empty node. > > Signed-off-by: Reza Arbab > --- > arch/powerpc/mm/numa.c | 13 + > 1 file changed, 1 insertion(+), 12 deletions(-) > > diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c > index 80d067d..bc70c4f 100644 > --- a/arch/powerpc/mm/numa.c > +++ b/arch/powerpc/mm/numa.c > @@ -1127,7 +1127,7 @@ static int hot_add_node_scn_to_nid(unsigned long > scn_addr) > int hot_add_scn_to_nid(unsigned long scn_addr) > { > struct device_node *memory = NULL; > - int nid, found = 0; > + int nid; > Do we want to do this only for ibm,hotplug-aperture compatible ranges? I'm OK either ways Acked-by: Balbir Singh
Re: [PATCH 2/4] powerpc/mm: create numa nodes for hotplug memory
On 09/08/16 04:27, Reza Arbab wrote: > When scanning the device tree to initialize the system NUMA topology, > process dt elements with compatible id "ibm,hotplug-aperture" to create > memoryless numa nodes. > > These nodes will be filled when hotplug occurs within the associated > address range. > > Signed-off-by: Reza Arbab > --- Looks good to me Acked-by: Balbir Singh
Re: [PATCH v5 11/13] powerpc: Allow userspace to set device tree properties in kexec_file_load
Hello Sam, Thanks for the quick response. Am Freitag, 12 August 2016, 10:45:00 schrieb Samuel Mendoza-Jonas: > On Thu, 2016-08-11 at 20:08 -0300, Thiago Jung Bauermann wrote: > > @@ -908,4 +909,245 @@ bool find_debug_console(const void *fdt, int > > chosen_node) return false; > > } > > > > +/** > > + * struct allowed_node - a node in the whitelist and its allowed > > properties. + * @name: node name or full node path > > + * @properties:NULL-terminated array of names or > > name=value pairs + * > > + * If name starts with /, then the node has to be at the specified path > > in + * the device tree (including unit addresses for all nodes in the > > path). + * If it doesn't, then the node can be anywhere in the device > > tree. + * > > + * An entry in properties can specify a string value that the property > > must + * have by using the "name=value" format. If the entry ends with > > =, it means + * that the property must be empty. > > + */ > > +static struct allowed_node { > > + const char *name; > > + const char *properties[9]; > > +} allowed_nodes[] = { > > + { > > + .name = "/chosen", > > + .properties = { > > + "stdout-path", > > + "linux,stdout-path", > > + NULL, > > + } > > + }, > > + { > > + .name = "vga", > > + .properties = { > > + "device_type=display", > > + "assigned-addresses", > > + "width", > > + "height", > > + "depth", > > + "little-endian=", > > + "linux,opened=", > > + "linux,boot-display=",ss > > + NULL, > > + } > > + }, > > +}; > > Hi Thiago, > > As much as this solves problems for *me*, I suspect adding 'vga' here > might be the subject of some discussion. Having /chosen whitelisted makes > sense on it's own, but 'vga' and its properties are very specific without > much explanation. > > If everyone's happy to have it there, cool! If not, I have the majority > of a patch that handles the original reason for these property updates > separately in the kernel rather than from userspace. If needed I'll clean > it up and we can handle it that way. Ok, that's good to know. I'm fine with it either way. In any case, 'vga' in this patch also serves a good real-life example of a non-trivial binding outside of /chosen that we might want to whitelist in the future. -- []'s Thiago Jung Bauermann IBM Linux Technology Center
Re: [PATCH v5 11/13] powerpc: Allow userspace to set device tree properties in kexec_file_load
On Thu, 2016-08-11 at 20:08 -0300, Thiago Jung Bauermann wrote: > Implement the arch_kexec_verify_buffer hook to verify that a device > tree blob passed by userspace via kexec_file_load contains only nodes > and properties from a whitelist. > > In elf64_load we merge those properties into the device tree that > will be passed to the next kernel. > > Suggested-by: Michael Ellerman > Signed-off-by: Thiago Jung Bauermann > --- > arch/powerpc/include/asm/kexec.h | 1 + > arch/powerpc/kernel/kexec_elf_64.c | 9 ++ > arch/powerpc/kernel/machine_kexec_64.c | 242 > + > 3 files changed, 252 insertions(+) > > diff --git a/arch/powerpc/include/asm/kexec.h > b/arch/powerpc/include/asm/kexec.h > index f263cc867891..31bc64e07c8f 100644 > --- a/arch/powerpc/include/asm/kexec.h > +++ b/arch/powerpc/include/asm/kexec.h > @@ -99,6 +99,7 @@ int setup_purgatory(struct kimage *image, const void > *slave_code, > int setup_new_fdt(void *fdt, unsigned long initrd_load_addr, > unsigned long initrd_len, const char *cmdline); > bool find_debug_console(const void *fdt, int chosen_node); > +int merge_partial_dtb(void *to, const void *from); > #endif /* CONFIG_KEXEC_FILE */ > > #else /* !CONFIG_KEXEC */ > diff --git a/arch/powerpc/kernel/kexec_elf_64.c > b/arch/powerpc/kernel/kexec_elf_64.c > index 49cba9509464..1b902ad66e2a 100644 > --- a/arch/powerpc/kernel/kexec_elf_64.c > +++ b/arch/powerpc/kernel/kexec_elf_64.c > @@ -210,6 +210,15 @@ void *elf64_load(struct kimage *image, char *kernel_buf, > goto out; > } > > + /* Add nodes and properties from the DTB passed by userspace. */ > + if (image->dtb_buf) { > + ret = merge_partial_dtb(fdt, image->dtb_buf); > + if (ret) { > + pr_err("Error merging partial device tree.\n"); > + goto out; > + } > + } > + > ret = setup_new_fdt(fdt, initrd_load_addr, initrd_len, cmdline); > if (ret) > goto out; > diff --git a/arch/powerpc/kernel/machine_kexec_64.c > b/arch/powerpc/kernel/machine_kexec_64.c > index 527f98efe651..a484a6346146 100644 > --- a/arch/powerpc/kernel/machine_kexec_64.c > +++ b/arch/powerpc/kernel/machine_kexec_64.c > @@ -35,6 +35,7 @@ > #include > > #define SLAVE_CODE_SIZE256 > +#define MAX_DT_PATH512 > > #ifdef CONFIG_KEXEC_FILE > static struct kexec_file_ops *kexec_file_loaders[] = { > @@ -908,4 +909,245 @@ bool find_debug_console(const void *fdt, int > chosen_node) > return false; > } > > +/** > + * struct allowed_node - a node in the whitelist and its allowed properties. > + * @name: node name or full node path > + * @properties:NULL-terminated array of names or name=value > pairs > + * > + * If name starts with /, then the node has to be at the specified path in > + * the device tree (including unit addresses for all nodes in the path). > + * If it doesn't, then the node can be anywhere in the device tree. > + * > + * An entry in properties can specify a string value that the property must > + * have by using the "name=value" format. If the entry ends with =, it means > + * that the property must be empty. > + */ > +static struct allowed_node { > + const char *name; > + const char *properties[9]; > +} allowed_nodes[] = { > + { > + .name = "/chosen", > + .properties = { > + "stdout-path", > + "linux,stdout-path", > + NULL, > + } > + }, > + { > + .name = "vga", > + .properties = { > + "device_type=display", > + "assigned-addresses", > + "width", > + "height", > + "depth", > + "little-endian=", > + "linux,opened=", > + "linux,boot-display=",ss > + NULL, > + } > + }, > +}; Hi Thiago, As much as this solves problems for *me*, I suspect adding 'vga' here might be the subject of some discussion. Having /chosen whitelisted makes sense on it's own, but 'vga' and its properties are very specific without much explanation. If everyone's happy to have it there, cool! If not, I have the majority of a patch that handles the original reason for these property updates separately in the kernel rather than from userspace. If needed I'll clean it up and we can handle it that way. Cheers, Sam
Re: [PATCH] soc: fsl/qe: fix Oops on CPM1 (and likely CPM2)
On Mon, 2016-08-08 at 18:08 +0200, Christophe Leroy wrote: > Commit 0e6e01ff694ee ("CPM/QE: use genalloc to manage CPM/QE muram") > has changed the way muram is managed. > genalloc uses kmalloc(), hence requires the SLAB to be up and running. > > On powerpc 8xx, cpm_reset() is called early during startup. > cpm_reset() then calls cpm_muram_init() before SLAB is available, > hence the following Oops. > > cpm_reset() cannot be called during initcalls because the CPM is > needed for console > > This patch splits cpm_muram_init() in two parts. The first part, > related to mappings, is kept as cpm_muram_init() > The second part is named cpm_muram_pool_init() and is called > the first time cpm_muram_alloc() is used Why do you need to split it, versus calling the full cpm_muram_init() on demand? -Scott
[PATCH v2 06/20] selftests/powerpc: Check for VSX preservation across userspace preemption
Ensure the kernel correctly switches VSX registers correctly. VSX registers are all volatile, and despite the kernel preserving VSX across syscalls, it doesn't have to. Test that during interrupts and timeslices ending the VSX regs remain the same. Signed-off-by: Cyril Bur --- tools/testing/selftests/powerpc/math/Makefile | 4 +- tools/testing/selftests/powerpc/math/vsx_asm.S | 61 + tools/testing/selftests/powerpc/math/vsx_preempt.c | 147 + tools/testing/selftests/powerpc/vsx_asm.h | 71 ++ 4 files changed, 282 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/powerpc/math/vsx_asm.S create mode 100644 tools/testing/selftests/powerpc/math/vsx_preempt.c create mode 100644 tools/testing/selftests/powerpc/vsx_asm.h diff --git a/tools/testing/selftests/powerpc/math/Makefile b/tools/testing/selftests/powerpc/math/Makefile index 5b88875..aa6598b 100644 --- a/tools/testing/selftests/powerpc/math/Makefile +++ b/tools/testing/selftests/powerpc/math/Makefile @@ -1,4 +1,4 @@ -TEST_PROGS := fpu_syscall fpu_preempt fpu_signal vmx_syscall vmx_preempt vmx_signal +TEST_PROGS := fpu_syscall fpu_preempt fpu_signal vmx_syscall vmx_preempt vmx_signal vsx_preempt all: $(TEST_PROGS) @@ -13,6 +13,8 @@ vmx_syscall: vmx_asm.S vmx_preempt: vmx_asm.S vmx_signal: vmx_asm.S +vsx_preempt: vsx_asm.S + include ../../lib.mk clean: diff --git a/tools/testing/selftests/powerpc/math/vsx_asm.S b/tools/testing/selftests/powerpc/math/vsx_asm.S new file mode 100644 index 000..a110dd8 --- /dev/null +++ b/tools/testing/selftests/powerpc/math/vsx_asm.S @@ -0,0 +1,61 @@ +/* + * Copyright 2015, Cyril Bur, IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include "../basic_asm.h" +#include "../vsx_asm.h" + +#long check_vsx(vector int *r3); +#This function wraps storeing VSX regs to the end of an array and a +#call to a comparison function in C which boils down to a memcmp() +FUNC_START(check_vsx) + PUSH_BASIC_STACK(32) + std r3,STACK_FRAME_PARAM(0)(sp) + addi r3, r3, 16 * 12 #Second half of array + bl store_vsx + ld r3,STACK_FRAME_PARAM(0)(sp) + bl vsx_memcmp + POP_BASIC_STACK(32) + blr +FUNC_END(check_vsx) + +# int preempt_vmx(vector int *varray, int *threads_starting, +# int *running); +# On starting will (atomically) decrement threads_starting as a signal +# that the VMX have been loaded with varray. Will proceed to check the +# validity of the VMX registers while running is not zero. +FUNC_START(preempt_vsx) + PUSH_BASIC_STACK(512) + std r3,STACK_FRAME_PARAM(0)(sp) # vector int *varray + std r4,STACK_FRAME_PARAM(1)(sp) # int *threads_starting + std r5,STACK_FRAME_PARAM(2)(sp) # int *running + + bl load_vsx + nop + + sync + # Atomic DEC + ld r3,STACK_FRAME_PARAM(1)(sp) +1: lwarx r4,0,r3 + addi r4,r4,-1 + stwcx. r4,0,r3 + bne- 1b + +2: ld r3,STACK_FRAME_PARAM(0)(sp) + bl check_vsx + nop + cmpdi r3,0 + bne 3f + ld r4,STACK_FRAME_PARAM(2)(sp) + ld r5,0(r4) + cmpwi r5,0 + bne 2b + +3: POP_BASIC_STACK(512) + blr +FUNC_END(preempt_vsx) diff --git a/tools/testing/selftests/powerpc/math/vsx_preempt.c b/tools/testing/selftests/powerpc/math/vsx_preempt.c new file mode 100644 index 000..6387f03 --- /dev/null +++ b/tools/testing/selftests/powerpc/math/vsx_preempt.c @@ -0,0 +1,147 @@ +/* + * Copyright 2015, Cyril Bur, IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + * This test attempts to see if the VSX registers change across preemption. + * There is no way to be sure preemption happened so this test just + * uses many threads and a long wait. As such, a successful test + * doesn't mean much but a failure is bad. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "utils.h" + +/* Time to wait for workers to get preempted (seconds) */ +#define PREEMPT_TIME 20 +/* + * Factor by which to multiply number of online CPUs for total number of + * worker threads + */ +#define THREAD_FACTOR 8 + +/* + * Ensure there is twice the number of non-volatile VMX regs! + * check_vmx() is going to use the other half as space to put the live + * registers before calling vsx_memcmp() + */ +__thread vector int varray[24] = { + {1, 2, 3, 4 }, {5, 6, 7, 8 }, {9, 10,11,12}, + {13,14,15,16}, {17,18,19,20}, {21,22,23,24}, + {25,26,27,28}, {29,
[PATCH v2 01/20] selftests/powerpc: Compile selftests against headers without AT_HWCAP2
It might be nice to compile selftests against older kernels and headers but which may not have HWCAP2. Signed-off-by: Cyril Bur --- tools/testing/selftests/powerpc/utils.h | 7 +++ 1 file changed, 7 insertions(+) diff --git a/tools/testing/selftests/powerpc/utils.h b/tools/testing/selftests/powerpc/utils.h index fbd33e5..ecd11b5 100644 --- a/tools/testing/selftests/powerpc/utils.h +++ b/tools/testing/selftests/powerpc/utils.h @@ -32,10 +32,17 @@ static inline bool have_hwcap(unsigned long ftr) return ((unsigned long)get_auxv_entry(AT_HWCAP) & ftr) == ftr; } +#ifdef AT_HWCAP2 static inline bool have_hwcap2(unsigned long ftr2) { return ((unsigned long)get_auxv_entry(AT_HWCAP2) & ftr2) == ftr2; } +#else +static inline bool have_hwcap2(unsigned long ftr2) +{ + return false; +} +#endif /* Yes, this is evil */ #define FAIL_IF(x) \ -- 2.9.2
[PATCH v2 19/20] powerpc: tm: Rename transct_(*) to ck(\1)_state
Make the structures being used for checkpointed state named consistently with the pt_regs/ckpt_regs. Signed-off-by: Cyril Bur --- arch/powerpc/include/asm/processor.h | 8 ++--- arch/powerpc/kernel/asm-offsets.c| 12 arch/powerpc/kernel/fpu.S| 2 +- arch/powerpc/kernel/process.c| 4 +-- arch/powerpc/kernel/ptrace.c | 46 +-- arch/powerpc/kernel/signal.h | 8 ++--- arch/powerpc/kernel/signal_32.c | 60 ++-- arch/powerpc/kernel/signal_64.c | 32 +-- arch/powerpc/kernel/tm.S | 12 arch/powerpc/kernel/vector.S | 4 +-- 10 files changed, 94 insertions(+), 94 deletions(-) diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index feab2ce..b3e0cfc 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -147,7 +147,7 @@ typedef struct { } mm_segment_t; #define TS_FPR(i) fp_state.fpr[i][TS_FPROFFSET] -#define TS_TRANS_FPR(i) transact_fp.fpr[i][TS_FPROFFSET] +#define TS_CKFPR(i) ckfp_state.fpr[i][TS_FPROFFSET] /* FP and VSX 0-31 register set */ struct thread_fp_state { @@ -275,9 +275,9 @@ struct thread_struct { * * These are analogous to how ckpt_regs and pt_regs work */ - struct thread_fp_state transact_fp; - struct thread_vr_state transact_vr; - unsigned long transact_vrsave; + struct thread_fp_state ckfp_state; /* Checkpointed FP state */ + struct thread_vr_state ckvr_state; /* Checkpointed VR state */ + unsigned long ckvrsave; /* Checkpointed VRSAVE */ #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */ #ifdef CONFIG_KVM_BOOK3S_32_HANDLER void* kvm_shadow_vcpu; /* KVM internal data */ diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index b89d14c..dd0fc33 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -142,12 +142,12 @@ int main(void) DEFINE(THREAD_TM_PPR, offsetof(struct thread_struct, tm_ppr)); DEFINE(THREAD_TM_DSCR, offsetof(struct thread_struct, tm_dscr)); DEFINE(PT_CKPT_REGS, offsetof(struct thread_struct, ckpt_regs)); - DEFINE(THREAD_TRANSACT_VRSTATE, offsetof(struct thread_struct, -transact_vr)); - DEFINE(THREAD_TRANSACT_VRSAVE, offsetof(struct thread_struct, - transact_vrsave)); - DEFINE(THREAD_TRANSACT_FPSTATE, offsetof(struct thread_struct, -transact_fp)); + DEFINE(THREAD_CKVRSTATE, offsetof(struct thread_struct, +ckvr_state)); + DEFINE(THREAD_CKVRSAVE, offsetof(struct thread_struct, + ckvrsave)); + DEFINE(THREAD_CKFPSTATE, offsetof(struct thread_struct, +ckfp_state)); /* Local pt_regs on stack for Transactional Memory funcs. */ DEFINE(TM_FRAME_SIZE, STACK_FRAME_OVERHEAD + sizeof(struct pt_regs) + 16); diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S index 15da2b5..181c187 100644 --- a/arch/powerpc/kernel/fpu.S +++ b/arch/powerpc/kernel/fpu.S @@ -68,7 +68,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX) SYNC MTMSRD(r5) - addir7,r3,THREAD_TRANSACT_FPSTATE + addir7,r3,THREAD_CKFPSTATE lfd fr0,FPSTATE_FPSCR(r7) MTFSF_L(fr0) REST_32FPVSRS(0, R4, R7) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 6836570..5216e04 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -860,8 +860,8 @@ static inline void tm_reclaim_task(struct task_struct *tsk) * * In switching we need to maintain a 2nd register state as * oldtask->thread.ckpt_regs. We tm_reclaim(oldproc); this saves the -* checkpointed (tbegin) state in ckpt_regs and saves the transactional -* (current) FPRs into oldtask->thread.transact_fpr[]. +* checkpointed (tbegin) state in ckpt_regs, ckfp_state and +* ckvr_state * * We also context switch (save) TFHAR/TEXASR/TFIAR in here. */ diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c index ed53712..1e85d6b 100644 --- a/arch/powerpc/kernel/ptrace.c +++ b/arch/powerpc/kernel/ptrace.c @@ -384,7 +384,7 @@ static int gpr_set(struct task_struct *target, const struct user_regset *regset, /* * Regardless of transactions, 'fp_state' holds the current running - * value of all FPR registers and 'transact_fp' holds the last checkpointed + * value of all FPR registers and 'ckfp_state' holds the last checkpointed * value of all FPR registers for the current transaction. * * Userspace interface buffer layout: @@
[PATCH v2 20/20] powerpc: Remove do_load_up_transact_{fpu,altivec}
Previous rework of TM code leaves these functions unused Signed-off-by: Cyril Bur --- arch/powerpc/include/asm/tm.h | 5 - arch/powerpc/kernel/fpu.S | 26 -- arch/powerpc/kernel/vector.S | 25 - 3 files changed, 56 deletions(-) diff --git a/arch/powerpc/include/asm/tm.h b/arch/powerpc/include/asm/tm.h index c22d704..82e06ca 100644 --- a/arch/powerpc/include/asm/tm.h +++ b/arch/powerpc/include/asm/tm.h @@ -9,11 +9,6 @@ #ifndef __ASSEMBLY__ -#ifdef CONFIG_PPC_TRANSACTIONAL_MEM -extern void do_load_up_transact_fpu(struct thread_struct *thread); -extern void do_load_up_transact_altivec(struct thread_struct *thread); -#endif - extern void tm_enable(void); extern void tm_reclaim(struct thread_struct *thread, unsigned long orig_msr, uint8_t cause); diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S index 181c187..08d14b0 100644 --- a/arch/powerpc/kernel/fpu.S +++ b/arch/powerpc/kernel/fpu.S @@ -50,32 +50,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX); \ #define REST_32FPVSRS(n,c,base) __REST_32FPVSRS(n,__REG_##c,__REG_##base) #define SAVE_32FPVSRS(n,c,base) __SAVE_32FPVSRS(n,__REG_##c,__REG_##base) -#ifdef CONFIG_PPC_TRANSACTIONAL_MEM -/* void do_load_up_transact_fpu(struct thread_struct *thread) - * - * This is similar to load_up_fpu but for the transactional version of the FP - * register set. It doesn't mess with the task MSR or valid flags. - * Furthermore, we don't do lazy FP with TM currently. - */ -_GLOBAL(do_load_up_transact_fpu) - mfmsr r6 - ori r5,r6,MSR_FP -#ifdef CONFIG_VSX -BEGIN_FTR_SECTION - orisr5,r5,MSR_VSX@h -END_FTR_SECTION_IFSET(CPU_FTR_VSX) -#endif - SYNC - MTMSRD(r5) - - addir7,r3,THREAD_CKFPSTATE - lfd fr0,FPSTATE_FPSCR(r7) - MTFSF_L(fr0) - REST_32FPVSRS(0, R4, R7) - - blr -#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */ - /* * Load state from memory into FP registers including FPSCR. * Assumes the caller has enabled FP in the MSR. diff --git a/arch/powerpc/kernel/vector.S b/arch/powerpc/kernel/vector.S index 7dc4021..bc85bdf 100644 --- a/arch/powerpc/kernel/vector.S +++ b/arch/powerpc/kernel/vector.S @@ -7,31 +7,6 @@ #include #include -#ifdef CONFIG_PPC_TRANSACTIONAL_MEM -/* void do_load_up_transact_altivec(struct thread_struct *thread) - * - * This is similar to load_up_altivec but for the transactional version of the - * vector regs. It doesn't mess with the task MSR or valid flags. - * Furthermore, VEC laziness is not supported with TM currently. - */ -_GLOBAL(do_load_up_transact_altivec) - mfmsr r6 - orisr5,r6,MSR_VEC@h - MTMSRD(r5) - isync - - li r4,1 - stw r4,THREAD_USED_VR(r3) - - li r10,THREAD_CKVRSTATE+VRSTATE_VSCR - lvx v0,r10,r3 - mtvscr v0 - addir10,r3,THREAD_CKVRSTATE - REST_32VRS(0,r4,r10) - - blr -#endif - /* * Load state from memory into VMX registers including VSCR. * Assumes the caller has enabled VMX in the MSR. -- 2.9.2
[PATCH v2 18/20] powerpc: tm: Always use fp_state and vr_state to store live registers
There is currently an inconsistency as to how the entire CPU register state is saved and restored when a thread uses transactional memory (TM). Using transactional memory results in the CPU having duplicated (almost all) of its register state. This duplication results in a set of registers which can be considered 'live', those being currently modified by the instructions being executed and another set that is frozen at a point in time. On context switch, both sets of state have to be saved and (later) restored. These two states are often called a variety of different things. Common terms for the state which only exists after has entered a transaction (performed a TBEGIN instruction) in hardware is the 'transactional' or 'speculative'. Between a TBEGIN and a TEND or TABORT (or an event that causes the hardware to abort), regardless of the use of TSUSPEND the transactional state can be referred to as the live state. The second state is often to referred to as the 'checkpointed' state and is a duplication of the live state when the TBEGIN instruction is executed. This state is kept in the hardware and will be rolled back to on transaction failure. Currently all the registers stored in pt_regs are ALWAYS the live registers, that is, when a thread has transactional registers their values are stored in pt_regs and the checkpointed state is in ckpt_regs. A strange opposite is true for fp_state. When a thread is non transactional fp_state holds the live registers. When a thread has initiated a transaction fp_state holds the checkpointed state and transact_fp becomes the structure which holds the live state (at this point it is a transactional state). The same is true for vr_state This method creates confusion as to where the live state is, in some circumstances it requires extra work to determine where to put the live state and prevents the use of common functions designed (probably before TM) to save the live state. With this patch pt_regs, fp_state and vr_state all represent the same thing and the other structures [pending rename] are for checkpointed state. Signed-off-by: Cyril Bur --- arch/powerpc/include/asm/processor.h | 7 +- arch/powerpc/kernel/process.c| 63 +++- arch/powerpc/kernel/ptrace.c | 278 +-- arch/powerpc/kernel/signal_32.c | 50 +++ arch/powerpc/kernel/signal_64.c | 53 +++ arch/powerpc/kernel/tm.S | 94 ++-- arch/powerpc/kernel/traps.c | 12 +- 7 files changed, 196 insertions(+), 361 deletions(-) diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index 68e3bf5..feab2ce 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -267,16 +267,13 @@ struct thread_struct { unsigned long tm_dscr; /* -* Transactional FP and VSX 0-31 register set. -* NOTE: the sense of these is the opposite of the integer ckpt_regs! +* Checkpointed FP and VSX 0-31 register set. * * When a transaction is active/signalled/scheduled etc., *regs is the * most recent set of/speculated GPRs with ckpt_regs being the older * checkpointed regs to which we roll back if transaction aborts. * -* However, fpr[] is the checkpointed 'base state' of FP regs, and -* transact_fpr[] is the new set of transactional values. -* VRs work the same way. +* These are analogous to how ckpt_regs and pt_regs work */ struct thread_fp_state transact_fp; struct thread_vr_state transact_vr; diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 0cfbc89..6836570 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -808,26 +808,14 @@ static inline bool hw_brk_match(struct arch_hw_breakpoint *a, static void tm_reclaim_thread(struct thread_struct *thr, struct thread_info *ti, uint8_t cause) { - unsigned long msr_diff = 0; + unsigned long msr_diff = thr->regs->msr; - /* -* If FP/VSX registers have been already saved to the -* thread_struct, move them to the transact_fp array. -* We clear the TIF_RESTORE_TM bit since after the reclaim -* the thread will no longer be transactional. -*/ if (test_ti_thread_flag(ti, TIF_RESTORE_TM)) { - msr_diff = thr->ckpt_regs.msr & ~thr->regs->msr; - if (msr_diff & MSR_FP) - memcpy(&thr->transact_fp, &thr->fp_state, - sizeof(struct thread_fp_state)); - if (msr_diff & MSR_VEC) - memcpy(&thr->transact_vr, &thr->vr_state, - sizeof(struct thread_vr_state)); + msr_diff = (thr->ckpt_regs.msr & ~thr->regs->msr) + & (MSR_FP | MSR_VEC | MSR_VSX | MSR_FE0 | MSR_FE1); +
[PATCH v2 17/20] selftests/powerpc: Add checks for transactional VSXs in signal contexts
If a thread receives a signal while transactional the kernel creates a second context to show the transactional state of the process. This test loads some known values and waits for a signal and confirms that the expected values are in the signal context. Signed-off-by: Cyril Bur --- tools/testing/selftests/powerpc/tm/Makefile| 2 +- .../powerpc/tm/tm-signal-context-chk-vsx.c | 125 + 2 files changed, 126 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/powerpc/tm/tm-signal-context-chk-vsx.c diff --git a/tools/testing/selftests/powerpc/tm/Makefile b/tools/testing/selftests/powerpc/tm/Makefile index 06c44aa..9d53f8b 100644 --- a/tools/testing/selftests/powerpc/tm/Makefile +++ b/tools/testing/selftests/powerpc/tm/Makefile @@ -1,5 +1,5 @@ SIGNAL_CONTEXT_CHK_TESTS := tm-signal-context-chk-gpr tm-signal-context-chk-fpu \ - tm-signal-context-chk-vmx + tm-signal-context-chk-vmx tm-signal-context-chk-vsx TEST_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv tm-signal-stack \ tm-vmxcopy tm-fork tm-tar tm-tmspr $(SIGNAL_CONTEXT_CHK_TESTS) diff --git a/tools/testing/selftests/powerpc/tm/tm-signal-context-chk-vsx.c b/tools/testing/selftests/powerpc/tm/tm-signal-context-chk-vsx.c new file mode 100644 index 000..b99c3d8 --- /dev/null +++ b/tools/testing/selftests/powerpc/tm/tm-signal-context-chk-vsx.c @@ -0,0 +1,125 @@ +/* + * Copyright 2016, Cyril Bur, IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + * + * Test the kernel's signal frame code. + * + * The kernel sets up two sets of ucontexts if the signal was to be + * delivered while the thread was in a transaction. + * Expected behaviour is that the checkpointed state is in the user + * context passed to the signal handler. The speculated state can be + * accessed with the uc_link pointer. + * + * The rationale for this is that if TM unaware code (which linked + * against TM libs) installs a signal handler it will not know of the + * speculative nature of the 'live' registers and may infer the wrong + * thing. + */ + +#include +#include +#include +#include +#include + +#include + +#include "utils.h" +#include "tm.h" + +#define MAX_ATTEMPT 50 + +#define NV_VSX_REGS 12 + +long tm_signal_self_context_load(pid_t pid, long *gprs, double *fps, vector int *vms, vector int *vss); + +static sig_atomic_t fail; + +vector int vss[] = { + {1, 2, 3, 4 },{5, 6, 7, 8 },{9, 10,11,12}, + {13,14,15,16},{17,18,19,20},{21,22,23,24}, + {25,26,27,28},{29,30,31,32},{33,34,35,36}, + {37,38,39,40},{41,42,43,44},{45,46,47,48}, + {-1, -2, -3, -4 },{-5, -6, -7, -8 },{-9, -10,-11,-12}, + {-13,-14,-15,-16},{-17,-18,-19,-20},{-21,-22,-23,-24}, + {-25,-26,-27,-28},{-29,-30,-31,-32},{-33,-34,-35,-36}, + {-37,-38,-39,-40},{-41,-42,-43,-44},{-45,-46,-47,-48} +}; + +static void signal_usr1(int signum, siginfo_t *info, void *uc) +{ + int i; + uint8_t vsc[sizeof(vector int)]; + uint8_t vst[sizeof(vector int)]; + ucontext_t *ucp = uc; + ucontext_t *tm_ucp = ucp->uc_link; + + /* +* The other half of the VSX regs will be after v_regs. +* +* In short, vmx_reserve array holds everything. v_regs is a 16 +* byte aligned pointer at the start of vmx_reserve (vmx_reserve +* may or may not be 16 aligned) where the v_regs structure exists. +* (half of) The VSX regsters are directly after v_regs so the +* easiest way to find them below. +*/ + long *vsx_ptr = (long *)(ucp->uc_mcontext.v_regs + 1); + long *tm_vsx_ptr = (long *)(tm_ucp->uc_mcontext.v_regs + 1); + for (i = 0; i < NV_VSX_REGS && !fail; i++) { + memcpy(vsc, &ucp->uc_mcontext.fp_regs[i + 20], 8); + memcpy(vsc + 8, &vsx_ptr[20 + i], 8); + fail = memcmp(vsc, &vss[i], sizeof(vector int)); + memcpy(vst, &tm_ucp->uc_mcontext.fp_regs[i + 20], 8); + memcpy(vst + 8, &tm_vsx_ptr[20 + i], 8); + fail |= memcmp(vst, &vss[i + NV_VSX_REGS], sizeof(vector int)); + + if (fail) { + int j; + + fprintf(stderr, "Failed on %d vsx 0x", i); + for (j = 0; j < 16; j++) + fprintf(stderr, "%02x", vsc[j]); + fprintf(stderr, " vs 0x"); + for (j = 0; j < 16; j++) + fprintf(stderr, "%02x", vst[j]); + fprintf(stderr, "\n"); + } + } +} + +static int tm_signal_context_chk() +{ + struct sigaction act; + int i; + long rc; + pid_t pid = getpid(); + + SKIP_IF(!have
[PATCH v2 16/20] selftests/powerpc: Add checks for transactional VMXs in signal contexts
If a thread receives a signal while transactional the kernel creates a second context to show the transactional state of the process. This test loads some known values and waits for a signal and confirms that the expected values are in the signal context. Signed-off-by: Cyril Bur --- tools/testing/selftests/powerpc/tm/Makefile| 3 +- .../powerpc/tm/tm-signal-context-chk-vmx.c | 110 + 2 files changed, 112 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/powerpc/tm/tm-signal-context-chk-vmx.c diff --git a/tools/testing/selftests/powerpc/tm/Makefile b/tools/testing/selftests/powerpc/tm/Makefile index 103648f..06c44aa 100644 --- a/tools/testing/selftests/powerpc/tm/Makefile +++ b/tools/testing/selftests/powerpc/tm/Makefile @@ -1,4 +1,5 @@ -SIGNAL_CONTEXT_CHK_TESTS := tm-signal-context-chk-gpr tm-signal-context-chk-fpu +SIGNAL_CONTEXT_CHK_TESTS := tm-signal-context-chk-gpr tm-signal-context-chk-fpu \ + tm-signal-context-chk-vmx TEST_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv tm-signal-stack \ tm-vmxcopy tm-fork tm-tar tm-tmspr $(SIGNAL_CONTEXT_CHK_TESTS) diff --git a/tools/testing/selftests/powerpc/tm/tm-signal-context-chk-vmx.c b/tools/testing/selftests/powerpc/tm/tm-signal-context-chk-vmx.c new file mode 100644 index 000..f0ee55f --- /dev/null +++ b/tools/testing/selftests/powerpc/tm/tm-signal-context-chk-vmx.c @@ -0,0 +1,110 @@ +/* + * Copyright 2016, Cyril Bur, IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + * + * Test the kernel's signal frame code. + * + * The kernel sets up two sets of ucontexts if the signal was to be + * delivered while the thread was in a transaction. + * Expected behaviour is that the checkpointed state is in the user + * context passed to the signal handler. The speculated state can be + * accessed with the uc_link pointer. + * + * The rationale for this is that if TM unaware code (which linked + * against TM libs) installs a signal handler it will not know of the + * speculative nature of the 'live' registers and may infer the wrong + * thing. + */ + +#include +#include +#include +#include +#include + +#include + +#include "utils.h" +#include "tm.h" + +#define MAX_ATTEMPT 50 + +#define NV_VMX_REGS 12 + +long tm_signal_self_context_load(pid_t pid, long *gprs, double *fps, vector int *vms, vector int *vss); + +static sig_atomic_t fail; + +vector int vms[] = { + {1, 2, 3, 4 },{5, 6, 7, 8 },{9, 10,11,12}, + {13,14,15,16},{17,18,19,20},{21,22,23,24}, + {25,26,27,28},{29,30,31,32},{33,34,35,36}, + {37,38,39,40},{41,42,43,44},{45,46,47,48}, + {-1, -2, -3, -4}, {-5, -6, -7, -8}, {-9, -10,-11,-12}, + {-13,-14,-15,-16},{-17,-18,-19,-20},{-21,-22,-23,-24}, + {-25,-26,-27,-28},{-29,-30,-31,-32},{-33,-34,-35,-36}, + {-37,-38,-39,-40},{-41,-42,-43,-44},{-45,-46,-47,-48} +}; + +static void signal_usr1(int signum, siginfo_t *info, void *uc) +{ + int i; + ucontext_t *ucp = uc; + ucontext_t *tm_ucp = ucp->uc_link; + + for (i = 0; i < NV_VMX_REGS && !fail; i++) { + fail = memcmp(ucp->uc_mcontext.v_regs->vrregs[i + 20], + &vms[i], sizeof(vector int)); + fail |= memcmp(tm_ucp->uc_mcontext.v_regs->vrregs[i + 20], + &vms[i + NV_VMX_REGS], sizeof (vector int)); + + if (fail) { + int j; + + fprintf(stderr, "Failed on %d vmx 0x", i); + for (j = 0; j < 4; j++) + fprintf(stderr, "%04x", ucp->uc_mcontext.v_regs->vrregs[i + 20][j]); + fprintf(stderr, " vs 0x"); + for (j = 0 ; j < 4; j++) + fprintf(stderr, "%04x", tm_ucp->uc_mcontext.v_regs->vrregs[i + 20][j]); + fprintf(stderr, "\n"); + } + } +} + +static int tm_signal_context_chk() +{ + struct sigaction act; + int i; + long rc; + pid_t pid = getpid(); + + SKIP_IF(!have_htm()); + + act.sa_sigaction = signal_usr1; + sigemptyset(&act.sa_mask); + act.sa_flags = SA_SIGINFO; + if (sigaction(SIGUSR1, &act, NULL) < 0) { + perror("sigaction sigusr1"); + exit(1); + } + + i = 0; + while (i < MAX_ATTEMPT && !fail) { + rc = tm_signal_self_context_load(pid, NULL, NULL, vms, NULL); + FAIL_IF(rc != pid); + i++; + } + + return fail; +} + +int main(void) +{ + return test_harness(tm_signal_context_chk, "tm_signal_context_chk_vmx"); +} -- 2.9.2
[PATCH v2 15/20] selftests/powerpc: Add checks for transactional FPUs in signal contexts
If a thread receives a signal while transactional the kernel creates a second context to show the transactional state of the process. This test loads some known values and waits for a signal and confirms that the expected values are in the signal context. Signed-off-by: Cyril Bur --- tools/testing/selftests/powerpc/tm/Makefile| 2 +- .../powerpc/tm/tm-signal-context-chk-fpu.c | 92 ++ 2 files changed, 93 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/powerpc/tm/tm-signal-context-chk-fpu.c diff --git a/tools/testing/selftests/powerpc/tm/Makefile b/tools/testing/selftests/powerpc/tm/Makefile index 2b6fe8f..103648f 100644 --- a/tools/testing/selftests/powerpc/tm/Makefile +++ b/tools/testing/selftests/powerpc/tm/Makefile @@ -1,4 +1,4 @@ -SIGNAL_CONTEXT_CHK_TESTS := tm-signal-context-chk-gpr +SIGNAL_CONTEXT_CHK_TESTS := tm-signal-context-chk-gpr tm-signal-context-chk-fpu TEST_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv tm-signal-stack \ tm-vmxcopy tm-fork tm-tar tm-tmspr $(SIGNAL_CONTEXT_CHK_TESTS) diff --git a/tools/testing/selftests/powerpc/tm/tm-signal-context-chk-fpu.c b/tools/testing/selftests/powerpc/tm/tm-signal-context-chk-fpu.c new file mode 100644 index 000..c760deb --- /dev/null +++ b/tools/testing/selftests/powerpc/tm/tm-signal-context-chk-fpu.c @@ -0,0 +1,92 @@ +/* + * Copyright 2016, Cyril Bur, IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + * + * Test the kernel's signal frame code. + * + * The kernel sets up two sets of ucontexts if the signal was to be + * delivered while the thread was in a transaction. + * Expected behaviour is that the checkpointed state is in the user + * context passed to the signal handler. The speculated state can be + * accessed with the uc_link pointer. + * + * The rationale for this is that if TM unaware code (which linked + * against TM libs) installs a signal handler it will not know of the + * speculative nature of the 'live' registers and may infer the wrong + * thing. + */ + +#include +#include +#include +#include + +#include + +#include "utils.h" +#include "tm.h" + +#define MAX_ATTEMPT 50 + +#define NV_FPU_REGS 18 + +long tm_signal_self_context_load(pid_t pid, long *gprs, double *fps, vector int *vms, vector int *vss); + +/* Be sure there are 2x as many as there are NV FPU regs (2x18) */ +static double fps[] = { +1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, + -1,-2,-3,-4,-5,-6,-7,-8,-9,-10,-11,-12,-13,-14,-15,-16,-17,-18 +}; + +static sig_atomic_t fail; + +static void signal_usr1(int signum, siginfo_t *info, void *uc) +{ + int i; + ucontext_t *ucp = uc; + ucontext_t *tm_ucp = ucp->uc_link; + + for (i = 0; i < NV_FPU_REGS && !fail; i++) { + fail = (ucp->uc_mcontext.fp_regs[i + 14] != fps[i]); + fail |= (tm_ucp->uc_mcontext.fp_regs[i + 14] != fps[i + NV_FPU_REGS]); + if (fail) + printf("Failed on %d FP %g or %g\n", i, ucp->uc_mcontext.fp_regs[i + 14], tm_ucp->uc_mcontext.fp_regs[i + 14]); + } +} + +static int tm_signal_context_chk_fpu() +{ + struct sigaction act; + int i; + long rc; + pid_t pid = getpid(); + + SKIP_IF(!have_htm()); + + act.sa_sigaction = signal_usr1; + sigemptyset(&act.sa_mask); + act.sa_flags = SA_SIGINFO; + if (sigaction(SIGUSR1, &act, NULL) < 0) { + perror("sigaction sigusr1"); + exit(1); + } + + i = 0; + while (i < MAX_ATTEMPT && !fail) { + rc = tm_signal_self_context_load(pid, NULL, fps, NULL, NULL); + FAIL_IF(rc != pid); + i++; + } + + return fail; +} + +int main(void) +{ + return test_harness(tm_signal_context_chk_fpu, "tm_signal_context_chk_fpu"); +} -- 2.9.2
[PATCH v2 14/20] selftests/powerpc: Add checks for transactional GPRs in signal contexts
If a thread receives a signal while transactional the kernel creates a second context to show the transactional state of the process. This test loads some known values and waits for a signal and confirms that the expected values are in the signal context. Signed-off-by: Cyril Bur --- tools/testing/selftests/powerpc/tm/Makefile| 7 +- .../powerpc/tm/tm-signal-context-chk-gpr.c | 90 tools/testing/selftests/powerpc/tm/tm-signal.S | 114 + 3 files changed, 210 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/powerpc/tm/tm-signal-context-chk-gpr.c create mode 100644 tools/testing/selftests/powerpc/tm/tm-signal.S diff --git a/tools/testing/selftests/powerpc/tm/Makefile b/tools/testing/selftests/powerpc/tm/Makefile index 9d301d7..2b6fe8f 100644 --- a/tools/testing/selftests/powerpc/tm/Makefile +++ b/tools/testing/selftests/powerpc/tm/Makefile @@ -1,5 +1,7 @@ +SIGNAL_CONTEXT_CHK_TESTS := tm-signal-context-chk-gpr + TEST_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv tm-signal-stack \ - tm-vmxcopy tm-fork tm-tar tm-tmspr tm-exec tm-execed + tm-vmxcopy tm-fork tm-tar tm-tmspr $(SIGNAL_CONTEXT_CHK_TESTS) all: $(TEST_PROGS) @@ -11,6 +13,9 @@ tm-syscall: tm-syscall-asm.S tm-syscall: CFLAGS += -I../../../../../usr/include tm-tmspr: CFLAGS += -pthread +$(SIGNAL_CONTEXT_CHK_TESTS): tm-signal.S +$(SIGNAL_CONTEXT_CHK_TESTS): CFLAGS += -mhtm -m64 + include ../../lib.mk clean: diff --git a/tools/testing/selftests/powerpc/tm/tm-signal-context-chk-gpr.c b/tools/testing/selftests/powerpc/tm/tm-signal-context-chk-gpr.c new file mode 100644 index 000..df91330 --- /dev/null +++ b/tools/testing/selftests/powerpc/tm/tm-signal-context-chk-gpr.c @@ -0,0 +1,90 @@ +/* + * Copyright 2016, Cyril Bur, IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + * + * Test the kernel's signal frame code. + * + * The kernel sets up two sets of ucontexts if the signal was to be + * delivered while the thread was in a transaction. + * Expected behaviour is that the checkpointed state is in the user + * context passed to the signal handler. The speculated state can be + * accessed with the uc_link pointer. + * + * The rationale for this is that if TM unaware code (which linked + * against TM libs) installs a signal handler it will not know of the + * speculative nature of the 'live' registers and may infer the wrong + * thing. + */ + +#include +#include +#include +#include + +#include + +#include "utils.h" +#include "tm.h" + +#define MAX_ATTEMPT 50 + +#define NV_GPR_REGS 18 + +long tm_signal_self_context_load(pid_t pid, long *gprs, double *fps, vector int *vms, vector int *vss); + +static sig_atomic_t fail; + +static long gps[] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, + -1,-2,-3,-4,-5,-6,-7,-8,-9,-10,-11,-12,-13,-14,-15,-16,-17,-18}; + +static void signal_usr1(int signum, siginfo_t *info, void *uc) +{ + int i; + ucontext_t *ucp = uc; + ucontext_t *tm_ucp = ucp->uc_link; + + for (i = 0; i < NV_GPR_REGS && !fail; i++) { + fail = (ucp->uc_mcontext.gp_regs[i + 14] != gps[i]); + fail |= (tm_ucp->uc_mcontext.gp_regs[i + 14] != gps[i + NV_GPR_REGS]); + if (fail) + printf("Failed on %d GPR %lu or %lu\n", i, + ucp->uc_mcontext.gp_regs[i + 14], tm_ucp->uc_mcontext.gp_regs[i + 14]); + } +} + +static int tm_signal_context_chk_gpr() +{ + struct sigaction act; + int i; + long rc; + pid_t pid = getpid(); + + SKIP_IF(!have_htm()); + + act.sa_sigaction = signal_usr1; + sigemptyset(&act.sa_mask); + act.sa_flags = SA_SIGINFO; + if (sigaction(SIGUSR1, &act, NULL) < 0) { + perror("sigaction sigusr1"); + exit(1); + } + + i = 0; + while (i < MAX_ATTEMPT && !fail) { + rc = tm_signal_self_context_load(pid, gps, NULL, NULL, NULL); + FAIL_IF(rc != pid); + i++; + } + + return fail; +} + +int main(void) +{ + return test_harness(tm_signal_context_chk_gpr, "tm_signal_context_chk_gpr"); +} diff --git a/tools/testing/selftests/powerpc/tm/tm-signal.S b/tools/testing/selftests/powerpc/tm/tm-signal.S new file mode 100644 index 000..4e13e8b --- /dev/null +++ b/tools/testing/selftests/powerpc/tm/tm-signal.S @@ -0,0 +1,114 @@ +/* + * Copyright 2015, Cyril Bur, IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of
[PATCH v2 13/20] selftests/powerpc: Check that signals always get delivered
Signed-off-by: Cyril Bur --- tools/testing/selftests/powerpc/Makefile | 1 + tools/testing/selftests/powerpc/signal/Makefile| 12 +++ tools/testing/selftests/powerpc/signal/signal.S| 50 ++ tools/testing/selftests/powerpc/signal/signal.c| 111 + tools/testing/selftests/powerpc/signal/signal_tm.c | 110 5 files changed, 284 insertions(+) create mode 100644 tools/testing/selftests/powerpc/signal/Makefile create mode 100644 tools/testing/selftests/powerpc/signal/signal.S create mode 100644 tools/testing/selftests/powerpc/signal/signal.c create mode 100644 tools/testing/selftests/powerpc/signal/signal_tm.c diff --git a/tools/testing/selftests/powerpc/Makefile b/tools/testing/selftests/powerpc/Makefile index 3c40c9d..96a8593 100644 --- a/tools/testing/selftests/powerpc/Makefile +++ b/tools/testing/selftests/powerpc/Makefile @@ -19,6 +19,7 @@ SUB_DIRS = alignment \ dscr \ mm \ pmu \ + signal \ primitives \ stringloops \ switch_endian\ diff --git a/tools/testing/selftests/powerpc/signal/Makefile b/tools/testing/selftests/powerpc/signal/Makefile new file mode 100644 index 000..97944cf --- /dev/null +++ b/tools/testing/selftests/powerpc/signal/Makefile @@ -0,0 +1,12 @@ +TEST_PROGS := signal signal_tm + +all: $(TEST_PROGS) + +$(TEST_PROGS): ../harness.c ../utils.c signal.S + +signal_tm: CFLAGS += -mhtm + +include ../../lib.mk + +clean: + rm -f $(TEST_PROGS) *.o diff --git a/tools/testing/selftests/powerpc/signal/signal.S b/tools/testing/selftests/powerpc/signal/signal.S new file mode 100644 index 000..7043d52 --- /dev/null +++ b/tools/testing/selftests/powerpc/signal/signal.S @@ -0,0 +1,50 @@ +/* + * Copyright 2015, Cyril Bur, IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include "../basic_asm.h" + +/* long signal_self(pid_t pid, int sig); */ +FUNC_START(signal_self) + li r0,37 /* sys_kill */ + /* r3 already has our pid in it */ + /* r4 already has signal type in it */ + sc + bc 4,3,1f + subfze r3,r3 +1: blr +FUNC_END(signal_self) + +/* long tm_signal_self(pid_t pid, int sig, int *ret); */ +FUNC_START(tm_signal_self) + PUSH_BASIC_STACK(8) + std r5,STACK_FRAME_PARAM(0)(sp) /* ret */ + tbegin. + beq 1f + tsuspend. + li r0,37 /* sys_kill */ + /* r3 already has our pid in it */ + /* r4 already has signal type in it */ + sc + ld r5,STACK_FRAME_PARAM(0)(sp) /* ret */ + bc 4,3,2f + subfze r3,r3 +2: std r3,0(r5) + tabort. 0 + tresume. /* Be nice to some cleanup, jumps back to tbegin then to 1: */ + /* +* Transaction should be proper doomed and we should never get +* here +*/ + li r3,1 + POP_BASIC_STACK(8) + blr +1: li r3,0 + POP_BASIC_STACK(8) + blr +FUNC_END(tm_signal_self) diff --git a/tools/testing/selftests/powerpc/signal/signal.c b/tools/testing/selftests/powerpc/signal/signal.c new file mode 100644 index 000..e7dedd2 --- /dev/null +++ b/tools/testing/selftests/powerpc/signal/signal.c @@ -0,0 +1,111 @@ +/* + * Copyright 2016, Cyril Bur, IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + * Sending one self a signal should always get delivered. + */ + +#include +#include +#include +#include +#include +#include +#include + +#include + +#include "utils.h" + +#define MAX_ATTEMPT 50 +#define TIMEOUT 5 + +extern long signal_self(pid_t pid, int sig); + +static sig_atomic_t signaled; +static sig_atomic_t fail; + +static void signal_handler(int sig) +{ + if (sig == SIGUSR1) + signaled = 1; + else + fail = 1; +} + +static int test_signal() +{ + int i; + struct sigaction act; + pid_t ppid = getpid(); + pid_t pid; + + act.sa_handler = signal_handler; + act.sa_flags = 0; + sigemptyset(&act.sa_mask); + if (sigaction(SIGUSR1, &act, NULL) < 0) { + perror("sigaction SIGUSR1"); + exit(1); + } + if (sigaction(SIGALRM, &act, NULL) < 0) { + perror("sigaction SIGALRM"); + exit(1); + } + + /* Don't do this for MAX_ATTEMPT, its simply too long */ + for(i = 0; i < 1000; i++) { + pid = fo
[PATCH v2 12/20] selftests/powerpc: Add TM tcheck helpers in C
Signed-off-by: Cyril Bur --- tools/testing/selftests/powerpc/tm/tm.h | 27 +++ 1 file changed, 27 insertions(+) diff --git a/tools/testing/selftests/powerpc/tm/tm.h b/tools/testing/selftests/powerpc/tm/tm.h index 60318ba..2c8da74 100644 --- a/tools/testing/selftests/powerpc/tm/tm.h +++ b/tools/testing/selftests/powerpc/tm/tm.h @@ -52,4 +52,31 @@ static inline bool failure_is_nesting(void) return (__builtin_get_texasru() & 0x40); } +static inline int tcheck(void) +{ + long cr; + asm volatile ("tcheck 0" : "=r"(cr) : : "cr0"); + return (cr >> 28) & 4; +} + +static inline bool tcheck_doomed(void) +{ + return tcheck() & 8; +} + +static inline bool tcheck_active(void) +{ + return tcheck() & 4; +} + +static inline bool tcheck_suspended(void) +{ + return tcheck() & 2; +} + +static inline bool tcheck_transactional(void) +{ + return tcheck() & 6; +} + #endif /* _SELFTESTS_POWERPC_TM_TM_H */ -- 2.9.2
[PATCH v2 09/20] selftests/powerpc: Introduce GPR asm helper header file
Signed-off-by: Cyril Bur --- tools/testing/selftests/powerpc/gpr_asm.h | 96 +++ 1 file changed, 96 insertions(+) create mode 100644 tools/testing/selftests/powerpc/gpr_asm.h diff --git a/tools/testing/selftests/powerpc/gpr_asm.h b/tools/testing/selftests/powerpc/gpr_asm.h new file mode 100644 index 000..f6f3885 --- /dev/null +++ b/tools/testing/selftests/powerpc/gpr_asm.h @@ -0,0 +1,96 @@ +/* + * Copyright 2016, Cyril Bur, IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#ifndef _SELFTESTS_POWERPC_GPR_ASM_H +#define _SELFTESTS_POWERPC_GPR_ASM_H + +#include "basic_asm.h" + +#define __PUSH_NVREGS(top_pos); \ + std r31,(top_pos)(%r1); \ + std r30,(top_pos - 8)(%r1); \ + std r29,(top_pos - 16)(%r1); \ + std r28,(top_pos - 24)(%r1); \ + std r27,(top_pos - 32)(%r1); \ + std r26,(top_pos - 40)(%r1); \ + std r25,(top_pos - 48)(%r1); \ + std r24,(top_pos - 56)(%r1); \ + std r23,(top_pos - 64)(%r1); \ + std r22,(top_pos - 72)(%r1); \ + std r21,(top_pos - 80)(%r1); \ + std r20,(top_pos - 88)(%r1); \ + std r19,(top_pos - 96)(%r1); \ + std r18,(top_pos - 104)(%r1); \ + std r17,(top_pos - 112)(%r1); \ + std r16,(top_pos - 120)(%r1); \ + std r15,(top_pos - 128)(%r1); \ + std r14,(top_pos - 136)(%r1) + +#define __POP_NVREGS(top_pos); \ + ld r31,(top_pos)(%r1); \ + ld r30,(top_pos - 8)(%r1); \ + ld r29,(top_pos - 16)(%r1); \ + ld r28,(top_pos - 24)(%r1); \ + ld r27,(top_pos - 32)(%r1); \ + ld r26,(top_pos - 40)(%r1); \ + ld r25,(top_pos - 48)(%r1); \ + ld r24,(top_pos - 56)(%r1); \ + ld r23,(top_pos - 64)(%r1); \ + ld r22,(top_pos - 72)(%r1); \ + ld r21,(top_pos - 80)(%r1); \ + ld r20,(top_pos - 88)(%r1); \ + ld r19,(top_pos - 96)(%r1); \ + ld r18,(top_pos - 104)(%r1); \ + ld r17,(top_pos - 112)(%r1); \ + ld r16,(top_pos - 120)(%r1); \ + ld r15,(top_pos - 128)(%r1); \ + ld r14,(top_pos - 136)(%r1) + +#define PUSH_NVREGS(stack_size) \ + __PUSH_NVREGS(stack_size + STACK_FRAME_MIN_SIZE) + +/* 18 NV FPU REGS */ +#define PUSH_NVREGS_BELOW_FPU(stack_size) \ + __PUSH_NVREGS(stack_size + STACK_FRAME_MIN_SIZE - (18 * 8)) + +#define POP_NVREGS(stack_size) \ + __POP_NVREGS(stack_size + STACK_FRAME_MIN_SIZE) + +/* 18 NV FPU REGS */ +#define POP_NVREGS_BELOW_FPU(stack_size) \ + __POP_NVREGS(stack_size + STACK_FRAME_MIN_SIZE - (18 * 8)) + +/* + * Careful calling this, it will 'clobber' NVGPRs (by design) + * Don't call this from C + */ +FUNC_START(load_gpr) + ld r14,0(r3) + ld r15,8(r3) + ld r16,16(r3) + ld r17,24(r3) + ld r18,32(r3) + ld r19,40(r3) + ld r20,48(r3) + ld r21,56(r3) + ld r22,64(r3) + ld r23,72(r3) + ld r24,80(r3) + ld r25,88(r3) + ld r26,96(r3) + ld r27,104(r3) + ld r28,112(r3) + ld r29,120(r3) + ld r30,128(r3) + ld r31,136(r3) + blr +FUNC_END(load_gpr) + + +#endif /* _SELFTESTS_POWERPC_GPR_ASM_H */ -- 2.9.2
[PATCH v2 10/20] selftests/powerpc: Add transactional memory defines
Signed-off-by: Cyril Bur --- tools/testing/selftests/powerpc/basic_asm.h | 4 1 file changed, 4 insertions(+) diff --git a/tools/testing/selftests/powerpc/basic_asm.h b/tools/testing/selftests/powerpc/basic_asm.h index 3349a07..5131059 100644 --- a/tools/testing/selftests/powerpc/basic_asm.h +++ b/tools/testing/selftests/powerpc/basic_asm.h @@ -4,6 +4,10 @@ #include #include +#define TBEGIN .long 0x7C00051D +#define TSUSPEND .long 0x7C0005DD +#define TRESUME .long 0x7C2005DD + #define LOAD_REG_IMMEDIATE(reg,expr) \ lis reg,(expr)@highest; \ ori reg,reg,(expr)@higher; \ -- 2.9.2
[PATCH v2 11/20] selftests/powerpc: Allow tests to extend their kill timeout
Signed-off-by: Cyril Bur --- tools/testing/selftests/powerpc/harness.c | 9 +++-- tools/testing/selftests/powerpc/utils.h | 2 +- 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/powerpc/harness.c b/tools/testing/selftests/powerpc/harness.c index 52f9be7..248a820 100644 --- a/tools/testing/selftests/powerpc/harness.c +++ b/tools/testing/selftests/powerpc/harness.c @@ -19,9 +19,9 @@ #include "subunit.h" #include "utils.h" -#define TIMEOUT120 #define KILL_TIMEOUT 5 +static uint64_t timeout = 120; int run_test(int (test_function)(void), char *name) { @@ -44,7 +44,7 @@ int run_test(int (test_function)(void), char *name) setpgid(pid, pid); /* Wake us up in timeout seconds */ - alarm(TIMEOUT); + alarm(timeout); terminated = false; wait: @@ -94,6 +94,11 @@ static struct sigaction alarm_action = { .sa_handler = alarm_handler, }; +void test_harness_set_timeout(uint64_t time) +{ + timeout = time; +} + int test_harness(int (test_function)(void), char *name) { int rc; diff --git a/tools/testing/selftests/powerpc/utils.h b/tools/testing/selftests/powerpc/utils.h index ecd11b5..53405e8 100644 --- a/tools/testing/selftests/powerpc/utils.h +++ b/tools/testing/selftests/powerpc/utils.h @@ -22,7 +22,7 @@ typedef uint32_t u32; typedef uint16_t u16; typedef uint8_t u8; - +void test_harness_set_timeout(uint64_t time); int test_harness(int (test_function)(void), char *name); extern void *get_auxv_entry(int type); int pick_online_cpu(void); -- 2.9.2
[PATCH v2 08/20] selftests/powerpc: Move VMX stack frame macros to header file
Signed-off-by: Cyril Bur --- tools/testing/selftests/powerpc/math/vmx_asm.S | 85 +- tools/testing/selftests/powerpc/vmx_asm.h | 98 ++ 2 files changed, 99 insertions(+), 84 deletions(-) create mode 100644 tools/testing/selftests/powerpc/vmx_asm.h diff --git a/tools/testing/selftests/powerpc/math/vmx_asm.S b/tools/testing/selftests/powerpc/math/vmx_asm.S index 1b8c248..fd74da4 100644 --- a/tools/testing/selftests/powerpc/math/vmx_asm.S +++ b/tools/testing/selftests/powerpc/math/vmx_asm.S @@ -8,90 +8,7 @@ */ #include "../basic_asm.h" - -# POS MUST BE 16 ALIGNED! -#define PUSH_VMX(pos,reg) \ - li reg,pos; \ - stvxv20,reg,sp; \ - addireg,reg,16; \ - stvxv21,reg,sp; \ - addireg,reg,16; \ - stvxv22,reg,sp; \ - addireg,reg,16; \ - stvxv23,reg,sp; \ - addireg,reg,16; \ - stvxv24,reg,sp; \ - addireg,reg,16; \ - stvxv25,reg,sp; \ - addireg,reg,16; \ - stvxv26,reg,sp; \ - addireg,reg,16; \ - stvxv27,reg,sp; \ - addireg,reg,16; \ - stvxv28,reg,sp; \ - addireg,reg,16; \ - stvxv29,reg,sp; \ - addireg,reg,16; \ - stvxv30,reg,sp; \ - addireg,reg,16; \ - stvxv31,reg,sp; - -# POS MUST BE 16 ALIGNED! -#define POP_VMX(pos,reg) \ - li reg,pos; \ - lvx v20,reg,sp; \ - addireg,reg,16; \ - lvx v21,reg,sp; \ - addireg,reg,16; \ - lvx v22,reg,sp; \ - addireg,reg,16; \ - lvx v23,reg,sp; \ - addireg,reg,16; \ - lvx v24,reg,sp; \ - addireg,reg,16; \ - lvx v25,reg,sp; \ - addireg,reg,16; \ - lvx v26,reg,sp; \ - addireg,reg,16; \ - lvx v27,reg,sp; \ - addireg,reg,16; \ - lvx v28,reg,sp; \ - addireg,reg,16; \ - lvx v29,reg,sp; \ - addireg,reg,16; \ - lvx v30,reg,sp; \ - addireg,reg,16; \ - lvx v31,reg,sp; - -# Carefull this will 'clobber' vmx (by design) -# Don't call this from C -FUNC_START(load_vmx) - li r5,0 - lvx v20,r5,r3 - addir5,r5,16 - lvx v21,r5,r3 - addir5,r5,16 - lvx v22,r5,r3 - addir5,r5,16 - lvx v23,r5,r3 - addir5,r5,16 - lvx v24,r5,r3 - addir5,r5,16 - lvx v25,r5,r3 - addir5,r5,16 - lvx v26,r5,r3 - addir5,r5,16 - lvx v27,r5,r3 - addir5,r5,16 - lvx v28,r5,r3 - addir5,r5,16 - lvx v29,r5,r3 - addir5,r5,16 - lvx v30,r5,r3 - addir5,r5,16 - lvx v31,r5,r3 - blr -FUNC_END(load_vmx) +#include "../vmx_asm.h" # Should be safe from C, only touches r4, r5 and v0,v1,v2 FUNC_START(check_vmx) diff --git a/tools/testing/selftests/powerpc/vmx_asm.h b/tools/testing/selftests/powerpc/vmx_asm.h new file mode 100644 index 000..461845dd --- /dev/null +++ b/tools/testing/selftests/powerpc/vmx_asm.h @@ -0,0 +1,98 @@ +/* + * Copyright 2015, Cyril Bur, IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include "basic_asm.h" + +/* POS MUST BE 16 ALIGNED! */ +#define PUSH_VMX(pos,reg) \ + li reg,pos; \ + stvxv20,reg,%r1; \ + addireg,reg,16; \ + stvxv21,reg,%r1; \ + addireg,reg,16; \ + stvxv22,reg,%r1; \ + addireg,reg,16; \ + stvxv23,reg,%r1; \ + addireg,reg,16; \ + stvxv24,reg,%r1; \ + addireg,reg,16; \ + stvxv25,reg,%r1; \ + addireg,reg,16; \ + stvxv26,reg,%r1; \ + addireg,reg,16; \ + stvxv27,reg,%r1; \ + addireg,reg,16; \ + stvxv28,reg,%r1; \ + addireg,reg,16; \ + stvxv29,reg,%r1; \ + addireg,reg,16; \ + stvxv30,reg,%r1; \ + addireg,reg,16; \ + stvxv31,reg,%r1; + +/* POS MUST BE 16 ALIGNED! */ +#define POP_VMX(pos,reg) \ + li reg,pos; \ + lvx v20,reg,%r1; \ + addireg,reg,16; \ + lvx v21,reg,%r1; \ + addireg,reg,16; \ + lvx v22,reg,%r1; \ + addireg,reg,16; \ + lvx v23,reg,%r1; \ + addireg,reg,16; \ + lvx v24,reg,%r1; \ + addireg,reg,16; \ + lvx v25,reg,%r1; \ + addireg,reg,16; \ + lvx v26,reg,%r1; \ + addireg,reg,16; \ + lvx v27,reg,%r1; \ + addireg,reg,16; \ + lvx v28,reg,%r1; \ + addireg,reg,16; \ + lvx v29,reg
[PATCH v2 04/20] powerpc: Return the new MSR from msr_check_and_set()
mfmsr() is a fairly expensive call and callers of msr_check_and_set() may want to make decisions bits in the MSR that it did not change but may not know the value of. This patch would avoid a two calls to mfmsr(). Signed-off-by: Cyril Bur --- arch/powerpc/include/asm/reg.h | 2 +- arch/powerpc/kernel/process.c | 4 +++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index f69f40f..0a3dde9 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -1247,7 +1247,7 @@ static inline void mtmsr_isync(unsigned long val) : "memory") #endif -extern void msr_check_and_set(unsigned long bits); +extern unsigned long msr_check_and_set(unsigned long bits); extern bool strict_msr_control; extern void __msr_check_and_clear(unsigned long bits); static inline void msr_check_and_clear(unsigned long bits) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 79f0615..216cf05 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -104,7 +104,7 @@ static int __init enable_strict_msr_control(char *str) } early_param("ppc_strict_facility_enable", enable_strict_msr_control); -void msr_check_and_set(unsigned long bits) +unsigned long msr_check_and_set(unsigned long bits) { unsigned long oldmsr = mfmsr(); unsigned long newmsr; @@ -118,6 +118,8 @@ void msr_check_and_set(unsigned long bits) if (oldmsr != newmsr) mtmsr_isync(newmsr); + + return newmsr; } void __msr_check_and_clear(unsigned long bits) -- 2.9.2
[PATCH v2 07/20] selftests/powerpc: Rework FPU stack placement macros and move to header file
The FPU regs are placed at the top of the stack frame. Currently the position expected to be passed to the macro. The macros now should be passed the stack frame size and from there they can calculate where to put the regs, this makes the use simpler. Also move them to a header file to be used in an different area of the powerpc selftests Signed-off-by: Cyril Bur --- tools/testing/selftests/powerpc/fpu_asm.h | 81 ++ tools/testing/selftests/powerpc/math/fpu_asm.S | 73 ++- 2 files changed, 86 insertions(+), 68 deletions(-) create mode 100644 tools/testing/selftests/powerpc/fpu_asm.h diff --git a/tools/testing/selftests/powerpc/fpu_asm.h b/tools/testing/selftests/powerpc/fpu_asm.h new file mode 100644 index 000..24061b8 --- /dev/null +++ b/tools/testing/selftests/powerpc/fpu_asm.h @@ -0,0 +1,81 @@ +/* + * Copyright 2016, Cyril Bur, IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#ifndef _SELFTESTS_POWERPC_FPU_ASM_H +#define _SELFTESTS_POWERPC_FPU_ASM_H +#include "basic_asm.h" + +#define PUSH_FPU(stack_size) \ + stfdf31,(stack_size + STACK_FRAME_MIN_SIZE)(%r1); \ + stfdf30,(stack_size + STACK_FRAME_MIN_SIZE - 8)(%r1); \ + stfdf29,(stack_size + STACK_FRAME_MIN_SIZE - 16)(%r1); \ + stfdf28,(stack_size + STACK_FRAME_MIN_SIZE - 24)(%r1); \ + stfdf27,(stack_size + STACK_FRAME_MIN_SIZE - 32)(%r1); \ + stfdf26,(stack_size + STACK_FRAME_MIN_SIZE - 40)(%r1); \ + stfdf25,(stack_size + STACK_FRAME_MIN_SIZE - 48)(%r1); \ + stfdf24,(stack_size + STACK_FRAME_MIN_SIZE - 56)(%r1); \ + stfdf23,(stack_size + STACK_FRAME_MIN_SIZE - 64)(%r1); \ + stfdf22,(stack_size + STACK_FRAME_MIN_SIZE - 72)(%r1); \ + stfdf21,(stack_size + STACK_FRAME_MIN_SIZE - 80)(%r1); \ + stfdf20,(stack_size + STACK_FRAME_MIN_SIZE - 88)(%r1); \ + stfdf19,(stack_size + STACK_FRAME_MIN_SIZE - 96)(%r1); \ + stfdf18,(stack_size + STACK_FRAME_MIN_SIZE - 104)(%r1); \ + stfdf17,(stack_size + STACK_FRAME_MIN_SIZE - 112)(%r1); \ + stfdf16,(stack_size + STACK_FRAME_MIN_SIZE - 120)(%r1); \ + stfdf15,(stack_size + STACK_FRAME_MIN_SIZE - 128)(%r1); \ + stfdf14,(stack_size + STACK_FRAME_MIN_SIZE - 136)(%r1); + +#define POP_FPU(stack_size) \ + lfd f31,(stack_size + STACK_FRAME_MIN_SIZE)(%r1); \ + lfd f30,(stack_size + STACK_FRAME_MIN_SIZE - 8)(%r1); \ + lfd f29,(stack_size + STACK_FRAME_MIN_SIZE - 16)(%r1); \ + lfd f28,(stack_size + STACK_FRAME_MIN_SIZE - 24)(%r1); \ + lfd f27,(stack_size + STACK_FRAME_MIN_SIZE - 32)(%r1); \ + lfd f26,(stack_size + STACK_FRAME_MIN_SIZE - 40)(%r1); \ + lfd f25,(stack_size + STACK_FRAME_MIN_SIZE - 48)(%r1); \ + lfd f24,(stack_size + STACK_FRAME_MIN_SIZE - 56)(%r1); \ + lfd f23,(stack_size + STACK_FRAME_MIN_SIZE - 64)(%r1); \ + lfd f22,(stack_size + STACK_FRAME_MIN_SIZE - 72)(%r1); \ + lfd f21,(stack_size + STACK_FRAME_MIN_SIZE - 80)(%r1); \ + lfd f20,(stack_size + STACK_FRAME_MIN_SIZE - 88)(%r1); \ + lfd f19,(stack_size + STACK_FRAME_MIN_SIZE - 96)(%r1); \ + lfd f18,(stack_size + STACK_FRAME_MIN_SIZE - 104)(%r1); \ + lfd f17,(stack_size + STACK_FRAME_MIN_SIZE - 112)(%r1); \ + lfd f16,(stack_size + STACK_FRAME_MIN_SIZE - 120)(%r1); \ + lfd f15,(stack_size + STACK_FRAME_MIN_SIZE - 128)(%r1); \ + lfd f14,(stack_size + STACK_FRAME_MIN_SIZE - 136)(%r1); + +/* + * Careful calling this, it will 'clobber' fpu (by design) + * Don't call this from C + */ +FUNC_START(load_fpu) + lfd f14,0(r3) + lfd f15,8(r3) + lfd f16,16(r3) + lfd f17,24(r3) + lfd f18,32(r3) + lfd f19,40(r3) + lfd f20,48(r3) + lfd f21,56(r3) + lfd f22,64(r3) + lfd f23,72(r3) + lfd f24,80(r3) + lfd f25,88(r3) + lfd f26,96(r3) + lfd f27,104(r3) + lfd f28,112(r3) + lfd f29,120(r3) + lfd f30,128(r3) + lfd f31,136(r3) + blr +FUNC_END(load_fpu) + +#endif /* _SELFTESTS_POWERPC_FPU_ASM_H */ + diff --git a/tools/testing/selftests/powerpc/math/fpu_asm.S b/tools/testing/selftests/powerpc/math/fpu_asm.S index f3711d8..241f067 100644 --- a/tools/testing/selftests/powerpc/math/fpu_asm.S +++ b/tools/testing/selftests/powerpc/math/fpu_asm.S @@ -8,70 +8,7 @@ */ #include "../basic_asm.h" - -#define PUSH_FPU(pos) \ - stfdf14,pos(sp); \ - stfdf15,pos+8(sp); \ - stfdf16,pos+16(sp); \ - stfdf17,pos+24(sp); \ - stfdf18,pos+32(sp)
[PATCH v2 05/20] powerpc: Never giveup a reclaimed thread when enabling kernel {fp, altivec, vsx}
After a thread is reclaimed from its active or suspended transactional state the checkpointed state exists on CPU, this state (along with the live/transactional state) has been saved in its entirety by the reclaiming process. There exists a sequence of events that would cause the kernel to call one of enable_kernel_fp(), enable_kernel_altivec() or enable_kernel_vsx() after a thread has been reclaimed. These functions save away any user state on the CPU so that the kernel can use the registers. Not only is this saving away unnecessary at this point, it is actually incorrect. It causes a save of the checkpointed state to the live structures within the thread struct thus destroying the true live state for that thread. Signed-off-by: Cyril Bur --- arch/powerpc/kernel/process.c | 39 --- 1 file changed, 36 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 216cf05..0cfbc89 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -198,12 +198,23 @@ EXPORT_SYMBOL_GPL(flush_fp_to_thread); void enable_kernel_fp(void) { + unsigned long cpumsr; + WARN_ON(preemptible()); - msr_check_and_set(MSR_FP); + cpumsr = msr_check_and_set(MSR_FP); if (current->thread.regs && (current->thread.regs->msr & MSR_FP)) { check_if_tm_restore_required(current); + /* +* If a thread has already been reclaimed then the +* checkpointed registers are on the CPU but have definitely +* been saved by the reclaim code. Don't need to and *cannot* +* giveup as this would save to the 'live' structure not the +* checkpointed structure. +*/ + if(!MSR_TM_ACTIVE(cpumsr) && MSR_TM_ACTIVE(current->thread.regs->msr)) + return; __giveup_fpu(current); } } @@ -250,12 +261,23 @@ EXPORT_SYMBOL(giveup_altivec); void enable_kernel_altivec(void) { + unsigned long cpumsr; + WARN_ON(preemptible()); - msr_check_and_set(MSR_VEC); + cpumsr = msr_check_and_set(MSR_VEC); if (current->thread.regs && (current->thread.regs->msr & MSR_VEC)) { check_if_tm_restore_required(current); + /* +* If a thread has already been reclaimed then the +* checkpointed registers are on the CPU but have definitely +* been saved by the reclaim code. Don't need to and *cannot* +* giveup as this would save to the 'live' structure not the +* checkpointed structure. +*/ + if(!MSR_TM_ACTIVE(cpumsr) && MSR_TM_ACTIVE(current->thread.regs->msr)) + return; __giveup_altivec(current); } } @@ -324,12 +346,23 @@ static void save_vsx(struct task_struct *tsk) void enable_kernel_vsx(void) { + unsigned long cpumsr; + WARN_ON(preemptible()); - msr_check_and_set(MSR_FP|MSR_VEC|MSR_VSX); + cpumsr = msr_check_and_set(MSR_FP|MSR_VEC|MSR_VSX); if (current->thread.regs && (current->thread.regs->msr & MSR_VSX)) { check_if_tm_restore_required(current); + /* +* If a thread has already been reclaimed then the +* checkpointed registers are on the CPU but have definitely +* been saved by the reclaim code. Don't need to and *cannot* +* giveup as this would save to the 'live' structure not the +* checkpointed structure. +*/ + if(!MSR_TM_ACTIVE(cpumsr) && MSR_TM_ACTIVE(current->thread.regs->msr)) + return; if (current->thread.regs->msr & MSR_FP) __giveup_fpu(current); if (current->thread.regs->msr & MSR_VEC) -- 2.9.2
[PATCH v2 03/20] powerpc: Add check_if_tm_restore_required() to giveup_all()
giveup_all() causes FPU/VMX/VSX facilitities to be disabled in a threads MSR. If this thread was transactional this should be recorded as reclaiming/recheckpointing code will need to know. Fixes: c208505 ("powerpc: create giveup_all()") Signed-off-by: Cyril Bur --- arch/powerpc/kernel/process.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index a5cdef9..79f0615 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -439,6 +439,7 @@ void giveup_all(struct task_struct *tsk) return; msr_check_and_set(msr_all_available); + check_if_tm_restore_required(tsk); #ifdef CONFIG_PPC_FPU if (usermsr & MSR_FP) -- 2.9.2
[PATCH v2 02/20] powerpc: Always restore FPU/VEC/VSX if hardware transactional memory in use
Comment from arch/powerpc/kernel/process.c:967: If userspace is inside a transaction (whether active or suspended) and FP/VMX/VSX instructions have ever been enabled inside that transaction, then we have to keep them enabled and keep the FP/VMX/VSX state loaded while ever the transaction continues. The reason is that if we didn't, and subsequently got a FP/VMX/VSX unavailable interrupt inside a transaction, we don't know whether it's the same transaction, and thus we don't know which of the checkpointed state and the ransactional state to use. restore_math() restore_fp() and restore_altivec() currently may not restore the registers. It doesn't appear that this is more serious than a performance penalty. If the math registers aren't restored the userspace thread will still be run with the facility disabled. Userspace will not be able to read invalid values. On the first access it will take an facility unavailable exception and the kernel will detected an active transaction, at which point it will abort the transaction. There is the possibility for a pathological case preventing any progress by transactions, however, transactions are never guaranteed to make progress. Fixes: 70fe3d9 ("powerpc: Restore FPU/VEC/VSX if previously used") Signed-off-by: Cyril Bur --- arch/powerpc/kernel/process.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 58ccf86..a5cdef9 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -208,7 +208,7 @@ void enable_kernel_fp(void) EXPORT_SYMBOL(enable_kernel_fp); static int restore_fp(struct task_struct *tsk) { - if (tsk->thread.load_fp) { + if (tsk->thread.load_fp || MSR_TM_ACTIVE(tsk->thread.regs->msr)) { load_fp_state(¤t->thread.fp_state); current->thread.load_fp++; return 1; @@ -278,7 +278,8 @@ EXPORT_SYMBOL_GPL(flush_altivec_to_thread); static int restore_altivec(struct task_struct *tsk) { - if (cpu_has_feature(CPU_FTR_ALTIVEC) && tsk->thread.load_vec) { + if (cpu_has_feature(CPU_FTR_ALTIVEC) && + (tsk->thread.load_vec || MSR_TM_ACTIVE(tsk->thread.regs->msr))) { load_vr_state(&tsk->thread.vr_state); tsk->thread.used_vr = 1; tsk->thread.load_vec++; @@ -464,7 +465,8 @@ void restore_math(struct pt_regs *regs) { unsigned long msr; - if (!current->thread.load_fp && !loadvec(current->thread)) + if (!MSR_TM_ACTIVE(regs->msr) && + !current->thread.load_fp && !loadvec(current->thread)) return; msr = regs->msr; -- 2.9.2
[PATCH v2 00/20] Consistent TM structures
Hello, This series has grown considerably from v1. Similarities with v1 include: - Selftests are all the same, they have simply been split into several patches with comments from MPE and Daniel Axtens incorporated. It is possible some things slipped through the cracks selftest wise as the focus has been on the final three patches. - The final three patches have been reworked following extra testing and from review by Simon Guo. Differences include: - Patches 2-5 are fixes for existing problems found in the course of verifying the final three patches. In the case of "powerpc: Never giveup a reclaimed thread when enabling kernel {fp, altivec, vsx}" it has proven difficult to narrow down when the bug was introduced. It does not exist in 3.8 when TM was introduced but does exist in 4.4. I was unable to boot 3.13 (or 3.12) in an attempt to further bisect. - As ptrace code was merged between v1 and v2, work was needed there to make it fit in with the final three patches. The overall aim of this series may have gotten lost here. The final three patches are the goal here. Cyril Bur (20): selftests/powerpc: Compile selftests against headers without AT_HWCAP2 powerpc: Always restore FPU/VEC/VSX if hardware transactional memory in use powerpc: Add check_if_tm_restore_required() to giveup_all() powerpc: Return the new MSR from msr_check_and_set() powerpc: Never giveup a reclaimed thread when enabling kernel {fp, altivec, vsx} selftests/powerpc: Check for VSX preservation across userspace preemption selftests/powerpc: Rework FPU stack placement macros and move to header file selftests/powerpc: Move VMX stack frame macros to header file selftests/powerpc: Introduce GPR asm helper header file selftests/powerpc: Add transactional memory defines selftests/powerpc: Allow tests to extend their kill timeout selftests/powerpc: Add TM tcheck helpers in C selftests/powerpc: Check that signals always get delivered selftests/powerpc: Add checks for transactional GPRs in signal contexts selftests/powerpc: Add checks for transactional FPUs in signal contexts selftests/powerpc: Add checks for transactional VMXs in signal contexts selftests/powerpc: Add checks for transactional VSXs in signal contexts powerpc: tm: Always use fp_state and vr_state to store live registers powerpc: tm: Rename transct_(*) to ck(\1)_state powerpc: Remove do_load_up_transact_{fpu,altivec} arch/powerpc/include/asm/processor.h | 15 +- arch/powerpc/include/asm/reg.h | 2 +- arch/powerpc/include/asm/tm.h | 5 - arch/powerpc/kernel/asm-offsets.c | 12 +- arch/powerpc/kernel/fpu.S | 26 -- arch/powerpc/kernel/process.c | 119 + arch/powerpc/kernel/ptrace.c | 278 + arch/powerpc/kernel/signal.h | 8 +- arch/powerpc/kernel/signal_32.c| 84 +++ arch/powerpc/kernel/signal_64.c| 59 ++--- arch/powerpc/kernel/tm.S | 94 +++ arch/powerpc/kernel/traps.c| 12 +- arch/powerpc/kernel/vector.S | 25 -- tools/testing/selftests/powerpc/Makefile | 1 + tools/testing/selftests/powerpc/basic_asm.h| 4 + tools/testing/selftests/powerpc/fpu_asm.h | 81 ++ tools/testing/selftests/powerpc/gpr_asm.h | 96 +++ tools/testing/selftests/powerpc/harness.c | 9 +- tools/testing/selftests/powerpc/math/Makefile | 4 +- tools/testing/selftests/powerpc/math/fpu_asm.S | 73 +- tools/testing/selftests/powerpc/math/vmx_asm.S | 85 +-- tools/testing/selftests/powerpc/math/vsx_asm.S | 61 + tools/testing/selftests/powerpc/math/vsx_preempt.c | 147 +++ tools/testing/selftests/powerpc/signal/Makefile| 12 + tools/testing/selftests/powerpc/signal/signal.S| 50 tools/testing/selftests/powerpc/signal/signal.c| 111 tools/testing/selftests/powerpc/signal/signal_tm.c | 110 tools/testing/selftests/powerpc/tm/Makefile| 8 +- .../powerpc/tm/tm-signal-context-chk-fpu.c | 92 +++ .../powerpc/tm/tm-signal-context-chk-gpr.c | 90 +++ .../powerpc/tm/tm-signal-context-chk-vmx.c | 110 .../powerpc/tm/tm-signal-context-chk-vsx.c | 125 + tools/testing/selftests/powerpc/tm/tm-signal.S | 114 + tools/testing/selftests/powerpc/tm/tm.h| 27 ++ tools/testing/selftests/powerpc/utils.h| 9 +- tools/testing/selftests/powerpc/vmx_asm.h | 98 tools/testing/selftests/powerpc/vsx_asm.h | 71 ++ 37 files changed, 1709 insertions(+), 618 deletions(-) create mode 100644 tools/testing/selftests/powerpc/fpu_asm.h create mode 100644 tools/t
[PATCH v5 12/13] powerpc: Add purgatory for kexec_file_load implementation.
This purgatory implementation comes from kexec-tools, almost unchanged. The only changes were that the sha256_regions global variable was renamed to sha_regions to match what kexec_file_load expects, and to use the sha256.c file from x86's purgatory to avoid adding yet another SHA-256 implementation. Also, some formatting warnings found by checkpatch.pl were fixed. Signed-off-by: Thiago Jung Bauermann --- arch/powerpc/Makefile | 4 + arch/powerpc/purgatory/.gitignore | 2 + arch/powerpc/purgatory/Makefile | 36 +++ arch/powerpc/purgatory/console-ppc64.c| 38 +++ arch/powerpc/purgatory/crashdump-ppc64.h | 42 arch/powerpc/purgatory/crashdump_backup.c | 36 +++ arch/powerpc/purgatory/crtsavres.S| 5 + arch/powerpc/purgatory/hvCall.S | 27 + arch/powerpc/purgatory/hvCall.h | 8 ++ arch/powerpc/purgatory/kexec-sha256.h | 11 ++ arch/powerpc/purgatory/ppc64_asm.h| 20 arch/powerpc/purgatory/printf.c | 164 ++ arch/powerpc/purgatory/purgatory-ppc64.c | 41 arch/powerpc/purgatory/purgatory-ppc64.h | 6 ++ arch/powerpc/purgatory/purgatory.c| 62 +++ arch/powerpc/purgatory/purgatory.h| 11 ++ arch/powerpc/purgatory/sha256.c | 6 ++ arch/powerpc/purgatory/sha256.h | 1 + arch/powerpc/purgatory/string.S | 1 + arch/powerpc/purgatory/v2wrap.S | 134 20 files changed, 655 insertions(+) diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index ca254546cd05..beb928ff6b77 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -254,6 +254,7 @@ core-y += arch/powerpc/kernel/ \ core-$(CONFIG_XMON)+= arch/powerpc/xmon/ core-$(CONFIG_KVM) += arch/powerpc/kvm/ core-$(CONFIG_PERF_EVENTS) += arch/powerpc/perf/ +core-$(CONFIG_KEXEC_FILE) += arch/powerpc/purgatory/ drivers-$(CONFIG_OPROFILE) += arch/powerpc/oprofile/ @@ -375,6 +376,9 @@ archclean: $(Q)$(MAKE) $(clean)=$(boot) archprepare: checkbin +ifeq ($(CONFIG_KEXEC_FILE),y) + $(Q)$(MAKE) $(build)=arch/powerpc/purgatory arch/powerpc/purgatory/kexec-purgatory.c +endif # Use the file '.tmp_gas_check' for binutils tests, as gas won't output # to stdout and these checks are run even on install targets. diff --git a/arch/powerpc/purgatory/.gitignore b/arch/powerpc/purgatory/.gitignore new file mode 100644 index ..e9e66f178a6d --- /dev/null +++ b/arch/powerpc/purgatory/.gitignore @@ -0,0 +1,2 @@ +kexec-purgatory.c +purgatory.ro diff --git a/arch/powerpc/purgatory/Makefile b/arch/powerpc/purgatory/Makefile new file mode 100644 index ..63daf95e5703 --- /dev/null +++ b/arch/powerpc/purgatory/Makefile @@ -0,0 +1,36 @@ +purgatory-y := purgatory.o printf.o string.o v2wrap.o hvCall.o \ + purgatory-ppc64.o console-ppc64.o crashdump_backup.o \ + crtsavres.o sha256.o + +targets += $(purgatory-y) +PURGATORY_OBJS = $(addprefix $(obj)/,$(purgatory-y)) + +LDFLAGS_purgatory.ro := -e purgatory_start -r --no-undefined -nostartfiles \ + -nostdlib -nodefaultlibs +targets += purgatory.ro + +# Default KBUILD_CFLAGS can have -pg option set when FTRACE is enabled. That +# in turn leaves some undefined symbols like __fentry__ in purgatory and not +# sure how to relocate those. Like kexec-tools, use custom flags. + +KBUILD_CFLAGS := -Wall -Wstrict-prototypes -fno-strict-aliasing \ + -fno-zero-initialized-in-bss -fno-builtin -ffreestanding \ + -fno-PIC -fno-PIE -fno-stack-protector -fno-exceptions \ + -msoft-float -MD -Os +KBUILD_CFLAGS += -m$(CONFIG_WORD_SIZE) + +$(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE + $(call if_changed,ld) + +targets += kexec-purgatory.c + +CMD_BIN2C = $(objtree)/scripts/basic/bin2c +quiet_cmd_bin2c = BIN2C $@ + cmd_bin2c = $(CMD_BIN2C) kexec_purgatory < $< > $@ + +$(obj)/kexec-purgatory.c: $(obj)/purgatory.ro FORCE + $(call if_changed,bin2c) + @: + + +obj-$(CONFIG_KEXEC_FILE) += kexec-purgatory.o diff --git a/arch/powerpc/purgatory/console-ppc64.c b/arch/powerpc/purgatory/console-ppc64.c new file mode 100644 index ..3d07be0b5d08 --- /dev/null +++ b/arch/powerpc/purgatory/console-ppc64.c @@ -0,0 +1,38 @@ +/* + * kexec: Linux boots Linux + * + * Created by: Mohan Kumar M (mo...@in.ibm.com) + * + * Copyright (C) IBM Corporation, 2005. All rights reserved + * + * Code taken from kexec-tools. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation (version 2 of the License). + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warrant
[PATCH v5 13/13] powerpc: Enable CONFIG_KEXEC_FILE in powerpc server defconfigs.
Enable CONFIG_KEXEC_FILE in powernv_defconfig, ppc64_defconfig and pseries_defconfig. It depends on CONFIG_CRYPTO_SHA256=y, so add that as well. Signed-off-by: Thiago Jung Bauermann --- arch/powerpc/configs/powernv_defconfig | 2 ++ arch/powerpc/configs/ppc64_defconfig | 2 ++ arch/powerpc/configs/pseries_defconfig | 2 ++ 3 files changed, 6 insertions(+) diff --git a/arch/powerpc/configs/powernv_defconfig b/arch/powerpc/configs/powernv_defconfig index dce352e9153b..319e1fb7b0c9 100644 --- a/arch/powerpc/configs/powernv_defconfig +++ b/arch/powerpc/configs/powernv_defconfig @@ -47,6 +47,7 @@ CONFIG_BINFMT_MISC=m CONFIG_PPC_TRANSACTIONAL_MEM=y CONFIG_HOTPLUG_CPU=y CONFIG_KEXEC=y +CONFIG_KEXEC_FILE=y CONFIG_IRQ_ALL_CPUS=y CONFIG_NUMA=y CONFIG_MEMORY_HOTPLUG=y @@ -298,6 +299,7 @@ CONFIG_CRYPTO_CCM=m CONFIG_CRYPTO_PCBC=m CONFIG_CRYPTO_HMAC=y CONFIG_CRYPTO_MICHAEL_MIC=m +CONFIG_CRYPTO_SHA256=y CONFIG_CRYPTO_TGR192=m CONFIG_CRYPTO_WP512=m CONFIG_CRYPTO_ANUBIS=m diff --git a/arch/powerpc/configs/ppc64_defconfig b/arch/powerpc/configs/ppc64_defconfig index 0a8d250cb97e..a0355ccc7f55 100644 --- a/arch/powerpc/configs/ppc64_defconfig +++ b/arch/powerpc/configs/ppc64_defconfig @@ -44,6 +44,7 @@ CONFIG_HZ_100=y CONFIG_BINFMT_MISC=m CONFIG_PPC_TRANSACTIONAL_MEM=y CONFIG_KEXEC=y +CONFIG_KEXEC_FILE=y CONFIG_CRASH_DUMP=y CONFIG_IRQ_ALL_CPUS=y CONFIG_MEMORY_HOTREMOVE=y @@ -333,6 +334,7 @@ CONFIG_CRYPTO_TEST=m CONFIG_CRYPTO_PCBC=m CONFIG_CRYPTO_HMAC=y CONFIG_CRYPTO_MICHAEL_MIC=m +CONFIG_CRYPTO_SHA256=y CONFIG_CRYPTO_TGR192=m CONFIG_CRYPTO_WP512=m CONFIG_CRYPTO_ANUBIS=m diff --git a/arch/powerpc/configs/pseries_defconfig b/arch/powerpc/configs/pseries_defconfig index 654aeffc57ef..23af4a72930e 100644 --- a/arch/powerpc/configs/pseries_defconfig +++ b/arch/powerpc/configs/pseries_defconfig @@ -50,6 +50,7 @@ CONFIG_HZ_100=y CONFIG_BINFMT_MISC=m CONFIG_PPC_TRANSACTIONAL_MEM=y CONFIG_KEXEC=y +CONFIG_KEXEC_FILE=y CONFIG_IRQ_ALL_CPUS=y CONFIG_MEMORY_HOTPLUG=y CONFIG_MEMORY_HOTREMOVE=y @@ -300,6 +301,7 @@ CONFIG_CRYPTO_TEST=m CONFIG_CRYPTO_PCBC=m CONFIG_CRYPTO_HMAC=y CONFIG_CRYPTO_MICHAEL_MIC=m +CONFIG_CRYPTO_SHA256=y CONFIG_CRYPTO_TGR192=m CONFIG_CRYPTO_WP512=m CONFIG_CRYPTO_ANUBIS=m -- 1.9.1
[PATCH v5 11/13] powerpc: Allow userspace to set device tree properties in kexec_file_load
Implement the arch_kexec_verify_buffer hook to verify that a device tree blob passed by userspace via kexec_file_load contains only nodes and properties from a whitelist. In elf64_load we merge those properties into the device tree that will be passed to the next kernel. Suggested-by: Michael Ellerman Signed-off-by: Thiago Jung Bauermann --- arch/powerpc/include/asm/kexec.h | 1 + arch/powerpc/kernel/kexec_elf_64.c | 9 ++ arch/powerpc/kernel/machine_kexec_64.c | 242 + 3 files changed, 252 insertions(+) diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index f263cc867891..31bc64e07c8f 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -99,6 +99,7 @@ int setup_purgatory(struct kimage *image, const void *slave_code, int setup_new_fdt(void *fdt, unsigned long initrd_load_addr, unsigned long initrd_len, const char *cmdline); bool find_debug_console(const void *fdt, int chosen_node); +int merge_partial_dtb(void *to, const void *from); #endif /* CONFIG_KEXEC_FILE */ #else /* !CONFIG_KEXEC */ diff --git a/arch/powerpc/kernel/kexec_elf_64.c b/arch/powerpc/kernel/kexec_elf_64.c index 49cba9509464..1b902ad66e2a 100644 --- a/arch/powerpc/kernel/kexec_elf_64.c +++ b/arch/powerpc/kernel/kexec_elf_64.c @@ -210,6 +210,15 @@ void *elf64_load(struct kimage *image, char *kernel_buf, goto out; } + /* Add nodes and properties from the DTB passed by userspace. */ + if (image->dtb_buf) { + ret = merge_partial_dtb(fdt, image->dtb_buf); + if (ret) { + pr_err("Error merging partial device tree.\n"); + goto out; + } + } + ret = setup_new_fdt(fdt, initrd_load_addr, initrd_len, cmdline); if (ret) goto out; diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/machine_kexec_64.c index 527f98efe651..a484a6346146 100644 --- a/arch/powerpc/kernel/machine_kexec_64.c +++ b/arch/powerpc/kernel/machine_kexec_64.c @@ -35,6 +35,7 @@ #include #define SLAVE_CODE_SIZE256 +#define MAX_DT_PATH512 #ifdef CONFIG_KEXEC_FILE static struct kexec_file_ops *kexec_file_loaders[] = { @@ -908,4 +909,245 @@ bool find_debug_console(const void *fdt, int chosen_node) return false; } +/** + * struct allowed_node - a node in the whitelist and its allowed properties. + * @name: node name or full node path + * @properties:NULL-terminated array of names or name=value pairs + * + * If name starts with /, then the node has to be at the specified path in + * the device tree (including unit addresses for all nodes in the path). + * If it doesn't, then the node can be anywhere in the device tree. + * + * An entry in properties can specify a string value that the property must + * have by using the "name=value" format. If the entry ends with =, it means + * that the property must be empty. + */ +static struct allowed_node { + const char *name; + const char *properties[9]; +} allowed_nodes[] = { + { + .name = "/chosen", + .properties = { + "stdout-path", + "linux,stdout-path", + NULL, + } + }, + { + .name = "vga", + .properties = { + "device_type=display", + "assigned-addresses", + "width", + "height", + "depth", + "little-endian=", + "linux,opened=", + "linux,boot-display=", + NULL, + } + }, +}; + +/** + * verify_properties() - verify that all properties in a node are allowed + * @properties:Array of allowed properties in the node. + * @fdt: Device tree blob. + * @node: Offset to node being verified. + * + * Return: 0 on success, negative errno on error. + */ +static int verify_properties(const char *properties[], const void *fdt, int node) +{ + int prop; + + for (prop = fdt_first_property_offset(fdt, node); prop >= 0; +prop = fdt_next_property_offset(fdt, prop)) { + const char *prop_name; + const void *prop_val; + int i; + + prop_val = fdt_getprop_by_offset(fdt, prop, &prop_name, NULL); + if (prop_val == NULL) { + pr_debug("Error reading device tree.\n"); + return -EINVAL; + } + + for (i = 0; properties[i] != NULL; i++) { + size_t len; + const char *allowed_prop = properties[i]; + + len = strchrnul(allowed_prop, '=') - allowed_prop; +
[PATCH v5 10/13] powerpc: Add support for loading ELF kernels with kexec_file_load.
This uses all the infrastructure built up by the previous patches in the series to load an ELF vmlinux file and an initrd. It uses the flattened device tree at initial_boot_params as a base and adjusts memory reservations and its /chosen node for the next kernel. Signed-off-by: Thiago Jung Bauermann --- arch/powerpc/include/asm/kexec_elf_64.h | 10 ++ arch/powerpc/kernel/Makefile| 1 + arch/powerpc/kernel/kexec_elf_64.c | 284 arch/powerpc/kernel/machine_kexec_64.c | 5 +- 4 files changed, 299 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/kexec_elf_64.h b/arch/powerpc/include/asm/kexec_elf_64.h new file mode 100644 index ..30da6bc0ccf8 --- /dev/null +++ b/arch/powerpc/include/asm/kexec_elf_64.h @@ -0,0 +1,10 @@ +#ifndef __POWERPC_KEXEC_ELF_64_H__ +#define __POWERPC_KEXEC_ELF_64_H__ + +#ifdef CONFIG_KEXEC_FILE + +extern struct kexec_file_ops kexec_elf64_ops; + +#endif /* CONFIG_KEXEC_FILE */ + +#endif /* __POWERPC_KEXEC_ELF_64_H__ */ diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index ce18a985bcfc..d149f5ebac90 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -109,6 +109,7 @@ obj-$(CONFIG_PCI) += pci_$(CONFIG_WORD_SIZE).o $(pci64-y) \ obj-$(CONFIG_PCI_MSI) += msi.o obj-$(CONFIG_KEXEC)+= machine_kexec.o crash.o \ machine_kexec_$(CONFIG_WORD_SIZE).o +obj-$(CONFIG_KEXEC_FILE) += kexec_elf_$(CONFIG_WORD_SIZE).o obj-$(CONFIG_AUDIT)+= audit.o obj64-$(CONFIG_AUDIT) += compat_audit.o diff --git a/arch/powerpc/kernel/kexec_elf_64.c b/arch/powerpc/kernel/kexec_elf_64.c new file mode 100644 index ..49cba9509464 --- /dev/null +++ b/arch/powerpc/kernel/kexec_elf_64.c @@ -0,0 +1,284 @@ +/* + * Load ELF vmlinux file for the kexec_file_load syscall. + * + * Copyright (C) 2004 Adam Litke (a...@us.ibm.com) + * Copyright (C) 2004 IBM Corp. + * Copyright (C) 2005 R Sharada (shar...@in.ibm.com) + * Copyright (C) 2006 Mohan Kumar M (mo...@in.ibm.com) + * Copyright (C) 2016 IBM Corporation + * + * Based on kexec-tools' kexec-elf-exec.c and kexec-elf-ppc64.c. + * Heavily modified for the kernel by + * Thiago Jung Bauermann . + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation (version 2 of the License). + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#define pr_fmt(fmt)"kexec_elf: " fmt + +#include +#include +#include +#include +#include +#include +#include +#include + +extern size_t kexec_purgatory_size; + +#define PURGATORY_STACK_SIZE (16 * 1024) + +/** + * build_elf_exec_info - read ELF executable and check that we can use it + */ +static int build_elf_exec_info(const char *buf, size_t len, struct elfhdr *ehdr, + struct elf_info *elf_info) +{ + int i; + int ret; + + ret = elf_read_from_buffer(buf, len, ehdr, elf_info); + if (ret) + return ret; + + /* Big endian vmlinux has type ET_DYN. */ + if (ehdr->e_type != ET_EXEC && ehdr->e_type != ET_DYN) { + pr_err("Not an ELF executable.\n"); + goto error; + } else if (!elf_info->proghdrs) { + pr_err("No ELF program header.\n"); + goto error; + } + + for (i = 0; i < ehdr->e_phnum; i++) { + /* +* Kexec does not support loading interpreters. +* In addition this check keeps us from attempting +* to kexec ordinay executables. +*/ + if (elf_info->proghdrs[i].p_type == PT_INTERP) { + pr_err("Requires an ELF interpreter.\n"); + goto error; + } + } + + return 0; +error: + elf_free_info(elf_info); + return -ENOEXEC; +} + +static int elf64_probe(const char *buf, unsigned long len) +{ + struct elfhdr ehdr; + struct elf_info elf_info; + int ret; + + ret = build_elf_exec_info(buf, len, &ehdr, &elf_info); + if (ret) + return ret; + + elf_free_info(&elf_info); + + return elf_check_arch(&ehdr) ? 0 : -ENOEXEC; +} + +/** + * elf_exec_load - load ELF executable image + * @lowest_load_addr: On return, will be the address where the first PT_LOAD + * section will be loaded in memory. + * + * Return: + * 0 on success, negative value on failure. + */ +static int elf_exec_load(struct kimage *image, struct elfhdr *ehdr, +struct elf_info *elf_info, +un
[PATCH v5 09/13] powerpc: Add code to work with device trees in kexec_file_load.
kexec_file_load needs to set up the device tree that will be used by the next kernel and check whether it provides a console that can be used by the purgatory. Signed-off-by: Thiago Jung Bauermann --- arch/powerpc/include/asm/kexec.h | 3 + arch/powerpc/kernel/machine_kexec_64.c | 222 + 2 files changed, 225 insertions(+) diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index 83b81b7bdca1..f263cc867891 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -96,6 +96,9 @@ int setup_purgatory(struct kimage *image, const void *slave_code, const void *fdt, unsigned long kernel_load_addr, unsigned long fdt_load_addr, unsigned long stack_top, int debug); +int setup_new_fdt(void *fdt, unsigned long initrd_load_addr, + unsigned long initrd_len, const char *cmdline); +bool find_debug_console(const void *fdt, int chosen_node); #endif /* CONFIG_KEXEC_FILE */ #else /* !CONFIG_KEXEC */ diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/machine_kexec_64.c index 1e678dc5096a..897b724ea9fd 100644 --- a/arch/powerpc/kernel/machine_kexec_64.c +++ b/arch/powerpc/kernel/machine_kexec_64.c @@ -683,4 +683,226 @@ int setup_purgatory(struct kimage *image, const void *slave_code, return 0; } +/* + * setup_new_fdt() - modify /chosen and memory reservation for the next kernel + * @fdt: + * @initrd_load_addr: Address where the next initrd will be loaded. + * @initrd_len:Size of the next initrd, or 0 if there will be none. + * @cmdline: Command line for the next kernel, or NULL if there will + * be none. + * + * Return: 0 on success, or negative errno on error. + */ +int setup_new_fdt(void *fdt, unsigned long initrd_load_addr, + unsigned long initrd_len, const char *cmdline) +{ + uint64_t oldfdt_addr; + int i, ret, chosen_node; + const void *prop; + + /* Remove memory reservation for the current device tree. */ + oldfdt_addr = __pa(initial_boot_params); + for (i = 0; i < fdt_num_mem_rsv(fdt); i++) { + uint64_t rsv_start, rsv_size; + + ret = fdt_get_mem_rsv(fdt, i, &rsv_start, &rsv_size); + if (ret) { + pr_err("Malformed device tree.\n"); + return -EINVAL; + } + + if (rsv_start == oldfdt_addr && + rsv_size == fdt_totalsize(initial_boot_params)) { + ret = fdt_del_mem_rsv(fdt, i); + if (ret) { + pr_err("Error deleting fdt reservation.\n"); + return -EINVAL; + } + + pr_debug("Removed old device tree reservation.\n"); + break; + } + } + + chosen_node = fdt_path_offset(fdt, "/chosen"); + if (chosen_node == -FDT_ERR_NOTFOUND) { + chosen_node = fdt_add_subnode(fdt, fdt_path_offset(fdt, "/"), + "chosen"); + if (chosen_node < 0) { + pr_err("Error creating /chosen.\n"); + return -EINVAL; + } + } else if (chosen_node < 0) { + pr_err("Malformed device tree: error reading /chosen.\n"); + return -EINVAL; + } + + /* Did we boot using an initrd? */ + prop = fdt_getprop(fdt, chosen_node, "linux,initrd-start", NULL); + if (prop) { + uint64_t tmp_start, tmp_end, tmp_size, tmp_sizepg; + + tmp_start = fdt64_to_cpu(*((const fdt64_t *) prop)); + + prop = fdt_getprop(fdt, chosen_node, "linux,initrd-end", NULL); + if (!prop) { + pr_err("Malformed device tree.\n"); + return -EINVAL; + } + tmp_end = fdt64_to_cpu(*((const fdt64_t *) prop)); + + /* +* kexec reserves exact initrd size, while firmware may +* reserve a multiple of PAGE_SIZE, so check for both. +*/ + tmp_size = tmp_end - tmp_start; + tmp_sizepg = round_up(tmp_size, PAGE_SIZE); + + /* Remove memory reservation for the current initrd. */ + for (i = 0; i < fdt_num_mem_rsv(fdt); i++) { + uint64_t rsv_start, rsv_size; + + ret = fdt_get_mem_rsv(fdt, i, &rsv_start, &rsv_size); + if (ret) { + pr_err("Malformed device tree.\n"); + return -EINVAL; + } + + if (rsv_start == tmp_start && + (rsv_size == tmp_size || rsv_size == tmp_sizepg)) { +
[PATCH v5 07/13] powerpc: Add functions to read ELF files of any endianness.
A little endian kernel might need to kexec a big endian kernel (the opposite is less likely but could happen as well), so we can't just cast the buffer with the binary to ELF structs and use them as is done elsewhere. This patch adds functions which do byte-swapping as necessary when populating the ELF structs. These functions will be used in the next patch in the series. Signed-off-by: Thiago Jung Bauermann --- arch/powerpc/include/asm/elf_util.h | 19 ++ arch/powerpc/kernel/Makefile| 2 +- arch/powerpc/kernel/elf_util.c | 476 3 files changed, 496 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/elf_util.h b/arch/powerpc/include/asm/elf_util.h index 3405eeabe542..18703d56eabd 100644 --- a/arch/powerpc/include/asm/elf_util.h +++ b/arch/powerpc/include/asm/elf_util.h @@ -20,6 +20,14 @@ #include struct elf_info { + /* +* Where the ELF binary contents are kept. +* Memory managed by the user of the struct. +*/ + const char *buffer; + + const struct elfhdr *ehdr; + const struct elf_phdr *proghdrs; struct elf_shdr *sechdrs; /* Index of stubs section. */ @@ -63,6 +71,17 @@ static inline unsigned long my_r2(const struct elf_info *elf_info) return elf_info->sechdrs[elf_info->toc_section].sh_addr + 0x8000; } +static inline bool elf_is_elf_file(const struct elfhdr *ehdr) +{ + return memcmp(ehdr->e_ident, ELFMAG, SELFMAG) == 0; +} + +int elf_read_from_buffer(const char *buf, size_t len, struct elfhdr *ehdr, +struct elf_info *elf_info); +void elf_init_elf_info(const struct elfhdr *ehdr, struct elf_shdr *sechdrs, + struct elf_info *elf_info); +void elf_free_info(struct elf_info *elf_info); + int elf64_apply_relocate_add(const struct elf_info *elf_info, const char *strtab, const Elf64_Rela *rela, unsigned int num_rela, void *syms_base, diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index e38aace0a6e7..6159ec6ac032 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -124,7 +124,7 @@ obj-y += iomap.o endif ifeq ($(CONFIG_MODULES)$(CONFIG_WORD_SIZE),y64) -obj-y += elf_util_64.o +obj-y += elf_util.o elf_util_64.o endif obj64-$(CONFIG_PPC_TRANSACTIONAL_MEM) += tm.o diff --git a/arch/powerpc/kernel/elf_util.c b/arch/powerpc/kernel/elf_util.c new file mode 100644 index ..1df4a116ad90 --- /dev/null +++ b/arch/powerpc/kernel/elf_util.c @@ -0,0 +1,476 @@ +/* + * Utility functions to work with ELF files. + * + * Copyright (C) 2016, IBM Corporation + * + * Based on kexec-tools' kexec-elf.c. Heavily modified for the + * kernel by Thiago Jung Bauermann . + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation (version 2 of the License). + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include + +#if ELF_CLASS == ELFCLASS32 +#define elf_addr_to_cpuelf32_to_cpu + +#ifndef Elf_Rel +#define Elf_RelElf32_Rel +#endif /* Elf_Rel */ +#else /* ELF_CLASS == ELFCLASS32 */ +#define elf_addr_to_cpuelf64_to_cpu + +#ifndef Elf_Rel +#define Elf_RelElf64_Rel +#endif /* Elf_Rel */ + +static uint64_t elf64_to_cpu(const struct elfhdr *ehdr, uint64_t value) +{ + if (ehdr->e_ident[EI_DATA] == ELFDATA2LSB) + value = le64_to_cpu(value); + else if (ehdr->e_ident[EI_DATA] == ELFDATA2MSB) + value = be64_to_cpu(value); + + return value; +} +#endif /* ELF_CLASS == ELFCLASS32 */ + +static uint16_t elf16_to_cpu(const struct elfhdr *ehdr, uint16_t value) +{ + if (ehdr->e_ident[EI_DATA] == ELFDATA2LSB) + value = le16_to_cpu(value); + else if (ehdr->e_ident[EI_DATA] == ELFDATA2MSB) + value = be16_to_cpu(value); + + return value; +} + +static uint32_t elf32_to_cpu(const struct elfhdr *ehdr, uint32_t value) +{ + if (ehdr->e_ident[EI_DATA] == ELFDATA2LSB) + value = le32_to_cpu(value); + else if (ehdr->e_ident[EI_DATA] == ELFDATA2MSB) + value = be32_to_cpu(value); + + return value; +} + +/** + * elf_is_ehdr_sane - check that it is safe to use the ELF header + * @buf_len: size of the buffer in which the ELF file is loaded. + */ +static bool elf_is_ehdr_sane(const struct elfhdr *ehdr, size_t buf_len) +{ + if (ehdr->e_phnum > 0 && ehdr->e_phentsize != sizeof(struc
[PATCH v5 08/13] powerpc: Implement kexec_file_load.
arch_kexec_walk_mem and arch_kexec_apply_relocations_add are used by generic kexec code, while setup_purgatory is powerpc-specific and sets runtime variables needed by the powerpc purgatory implementation. Signed-off-by: Josh Sklar Signed-off-by: Thiago Jung Bauermann --- arch/powerpc/Kconfig | 13 ++ arch/powerpc/include/asm/kexec.h | 7 + arch/powerpc/include/asm/systbl.h | 1 + arch/powerpc/include/asm/unistd.h | 2 +- arch/powerpc/include/uapi/asm/unistd.h | 1 + arch/powerpc/kernel/Makefile | 4 +- arch/powerpc/kernel/machine_kexec_64.c | 252 + 7 files changed, 278 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index ec4047e170a0..ff362ca60d1b 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -459,6 +459,19 @@ config KEXEC interface is strongly in flux, so no good recommendation can be made. +config KEXEC_FILE + bool "kexec file based system call" + select KEXEC_CORE + select BUILD_BIN2C + depends on PPC64 + depends on CRYPTO=y + depends on CRYPTO_SHA256=y + help + This is a new version of the kexec system call. This call is + file based and takes in file descriptors as system call arguments + for kernel and initramfs as opposed to a list of segments as is the + case for the older kexec call. + config RELOCATABLE bool "Build a relocatable kernel" depends on (PPC64 && !COMPILE_TEST) || (FLATMEM && (44x || FSL_BOOKE)) diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index a46f5f45570c..83b81b7bdca1 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -91,6 +91,13 @@ static inline bool kdump_in_progress(void) return crashing_cpu >= 0; } +#ifdef CONFIG_KEXEC_FILE +int setup_purgatory(struct kimage *image, const void *slave_code, + const void *fdt, unsigned long kernel_load_addr, + unsigned long fdt_load_addr, unsigned long stack_top, + int debug); +#endif /* CONFIG_KEXEC_FILE */ + #else /* !CONFIG_KEXEC */ static inline void crash_kexec_secondary(struct pt_regs *regs) { } diff --git a/arch/powerpc/include/asm/systbl.h b/arch/powerpc/include/asm/systbl.h index 2fc5d4db503c..4b369d83fe9c 100644 --- a/arch/powerpc/include/asm/systbl.h +++ b/arch/powerpc/include/asm/systbl.h @@ -386,3 +386,4 @@ SYSCALL(mlock2) SYSCALL(copy_file_range) COMPAT_SYS_SPU(preadv2) COMPAT_SYS_SPU(pwritev2) +SYSCALL(kexec_file_load) diff --git a/arch/powerpc/include/asm/unistd.h b/arch/powerpc/include/asm/unistd.h index cf12c580f6b2..a01e97d3f305 100644 --- a/arch/powerpc/include/asm/unistd.h +++ b/arch/powerpc/include/asm/unistd.h @@ -12,7 +12,7 @@ #include -#define NR_syscalls382 +#define NR_syscalls383 #define __NR__exit __NR_exit diff --git a/arch/powerpc/include/uapi/asm/unistd.h b/arch/powerpc/include/uapi/asm/unistd.h index e9f5f41aa55a..2f26335a3c42 100644 --- a/arch/powerpc/include/uapi/asm/unistd.h +++ b/arch/powerpc/include/uapi/asm/unistd.h @@ -392,5 +392,6 @@ #define __NR_copy_file_range 379 #define __NR_preadv2 380 #define __NR_pwritev2 381 +#define __NR_kexec_file_load 382 #endif /* _UAPI_ASM_POWERPC_UNISTD_H_ */ diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index 6159ec6ac032..ce18a985bcfc 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -123,9 +123,11 @@ ifneq ($(CONFIG_PPC_INDIRECT_PIO),y) obj-y += iomap.o endif -ifeq ($(CONFIG_MODULES)$(CONFIG_WORD_SIZE),y64) +ifneq ($(CONFIG_MODULES)$(CONFIG_KEXEC_FILE),) +ifeq ($(CONFIG_WORD_SIZE),64) obj-y += elf_util.o elf_util_64.o endif +endif obj64-$(CONFIG_PPC_TRANSACTIONAL_MEM) += tm.o diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/machine_kexec_64.c index 4c780a342282..1e678dc5096a 100644 --- a/arch/powerpc/kernel/machine_kexec_64.c +++ b/arch/powerpc/kernel/machine_kexec_64.c @@ -18,6 +18,8 @@ #include #include #include +#include +#include #include #include @@ -31,6 +33,12 @@ #include #include +#define SLAVE_CODE_SIZE256 + +#ifdef CONFIG_KEXEC_FILE +static struct kexec_file_ops *kexec_file_loaders[] = { }; +#endif + #ifdef CONFIG_PPC_BOOK3E int default_machine_kexec_prepare(struct kimage *image) { @@ -432,3 +440,247 @@ static int __init export_htab_values(void) } late_initcall(export_htab_values); #endif /* CONFIG_PPC_STD_MMU_64 */ + +#ifdef CONFIG_KEXEC_FILE +int arch_kexec_kernel_image_probe(struct kimage *image, void *buf, + unsigned long buf_len) +{ + int i, ret = -ENOEXEC; + struct kexec_file_ops *fops; + + /* We don't support crash kernels yet. */ + if (image->type =
[PATCH v5 06/13] powerpc: Adapt elf64_apply_relocate_add for kexec_file_load.
Extend elf64_apply_relocate_add to support relative symbols. This is necessary because there is a difference between how the module loading mechanism and the kexec purgatory loading code use Elf64_Sym.st_value at relocation time: the former changes st_value to point to the absolute memory address before relocating the module, while the latter does that adjustment during relocation of the purgatory. Also, add a check_symbols argument so that the kexec code can be stricter about undefined symbols. Finally, add relocation types used by the purgatory. Signed-off-by: Thiago Jung Bauermann --- arch/powerpc/include/asm/elf_util.h | 1 + arch/powerpc/kernel/elf_util_64.c | 84 - arch/powerpc/kernel/module_64.c | 5 ++- 3 files changed, 88 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/elf_util.h b/arch/powerpc/include/asm/elf_util.h index a012ba03282d..3405eeabe542 100644 --- a/arch/powerpc/include/asm/elf_util.h +++ b/arch/powerpc/include/asm/elf_util.h @@ -67,6 +67,7 @@ int elf64_apply_relocate_add(const struct elf_info *elf_info, const char *strtab, const Elf64_Rela *rela, unsigned int num_rela, void *syms_base, void *loc_base, Elf64_Addr addr_base, +bool relative_symbols, bool check_symbols, const char *obj_name); #endif /* _ASM_POWERPC_ELF_UTIL_H */ diff --git a/arch/powerpc/kernel/elf_util_64.c b/arch/powerpc/kernel/elf_util_64.c index 8e5d400ac9f2..80f209a42abd 100644 --- a/arch/powerpc/kernel/elf_util_64.c +++ b/arch/powerpc/kernel/elf_util_64.c @@ -74,6 +74,8 @@ static void squash_toc_save_inst(const char *name, unsigned long addr) { } * @syms_base: Contents of the associated symbol table. * @loc_base: Contents of the section to which relocations apply. * @addr_base: The address where the section will be loaded in memory. + * @relative_symbols: Are the symbols' st_value members relative? + * @check_symbols: Fail if an unexpected symbol is found? * @obj_name: The name of the ELF binary, for information messages. * * Applies RELA relocations to an ELF file already at its final location @@ -84,11 +86,13 @@ int elf64_apply_relocate_add(const struct elf_info *elf_info, const char *strtab, const Elf64_Rela *rela, unsigned int num_rela, void *syms_base, void *loc_base, Elf64_Addr addr_base, +bool relative_symbols, bool check_symbols, const char *obj_name) { unsigned int i; unsigned long *location; unsigned long address; + unsigned long sec_base; unsigned long value; const char *name; Elf64_Sym *sym; @@ -121,8 +125,36 @@ int elf64_apply_relocate_add(const struct elf_info *elf_info, name, (unsigned long)sym->st_value, (long)rela[i].r_addend); + if (check_symbols) { + /* +* TOC symbols appear as undefined but should be +* resolved as well, so allow them to be processed. +*/ + if (sym->st_shndx == SHN_UNDEF && + strcmp(name, ".TOC.") != 0) { + pr_err("Undefined symbol: %s\n", name); + return -ENOEXEC; + } else if (sym->st_shndx == SHN_COMMON) { + pr_err("Symbol '%s' in common section.\n", name); + return -ENOEXEC; + } + } + + if (relative_symbols && sym->st_shndx != SHN_ABS) { + if (sym->st_shndx >= elf_info->ehdr->e_shnum) { + pr_err("Invalid section %d for symbol %s\n", + sym->st_shndx, name); + return -ENOEXEC; + } else { + struct elf_shdr *sechdrs = elf_info->sechdrs; + + sec_base = sechdrs[sym->st_shndx].sh_addr; + } + } else + sec_base = 0; + /* `Everything is relative'. */ - value = sym->st_value + rela[i].r_addend; + value = sym->st_value + sec_base + rela[i].r_addend; switch (ELF64_R_TYPE(rela[i].r_info)) { case R_PPC64_ADDR32: @@ -135,6 +167,10 @@ int elf64_apply_relocate_add(const struct elf_info *elf_info, *(unsigned long *)location = value; break; + case R_PPC64_REL32: + *(uint32_t *)location = value - (uint32_t)(uint64_t
[PATCH v5 05/13] powerpc: Generalize elf64_apply_relocate_add.
When apply_relocate_add is called, modules are already loaded at their final location in memory so Elf64_Shdr.sh_addr can be used for accessing the section contents as well as the base address for relocations. This is not the case for kexec's purgatory, because it will only be copied to its final location right before being executed. Therefore, it needs to be relocated while it is still in a temporary buffer. In this case, Elf64_Shdr.sh_addr can't be used to access the sections' contents. This patch allows elf64_apply_relocate_add to be used when the ELF binary is not yet at its final location by adding an addr_base argument to specify the address at which the section will be loaded, and rela, loc_base and syms_base to point to the sections' contents. Signed-off-by: Thiago Jung Bauermann --- arch/powerpc/include/asm/elf_util.h | 6 ++-- arch/powerpc/kernel/elf_util_64.c | 63 + arch/powerpc/kernel/module_64.c | 17 -- 3 files changed, 61 insertions(+), 25 deletions(-) diff --git a/arch/powerpc/include/asm/elf_util.h b/arch/powerpc/include/asm/elf_util.h index 37372559fe62..a012ba03282d 100644 --- a/arch/powerpc/include/asm/elf_util.h +++ b/arch/powerpc/include/asm/elf_util.h @@ -64,7 +64,9 @@ static inline unsigned long my_r2(const struct elf_info *elf_info) } int elf64_apply_relocate_add(const struct elf_info *elf_info, -const char *strtab, unsigned int symindex, -unsigned int relsec, const char *obj_name); +const char *strtab, const Elf64_Rela *rela, +unsigned int num_rela, void *syms_base, +void *loc_base, Elf64_Addr addr_base, +const char *obj_name); #endif /* _ASM_POWERPC_ELF_UTIL_H */ diff --git a/arch/powerpc/kernel/elf_util_64.c b/arch/powerpc/kernel/elf_util_64.c index decad2c34f38..8e5d400ac9f2 100644 --- a/arch/powerpc/kernel/elf_util_64.c +++ b/arch/powerpc/kernel/elf_util_64.c @@ -69,33 +69,56 @@ static void squash_toc_save_inst(const char *name, unsigned long addr) { } * elf64_apply_relocate_add - apply 64 bit RELA relocations * @elf_info: Support information for the ELF binary being relocated. * @strtab:String table for the associated symbol table. - * @symindex: Section header index for the associated symbol table. - * @relsec:Section header index for the relocations to apply. + * @rela: Contents of the section with the relocations to apply. + * @num_rela: Number of relocation entries in the section. + * @syms_base: Contents of the associated symbol table. + * @loc_base: Contents of the section to which relocations apply. + * @addr_base: The address where the section will be loaded in memory. * @obj_name: The name of the ELF binary, for information messages. + * + * Applies RELA relocations to an ELF file already at its final location + * in memory (in which case loc_base == addr_base), or still in a temporary + * buffer. */ int elf64_apply_relocate_add(const struct elf_info *elf_info, -const char *strtab, unsigned int symindex, -unsigned int relsec, const char *obj_name) +const char *strtab, const Elf64_Rela *rela, +unsigned int num_rela, void *syms_base, +void *loc_base, Elf64_Addr addr_base, +const char *obj_name) { unsigned int i; - Elf64_Shdr *sechdrs = elf_info->sechdrs; - Elf64_Rela *rela = (void *)sechdrs[relsec].sh_addr; - Elf64_Sym *sym; unsigned long *location; + unsigned long address; unsigned long value; + const char *name; + Elf64_Sym *sym; + + for (i = 0; i < num_rela; i++) { + /* +* rels[i].r_offset contains the byte offset from the beginning +* of section to the storage unit affected. +* +* This is the location to update in the temporary buffer where +* the section is currently loaded. The section will finally +* be loaded to a different address later, pointed to by +* addr_base. +*/ + location = loc_base + rela[i].r_offset; + + /* Final address of the location. */ + address = addr_base + rela[i].r_offset; + /* This is the symbol the relocation is referring to. */ + sym = (Elf64_Sym *) syms_base + ELF64_R_SYM(rela[i].r_info); - for (i = 0; i < sechdrs[relsec].sh_size / sizeof(*rela); i++) { - /* This is where to make the change */ - location = (void *)sechdrs[sechdrs[relsec].sh_info].sh_addr - + rela[i].r_offset; - /* This i
[PATCH v5 04/13] powerpc: Factor out relocation code from module_64.c to elf_util_64.c.
The kexec_file_load system call needs to relocate the purgatory, so factor out the module relocation code so that it can be shared. This patch's purpose is to move the ELF relocation logic from apply_relocate_add to elf_util_64.c with as few changes as possible. The following changes were needed: To avoid having module-specific code in a general purpose utility function, struct elf_info was created to contain the information needed for ELF binaries manipulation. my_r2, stub_for_addr and create_stub were changed to use it instead of having to receive a struct module, since they are called from elf64_apply_relocate_add. local_entry_offset and squash_toc_save_inst were only used by apply_rellocate_add, so they were moved to elf_util_64.c as well. Signed-off-by: Thiago Jung Bauermann --- arch/powerpc/include/asm/elf_util.h | 70 arch/powerpc/include/asm/module.h | 14 +- arch/powerpc/kernel/Makefile| 4 + arch/powerpc/kernel/elf_util_64.c | 269 +++ arch/powerpc/kernel/module_64.c | 312 5 files changed, 386 insertions(+), 283 deletions(-) diff --git a/arch/powerpc/include/asm/elf_util.h b/arch/powerpc/include/asm/elf_util.h new file mode 100644 index ..37372559fe62 --- /dev/null +++ b/arch/powerpc/include/asm/elf_util.h @@ -0,0 +1,70 @@ +/* + * Utility functions to work with ELF files. + * + * Copyright (C) 2016, IBM Corporation + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2, or (at your option) + * any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#ifndef _ASM_POWERPC_ELF_UTIL_H +#define _ASM_POWERPC_ELF_UTIL_H + +#include + +struct elf_info { + struct elf_shdr *sechdrs; + + /* Index of stubs section. */ + unsigned int stubs_section; + /* Index of TOC section. */ + unsigned int toc_section; +}; + +#ifdef __powerpc64__ +#ifdef PPC64_ELF_ABI_v2 + +/* An address is simply the address of the function. */ +typedef unsigned long func_desc_t; +#else + +/* An address is address of the OPD entry, which contains address of fn. */ +typedef struct ppc64_opd_entry func_desc_t; +#endif /* PPC64_ELF_ABI_v2 */ + +/* Like PPC32, we need little trampolines to do > 24-bit jumps (into + the kernel itself). But on PPC64, these need to be used for every + jump, actually, to reset r2 (TOC+0x8000). */ +struct ppc64_stub_entry +{ + /* 28 byte jump instruction sequence (7 instructions). We only +* need 6 instructions on ABIv2 but we always allocate 7 so +* so we don't have to modify the trampoline load instruction. */ + u32 jump[7]; + /* Used by ftrace to identify stubs */ + u32 magic; + /* Data for the above code */ + func_desc_t funcdata; +}; +#endif + +/* r2 is the TOC pointer: it actually points 0x8000 into the TOC (this + gives the value maximum span in an instruction which uses a signed + offset) */ +static inline unsigned long my_r2(const struct elf_info *elf_info) +{ + return elf_info->sechdrs[elf_info->toc_section].sh_addr + 0x8000; +} + +int elf64_apply_relocate_add(const struct elf_info *elf_info, +const char *strtab, unsigned int symindex, +unsigned int relsec, const char *obj_name); + +#endif /* _ASM_POWERPC_ELF_UTIL_H */ diff --git a/arch/powerpc/include/asm/module.h b/arch/powerpc/include/asm/module.h index cd4ffd86765f..f2073115d518 100644 --- a/arch/powerpc/include/asm/module.h +++ b/arch/powerpc/include/asm/module.h @@ -12,7 +12,14 @@ #include #include #include +#include +/* Both low and high 16 bits are added as SIGNED additions, so if low + 16 bits has high bit set, high 16 bits must be adjusted. These + macros do that (stolen from binutils). */ +#define PPC_LO(v) ((v) & 0x) +#define PPC_HI(v) (((v) >> 16) & 0x) +#define PPC_HA(v) PPC_HI ((v) + 0x8000) #ifndef __powerpc64__ /* @@ -33,8 +40,7 @@ struct ppc_plt_entry { struct mod_arch_specific { #ifdef __powerpc64__ - unsigned int stubs_section; /* Index of stubs section in module */ - unsigned int toc_section; /* What section is the TOC? */ + struct elf_info elf_info; bool toc_fixed; /* Have we fixed up .TOC.? */ #ifdef CONFIG_DYNAMIC_FTRACE unsigned long toc; @@ -90,6 +96,10 @@ static inline int module_finalize_ftrace(struct module *mod, const Elf_Shdr *sec } #endif +unsigned long stub_for_addr(const struct elf_info *elf_info, unsigned long addr, + const char *obj_name); +int restore_r2(u32 *instructi
[PATCH v5 02/13] kexec_file: Change kexec_add_buffer to take kexec_buf as argument.
Adapt all callers to the new function prototype. In addition, change the type of kexec_buf.buffer from char * to void *. There is no particular reason for it to be a char *, and the change allows us to get rid of 3 existing casts to char * in the code. Signed-off-by: Thiago Jung Bauermann Acked-by: Dave Young --- arch/x86/kernel/crash.c | 37 arch/x86/kernel/kexec-bzimage64.c | 48 +++-- include/linux/kexec.h | 8 +--- kernel/kexec_file.c | 88 ++- 4 files changed, 87 insertions(+), 94 deletions(-) diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index 9616cf76940c..38a1cdf6aa05 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -615,9 +615,9 @@ static int determine_backup_region(u64 start, u64 end, void *arg) int crash_load_segments(struct kimage *image) { - unsigned long src_start, src_sz, elf_sz; - void *elf_addr; int ret; + struct kexec_buf kbuf = { .image = image, .buf_min = 0, + .buf_max = ULONG_MAX, .top_down = false }; /* * Determine and load a segment for backup area. First 640K RAM @@ -631,43 +631,44 @@ int crash_load_segments(struct kimage *image) if (ret < 0) return ret; - src_start = image->arch.backup_src_start; - src_sz = image->arch.backup_src_sz; - /* Add backup segment. */ - if (src_sz) { + if (image->arch.backup_src_sz) { + kbuf.buffer = &crash_zero_bytes; + kbuf.bufsz = sizeof(crash_zero_bytes); + kbuf.memsz = image->arch.backup_src_sz; + kbuf.buf_align = PAGE_SIZE; /* * Ideally there is no source for backup segment. This is * copied in purgatory after crash. Just add a zero filled * segment for now to make sure checksum logic works fine. */ - ret = kexec_add_buffer(image, (char *)&crash_zero_bytes, - sizeof(crash_zero_bytes), src_sz, - PAGE_SIZE, 0, -1, 0, - &image->arch.backup_load_addr); + ret = kexec_add_buffer(&kbuf); if (ret) return ret; + image->arch.backup_load_addr = kbuf.mem; pr_debug("Loaded backup region at 0x%lx backup_start=0x%lx memsz=0x%lx\n", -image->arch.backup_load_addr, src_start, src_sz); +image->arch.backup_load_addr, +image->arch.backup_src_start, kbuf.memsz); } /* Prepare elf headers and add a segment */ - ret = prepare_elf_headers(image, &elf_addr, &elf_sz); + ret = prepare_elf_headers(image, &kbuf.buffer, &kbuf.bufsz); if (ret) return ret; - image->arch.elf_headers = elf_addr; - image->arch.elf_headers_sz = elf_sz; + image->arch.elf_headers = kbuf.buffer; + image->arch.elf_headers_sz = kbuf.bufsz; - ret = kexec_add_buffer(image, (char *)elf_addr, elf_sz, elf_sz, - ELF_CORE_HEADER_ALIGN, 0, -1, 0, - &image->arch.elf_load_addr); + kbuf.memsz = kbuf.bufsz; + kbuf.buf_align = ELF_CORE_HEADER_ALIGN; + ret = kexec_add_buffer(&kbuf); if (ret) { vfree((void *)image->arch.elf_headers); return ret; } + image->arch.elf_load_addr = kbuf.mem; pr_debug("Loaded ELF headers at 0x%lx bufsz=0x%lx memsz=0x%lx\n", -image->arch.elf_load_addr, elf_sz, elf_sz); +image->arch.elf_load_addr, kbuf.bufsz, kbuf.bufsz); return ret; } diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c index f2356bda2b05..4b3a75329fb6 100644 --- a/arch/x86/kernel/kexec-bzimage64.c +++ b/arch/x86/kernel/kexec-bzimage64.c @@ -331,17 +331,17 @@ static void *bzImage64_load(struct kimage *image, char *kernel, struct setup_header *header; int setup_sects, kern16_size, ret = 0; - unsigned long setup_header_size, params_cmdline_sz, params_misc_sz; + unsigned long setup_header_size, params_cmdline_sz; struct boot_params *params; unsigned long bootparam_load_addr, kernel_load_addr, initrd_load_addr; unsigned long purgatory_load_addr; - unsigned long kernel_bufsz, kernel_memsz, kernel_align; - char *kernel_buf; struct bzimage64_data *ldata; struct kexec_entry64_regs regs64; void *stack; unsigned int setup_hdr_offset = offsetof(struct boot_params, hdr); unsigned int efi_map_offset, efi_map_sz, efi_setup_data_offset; + struct kexec_buf kbuf = { .image = image, .buf_max = ULONG_MAX, + .top_down = true };
[PATCH v5 03/13] kexec_file: Factor out kexec_locate_mem_hole from kexec_add_buffer.
kexec_locate_mem_hole will be used by the PowerPC kexec_file_load implementation to find free memory for the purgatory stack. Signed-off-by: Thiago Jung Bauermann Acked-by: Dave Young --- include/linux/kexec.h | 1 + kernel/kexec_file.c | 25 - 2 files changed, 21 insertions(+), 5 deletions(-) diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 28bc9f335d0d..ceccc5856aab 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -176,6 +176,7 @@ struct kexec_buf { int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf, int (*func)(u64, u64, void *)); extern int kexec_add_buffer(struct kexec_buf *kbuf); +int kexec_locate_mem_hole(struct kexec_buf *kbuf); int __weak arch_kexec_verify_buffer(enum kexec_file_type type, const void *buf, unsigned long size); #endif /* CONFIG_KEXEC_FILE */ diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index 58818264ad0e..772cb491715e 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -524,6 +524,23 @@ int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf, } /** + * kexec_locate_mem_hole - find free memory for the purgatory or the next kernel + * @kbuf: Parameters for the memory search. + * + * On success, kbuf->mem will have the start address of the memory region found. + * + * Return: 0 on success, negative errno on error. + */ +int kexec_locate_mem_hole(struct kexec_buf *kbuf) +{ + int ret; + + ret = arch_kexec_walk_mem(kbuf, locate_mem_hole_callback); + + return ret == 1 ? 0 : -EADDRNOTAVAIL; +} + +/** * kexec_add_buffer - place a buffer in a kexec segment * @kbuf: Buffer contents and memory parameters. * @@ -563,11 +580,9 @@ int kexec_add_buffer(struct kexec_buf *kbuf) kbuf->buf_align = max(kbuf->buf_align, PAGE_SIZE); /* Walk the RAM ranges and allocate a suitable range for the buffer */ - ret = arch_kexec_walk_mem(kbuf, locate_mem_hole_callback); - if (ret != 1) { - /* A suitable memory range could not be found for buffer */ - return -EADDRNOTAVAIL; - } + ret = kexec_locate_mem_hole(kbuf); + if (ret) + return ret; /* Found a suitable memory range */ ksegment = &kbuf->image->segment[kbuf->image->nr_segments]; -- 1.9.1
[PATCH v5 01/13] kexec_file: Allow arch-specific memory walking for kexec_add_buffer
Allow architectures to specify a different memory walking function for kexec_add_buffer. x86 uses iomem to track reserved memory ranges, but PowerPC uses the memblock subsystem. Signed-off-by: Thiago Jung Bauermann Acked-by: Dave Young --- include/linux/kexec.h | 26 ++ kernel/kexec_file.c | 30 ++ kernel/kexec_internal.h | 16 3 files changed, 48 insertions(+), 24 deletions(-) diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 29202935055d..5ffd0011395c 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -149,6 +149,32 @@ struct kexec_file_ops { #endif }; +/** + * struct kexec_buf - parameters for finding a place for a buffer in memory + * @image: kexec image in which memory to search. + * @buffer:Contents which will be copied to the allocated memory. + * @bufsz: Size of @buffer. + * @mem: On return will have address of the buffer in memory. + * @memsz: Size for the buffer in memory. + * @buf_align: Minimum alignment needed. + * @buf_min: The buffer can't be placed below this address. + * @buf_max: The buffer can't be placed above this address. + * @top_down: Allocate from top of memory. + */ +struct kexec_buf { + struct kimage *image; + char *buffer; + unsigned long bufsz; + unsigned long mem; + unsigned long memsz; + unsigned long buf_align; + unsigned long buf_min; + unsigned long buf_max; + bool top_down; +}; + +int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf, + int (*func)(u64, u64, void *)); int __weak arch_kexec_verify_buffer(enum kexec_file_type type, const void *buf, unsigned long size); #endif /* CONFIG_KEXEC_FILE */ diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index c32d1d65bb77..e63fd4592e20 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -502,6 +502,27 @@ static int locate_mem_hole_callback(u64 start, u64 end, void *arg) return locate_mem_hole_bottom_up(start, end, kbuf); } +/** + * arch_kexec_walk_mem - call func(data) on free memory regions + * @kbuf: Context info for the search. Also passed to @func. + * @func: Function to call for each memory region. + * + * Return: The memory walk will stop when func returns a non-zero value + * and that value will be returned. If all free regions are visited without + * func returning non-zero, then zero will be returned. + */ +int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf, + int (*func)(u64, u64, void *)) +{ + if (kbuf->image->type == KEXEC_TYPE_CRASH) + return walk_iomem_res_desc(crashk_res.desc, + IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY, + crashk_res.start, crashk_res.end, + kbuf, func); + else + return walk_system_ram_res(0, ULONG_MAX, kbuf, func); +} + /* * Helper function for placing a buffer in a kexec segment. This assumes * that kexec_mutex is held. @@ -548,14 +569,7 @@ int kexec_add_buffer(struct kimage *image, char *buffer, unsigned long bufsz, kbuf->top_down = top_down; /* Walk the RAM ranges and allocate a suitable range for the buffer */ - if (image->type == KEXEC_TYPE_CRASH) - ret = walk_iomem_res_desc(crashk_res.desc, - IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY, - crashk_res.start, crashk_res.end, kbuf, - locate_mem_hole_callback); - else - ret = walk_system_ram_res(0, -1, kbuf, - locate_mem_hole_callback); + ret = arch_kexec_walk_mem(kbuf, locate_mem_hole_callback); if (ret != 1) { /* A suitable memory range could not be found for buffer */ return -EADDRNOTAVAIL; diff --git a/kernel/kexec_internal.h b/kernel/kexec_internal.h index 0a52315d9c62..4cef7e4706b0 100644 --- a/kernel/kexec_internal.h +++ b/kernel/kexec_internal.h @@ -20,22 +20,6 @@ struct kexec_sha_region { unsigned long len; }; -/* - * Keeps track of buffer parameters as provided by caller for requesting - * memory placement of buffer. - */ -struct kexec_buf { - struct kimage *image; - char *buffer; - unsigned long bufsz; - unsigned long mem; - unsigned long memsz; - unsigned long buf_align; - unsigned long buf_min; - unsigned long buf_max; - bool top_down; /* allocate from top of memory hole */ -}; - void kimage_file_post_load_cleanup(struct kimage *image); #else /* CONFIG_KEXEC_FILE */ static inline void kimage_file_post_load_cleanup(struct kimage *image) { } -- 1.9.1
[PATCH v5 00/13] kexec_file_load implementation for PowerPC
[ Andrew, since this series touches generic code, x86 and powerpc, Michael Ellerman and Dave Young think it should go via your tree. ] The main differences in this version are (more detailed changelog at the end of this email): - The code which is not specific to loading ELF format kernels were moved from kexec_elf_64.c to machine_kexec_64.c. - There is a new patch implementing support for receiving a device tree blob from userspace, checking it against a whitelist of allowed nodes and properties and copying it into the device tree for the next kernel. This is the only patch that depends on the "extend kexec_file_load system call" series. Everything else can be upstreamed independently of that series. - Also, I realised that the patch "Add support for loading ELF kernels with kexec_file_load." was too big, so I moved some changes to other patches to facilitate review. Details of what went where are in the changelog. Original cover letter: This patch series implements the kexec_file_load system call on PowerPC. This system call moves the reading of the kernel, initrd and the device tree from the userspace kexec tool to the kernel. This is needed if you want to do one or both of the following: 1. only allow loading of signed kernels. 2. "measure" (i.e., record the hashes of) the kernel, initrd, kernel command line and other boot inputs for the Integrity Measurement Architecture subsystem. The above are the functions kexec already has built into kexec_file_load. Yesterday I posted a set of patches which allows a third feature: 3. have IMA pass-on its event log (where integrity measurements are registered) accross kexec to the second kernel, so that the event history is preserved. Because OpenPower uses an intermediary Linux instance as a boot loader (skiroot), feature 1 is needed to implement secure boot for the platform, while features 2 and 3 are needed to implement trusted boot. This patch series starts by removing an x86 assumption from kexec_file: kexec_add_buffer uses iomem to find reserved memory ranges, but PowerPC uses the memblock subsystem. A hook is added so that each arch can specify how memory ranges can be found. Also, the memory-walking logic in kexec_add_buffer is useful in this implementation to find a free area for the purgatory's stack, so the next patch moves that logic to kexec_locate_mem_hole. The kexec_file_load system call needs to apply relocations to the purgatory but adding code for that would duplicate functionality with the module loading mechanism, which also needs to apply relocations to the kernel modules. Therefore, this patch series factors out the module relocation code so that it can be shared. One thing that is still missing is crashkernel support, which I intend to submit shortly. For now, arch_kexec_kernel_image_probe rejects crash kernels. This code is based on kexec-tools, but with many modifications to adapt it to the kernel environment and facilities. Except the purgatory, which only has minimal changes. Changes for v5: - Rebased series on v4.8-rc1 + the extend kexec_file_load series. - Patch "powerpc: Adapt elf64_apply_relocate_add for kexec_file_load." - New patch. These changes were previously in patch 10. The code itself is unchanged from v4. - Patch "powerpc: Implement kexec_file_load." - Moved arch_kexec_walk_mem, arch_kexec_apply_relocations_add and setup_purgatory from patch 10 to this patch. - arch_kexec_apply_relocations_add is unchanged from v4. - Fixed off-by-one error in arch_kexec_walk_mem when passing range to func. - Moved setup_purgatory from kexec_elf_64.c to machine_kexec_64.c, and changed it to receive a pointer to the slave code directly rather than a struct elf_info and getting the pointer from there. - Patch "powerpc: Add code to work with device trees in kexec_file_load." - New patch. These changes were previously in patch 10. - find_debug_console moved from kexec_elf_64.c to machine_kexec_64.c. The code is unchanged from v4. - setup_new_fdt is a new function factored out of elf64_load. The only code change from v4 is to create /chosen if it doesn't exist yet. - Patch "powerpc: Add support for loading ELF kernels with kexec_file_load." - This patch was too big, so moved some of its changes to other patches to facilitate review. - Allow loading ELF file type ET_DYN, which is what the BE kernel uses. - The code adapting the device tree for booting the new kernel was moved out of elf64_load to setup_new_fdt. - Patch "powerpc: Allow userspace to set device tree properties in kexec_file_load" - New patch. - The code in this patch didn't exist in v4. - This is the only patch that depends on the extend kexec_file_load series. - Patch "powerpc: Enable CONFIG_KEXEC_FILE in powerpc server defconfigs." - New patch. Changes for v4: - Rebased series on today's powerpc/next. - Patch "kexec_file: Remove unused members from struct kexec_buf.
[PATCH v2 2/2] kexec: extend kexec_file_load system call
From: AKASHI Takahiro Device tree blob must be passed to a second kernel on DTB-capable archs, like powerpc and arm64, but the current kernel interface lacks this support. This patch extends kexec_file_load system call by adding an extra argument to this syscall so that an arbitrary number of file descriptors can be handed out from user space to the kernel. long sys_kexec_file_load(int kernel_fd, int initrd_fd, unsigned long cmdline_len, const char __user *cmdline_ptr, unsigned long flags, const struct kexec_fdset __user *ufdset); If KEXEC_FILE_EXTRA_FDS is set to the "flags" argument, the "ufdset" argument points to the following struct buffer: struct kexec_fdset { int nr_fds; struct kexec_file_fd fds[0]; } Signed-off-by: AKASHI Takahiro Signed-off-by: Thiago Jung Bauermann --- include/linux/fs.h | 1 + include/linux/kexec.h | 7 ++-- include/linux/syscalls.h | 4 ++- include/uapi/linux/kexec.h | 22 kernel/kexec_file.c| 83 ++ 5 files changed, 108 insertions(+), 9 deletions(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index 3523bf62f328..847d9c31f428 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2656,6 +2656,7 @@ extern int do_pipe_flags(int *, int); id(MODULE, kernel-module) \ id(KEXEC_IMAGE, kexec-image)\ id(KEXEC_INITRAMFS, kexec-initramfs)\ + id(KEXEC_PARTIAL_DTB, kexec-partial-dtb)\ id(POLICY, security-policy) \ id(MAX_ID, ) diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 4f85d284ed0b..29202935055d 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -148,7 +148,10 @@ struct kexec_file_ops { kexec_verify_sig_t *verify_sig; #endif }; -#endif + +int __weak arch_kexec_verify_buffer(enum kexec_file_type type, const void *buf, + unsigned long size); +#endif /* CONFIG_KEXEC_FILE */ struct kimage { kimage_entry_t head; @@ -280,7 +283,7 @@ extern int kexec_load_disabled; /* List of defined/legal kexec file flags */ #define KEXEC_FILE_FLAGS (KEXEC_FILE_UNLOAD | KEXEC_FILE_ON_CRASH | \ -KEXEC_FILE_NO_INITRAMFS) +KEXEC_FILE_NO_INITRAMFS | KEXEC_FILE_EXTRA_FDS) #define VMCOREINFO_BYTES (4096) #define VMCOREINFO_NOTE_NAME "VMCOREINFO" diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index d02239022bd0..fc072bdb74e3 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -66,6 +66,7 @@ struct perf_event_attr; struct file_handle; struct sigaltstack; union bpf_attr; +struct kexec_fdset; #include #include @@ -321,7 +322,8 @@ asmlinkage long sys_kexec_load(unsigned long entry, unsigned long nr_segments, asmlinkage long sys_kexec_file_load(int kernel_fd, int initrd_fd, unsigned long cmdline_len, const char __user *cmdline_ptr, - unsigned long flags); + unsigned long flags, + const struct kexec_fdset __user *ufdset); asmlinkage long sys_exit(int error_code); asmlinkage long sys_exit_group(int error_code); diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h index aae5ebf2022b..6279be79efba 100644 --- a/include/uapi/linux/kexec.h +++ b/include/uapi/linux/kexec.h @@ -23,6 +23,28 @@ #define KEXEC_FILE_UNLOAD 0x0001 #define KEXEC_FILE_ON_CRASH0x0002 #define KEXEC_FILE_NO_INITRAMFS0x0004 +#define KEXEC_FILE_EXTRA_FDS 0x0008 + +enum kexec_file_type { + KEXEC_FILE_TYPE_KERNEL, + KEXEC_FILE_TYPE_INITRAMFS, + + /* +* Device Tree Blob containing just the nodes and properties that +* the kexec_file_load caller wants to add or modify. +*/ + KEXEC_FILE_TYPE_PARTIAL_DTB, +}; + +struct kexec_file_fd { + enum kexec_file_type type; + int fd; +}; + +struct kexec_fdset { + int nr_fds; + struct kexec_file_fd fds[0]; +}; /* These values match the ELF architecture values. * Unless there is a good reason that should continue to be the case. diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index 113af2f219b9..d6803dd884e2 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -25,6 +25,9 @@ #include #include "kexec_internal.h" +#define MAX_FDSET_SIZE (sizeof(struct kexec_fdset) + \ + KEXEC_SEGMENT_MAX * sizeof(struct kexec_file_fd)) + /* * Declare these symbols weak so that if architecture provides a purgatory, * these will be overridden. @@ -116,6 +119,22 @@ voi
[PATCH v2 1/2] kexec: add dtb info to struct kimage
From: AKASHI Takahiro Device tree blob must be passed to a second kernel on DTB-capable archs, like powerpc and arm64, but the current kernel interface lacks this support. This patch adds dtb buffer information to struct kimage. When users don't specify dtb explicitly and the one used for the current kernel can be re-used, this change will be good enough for implementing kexec_file_load feature. Signed-off-by: AKASHI Takahiro --- include/linux/kexec.h | 3 +++ kernel/kexec_file.c | 3 +++ 2 files changed, 6 insertions(+) diff --git a/include/linux/kexec.h b/include/linux/kexec.h index d743baaa..4f85d284ed0b 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -192,6 +192,9 @@ struct kimage { char *cmdline_buf; unsigned long cmdline_buf_len; + void *dtb_buf; + unsigned long dtb_buf_len; + /* File operations provided by image loader */ struct kexec_file_ops *fops; diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index 503bc2d348e5..113af2f219b9 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -92,6 +92,9 @@ void kimage_file_post_load_cleanup(struct kimage *image) vfree(image->initrd_buf); image->initrd_buf = NULL; + vfree(image->dtb_buf); + image->dtb_buf = NULL; + kfree(image->cmdline_buf); image->cmdline_buf = NULL; -- 1.9.1
[PATCH v2 0/2] extend kexec_file_load system call
This patch series is from AKASHI Takahiro. I will use it in my next version of the kexec_file_load implementation for powerpc, so I am rebasing it on top of v4.8-rc1. I dropped the patch which adds __NR_kexec_file_load to for simplicity, since the powerpc patches already add it to powerpc's . I don't know which approach is better. The first patch in this series is unchanged from v1. The second patch is the same one I posted on July 26th. It has the following changes from v1: - Added the arch_kexec_verify_buffer hook, where each architecture can verify if the DTB is safe to load. - Renamed KEXEC_FILE_TYPE_DTB to KEXEC_FILE_TYPE_PARTIAL_DTB. - Limited max number of fds to KEXEC_SEGMENT_MAX. - Changed to use fixed size buffer for fdset instead of allocating it. - Changed to return -EINVAL if an unknown file type is found in fdset. I am also posting a new version of the kexec_file_load syscall implementation for powerpc which uses the arch_kexec_verify_buffer hook to enforce a whitelist of nodes and properties that userspace can pass to the next kernel, as suggested by Michael Ellerman. You can find it in a new patch in the powerpc series called "powerpc: Allow userspace to set device tree properties in kexec_file_load" Original cover letter: Device tree blob must be passed to a second kernel on DTB-capable archs, like powerpc and arm64, but the current kernel interface lacks this support. This patch extends kexec_file_load system call by adding an extra argument to this syscall so that an arbitrary number of file descriptors can be handed out from user space to the kernel. See the background [1]. Please note that the new interface looks quite similar to the current system call, but that it won't always mean that it provides the "binary compatibility." [1] http://lists.infradead.org/pipermail/kexec/2016-June/016276.html AKASHI Takahiro (1): kexec: add dtb info to struct kimage Thiago Jung Bauermann (1): kexec: extend kexec_file_load system call include/linux/fs.h | 1 + include/linux/kexec.h | 10 -- include/linux/syscalls.h | 4 ++- include/uapi/linux/kexec.h | 22 kernel/kexec_file.c| 86 ++ 5 files changed, 114 insertions(+), 9 deletions(-) -- 1.9.1
Re: [PATCH] powerpc/32: Remove one insn in __bswapdi2
On Thu, Aug 11, 2016 at 11:34:37PM +0200, Gabriel Paubert wrote: > On the other hand gcc did at the time a very poor job (quite an > understatement) at bswapdi when compiling for 64 bit processors > (see the example). > > But what do modern compilers generate for bswapdi these days? Do they > still call the library or not? Nope. > After all, bswapdi on 32 bit processors only takes 6 instructions if the > input and output registers don't overlap. For this testcase: === typedef unsigned long long u64; u64 bs(u64 x) { return __builtin_bswap64(x); } === we get with -m32: === bs: mr 9,3 rotlwi 3,4,24 rlwimi 3,4,8,8,15 rlwimi 3,4,8,24,31 rotlwi 4,9,24 rlwimi 4,9,8,8,15 rlwimi 4,9,8,24,31 blr === and with -m64: === .L.bs: srdi 10,3,32 mr 9,3 rotlwi 3,3,24 rotlwi 8,10,24 rlwimi 3,9,8,8,15 rlwimi 8,10,8,8,15 rlwimi 3,9,8,24,31 rlwimi 8,10,8,24,31 sldi 3,3,32 or 3,3,8 blr === Neither as tight as possible, but neither horrible either. Segher
Re: [PATCH] powerpc/32: Remove one insn in __bswapdi2
On Wed, Aug 10, 2016 at 12:18:15PM +0200, Christophe Leroy wrote: > > > Le 10/08/2016 à 10:56, Gabriel Paubert a écrit : > >On Fri, Aug 05, 2016 at 01:28:02PM +0200, Christophe Leroy wrote: > >>Signed-off-by: Christophe Leroy > >>--- > >> arch/powerpc/kernel/misc_32.S | 3 +-- > >> 1 file changed, 1 insertion(+), 2 deletions(-) > >> > >>diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S > >>index e025230..e18055c 100644 > >>--- a/arch/powerpc/kernel/misc_32.S > >>+++ b/arch/powerpc/kernel/misc_32.S > >>@@ -578,9 +578,8 @@ _GLOBAL(__bswapdi2) > >>rlwimi r9,r4,24,0,7 > >>rlwimi r10,r3,24,0,7 > >>rlwimi r9,r4,24,16,23 > >>- rlwimi r10,r3,24,16,23 > >>+ rlwimi r4,r3,24,16,23 > >>mr r3,r9 > >>- mr r4,r10 > >>blr > >> > > > >Hmmm, are you sure that it works? rlwimi is a bit special since the > >first operand is both an input and an output of the instruction. > > > > > > Oops, you are right ... I just found this: http://hardwarebug.org/2010/01/14/beware-the-builtins/ the bswapdi2 suggested sequence only needs a single mr instruction, the other one is absorbed in a rotlwi. The scheduling looks poor, but it seems impossible to interleave the operations between the two halves without adding another instructions, and the routine is 8 instructions long, which happens to be exactly a cache line on most 32 bit processors. On the other hand gcc did at the time a very poor job (quite an understatement) at bswapdi when compiling for 64 bit processors (see the example). But what do modern compilers generate for bswapdi these days? Do they still call the library or not? After all, bswapdi on 32 bit processors only takes 6 instructions if the input and output registers don't overlap. Gabriel
Re: [PATCH 0/2] ibmvfc: FC-TAPE Support
On 08/03/2016 02:36 PM, Tyrel Datwyler wrote: > This patchset introduces optional FC-TAPE/FC Class 3 Error Recovery to the > ibmvfc client driver. > > Tyrel Datwyler (2): > ibmvfc: Set READ FCP_XFER_READY DISABLED bit in PRLI > ibmvfc: add FC Class 3 Error Recovery support > > drivers/scsi/ibmvscsi/ibmvfc.c | 11 +++ > drivers/scsi/ibmvscsi/ibmvfc.h | 1 + > 2 files changed, 12 insertions(+) > ping?
[PATCH v4] powerpc/pseries: use pci_host_bridge.release_fn() to kfree(phb)
This patch leverages 'struct pci_host_bridge' from the PCI subsystem in order to free the pci_controller only after the last reference to its devices is dropped (avoiding an oops in pcibios_release_device() if the last reference is dropped after pcibios_free_controller()). The patch relies on pci_host_bridge.release_fn() (and .release_data), which is called automatically by the PCI subsystem when the root bus is released (i.e., the last reference is dropped). Those fields are set via pci_set_host_bridge_release() (e.g. in the platform-specific implementation of pcibios_root_bridge_prepare()). It introduces the 'pcibios_free_controller_deferred()' .release_fn() and it expects .release_data to hold a pointer to the pci_controller. The function implictly calls 'pcibios_free_controller()', so an user must *NOT* explicitly call it if using the new _deferred() callback. The functionality is enabled for pseries (although it isn't platform specific, and may be used by cxl). Details on not-so-elegant design choices: - Use 'pci_host_bridge.release_data' field as pointer to associated 'struct pci_controller' so *not* to 'pci_bus_to_host(bridge->bus)' in pcibios_free_controller_deferred(). That's because pci_remove_root_bus() sets 'host_bridge->bus = NULL' (so, if the last reference is released after pci_remove_root_bus() runs, which eventually reaches pcibios_free_controller_deferred(), that would hit a null pointer dereference). The cxl/vphb.c code calls pci_remove_root_bus(), and the cxl folks are interested in this fix. Test-case #1 (hold references) # ls -ld /sys/block/sd* | grep -m1 0021:01:00.0 <...> /sys/block/sdaa -> ../devices/pci0021:01/0021:01:00.0/<...> # ls -ld /sys/block/sd* | grep -m1 0021:01:00.1 <...> /sys/block/sdab -> ../devices/pci0021:01/0021:01:00.1/<...> # cat >/dev/sdaa & pid1=$! # cat >/dev/sdab & pid2=$! # drmgr -w 5 -d 1 -c phb -s 'PHB 33' -r Validating PHB DLPAR capability...yes. [ 594.306719] pci_hp_remove_devices: PCI: Removing devices on bus 0021:01 [ 594.306738] pci_hp_remove_devices:Removing 0021:01:00.0... ... [ 598.236381] pci_hp_remove_devices:Removing 0021:01:00.1... ... [ 611.972077] pci_bus 0021:01: busn_res: [bus 01-ff] is released [ 611.972140] rpadlpar_io: slot PHB 33 removed # kill -9 $pid1 # kill -9 $pid2 [ 632.918088] pcibios_free_controller_deferred: domain 33, dynamic 1 Test-case #2 (don't hold references) # drmgr -w 5 -d 1 -c phb -s 'PHB 33' -r Validating PHB DLPAR capability...yes. [ 916.357363] pci_hp_remove_devices: PCI: Removing devices on bus 0021:01 [ 916.357386] pci_hp_remove_devices:Removing 0021:01:00.0... ... [ 920.566527] pci_hp_remove_devices:Removing 0021:01:00.1... ... [ 933.955873] pci_bus 0021:01: busn_res: [bus 01-ff] is released [ 933.955977] pcibios_free_controller_deferred: domain 33, dynamic 1 [ 933.955999] rpadlpar_io: slot PHB 33 removed Suggested-By: Gavin Shan Signed-off-by: Mauricio Faria de Oliveira --- Changelog: - v4: improve usability/design/documentation: - rename function to pcibios_free_controller_deferred() - from function call pcibios_free_controller() - no more struct pci_controller.bridge field thanks: Gavin Shan, Andrew Donnellan - v3: different approach: struct pci_host_bridge.release_fn() - v2: different approach: struct pci_controller.refcount arch/powerpc/include/asm/pci-bridge.h | 1 + arch/powerpc/kernel/pci-common.c | 36 ++ arch/powerpc/platforms/pseries/pci.c | 4 arch/powerpc/platforms/pseries/pci_dlpar.c | 7 -- 4 files changed, 46 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h index b5e88e4..c0309c5 100644 --- a/arch/powerpc/include/asm/pci-bridge.h +++ b/arch/powerpc/include/asm/pci-bridge.h @@ -301,6 +301,7 @@ extern void pci_process_bridge_OF_ranges(struct pci_controller *hose, /* Allocate & free a PCI host bridge structure */ extern struct pci_controller *pcibios_alloc_controller(struct device_node *dev); extern void pcibios_free_controller(struct pci_controller *phb); +extern void pcibios_free_controller_deferred(struct pci_host_bridge *bridge); #ifdef CONFIG_PCI extern int pcibios_vaddr_is_ioport(void __iomem *address); diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c index a5c0153..8c48a78 100644 --- a/arch/powerpc/kernel/pci-common.c +++ b/arch/powerpc/kernel/pci-common.c @@ -151,6 +151,42 @@ void pcibios_free_controller(struct pci_controller *phb) EXPORT_SYMBOL_GPL(pcibios_free_controller); /* + * This function is used to call pcibios_free_controller() + * in a deferred manner: a callback from the PCI subsystem. + * + * _*DO NOT*_ call pcibios_free_controller() explicitly if + * this is used (or it may access an invalid *phb pointer). + * + * The callback occurs when all re
Re: [PATCH v3] powerpc/pseries: use pci_host_bridge.release_fn() to kfree(phb)
Hi Gavin, tl;dr: thanks for the comments & suggestions; i'll submit v4. On 08/11/2016 03:40 AM, Gavin Shan wrote: [added some line breaks] It seems the user has two options here: (1) Setup bridge's release_fn() and call pcibios_free_controller() explicitly; I think the v3 design was non-intuitive in that point -- it does not seem right for an user to use both options: if release_fn() is set and is called before pcibios_free_controller() (normal case w/ DLPAR/PCI hotplug/cxl, as buses/devices are supposed to be removed before the controller is released) the latter will use an invalid 'phb' pointer. (what Andrew reported) In that scenario, it's not even possible for pcibios_free_controller() to try to detect if release_fn() was already run or not, as the only information it has is the 'phb' pointer, which may be invalid. So, I believe the elegant way out of this is your suggestion to have "immediate or deferred release" and make the user *choose* either one. Obviously, let's make this explicit to the user -- w/ rename & comments. > (2) Call pcibios_free_controller() without a valid bridge's release_fn() initialized. Ok, that looks legitimate for those using immediate release (default). i.e., once an user decides to use deferred released, it's understood that pcibios_free_controller() should not be called. > I think we can provide better interface to users: what we do in pcibios_free_controller() and pcibios_host_bridge_release() should be (almost) same. pcibios_host_bridge_release() can be a wrapper of pcibios_free_controller(). Right; I implemented only kfree() in pcibios_host_bridge_release() because I was focused on when it runs *after* pcibios_free_controller(); but it turns out that if it runs *before*, phb becomes invalid pointer. So, you're right -- both functions are expected to have the same effect (slightly different code), that is all of what pcibios_free_controller() does. The only difference should be the timing. (good point on wrapper) > With this, the users have two options: (1) Rely on bridge's release_fn() to free the PCI controller; (2) Call pcibios_free_controller() as we're doing currently. Those two options corresponds to immediately or deferred releasing. Looks very good. I'll submit a v4 like this: -rename pcibios_host_bridge_release()/pcibios_free_controller_deferred() -add comments about using _either_ one or another -pcibios_free_controller_deferred() calls pcibios_free_controller(). -- Mauricio Faria de Oliveira IBM Linux Technology Center
Re: [PATCH 1/4] dt-bindings: add doc for ibm,hotplug-aperture
On Thu, Aug 11, 2016 at 02:39:23PM +1000, Stewart Smith wrote: Forgive me for being absent on the whole discussion here, but is this an OPAL specific binding? If so, shouldn't the docs also appear in the skiboot tree? Good question. I guess it's not necessarily OPAL-specific, even though OPAL may initially be the only implementor of the binding. Would it be more appropriate to move the file up a directory, directly under Documentation/devicetree/bindings/powerpc? I hesitated at that because the binding is tied to "ibm,associativity". -- Reza Arbab
Re: [PATCH v3] powerpc/pseries: use pci_host_bridge.release_fn() to kfree(phb)
On 08/11/2016 02:01 AM, Andrew Donnellan wrote: In cxl, we currently call: pci_remove_root_bus(phb->bus); pcibios_free_controller(phb); which appears to break with this patch after I wire up pci_set_host_bridge_release() in cxl, as phb can be freed before we call pcibios_free_controller(). Ugh; you're right. I believe the user is expected to use either one way or another, but now I see it's not that intuitive -- a design fault. I'll address this w/ the other review/suggestion by Gavin; replying it. Missing a '---' here :) Changelog: Ok, thanks! -- Mauricio Faria de Oliveira IBM Linux Technology Center
[PATCH] mm: Initialize per_cpu_nodestats for hotadded pgdats
The following oops occurs after a pgdat is hotadded: [ 86.839956] Unable to handle kernel paging request for data at address 0x00c30001 [ 86.840132] Faulting instruction address: 0xc022f8f4 [ 86.840328] Oops: Kernel access of bad area, sig: 11 [#1] [ 86.840468] SMP NR_CPUS=2048 NUMA pSeries [ 86.840612] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter nls_utf8 isofs sg virtio_balloon uio_pdrv_genirq uio ip_tables xfs libcrc32c sr_mod cdrom sd_mod virtio_net ibmvscsi scsi_transport_srp virtio_pci virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod [ 86.842955] CPU: 0 PID: 0 Comm: swapper/0 Tainted: GW 4.8.0-rc1-device #110 [ 86.843140] task: c0ef3080 task.stack: c0f6c000 [ 86.843323] NIP: c022f8f4 LR: c022f948 CTR: [ 86.843595] REGS: c0f6fa50 TRAP: 0300 Tainted: GW (4.8.0-rc1-device) [ 86.843889] MSR: 80010280b033 CR: 84002028 XER: 2000 [ 86.844624] CFAR: d1d2013c DAR: 00c30001 DSISR: 4000 SOFTE: 0 GPR00: c022f948 c0f6fcd0 c0f71400 0001 GPR04: 0100 00c3 GPR08: 0001 00c3 GPR12: 2200 c130 c0faefb4 c0faefa8 GPR16: c0f6c000 c0f6c080 c0bf15b0 c0f6c080 GPR20: c0bf4928 0003 c0bf4968 GPR24: c000ffed c0f6fd58 GPR28: 0001 0001 c0f6fcf0 c000ffed9c08 [ 86.847747] NIP [c022f8f4] refresh_cpu_vm_stats+0x1a4/0x2f0 [ 86.847897] LR [c022f948] refresh_cpu_vm_stats+0x1f8/0x2f0 [ 86.848060] Call Trace: [ 86.848183] [c0f6fcd0] [c022f948] refresh_cpu_vm_stats+0x1f8/0x2f0 (unreliable) Add per_cpu_nodestats initialization to the hotplug codepath. Signed-off-by: Reza Arbab --- mm/memory_hotplug.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 3894b65..41266dc 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1219,6 +1219,7 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start) /* init node's zones as empty zones, we don't have any present pages.*/ free_area_init_node(nid, zones_size, start_pfn, zholes_size); + pgdat->per_cpu_nodestats = alloc_percpu(struct per_cpu_nodestat); /* * The node we allocated has no zone fallback lists. For avoiding @@ -1249,6 +1250,7 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start) static void rollback_node_hotadd(int nid, pg_data_t *pgdat) { arch_refresh_nodedata(nid, NULL); + free_percpu(pgdat->per_cpu_nodestats); arch_free_nodedata(pgdat); return; } -- 1.8.3.1
Re: mm: Initialise per_cpu_nodestats for all online pgdats at boot
On Thu, Aug 11, 2016 at 10:28:08AM +0100, Mel Gorman wrote: Fix looks ok. Can you add a proper changelog to it including an example oops or do you need me to do it? Sure, no problem. Patch to follow. -- Reza Arbab
Re: [TESTING] kbuild: link drivers subdirectories separately
On Thursday, August 11, 2016 3:49:03 PM CEST Arnd Bergmann wrote: > @@ -137,7 +134,8 @@ obj-$(CONFIG_PPC_PS3) += ps3/ > obj-$(CONFIG_OF) += of/ > obj-$(CONFIG_SSB) += ssb/ > obj-$(CONFIG_BCMA) += bcma/ > -obj-y += vhost/ > +obj-$(CONFIG_VHOST_RING) += vhost/ > +obj-$(CONFIG_VHOST)+= vhost/ > obj-$(CONFIG_VLYNQ)+= vlynq/ > obj-$(CONFIG_STAGING) += staging/ > obj-y += platform/ > This hunk should have been the other way round to apply and work correctly, I mixed up the number of reverts I had on my tree before it. Arnd
[pasemi] Internal CompactFlash (CF) card device not recognised after the powerpc-4.8-1 merge
Hi All, I was able to patch the RC1 with the Nemo and PHB-numbering patch. Additionally I added some printks in the file pata_of_platform.c. I wanted to know which values have the following variables: ctl_res = io_res; io_res.start += 0x800; ctl_res.start = ctl_res.start + 0x80e; io_res.end = ctl_res.start-1; It compiled without any problems but unfortunately I didn't see any printk outputs of these variables. The output of pata_of_platform is missing too. I see this output in the dmesg of the kernel 4.7 but I don't see it in the dmesg of the kernel 4.8. I have the feeling, that pata_of_platform doesn't work anymore. Maybe this is the reason, why the CF card doesn't work anymore. Maybe this is the problem: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/diff/arch/powerpc/platforms/pasemi/setup.c?id=bad60e6f259a01cf9f29a1ef8d435ab6c60b2de9 Do you have any hints for me? Cheers, Christian On 05 August 2016 at 11:42 PM, Darren Stevens wrote: Hello Nicholas On 06/08/2016, Nicholas Piggin wrote: Hi Christian, On 05 August 2016 at 1:41 PM, Christian Zigotzky wrote: Hi All, The internal PASEMI CompactFlash (CF) card device doesn't work anymore after the powerpc-4.8-1 merge. That means the code for the internal CF card device in the Nemo patch doesn't work after the first PowerPC merge. The CompactFlash (CF) card slot is wired to the CPU local bus. It is typically used to hold the Linux kernel. I know it isn't well to use an own patch for that but I think it is a good time to integrate the PASEMI internal CompactFlash (CF) card device to the official kernel. What do you think? I am not a programmer so I can't integrate the source code for the internal CF card device. But maybe you can take the patch and integrate it. We use the following patch for the kernel 4.7: diff -rupN a/drivers/ata/pata_of_platform.c b/drivers/ata/pata_of_platform.c --- a/drivers/ata/pata_of_platform.c 2016-08-05 09:58:41.410569036 +0200 +++ b/drivers/ata/pata_of_platform.c 2016-08-05 09:59:54.41424 +0200 @@ -41,14 +41,36 @@ static int pata_of_platform_probe(struct return -EINVAL; } - ret = of_address_to_resource(dn, 1, &ctl_res); - if (ret) { - dev_err(&ofdev->dev, "can't get CTL address from " - "device tree\n"); - return -EINVAL; + if (of_device_is_compatible(dn, "electra-ide")) { + /* Altstatus is really at offset 0x3f6 from the primary window + * on electra-ide. Adjust ctl_res and io_res accordingly. + */ + ctl_res = io_res; + ctl_res.start = ctl_res.start+0x3f6; + io_res.end = ctl_res.start-1; + +#ifdef CONFIG_PPC_PASEMI_SB600 + } else if (of_device_is_compatible(dn, "electra-cf")) { + /* Task regs are at 0x800, with alt status @ 0x80e in the primary window +* on electra-cf. Adjust ctl_res and io_res accordingly. +*/ + ctl_res = io_res; + io_res.start += 0x800; + ctl_res.start = ctl_res.start + 0x80e; + io_res.end = ctl_res.start-1; +#endif + } else { + ret = of_address_to_resource(dn, 1, &ctl_res); + if (ret) { + dev_err(&ofdev->dev, "can't get CTL address from " +"device tree\n"); + return -EINVAL; + } } irq_res = platform_get_resource(ofdev, IORESOURCE_IRQ, 0); + if (irq_res) + irq_res->flags = 0; prop = of_get_property(dn, "reg-shift", NULL); if (prop) @@ -65,6 +87,11 @@ static int pata_of_platform_probe(struct dev_info(&ofdev->dev, "pio-mode unspecified, assuming PIO0\n"); } +#ifdef CONFIG_PPC_PASEMI_SB600 + irq_res = 0;// force irq off (doesn't seem to work) +#endif + + pio_mask = 1 << pio_mode; pio_mask |= (1 << pio_mode) - 1; @@ -74,7 +101,11 @@ static int pata_of_platform_probe(struct static struct of_device_id pata_of_platform_match[] = { { .compatible = "ata-generic", }, - { }, + { .compatible = "electra-ide", }, +#ifdef CONFIG_PPC_PASEMI_SB600 + { .compatible = "electra-cf",}, +#endif + {}, }; MODULE_DEVICE_TABLE(of, pata_of_platform_match); dmesg with the kernel 4.7: zcat /var/log/dmesg.1.gz | grep -i ata7 [2.939788] ata7: PATA max PIO0 no IRQ, using PIO polling mmio cmd 0xf800 ctl 0xf80e [3.099186] ata7.00: CFA: SanDisk SDCFB-256, HDX 2.33, max PIO4 [3.099191] ata7.00: 501760 sectors, multi 0: LBA [3.099199] ata7.00: configured for PIO The dmesg of the latest Git kernel doesn't have any output of our internal CF card device
[TESTING] kbuild: link drivers subdirectories separately
On ARM, relative branches between functions can not span more than 32MB, which limits the size of an ELF section. In the final link, the linker will introduce trampolines that perform long calls to avoid the limit, and during a recursive link, trampolines are added within the section. However, this does not work for cross-section branches when the source section is already larger than 32MB because there is no longer space to put the trampoline. We are unable to build an allyesconfig kernel on ARM because the .text section in drivers/built-in.o has that problem. This patch avoids it by linking drivers/*/built-in.o directly into vmlinux.o, rather than first linking them into drivers/built-in.o. Signed-off-by: Arnd Bergmann --- This patch gets allyesconfig to work for me on ARM. We have previously decided that this is too ugly, but you can use it for comparing the link times. diff --git a/Makefile b/Makefile index 2eae4bab0d9b..091ca3a3015b 100644 --- a/Makefile +++ b/Makefile @@ -557,13 +557,6 @@ scripts: scripts_basic include/config/auto.conf include/config/tristate.conf \ asm-generic gcc-plugins $(Q)$(MAKE) $(build)=$(@) -# Objects we will link into vmlinux / subdirs we need to visit -init-y := init/ -drivers-y := drivers/ sound/ firmware/ -net-y := net/ -libs-y := lib/ -core-y := usr/ -virt-y := virt/ endif # KBUILD_EXTMOD ifeq ($(dot-config),1) @@ -584,6 +577,20 @@ $(KCONFIG_CONFIG) include/config/auto.conf.cmd: ; # we execute the config step to be sure to catch updated Kconfig files include/config/%.conf: $(KCONFIG_CONFIG) include/config/auto.conf.cmd $(Q)$(MAKE) -f $(srctree)/Makefile silentoldconfig + +# Objects we will link into vmlinux / subdirs we need to visit +init-y := init/ +net-y := net/ +libs-y := lib/ +core-y := usr/ +virt-y := virt/ + +# split out objects from drivers to avoid recursively linking large .o files +include drivers/Makefile +drivers-y := $(addprefix drivers/,$(obj-y) $(obj-m)) +drivers-y += sound/ firmware/ +obj-y := + else # external modules needs include/generated/autoconf.h and include/config/auto.conf # but do not care if they are up-to-date. Use auto.conf to trigger the test diff --git a/drivers/Makefile b/drivers/Makefile index 9cfa547d67ce..38848742db1f 100644 --- a/drivers/Makefile +++ b/drivers/Makefile @@ -95,10 +95,7 @@ obj-$(CONFIG_ATA_OVER_ETH) += block/aoe/ obj-$(CONFIG_PARIDE) += block/paride/ obj-$(CONFIG_TC) += tc/ obj-$(CONFIG_UWB) += uwb/ -obj-$(CONFIG_USB_PHY) += usb/ -obj-$(CONFIG_USB) += usb/ -obj-$(CONFIG_PCI) += usb/ -obj-$(CONFIG_USB_GADGET) += usb/ +obj-y += usb/ obj-$(CONFIG_SERIO)+= input/serio/ obj-$(CONFIG_GAMEPORT) += input/gameport/ obj-$(CONFIG_INPUT)+= input/ @@ -137,7 +134,8 @@ obj-$(CONFIG_PPC_PS3) += ps3/ obj-$(CONFIG_OF) += of/ obj-$(CONFIG_SSB) += ssb/ obj-$(CONFIG_BCMA) += bcma/ -obj-y += vhost/ +obj-$(CONFIG_VHOST_RING) += vhost/ +obj-$(CONFIG_VHOST)+= vhost/ obj-$(CONFIG_VLYNQ)+= vlynq/ obj-$(CONFIG_STAGING) += staging/ obj-y += platform/
Re: [PATCH] powerpc: populate the default bus with machine_arch_initcall
On Thu, Aug 11, 2016 at 6:09 AM, Kevin Hao wrote: > With the commit 44a7185c2ae6 ("of/platform: Add common method to > populate default bus"), a default function is introduced to populate > the default bus and this function is invoked at the arch_initcall_sync > level. This will override the arch specific population of default bus > which run at a lower level than arch_initcall_sync. Since not all > powerpc specific buses are added to the of_default_bus_match_table[], > this causes some powerpc specific bus are not probed. Fix this by > using a more preceding initcall. > > Signed-off-by: Kevin Hao > --- > Of course we can adjust the powerpc arch codes to use the > of_platform_default_populate_init(), but it has high risk to break > other boards given the complicated powerpc specific buses. So I would > like just to fix the broken boards in the current release, and cook > a patch to change to of_platform_default_populate_init() for linux-next. The patch that broke things was sitting in -next for some time and no one reported anything. Are all these boards broken? I'm fine to just disable the default call for PPC instead if there's some chance this does not fix some boards. There could be some other initcall ordering dependencies. > > Only boot test on a mpc8315erdb board. Curious, what would it take to remove the of_platform_bus_probe and use the default here? We can add additional bus compatibles to match. The difference between of_platform_bus_probe and of_platform_bus_populate is the former will match root nodes with no compatible string. Most platforms should not need that behavior and it would be nice to know which ones. Rob
Re: powerpc allyesconfig / allmodconfig linux-next next-20160729 - next-20160729 build failures
On Thu, 11 Aug 2016 15:04:00 +0200 Arnd Bergmann wrote: > On Thursday, August 11, 2016 10:43:20 PM CEST Nicholas Piggin wrote: > > On Wed, 03 Aug 2016 22:13:28 +0200 > > Final ld time > > inclink > > real0m0.378s > > user0m0.304s > > sys 0m0.076s > > > > thinarc > > real0m0.894s > > user0m0.684s > > sys 0m0.200s > > This also still seems fine. > > > For both cases final link gets slower with thin archives. I guess there is > > some > > per-file overhead but I thought with --whole-archive it should not be that > > much > > slower. Still, overall time for main ar/ld phases comes out about the same > > in > > the end so I don't think it's too much problem. Unless ARM blows up > > significantly > > worse with a bigger config. > > Unfortunately I think it does. I haven't tried your latest series yet, > but I think the total time for removing built-in.o and relinking went > up from around 4 minutes (already way too much) to 18 minutes for me. > > > Linking with thin archives takes significantly more time in bfd hash lookup > > code. > > I haven't dug much further yet. > > Can you try the ARM allyesconfig with thin archives? I'll follow up with two > patches: one to get ARM to link without thin archives, and one that I used > to get --gc-sections to work. Okay send them over, I'll try digging into it. There is not much kbuild code to maintain so we don't have to switch every arch. It would be nice to though. Thanks, Nick
[PATCH v2] powerpc: move hmi.c to arch/powerpc/kvm/
hmi.c functions are unused unless sibling_subcore_state is nonzero, and that in turn happens only if KVM is in use. So move the code to arch/powerpc/kvm/, putting it under CONFIG_KVM_BOOK3S_HV_POSSIBLE rather than CONFIG_PPC_BOOK3S_64. The sibling_subcore_state is also included in struct paca_struct only if KVM is supported by the kernel. Cc: Daniel Axtens Cc: Michael Ellerman Cc: Mahesh Salgaonkar Cc: Paul Mackerras Cc: linuxppc-dev@lists.ozlabs.org Cc: kvm-...@vger.kernel.org Cc: k...@vger.kernel.org Signed-off-by: Paolo Bonzini --- v1->v2: use CONFIG_KVM_BOOK3S_HV_POSSIBLE, not CONFIG_KVM_BOOK3S_64_HANDLER. The former implies the latter, but the reverse is not true. arch/powerpc/include/asm/hmi.h | 2 +- arch/powerpc/include/asm/paca.h| 12 +++- arch/powerpc/kernel/Makefile | 2 +- arch/powerpc/kvm/Makefile | 1 + arch/powerpc/{kernel/hmi.c => kvm/book3s_hv_hmi.c} | 0 5 files changed, 10 insertions(+), 7 deletions(-) rename arch/powerpc/{kernel/hmi.c => kvm/book3s_hv_hmi.c} (100%) diff --git a/arch/powerpc/include/asm/hmi.h b/arch/powerpc/include/asm/hmi.h index 88b4901ac4ee..85b7a1a21e22 100644 --- a/arch/powerpc/include/asm/hmi.h +++ b/arch/powerpc/include/asm/hmi.h @@ -21,7 +21,7 @@ #ifndef __ASM_PPC64_HMI_H__ #define __ASM_PPC64_HMI_H__ -#ifdef CONFIG_PPC_BOOK3S_64 +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE #defineCORE_TB_RESYNC_REQ_BIT 63 #define MAX_SUBCORE_PER_CORE 4 diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h index 148303e7771f..6a6792bb39fb 100644 --- a/arch/powerpc/include/asm/paca.h +++ b/arch/powerpc/include/asm/paca.h @@ -183,11 +183,6 @@ struct paca_struct { */ u16 in_mce; u8 hmi_event_available; /* HMI event is available */ - /* -* Bitmap for sibling subcore status. See kvm/book3s_hv_ras.c for -* more details -*/ - struct sibling_subcore_state *sibling_subcore_state; #endif /* Stuff for accurate time accounting */ @@ -202,6 +197,13 @@ struct paca_struct { struct kvmppc_book3s_shadow_vcpu shadow_vcpu; #endif struct kvmppc_host_state kvm_hstate; +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE + /* +* Bitmap for sibling subcore status. See kvm/book3s_hv_ras.c for +* more details +*/ + struct sibling_subcore_state *sibling_subcore_state; +#endif #endif }; diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index b2027a5cf508..fe4c075bcf50 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -41,7 +41,7 @@ obj-$(CONFIG_VDSO32) += vdso32/ obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o obj-$(CONFIG_PPC_BOOK3S_64)+= cpu_setup_ppc970.o cpu_setup_pa6t.o obj-$(CONFIG_PPC_BOOK3S_64)+= cpu_setup_power.o -obj-$(CONFIG_PPC_BOOK3S_64)+= mce.o mce_power.o hmi.o +obj-$(CONFIG_PPC_BOOK3S_64)+= mce.o mce_power.o obj-$(CONFIG_PPC_BOOK3E_64)+= exceptions-64e.o idle_book3e.o obj-$(CONFIG_PPC64)+= vdso64/ obj-$(CONFIG_ALTIVEC) += vecemu.o diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile index 1f9e5529e692..855d4b95d752 100644 --- a/arch/powerpc/kvm/Makefile +++ b/arch/powerpc/kvm/Makefile @@ -78,6 +78,7 @@ kvm-book3s_64-builtin-xics-objs-$(CONFIG_KVM_XICS) := \ ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HANDLER) += \ + book3s_hv_hmi.o \ book3s_hv_rmhandlers.o \ book3s_hv_rm_mmu.o \ book3s_hv_ras.o \ diff --git a/arch/powerpc/kernel/hmi.c b/arch/powerpc/kvm/book3s_hv_hmi.c similarity index 100% rename from arch/powerpc/kernel/hmi.c rename to arch/powerpc/kvm/book3s_hv_hmi.c -- 1.8.3.1
Re: powerpc allyesconfig / allmodconfig linux-next next-20160729 - next-20160729 build failures
On Thursday, August 11, 2016 10:43:20 PM CEST Nicholas Piggin wrote: > On Wed, 03 Aug 2016 22:13:28 +0200 > Arnd Bergmann wrote: > > > On Wednesday, August 3, 2016 2:44:29 PM CEST Segher Boessenkool wrote: > > > Hi Arnd, > > > > > > On Wed, Aug 03, 2016 at 08:52:48PM +0200, Arnd Bergmann wrote: > > > > From my first look, it seems that all of lib/*.o is now getting linked > > > > into vmlinux, while we traditionally leave out everything from lib/ > > > > that is not referenced. > > > > > > > > I also see a noticeable overhead in link time, the numbers are for > > > > a cache-hot rebuild after a successful allyesconfig build, using a > > > > 24-way Opteron@2.5Ghz, just relinking vmlinux: > > > > > > > > $ time make skj30 vmlinux # before > > > > real2m8.092s > > > > user3m41.008s > > > > sys 0m48.172s > > > > > > > > $ time make skj30 vmlinux # after > > > > real4m10.189s > > > > user5m43.804s > > > > sys 0m52.988s > > > > > > Is it better when using rcT instead of rcsT? > > > > It seems to be noticeably better for the clean rebuild case, though > > not as good as the original: > > > > real3m34.015s > > user5m7.104s > > sys 0m49.172s > > > > I've also tried now with my own patch applied as well (linking > > each drivers/*/built-in.o into vmlinux rather than having them > > linked into drivers/built-in.o first), but that makes no > > difference. > > I just want to come back to this, because I've subbmitted the thin > archives kbuild patch, I wanted to make sure we're doing okay on > ARM/ARM64. I cross compiled with my laptop. > > For ARM64 allyesconfig: > > After building then removing all built-in.o then rebuilding vmlinux: > inclink > time make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- -j8 vmlinux > real1m18.977s > user2m14.512s > sys 0m29.704s > > thinarc > time make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- -j8 vmlinux > real1m18.433s > user2m6.128s > sys 0m28.372s > > > Final ld time > inclink > real0m4.005s > user0m3.464s > sys 0m0.536s > > thinarc > real0m5.841s > user0m4.916s > sys 0m0.916s > > > Build directory size is of course much better (3953MB vs 5519MB). Ok, looks great. Some downsides and some upsides here, but overall I think this is a win. > > For ARM, defconfig > > After building then removing all built-in.o then rebuilding vmlinux: > inclink > real 0m19.593s > user 0m22.372s > sys 0m6.428s > > thinarc > real 0m18.919s > user 0m21.924s > sys 0m6.400s > > > Final ld time > inclink > real 0m0.378s > user 0m0.304s > sys 0m0.076s > > thinarc > real0m0.894s > user0m0.684s > sys 0m0.200s This also still seems fine. > For both cases final link gets slower with thin archives. I guess there is > some > per-file overhead but I thought with --whole-archive it should not be that > much > slower. Still, overall time for main ar/ld phases comes out about the same in > the end so I don't think it's too much problem. Unless ARM blows up > significantly > worse with a bigger config. Unfortunately I think it does. I haven't tried your latest series yet, but I think the total time for removing built-in.o and relinking went up from around 4 minutes (already way too much) to 18 minutes for me. > Linking with thin archives takes significantly more time in bfd hash lookup > code. > I haven't dug much further yet. Can you try the ARM allyesconfig with thin archives? I'll follow up with two patches: one to get ARM to link without thin archives, and one that I used to get --gc-sections to work. Arnd
Re: powerpc allyesconfig / allmodconfig linux-next next-20160729 - next-20160729 build failures
On Wed, 03 Aug 2016 22:13:28 +0200 Arnd Bergmann wrote: > On Wednesday, August 3, 2016 2:44:29 PM CEST Segher Boessenkool wrote: > > Hi Arnd, > > > > On Wed, Aug 03, 2016 at 08:52:48PM +0200, Arnd Bergmann wrote: > > > From my first look, it seems that all of lib/*.o is now getting linked > > > into vmlinux, while we traditionally leave out everything from lib/ > > > that is not referenced. > > > > > > I also see a noticeable overhead in link time, the numbers are for > > > a cache-hot rebuild after a successful allyesconfig build, using a > > > 24-way Opteron@2.5Ghz, just relinking vmlinux: > > > > > > $ time make skj30 vmlinux # before > > > real 2m8.092s > > > user 3m41.008s > > > sys 0m48.172s > > > > > > $ time make skj30 vmlinux # after > > > real 4m10.189s > > > user 5m43.804s > > > sys 0m52.988s > > > > Is it better when using rcT instead of rcsT? > > It seems to be noticeably better for the clean rebuild case, though > not as good as the original: > > real 3m34.015s > user 5m7.104s > sys 0m49.172s > > I've also tried now with my own patch applied as well (linking > each drivers/*/built-in.o into vmlinux rather than having them > linked into drivers/built-in.o first), but that makes no > difference. I just want to come back to this, because I've subbmitted the thin archives kbuild patch, I wanted to make sure we're doing okay on ARM/ARM64. I cross compiled with my laptop. For ARM64 allyesconfig: After building then removing all built-in.o then rebuilding vmlinux: inclink time make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- -j8 vmlinux real1m18.977s user2m14.512s sys 0m29.704s thinarc time make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- -j8 vmlinux real1m18.433s user2m6.128s sys 0m28.372s Final ld time inclink real0m4.005s user0m3.464s sys 0m0.536s thinarc real0m5.841s user0m4.916s sys 0m0.916s Build directory size is of course much better (3953MB vs 5519MB). For ARM, defconfig After building then removing all built-in.o then rebuilding vmlinux: inclink real0m19.593s user0m22.372s sys 0m6.428s thinarc real0m18.919s user0m21.924s sys 0m6.400s Final ld time inclink real0m0.378s user0m0.304s sys 0m0.076s thinarc real0m0.894s user0m0.684s sys 0m0.200s For both cases final link gets slower with thin archives. I guess there is some per-file overhead but I thought with --whole-archive it should not be that much slower. Still, overall time for main ar/ld phases comes out about the same in the end so I don't think it's too much problem. Unless ARM blows up significantly worse with a bigger config. Linking with thin archives takes significantly more time in bfd hash lookup code. I haven't dug much further yet. Thanks, Nick
Re: [PATCH] perf/core: Fix the mask in perf_output_sample_regs
Sorry, found it in my inbox while clearing out backlog.. On Sun, Jul 03, 2016 at 11:31:58PM +0530, Madhavan Srinivasan wrote: > When decoding the perf_regs mask in perf_output_sample_regs(), > we loop through the mask using find_first_bit and find_next_bit functions. > While the exisitng code works fine in most of the case, > the logic is broken for 32bit kernel (Big Endian). > When reading u64 mask using (u32 *)(&val)[0], find_*_bit() assumes it gets > lower 32bits of u64 but instead gets upper 32bits which is wrong. > Proposed fix is to swap the words of the u64 to handle this case. > This is _not_ endianness swap. But it looks an awful lot like it.. > +++ b/kernel/events/core.c > @@ -5205,8 +5205,10 @@ perf_output_sample_regs(struct perf_output_handle > *handle, > struct pt_regs *regs, u64 mask) > { > int bit; > + DECLARE_BITMAP(_mask, 64); > > - for_each_set_bit(bit, (const unsigned long *) &mask, > + bitmap_from_u64(_mask, mask); > + for_each_set_bit(bit, _mask, >sizeof(mask) * BITS_PER_BYTE) { > u64 val; > +++ b/lib/bitmap.c > +void bitmap_from_u64(unsigned long *dst, u64 mask) > +{ > + dst[0] = mask & ULONG_MAX; > + > + if (sizeof(mask) > sizeof(unsigned long)) > + dst[1] = mask >> 32; > +} > +EXPORT_SYMBOL(bitmap_from_u64); Looks small enough for an inline. Alternatively you can go all the way and add bitmap_from_u64array(), but that seems massive overkill. Tedious stuff.. I can't come up with anything prettier :/
Re: [PATCH] powerpc: sysdev: cpm: fix gpio save_regs functions
On Thu, Aug 11, 2016 at 10:50 AM, Christophe Leroy wrote: > of_mm_gpiochip_add_data() calls mm_gc->save_regs() before > setting the data. Therefore ->save_regs() cannot use > gpiochip_get_data() > > [0.275940] Unable to handle kernel paging request for data at address > 0x0130 > [0.283120] Faulting instruction address: 0xc01b44cc > [0.288175] Oops: Kernel access of bad area, sig: 11 [#1] > [0.293343] PREEMPT CMPC885 > [0.296141] CPU: 0 PID: 1 Comm: swapper Not tainted 4.7.0-g65124df-dirty > #68 > [0.304131] task: c6074000 ti: c608 task.ti: c608 > [0.309459] NIP: c01b44cc LR: c0011720 CTR: c0011708 > [0.314372] REGS: c6081d90 TRAP: 0300 Not tainted (4.7.0-g65124df-dirty) > [0.322267] MSR: 9032 CR: 2428 XER: 2000 > [0.328813] DAR: 0130 DSISR: c000 > GPR00: c01b6d0c c6081e40 c6074000 c6017000 c9028000 c601d028 c6081dd8 > GPR08: c601d028 0001 2444 c0002790 > GPR16: c05643b0 0083 > GPR24: c04a1a6c c056 c04a8308 c04c6480 c0012498 c6017000 c7ffcc78 c6017000 > [0.360806] NIP [c01b44cc] gpiochip_get_data+0x4/0xc > [0.365684] LR [c0011720] cpm1_gpio16_save_regs+0x18/0x44 > [0.370972] Call Trace: > [0.373451] [c6081e50] [c01b6d0c] of_mm_gpiochip_add_data+0x70/0xdc > [0.379624] [c6081e70] [c00124c0] cpm_init_par_io+0x28/0x118 > [0.385238] [c6081e80] [c04a8ac0] do_one_initcall+0xb0/0x17c > [0.390819] [c6081ef0] [c04a8cbc] kernel_init_freeable+0x130/0x1dc > [0.396924] [c6081f30] [c00027a4] kernel_init+0x14/0x110 > [0.402177] [c6081f40] [c000b424] ret_from_kernel_thread+0x5c/0x64 > [0.408233] Instruction dump: > [0.411168] 4182fafc 3f80c040 48234c6d 3bc0fff0 3b9c5ed0 4bfffaf4 81290020 > 712a0004 > [0.418825] 4182fb34 48234c51 4bfffb2c 81230004 <80690130> 4e800020 > 7c0802a6 9421ffe0 > [0.426763] ---[ end trace fe4113ee21d72ffa ]--- > > fixes: e65078f1f3490 ("powerpc: sysdev: cpm1: use gpiochip data pointer") > fixes: a14a2d484b386 ("powerpc: cpm_common: use gpiochip data pointer") > Cc: sta...@vger.kernel.org > Signed-off-by: Christophe Leroy Reviewed-by: Linus Walleij Sorry for screwing stuff up :( Yours, Linus Walleij
Re: [PATCH 0/7] ima: carry the measurement list across kexec
On Thu, 2016-08-11 at 17:38 +1000, Balbir Singh wrote: > > On 09/08/16 22:36, Mimi Zohar wrote: > > On Tue, 2016-08-09 at 15:19 +1000, Balbir Singh wrote: > >> > >> On 04/08/16 22:24, Mimi Zohar wrote: > >>> The TPM PCRs are only reset on a hard reboot. In order to validate a > >>> TPM's quote after a soft reboot (eg. kexec -e), the IMA measurement list > >>> of the running kernel must be saved and then restored on the subsequent > >>> boot. > >>> > >>> The existing securityfs binary_runtime_measurements file conveniently > >>> provides a serialized format of the IMA measurement list. This patch > >>> set serializes the measurement list in this format and restores it. > >>> > >>> This patch set pre-req's Thiago Bauermann's "kexec_file: Add buffer > >>> hand-over for the next kernel" patch set* for actually carrying the > >>> serialized measurement list across the kexec. > >>> > >>> Mimi > >>> > >> > >> Hi, Mimi > >> > >> I am trying to convince myself of the security of the solution. I asked > >> Thiago as well, but may be I am be lagging behind in understanding. > >> > >> We trust the kernel to hand over PCR values of the old kernel (which > >> cannot be validated) to the IMA subsystem in the new kernel for storage. > >> I guess the idea is for ima_add_boot_aggregate to do the right thing? > >> How do we validate what the old kernel is giving us? Why do we care for > >> the old measurement list? Is it still of significance in the new kernel? > >> > > > > Hi Balbir, > > > > To validate the hardware TPM PCR values requires walking the measurement > > list simulating the TPM extend operation. The resulting values should > > match the hardware TPM PCRs. > > > > In the case of a soft reboot, the TPM PCRs are not reset to 0, so all > > the measurements of the running system, including those from previous > > soft reboots, need to be included in the measurement list. Without > > these measurements, the simulated PCR values will not match the hardware > > TPM PCR values. Thus the need for this patch set. > > > > Measurements can not be added/removed/changed in the measurement list > > without it being detectable. > > > > Thanks Mimi > > I think that makes sense > > So effectively we do > > first kernel boot -> > second kernel boot -> > > and so on No, the running system doesn't verify the measurement list against the PCRs, before saving and carrying it across kexec. If the system has been compromised, it can't be trusted to verify itself. Verifying the measurement list needs to be done by a trusted third party. The system just carries the measurement list(s) across kexec. Mimi
Re: powerpc/Makefile: Use cflags-y/aflags-y for setting endian options
On Tue, 2016-09-08 at 12:43:46 UTC, Michael Ellerman wrote: > When we introduced the little endian support, we added the endian flags > to CC directly using override. I don't know the history of why we did > that, I suspect no one does. > > Although this mostly works, it has one bug, which is that CROSS32CC > doesn't get -mbig-endian. That means when the compiler is little endian > by default and the user is building big endian, vdso32 is incorrectly > compiled as little endian and the kernel fails to build. > > Instead we can add the endian flags to cflags-y/aflags-y, and then > append those to KBUILD_CFLAGS/KBUILD_AFLAGS. > > This has the advantage of being 1) less ugly, 2) the documented way of > adding flags in the arch Makefile and 3) it fixes building vdso32 with a > LE toolchain. > > Signed-off-by: Michael Ellerman Applied to powerpc fixes. https://git.kernel.org/powerpc/c/164af597ce945751e2dcd53d0a cheers
Re: selftests/powerpc: Specify we expect to build with std=gnu99
On Fri, 2016-29-07 at 10:48:09 UTC, Michael Ellerman wrote: > We have some tests that assume we're using std=gnu99, which is fine on > most compilers, but some old compilers use a different default. > > So make it explicit that we want to use std=gnu99. > > Signed-off-by: Michael Ellerman Applied to powerpc fixes. https://git.kernel.org/powerpc/c/ca49e64f0cb1368fc666a53b16 cheers
Re: powerpc: Update obsolete comment in setup_32.c about early_init()
On Wed, 2016-10-08 at 07:32:38 UTC, Benjamin Herrenschmidt wrote: > We don't identify the machine type anymore... > > Signed-off-by: Benjamin Herrenschmidt Applied to powerpc fixes, thanks. https://git.kernel.org/powerpc/c/f9cc1d1f808dbdfd56978259d2 cheers
Re: powerpc: rebuild vdsos correctly
On Mon, 2016-08-08 at 09:35:43 UTC, Nicholas Piggin wrote: > When using if_changed, we need to add FORCE to dependencies, otherwise > we don't get command line change checking amongst other things. This > has resulted in vdsos not being rebuilt when switching between big and > little endian. > > Signed-off-by: Nicholas Piggin Applied to powerpc fixes, thanks. https://git.kernel.org/powerpc/c/b9a4a0d02c5b8d9a1397c11d74 cheers
Re: powerpc: Fix crash during static key init on ppc32
On Wed, 2016-10-08 at 07:27:34 UTC, Benjamin Herrenschmidt wrote: > We cannot do those initializations from apply_feature_fixups() as > this function runs in a very restricted environment in 32-bit where > the kernel isn't running at its linked address and the PTRRELOC() > macro must be used for any global accesss. > > Instead, split them into a separtate steup_feature_keys() function > which is called in a more suitable spot on ppc32. > > Signed-off-by: Benjamin Herrenschmidt Applied to powerpc fixes, thanks. https://git.kernel.org/powerpc/c/97f6e0cc35026a2a09147a6da6 cheers
Re: powerpc: Print the kernel load address at the end of prom_init
On Wed, 2016-10-08 at 07:29:29 UTC, Benjamin Herrenschmidt wrote: > This makes it easier to debug crashes that happen very early before > the kernel takes over Open Firmware by allowing us to relate the OF > reported crashing addresses to offsets within the kernel. > > Signed-off-by: Benjamin Herrenschmidt Applied to powerpc fixes, thanks. https://git.kernel.org/powerpc/c/7d70c63c7132eb95e428e94524 cheers
[PATCH] powerpc: populate the default bus with machine_arch_initcall
With the commit 44a7185c2ae6 ("of/platform: Add common method to populate default bus"), a default function is introduced to populate the default bus and this function is invoked at the arch_initcall_sync level. This will override the arch specific population of default bus which run at a lower level than arch_initcall_sync. Since not all powerpc specific buses are added to the of_default_bus_match_table[], this causes some powerpc specific bus are not probed. Fix this by using a more preceding initcall. Signed-off-by: Kevin Hao --- Of course we can adjust the powerpc arch codes to use the of_platform_default_populate_init(), but it has high risk to break other boards given the complicated powerpc specific buses. So I would like just to fix the broken boards in the current release, and cook a patch to change to of_platform_default_populate_init() for linux-next. Only boot test on a mpc8315erdb board. arch/powerpc/platforms/40x/ep405.c | 2 +- arch/powerpc/platforms/40x/ppc40x_simple.c | 2 +- arch/powerpc/platforms/40x/virtex.c | 2 +- arch/powerpc/platforms/40x/walnut.c | 2 +- arch/powerpc/platforms/44x/canyonlands.c | 2 +- arch/powerpc/platforms/44x/ebony.c | 2 +- arch/powerpc/platforms/44x/iss4xx.c | 2 +- arch/powerpc/platforms/44x/ppc44x_simple.c | 2 +- arch/powerpc/platforms/44x/ppc476.c | 2 +- arch/powerpc/platforms/44x/sam440ep.c| 2 +- arch/powerpc/platforms/44x/virtex.c | 2 +- arch/powerpc/platforms/44x/warp.c| 2 +- arch/powerpc/platforms/82xx/ep8248e.c| 2 +- arch/powerpc/platforms/82xx/km82xx.c | 2 +- arch/powerpc/platforms/82xx/mpc8272_ads.c| 2 +- arch/powerpc/platforms/82xx/pq2fads.c| 2 +- arch/powerpc/platforms/83xx/mpc831x_rdb.c| 2 +- arch/powerpc/platforms/83xx/mpc834x_itx.c| 2 +- arch/powerpc/platforms/85xx/ppa8548.c| 2 +- arch/powerpc/platforms/8xx/adder875.c| 2 +- arch/powerpc/platforms/8xx/ep88xc.c | 2 +- arch/powerpc/platforms/8xx/mpc86xads_setup.c | 2 +- arch/powerpc/platforms/8xx/mpc885ads_setup.c | 2 +- arch/powerpc/platforms/8xx/tqm8xx_setup.c| 2 +- arch/powerpc/platforms/cell/setup.c | 2 +- arch/powerpc/platforms/embedded6xx/gamecube.c| 2 +- arch/powerpc/platforms/embedded6xx/linkstation.c | 2 +- arch/powerpc/platforms/embedded6xx/mvme5100.c| 2 +- arch/powerpc/platforms/embedded6xx/storcenter.c | 2 +- arch/powerpc/platforms/embedded6xx/wii.c | 2 +- arch/powerpc/platforms/pasemi/setup.c| 2 +- 31 files changed, 31 insertions(+), 31 deletions(-) diff --git a/arch/powerpc/platforms/40x/ep405.c b/arch/powerpc/platforms/40x/ep405.c index 1c8aec6e9bb7..1328cb38e5d7 100644 --- a/arch/powerpc/platforms/40x/ep405.c +++ b/arch/powerpc/platforms/40x/ep405.c @@ -62,7 +62,7 @@ static int __init ep405_device_probe(void) return 0; } -machine_device_initcall(ep405, ep405_device_probe); +machine_arch_initcall(ep405, ep405_device_probe); static void __init ep405_init_bcsr(void) { diff --git a/arch/powerpc/platforms/40x/ppc40x_simple.c b/arch/powerpc/platforms/40x/ppc40x_simple.c index 2a050007bbae..50dce54e6b3b 100644 --- a/arch/powerpc/platforms/40x/ppc40x_simple.c +++ b/arch/powerpc/platforms/40x/ppc40x_simple.c @@ -39,7 +39,7 @@ static int __init ppc40x_device_probe(void) return 0; } -machine_device_initcall(ppc40x_simple, ppc40x_device_probe); +machine_arch_initcall(ppc40x_simple, ppc40x_device_probe); /* This is the list of boards that can be supported by this simple * platform code. This does _not_ mean the boards are compatible, diff --git a/arch/powerpc/platforms/40x/virtex.c b/arch/powerpc/platforms/40x/virtex.c index 91a08ea758a8..d262696b3cbc 100644 --- a/arch/powerpc/platforms/40x/virtex.c +++ b/arch/powerpc/platforms/40x/virtex.c @@ -33,7 +33,7 @@ static int __init virtex_device_probe(void) return 0; } -machine_device_initcall(virtex, virtex_device_probe); +machine_arch_initcall(virtex, virtex_device_probe); static int __init virtex_probe(void) { diff --git a/arch/powerpc/platforms/40x/walnut.c b/arch/powerpc/platforms/40x/walnut.c index e5797815e2f1..9a9c0bccba47 100644 --- a/arch/powerpc/platforms/40x/walnut.c +++ b/arch/powerpc/platforms/40x/walnut.c @@ -42,7 +42,7 @@ static int __init walnut_device_probe(void) return 0; } -machine_device_initcall(walnut, walnut_device_probe); +machine_arch_initcall(walnut, walnut_device_probe); static int __init walnut_probe(void) { diff --git a/arch/powerpc/platforms/44x/canyonlands.c b/arch/powerpc/platforms/44x/canyonlands.c index 157f4ce46386..681fa66ff194 100644 --- a/arch/powerpc/platforms/44x/canyonlands.c +++ b/arch/powerpc/platforms/44x/canyonlands.c @@ -47,7 +47,7 @@ static int __init ppc460ex_device_probe(void) return 0; } -machine_device_