Re: [PATCH 2/2] powerpc: Show PAGE_SIZE in __die() output
Le 08/01/2019 à 13:21, Christophe Leroy a écrit : Le 08/01/2019 à 13:05, Michael Ellerman a écrit : The page size the kernel is built with is useful info when debugging a crash, so add it to the output in __die(). Result looks like eg: kernel BUG at drivers/misc/lkdtm/bugs.c:63! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K SMP NR_CPUS=2048 NUMA pSeries Modules linked in: vmx_crypto kvm binfmt_misc ip_tables Signed-off-by: Michael Ellerman --- arch/powerpc/kernel/traps.c | 12 1 file changed, 12 insertions(+) diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index 431a86d3f772..fc972e4eee5f 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -268,6 +268,18 @@ static int __die(const char *str, struct pt_regs *regs, long err) else seq_buf_puts(, "BE "); + seq_buf_puts(, "PAGE_SIZE="); + if (IS_ENABLED(CONFIG_PPC_4K_PAGES)) + seq_buf_puts(, "4K "); + else if (IS_ENABLED(CONFIG_PPC_16K_PAGES)) + seq_buf_puts(, "16K "); + else if (IS_ENABLED(CONFIG_PPC_64K_PAGES)) + seq_buf_puts(, "64K "); + else if (IS_ENABLED(CONFIG_PPC_256K_PAGES)) + seq_buf_puts(, "256K "); Can't we build all the above at once using PAGE_SHIFT ? Something like (untested): "%dK ", 1 << (PAGE_SHIFT - 10) Or even simplier: "%dK ", PAGE_SIZE / 1024 Christophe Christophe + else + BUILD_BUG_ON(1); + if (IS_ENABLED(CONFIG_PREEMPT)) seq_buf_puts(, "PREEMPT ");
Re: [PATCH] PCI: Add no-D3 quirk for Mellanox ConnectX-[45]
On Wed, 2019-01-09 at 17:32 +1100, Alexey Kardashevskiy wrote: > I have just moved the "Mellanox Technologies MT27700 Family > [ConnectX-4]" from garrison to firestone machine and there it does not > produce an EEH, with the same kernel and skiboot (both upstream + my > debug). Hm. I cannot really blame the card but I cannot see what could > cause the difference in skiboot either. I even tried disabling NPU so > garrison would look like firestone, still EEH'ing. The systems have a different chip though, firestone is P8 and garrison is P8', which a slightly different PHB revision. Worth checking if we have anything significantly different in our inits and poke at the HW guys. BTW. Are the cards behind a switch in either case ? Cheers, Ben.
Re: [PATCH] PCI: Add no-D3 quirk for Mellanox ConnectX-[45]
On Wed, 2019-01-09 at 15:53 +1100, Alexey Kardashevskiy wrote: > "A PCI completion timeout occurred for an outstanding PCI-E transaction" > it is. > > This is how I bind the device to vfio: > > echo vfio-pci > '/sys/bus/pci/devices/:01:00.0/driver_override' > echo vfio-pci > '/sys/bus/pci/devices/:01:00.1/driver_override' > echo ':01:00.0' > '/sys/bus/pci/devices/:01:00.0/driver/unbind' > echo ':01:00.1' > '/sys/bus/pci/devices/:01:00.1/driver/unbind' > echo ':01:00.0' > /sys/bus/pci/drivers/vfio-pci/bind > echo ':01:00.1' > /sys/bus/pci/drivers/vfio-pci/bind > > > and I noticed that EEH only happens with the last command. The order > (.0,.1 or .1,.0) does not matter, it seems that putting one function to > D3 is fine but putting another one when the first one is already in D3 - > produces EEH. And I do not recall ever seeing this on the firestone > machine. Weird. Putting all functions into D3 is what allows the device to actually go into D3. Does it work with other devices ? We do have that bug on early P9 revisions where the attempt of bringing the link to L1 as part of the D3 process fails in horrible ways, I thought P8 would be ok but maybe not ... Otherwise, it might be that our timeouts are too low (you may want to talk to our PCIe guys internally) Cheers, Ben.
Re: [PATCH] PCI: Add no-D3 quirk for Mellanox ConnectX-[45]
On 09/01/2019 16:30, David Gibson wrote: > On Wed, Jan 09, 2019 at 04:09:02PM +1100, Benjamin Herrenschmidt wrote: >> On Mon, 2019-01-07 at 21:01 -0700, Jason Gunthorpe wrote: >>> In a very cryptic way that requires manual parsing using non-public docs sadly but yes. From the look of it, it's a completion timeout. Looks to me like we don't get a response to a config space access during the change of D state. I don't know if it's the write of the D3 state itself or the read back though (it's probably detected on the read back or a subsequent read, but that doesn't tell me which specific one failed). >>> >>> If it is just one card doing it (again, check you have latest >>> firmware) I wonder if it is a sketchy PCI-E electrical link that is >>> causing a long re-training cycle? Can you tell if the PCI-E link is >>> permanently gone or does it eventually return? >> >> No, it's 100% reproducable on systems with that specific card model, >> not card instance, and maybe different systems/cards as well, I'll let >> David & Alexey comment further on that. > > Well, it's 100% reproducable on a particular model of system > (garrison) with a particular model of card. I've had some suggestions > that it fails with some other systems card card models, but nothing > confirmed - the one other system model I've been able to try, which > also had a newer card model didn't reproduce the problem. I have just moved the "Mellanox Technologies MT27700 Family [ConnectX-4]" from garrison to firestone machine and there it does not produce an EEH, with the same kernel and skiboot (both upstream + my debug). Hm. I cannot really blame the card but I cannot see what could cause the difference in skiboot either. I even tried disabling NPU so garrison would look like firestone, still EEH'ing. >>> Does the card work in Gen 3 when it starts? Is there any indication of >>> PCI-E link errors? >> >> Nope. >> >>> Everytime or sometimes? >>> >>> POWER 8 firmware is good? If the link does eventually come back, is >>> the POWER8's D3 resumption timeout long enough? >>> >>> If this doesn't lead to an obvious conclusion you'll probably need to >>> connect to IBM's Mellanox support team to get more information from >>> the card side. >> >> We are IBM :-) So far, it seems to be that the card is doing something >> not quite right, but we don't know what. We might need to engage >> Mellanox themselves. > > Possibly. On the other hand, I've had it reported that this is a > software regression at least with downstream red hat kernels. I > haven't yet been able to eliminate factors that might be confusing > that, or try to find a working version upstream. Do you have tarballs handy? I'd diff... -- Alexey
Re: [PATCH] lkdtm: Add a tests for NULL pointer dereference
Le 09/01/2019 à 02:14, Kees Cook a écrit : On Fri, Dec 14, 2018 at 7:26 AM Christophe Leroy wrote: Introduce lkdtm tests for NULL pointer dereference: check access or exec at NULL address. Why is this not already covered by the existing tests? (Is there something special about NULL that is being missed?) I'd expect SMAP and SMEP to cover NULL as well. Most arches print a different message whether the faulty address is above or under PAGE_SIZE. Below is exemple from x86: pr_alert("BUG: unable to handle kernel %s at %px\n", address < PAGE_SIZE ? "NULL pointer dereference" : "paging request", (void *)address); Until recently, the powerpc arch didn't do it. When I implemented it (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=49a502ea23bf9dec47f8f3c3960909ff409cd1bb), I needed a way to test it and couldn't find an existing one, hence this new LKDTM test. But maybe I missed something ? Christophe -Kees Signed-off-by: Christophe Leroy --- drivers/misc/lkdtm/core.c | 2 ++ drivers/misc/lkdtm/lkdtm.h | 2 ++ drivers/misc/lkdtm/perms.c | 18 ++ 3 files changed, 22 insertions(+) diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c index bc76756b7eda..36910e1d5c09 100644 --- a/drivers/misc/lkdtm/core.c +++ b/drivers/misc/lkdtm/core.c @@ -157,7 +157,9 @@ static const struct crashtype crashtypes[] = { CRASHTYPE(EXEC_VMALLOC), CRASHTYPE(EXEC_RODATA), CRASHTYPE(EXEC_USERSPACE), + CRASHTYPE(EXEC_NULL), CRASHTYPE(ACCESS_USERSPACE), + CRASHTYPE(ACCESS_NULL), CRASHTYPE(WRITE_RO), CRASHTYPE(WRITE_RO_AFTER_INIT), CRASHTYPE(WRITE_KERN), diff --git a/drivers/misc/lkdtm/lkdtm.h b/drivers/misc/lkdtm/lkdtm.h index 3c6fd327e166..b69ee004a3f7 100644 --- a/drivers/misc/lkdtm/lkdtm.h +++ b/drivers/misc/lkdtm/lkdtm.h @@ -45,7 +45,9 @@ void lkdtm_EXEC_KMALLOC(void); void lkdtm_EXEC_VMALLOC(void); void lkdtm_EXEC_RODATA(void); void lkdtm_EXEC_USERSPACE(void); +void lkdtm_EXEC_NULL(void); void lkdtm_ACCESS_USERSPACE(void); +void lkdtm_ACCESS_NULL(void); /* lkdtm_refcount.c */ void lkdtm_REFCOUNT_INC_OVERFLOW(void); diff --git a/drivers/misc/lkdtm/perms.c b/drivers/misc/lkdtm/perms.c index fa54add6375a..62f76d506f04 100644 --- a/drivers/misc/lkdtm/perms.c +++ b/drivers/misc/lkdtm/perms.c @@ -164,6 +164,11 @@ void lkdtm_EXEC_USERSPACE(void) vm_munmap(user_addr, PAGE_SIZE); } +void lkdtm_EXEC_NULL(void) +{ + execute_location(NULL, CODE_AS_IS); +} + void lkdtm_ACCESS_USERSPACE(void) { unsigned long user_addr, tmp = 0; @@ -195,6 +200,19 @@ void lkdtm_ACCESS_USERSPACE(void) vm_munmap(user_addr, PAGE_SIZE); } +void lkdtm_ACCESS_NULL(void) +{ + unsigned long tmp; + unsigned long *ptr = (unsigned long *)NULL; + + pr_info("attempting bad read at %px\n", ptr); + tmp = *ptr; + tmp += 0xc0dec0de; + + pr_info("attempting bad write at %px\n", ptr); + *ptr = tmp; +} + void __init lkdtm_perms_init(void) { /* Make sure we can write to __ro_after_init values during __init */ -- 2.13.3
Re: [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop
Hello Thiago, Wish you a happy 2019! On Sat, Dec 08, 2018 at 12:40:52AM -0200, Thiago Jung Bauermann wrote: > > Gautham R Shenoy writes: > > On Fri, Dec 07, 2018 at 04:13:11PM +0530, Gautham R Shenoy wrote: > >> Sure. I will test the patch and report back. > > > > I added the following debug patch on top of your patch, and after an > > hour's run, the system crashed. Appending the log at the end. > > Thank you very much for testing! Your debug patch was very helpful as > well. > > > I suppose we still need to increase the number of tries since we wait > > only for 2.5ms looping before giving up. > > Do you think it would have helped? In the debug output you posted I > would have expected the following message to show up if the loop > finished too early, and it didn't: > > "Querying DEAD? cpu %i (%i) shows %i\n" > > So I don't think increasing the loop length would have made a > difference. In fact, the call to smp_query_cpu_stopped() always > succeeded on the first iteration. I did some testing during the holidays. Here are the observations: 1) With just your patch (without any additional debug patch), if I run DLPAR on /off operations on a system that has SMT=off, I am able to see a crash involving RTAS stack corruption within an hour's time. 2) With the debug patch (appended below) which has additional debug to capture the callers of stop-self, start-cpu, set-power-levels, the system is able to perform DLPAR on/off operations on a system with SMT=off for three days. And then, it crashed with the dead CPU showing a "Bad kernel stack pointer". From this log, I can clearly see that there were no concurrent calls to stop-self, start-cpu, set-power-levels. The only concurrent RTAS calls were the dying CPU calling "stop-self", and the CPU running the DLPAR operation calling "query-cpu-stopped-state". The crash signature is appended below as well. 3) Modifying your patch to remove the udelay and increase the loop count from 25 to 1000 doesn't improve the situation. We are still able to see the crash. 4) With my patch, even without any additional debug, I was able to observe the system run the tests successfully for over a week (I started the tests before the Christmas weekend, and forgot to turn it off!) It appears that there is a narrow race window involving rtas-stop-self and rtas-query-cpu-stopped-state calls that can be observed with your patch. Adding any printk's seems to reduce the probability of hitting this race window. It might be worth the while to check with RTAS folks, if they suspect something here. The Crash log with this debug patch is as follows [DEBUG] CPU 32 waiting for CPU 130 to enter rtas cpu 130 (hwid 130) Ready to die... [DEBUG] CPU 32 noticed CPU 130 enter rtas: tries=0, time=539 [DEBUG] CPU 32 waiting for CPU 131 to enter rtas cpu 131 (hwid 131) Ready to die... [DEBUG] CPU 32 noticed CPU 131 enter rtas: tries=0, time=137 [DEBUG] CPU 32 waiting for CPU 132 to enter rtas cpu 132 (hwid 132) Ready to die... [DEBUG] CPU 32 noticed CPU 132 enter rtas: tries=0, time=1238 [DEBUG] CPU 32 waiting for CPU 133 to enter rtas cpu 133 (hwid 133) Ready to die... [DEBUG] CPU 32 noticed CPU 133 enter rtas: tries=1, time=1259 [DEBUG] CPU 32 waiting for CPU 134 to enter rtas cpu 134 (hwid 134) Ready to die... [DEBUG] CPU 32 noticed CPU 134 enter rtas: tries=0, time=1141 [DEBUG] CPU 32 waiting for CPU 135 to enter rtas cpu 135 (hwid 135) Ready to die... [DEBUG] CPU 32 noticed CPU 135 enter rtas: tries=0, time=636 cpu 120 (hwid 120) Ready to die... [DEBUG] CPU 32 waiting for CPU 120 to enter rtas [DEBUG] CPU 32 noticed CPU 120 enter rtas: tries=0, time=53 [DEBUG] CPU 32 waiting for CPU 121 to enter rtas cpu 121 (hwid 121) Ready to die... Bad kernel stack pointer 1fafb400 at 1fafb4b0 Bad kernel stack pointer 1fafb4b0 at 1ec15270 Oops: Bad kernel stack pointer, sig: 6 [#1] LE SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 121 PID: 0 Comm: swapper/121 Not tainted 4.20.0-thiago-original-with-debug+ #57 NIP: 1ec15270 LR: 1ec1526c CTR: 1ecd26c4 REGS: c0001e577d30 TRAP: 0300 Not tainted (4.20.0-thiago-original-with-debug+) MSR: 80001000 CR: 4800 XER: 0020 CFAR: 1ecd27a8 DAR: 8108 DSISR: 4000 IRQMASK: 80009033 GPR00: 1ec1526c 1fafb4b0 GPR04: 01a40968 00e0 dfe8 0018 GPR08: 1f82ae00 1ed025d4 1f827850 GPR12: 01b6d998 c0001eb4e080 c003a1bdbf90 1eea3020 GPR16: c006fdc45100 c004c8b0 c13d5300 GPR20: c006fdc45300 0008 c19d2cf8 c13d6888 GPR24: 0079 c13d688c 0002 c13d688c GPR28: c19cecf0 03c8 1fafb4b0 NIP [1ec15270] 0x1ec15270 LR [1ec1526c] 0x1ec1526c Call Trace: Instruction dump:
Re: [PATCH] PCI: Add no-D3 quirk for Mellanox ConnectX-[45]
On Wed, Jan 09, 2019 at 04:09:02PM +1100, Benjamin Herrenschmidt wrote: > On Mon, 2019-01-07 at 21:01 -0700, Jason Gunthorpe wrote: > > > > > In a very cryptic way that requires manual parsing using non-public > > > docs sadly but yes. From the look of it, it's a completion timeout. > > > > > > Looks to me like we don't get a response to a config space access > > > during the change of D state. I don't know if it's the write of the D3 > > > state itself or the read back though (it's probably detected on the > > > read back or a subsequent read, but that doesn't tell me which specific > > > one failed). > > > > If it is just one card doing it (again, check you have latest > > firmware) I wonder if it is a sketchy PCI-E electrical link that is > > causing a long re-training cycle? Can you tell if the PCI-E link is > > permanently gone or does it eventually return? > > No, it's 100% reproducable on systems with that specific card model, > not card instance, and maybe different systems/cards as well, I'll let > David & Alexey comment further on that. Well, it's 100% reproducable on a particular model of system (garrison) with a particular model of card. I've had some suggestions that it fails with some other systems card card models, but nothing confirmed - the one other system model I've been able to try, which also had a newer card model didn't reproduce the problem. > > Does the card work in Gen 3 when it starts? Is there any indication of > > PCI-E link errors? > > Nope. > > > Everytime or sometimes? > > > > POWER 8 firmware is good? If the link does eventually come back, is > > the POWER8's D3 resumption timeout long enough? > > > > If this doesn't lead to an obvious conclusion you'll probably need to > > connect to IBM's Mellanox support team to get more information from > > the card side. > > We are IBM :-) So far, it seems to be that the card is doing something > not quite right, but we don't know what. We might need to engage > Mellanox themselves. Possibly. On the other hand, I've had it reported that this is a software regression at least with downstream red hat kernels. I haven't yet been able to eliminate factors that might be confusing that, or try to find a working version upstream. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature
Re: [PATCH] PCI: Add no-D3 quirk for Mellanox ConnectX-[45]
On Mon, 2019-01-07 at 21:01 -0700, Jason Gunthorpe wrote: > > > In a very cryptic way that requires manual parsing using non-public > > docs sadly but yes. From the look of it, it's a completion timeout. > > > > Looks to me like we don't get a response to a config space access > > during the change of D state. I don't know if it's the write of the D3 > > state itself or the read back though (it's probably detected on the > > read back or a subsequent read, but that doesn't tell me which specific > > one failed). > > If it is just one card doing it (again, check you have latest > firmware) I wonder if it is a sketchy PCI-E electrical link that is > causing a long re-training cycle? Can you tell if the PCI-E link is > permanently gone or does it eventually return? No, it's 100% reproducable on systems with that specific card model, not card instance, and maybe different systems/cards as well, I'll let David & Alexey comment further on that. > Does the card work in Gen 3 when it starts? Is there any indication of > PCI-E link errors? Nope. > Everytime or sometimes? > > POWER 8 firmware is good? If the link does eventually come back, is > the POWER8's D3 resumption timeout long enough? > > If this doesn't lead to an obvious conclusion you'll probably need to > connect to IBM's Mellanox support team to get more information from > the card side. We are IBM :-) So far, it seems to be that the card is doing something not quite right, but we don't know what. We might need to engage Mellanox themselves. Cheers, Ben.
Re: [PATCH 03/19] KVM: PPC: Book3S HV: check the IRQ controller type
On Mon, Jan 07, 2019 at 07:43:15PM +0100, Cédric Le Goater wrote: > We will have different KVM devices for interrupts, one for the > XICS-over-XIVE mode and one for the XIVE native exploitation > mode. Let's add some checks to make sure we are not mixing the > interfaces in KVM. > > Signed-off-by: Cédric Le Goater Reviewed-by: David Gibson > --- > arch/powerpc/kvm/book3s_xive.c | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c > index f78d002f0fe0..8a4fa45f07f8 100644 > --- a/arch/powerpc/kvm/book3s_xive.c > +++ b/arch/powerpc/kvm/book3s_xive.c > @@ -819,6 +819,9 @@ u64 kvmppc_xive_get_icp(struct kvm_vcpu *vcpu) > { > struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu; > > + if (!kvmppc_xics_enabled(vcpu)) > + return -EPERM; > + > if (!xc) > return 0; > > @@ -835,6 +838,9 @@ int kvmppc_xive_set_icp(struct kvm_vcpu *vcpu, u64 icpval) > u8 cppr, mfrr; > u32 xisr; > > + if (!kvmppc_xics_enabled(vcpu)) > + return -EPERM; > + > if (!xc || !xive) > return -ENOENT; > -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature
Re: [PATCH 02/19] powerpc/xive: add OPAL extensions for the XIVE native exploitation support
On Mon, Jan 07, 2019 at 07:43:14PM +0100, Cédric Le Goater wrote: > The support for XIVE native exploitation mode in Linux/KVM needs a > couple more OPAL calls to configure the sPAPR guest and to get/set the > state of the XIVE internal structures. > > Signed-off-by: Cédric Le Goater Reviewed-by: David Gibson > --- > arch/powerpc/include/asm/opal-api.h | 11 ++- > arch/powerpc/include/asm/opal.h | 7 ++ > arch/powerpc/include/asm/xive.h | 14 +++ > arch/powerpc/sysdev/xive/native.c | 99 +++ > .../powerpc/platforms/powernv/opal-wrappers.S | 3 + > 5 files changed, 130 insertions(+), 4 deletions(-) > > diff --git a/arch/powerpc/include/asm/opal-api.h > b/arch/powerpc/include/asm/opal-api.h > index 870fb7b239ea..cdfc54f78101 100644 > --- a/arch/powerpc/include/asm/opal-api.h > +++ b/arch/powerpc/include/asm/opal-api.h > @@ -186,8 +186,8 @@ > #define OPAL_XIVE_FREE_IRQ 140 > #define OPAL_XIVE_SYNC 141 > #define OPAL_XIVE_DUMP 142 > -#define OPAL_XIVE_RESERVED3 143 > -#define OPAL_XIVE_RESERVED4 144 > +#define OPAL_XIVE_GET_QUEUE_STATE143 > +#define OPAL_XIVE_SET_QUEUE_STATE144 > #define OPAL_SIGNAL_SYSTEM_RESET 145 > #define OPAL_NPU_INIT_CONTEXT146 > #define OPAL_NPU_DESTROY_CONTEXT 147 > @@ -209,8 +209,11 @@ > #define OPAL_SENSOR_GROUP_ENABLE 163 > #define OPAL_PCI_GET_PBCQ_TUNNEL_BAR 164 > #define OPAL_PCI_SET_PBCQ_TUNNEL_BAR 165 > -#define OPAL_NX_COPROC_INIT 167 > -#define OPAL_LAST167 > +#define OPAL_HANDLE_HMI2 166 > +#define OPAL_NX_COPROC_INIT 167 > +#define OPAL_NPU_SET_RELAXED_ORDER 168 > +#define OPAL_NPU_GET_RELAXED_ORDER 169 > +#define OPAL_XIVE_GET_VP_STATE 170 > > #define QUIESCE_HOLD 1 /* Spin all calls at entry */ > #define QUIESCE_REJECT 2 /* Fail all calls with > OPAL_BUSY */ > diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h > index a55b01c90bb1..4e978d4dea5c 100644 > --- a/arch/powerpc/include/asm/opal.h > +++ b/arch/powerpc/include/asm/opal.h > @@ -279,6 +279,13 @@ int64_t opal_xive_allocate_irq(uint32_t chip_id); > int64_t opal_xive_free_irq(uint32_t girq); > int64_t opal_xive_sync(uint32_t type, uint32_t id); > int64_t opal_xive_dump(uint32_t type, uint32_t id); > +int64_t opal_xive_get_queue_state(uint64_t vp, uint32_t prio, > + __be32 *out_qtoggle, > + __be32 *out_qindex); > +int64_t opal_xive_set_queue_state(uint64_t vp, uint32_t prio, > + uint32_t qtoggle, > + uint32_t qindex); > +int64_t opal_xive_get_vp_state(uint64_t vp, __be64 *out_w01); > int64_t opal_pci_set_p2p(uint64_t phb_init, uint64_t phb_target, > uint64_t desc, uint16_t pe_number); > > diff --git a/arch/powerpc/include/asm/xive.h b/arch/powerpc/include/asm/xive.h > index 32f033bfbf42..d6be3e4d9fa4 100644 > --- a/arch/powerpc/include/asm/xive.h > +++ b/arch/powerpc/include/asm/xive.h > @@ -132,12 +132,26 @@ extern int xive_native_configure_queue(u32 vp_id, > struct xive_q *q, u8 prio, > extern void xive_native_disable_queue(u32 vp_id, struct xive_q *q, u8 prio); > > extern void xive_native_sync_source(u32 hw_irq); > +extern void xive_native_sync_queue(u32 hw_irq); > extern bool is_xive_irq(struct irq_chip *chip); > extern int xive_native_enable_vp(u32 vp_id, bool single_escalation); > extern int xive_native_disable_vp(u32 vp_id); > extern int xive_native_get_vp_info(u32 vp_id, u32 *out_cam_id, u32 > *out_chip_id); > extern bool xive_native_has_single_escalation(void); > > +extern int xive_native_get_queue_info(u32 vp_id, uint32_t prio, > + u64 *out_qpage, > + u64 *out_qsize, > + u64 *out_qeoi_page, > + u32 *out_escalate_irq, > + u64 *out_qflags); > + > +extern int xive_native_get_queue_state(u32 vp_id, uint32_t prio, u32 > *qtoggle, > +u32 *qindex); > +extern int xive_native_set_queue_state(u32 vp_id, uint32_t prio, u32 qtoggle, > +u32 qindex); > +extern int xive_native_get_vp_state(u32 vp_id, u64 *out_state); > + > #else > > static inline bool xive_enabled(void) { return false; } > diff --git a/arch/powerpc/sysdev/xive/native.c > b/arch/powerpc/sysdev/xive/native.c > index 1ca127d052a6..0c037e933e55 100644 > --- a/arch/powerpc/sysdev/xive/native.c > +++ b/arch/powerpc/sysdev/xive/native.c > @@ -437,6 +437,12 @@ void
Re: [PATCH] PCI: Add no-D3 quirk for Mellanox ConnectX-[45]
On 06/01/2019 09:43, Benjamin Herrenschmidt wrote: > On Sat, 2019-01-05 at 10:51 -0700, Jason Gunthorpe wrote: >> >>> Interesting. I've investigated this further, though I don't have as >>> many new clues as I'd like. The problem occurs reliably, at least on >>> one particular type of machine (a POWER8 "Garrison" with ConnectX-4). >>> I don't yet know if it occurs with other machines, I'm having trouble >>> getting access to other machines with a suitable card. I didn't >>> manage to reproduce it on a different POWER8 machine with a >>> ConnectX-5, but I don't know if it's the difference in machine or >>> difference in card revision that's important. >> >> Make sure the card has the latest firmware is always good advice.. >> >>> So possibilities that occur to me: >>> * It's something specific about how the vfio-pci driver uses D3 >>> state - have you tried rebinding your device to vfio-pci? >>> * It's something specific about POWER, either the kernel or the PCI >>> bridge hardware >>> * It's something specific about this particular type of machine >> >> Does the EEH indicate what happend to actually trigger it? > > In a very cryptic way that requires manual parsing using non-public > docs sadly but yes. From the look of it, it's a completion timeout. > > Looks to me like we don't get a response to a config space access > during the change of D state. I don't know if it's the write of the D3 > state itself or the read back though (it's probably detected on the > read back or a subsequent read, but that doesn't tell me which specific > one failed). It is write: pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr); > > Some extra logging in OPAL might help pin that down by checking the InA > error state in the config accessor after the config write (and polling > on it for a while as from a CPU perspective I don't knw if the write is > synchronous, probably not). Extra logging gives these straight after that write: nFir:8080 0030006e 8000 PhbSts: 0018 0018 Lem: 02088000 42498e367f502eae 0008 OutErr: 0020 0020 InAErr: 3000 2000 8080 Decoded (my fancy script): nFir:8080 0030006e 8000 |- PCI Nest Fault Isolation Register(FIR) NestBase+0x00 _BE_ = 8080h: | [0..63] 1000 1000 | #16 set: The PHB had a severe error and has fenced the AIB | #24 set: The internal SCOM to ASB bridge has an error | #29..30: Error bit from SCOM FIR engine = 0h |- PCI Nest FIR Mask NestBase+0x03 _BE_ = 0030006eh: | [0..63] 0011 01101110 | #10 set: Any PowerBus data hang poll error(Only checked for CI Stores) | #11 set: Any PowerBus command hang error (domestic address range) | #25 set: A command received ack_dead, foreign data hang, or Link_chk_abort from the foreign interface | #26 set: Any PowerBus command hang error (foreign address range) | #28 set: Error bit from BARS SCOM engines, Nest domain | #29..30: Error bit from SCOM FIR engine = 3h/[0..1] 11 |- PCI Nest FIR WOF (“Who's on First”) NestBase+0x08 _BE_ = 8000h: | [0..63] 1000 | #16 set: The PHB had a severe error and has fenced the AIB | #29..30: Error bit from SCOM FIR engine = 0h | PhbSts: 0018 0018 |- 0x0120 Processor Load/Store Status Register _BE_ = 0018h: | [0..63] 00011000 | #27 set: One of the PHB3’s error status register bits is set | #28 set: One of the PHB3’s first error status register bits is set |- 0x0110 DMA Channel Status Register _BE_ = 0018h: | [0..63] 00011000 | #27 set: One of the PHB3’s error status register bits is set | #28 set: One of the PHB3’s first error status register bits is set | Lem: 02088000 42498e367f502eae 0008 |- 0xC00 LEM FIR Accumulator Register _BE_ = 02088000h: | [0..63] 0010 1000 1000 | #22 set: CFG Access Error | #44 set: PCT Timeout Error | #48 set: PCT Unexpected Completion |- 0xC18 LEM Error Mask Register = 42498e367f502eaeh |- 0xC40 LEM WOF Register _BE_ = 0008h: | [0..63] 1000 | #44 set: PCT Timeout Error | OutErr: 0020 0020 |- 0xD00 Outbound Error Status Register _BE_ = 0020h: | [0..63]
Re: perf trace syscall table generation for powerpc with syscall.tbl
Hi Arnaldo, Yes. I'm aware of it. Just that I was busy with something else so couldn't do it. Thanks for reminding :). Will post a patch soon. Ravi On 1/8/19 10:34 PM, Arnaldo Carvalho de Melo wrote: > Hi Ravi, > > I noticed that in: > > commit ab66dcc76d6ab8fae9d69d149ae38c42605e7fc5 > Author: Firoz Khan > Date: Mon Dec 17 16:10:36 2018 +0530 > > powerpc: generate uapi header and system call table files > > powerpc now generates its syscall tables headers from a syscall.tbl just > like x86 and s390, could you please switch to using it, taking the x86 > and s390 scripts as a starting point, then test on your systems > everything works? > > Thanks, > > - Arnaldo >
Re: [PATCH 01/19] powerpc/xive: export flags for the XIVE native exploitation mode hcalls
On Mon, Jan 07, 2019 at 07:43:13PM +0100, Cédric Le Goater wrote: > These flags are shared between Linux/KVM implementing the hypervisor > calls for the XIVE native exploitation mode and the driver for the > sPAPR guests. > > Signed-off-by: Cédric Le Goater Reviewed-by: David Gibson > --- > arch/powerpc/include/asm/xive.h | 23 +++ > arch/powerpc/sysdev/xive/spapr.c | 28 > 2 files changed, 31 insertions(+), 20 deletions(-) > > diff --git a/arch/powerpc/include/asm/xive.h b/arch/powerpc/include/asm/xive.h > index 3c704f5dd3ae..32f033bfbf42 100644 > --- a/arch/powerpc/include/asm/xive.h > +++ b/arch/powerpc/include/asm/xive.h > @@ -93,6 +93,29 @@ extern void xive_flush_interrupt(void); > /* xmon hook */ > extern void xmon_xive_do_dump(int cpu); > > +/* > + * Hcall flags shared by the sPAPR backend and KVM > + */ > + > +/* H_INT_GET_SOURCE_INFO */ > +#define XIVE_SPAPR_SRC_H_INT_ESB PPC_BIT(60) > +#define XIVE_SPAPR_SRC_LSI PPC_BIT(61) > +#define XIVE_SPAPR_SRC_TRIGGER PPC_BIT(62) > +#define XIVE_SPAPR_SRC_STORE_EOI PPC_BIT(63) > + > +/* H_INT_SET_SOURCE_CONFIG */ > +#define XIVE_SPAPR_SRC_SET_EISN PPC_BIT(62) > +#define XIVE_SPAPR_SRC_MASK PPC_BIT(63) /* unused */ > + > +/* H_INT_SET_QUEUE_CONFIG */ > +#define XIVE_SPAPR_EQ_ALWAYS_NOTIFY PPC_BIT(63) > + > +/* H_INT_SET_QUEUE_CONFIG */ > +#define XIVE_SPAPR_EQ_DEBUG PPC_BIT(63) > + > +/* H_INT_ESB */ > +#define XIVE_SPAPR_ESB_STORE PPC_BIT(63) > + > /* APIs used by KVM */ > extern u32 xive_native_default_eq_shift(void); > extern u32 xive_native_alloc_vp_block(u32 max_vcpus); > diff --git a/arch/powerpc/sysdev/xive/spapr.c > b/arch/powerpc/sysdev/xive/spapr.c > index 575db3b06a6b..730284f838c8 100644 > --- a/arch/powerpc/sysdev/xive/spapr.c > +++ b/arch/powerpc/sysdev/xive/spapr.c > @@ -184,9 +184,6 @@ static long plpar_int_get_source_info(unsigned long flags, > return 0; > } > > -#define XIVE_SRC_SET_EISN (1ull << (63 - 62)) > -#define XIVE_SRC_MASK (1ull << (63 - 63)) /* unused */ > - > static long plpar_int_set_source_config(unsigned long flags, > unsigned long lisn, > unsigned long target, > @@ -243,8 +240,6 @@ static long plpar_int_get_queue_info(unsigned long flags, > return 0; > } > > -#define XIVE_EQ_ALWAYS_NOTIFY (1ull << (63 - 63)) > - > static long plpar_int_set_queue_config(unsigned long flags, > unsigned long target, > unsigned long priority, > @@ -286,8 +281,6 @@ static long plpar_int_sync(unsigned long flags, unsigned > long lisn) > return 0; > } > > -#define XIVE_ESB_FLAG_STORE (1ull << (63 - 63)) > - > static long plpar_int_esb(unsigned long flags, > unsigned long lisn, > unsigned long offset, > @@ -321,7 +314,7 @@ static u64 xive_spapr_esb_rw(u32 lisn, u32 offset, u64 > data, bool write) > unsigned long read_data; > long rc; > > - rc = plpar_int_esb(write ? XIVE_ESB_FLAG_STORE : 0, > + rc = plpar_int_esb(write ? XIVE_SPAPR_ESB_STORE : 0, > lisn, offset, data, _data); > if (rc) > return -1; > @@ -329,11 +322,6 @@ static u64 xive_spapr_esb_rw(u32 lisn, u32 offset, u64 > data, bool write) > return write ? 0 : read_data; > } > > -#define XIVE_SRC_H_INT_ESB (1ull << (63 - 60)) > -#define XIVE_SRC_LSI (1ull << (63 - 61)) > -#define XIVE_SRC_TRIGGER (1ull << (63 - 62)) > -#define XIVE_SRC_STORE_EOI (1ull << (63 - 63)) > - > static int xive_spapr_populate_irq_data(u32 hw_irq, struct xive_irq_data > *data) > { > long rc; > @@ -349,11 +337,11 @@ static int xive_spapr_populate_irq_data(u32 hw_irq, > struct xive_irq_data *data) > if (rc) > return -EINVAL; > > - if (flags & XIVE_SRC_H_INT_ESB) > + if (flags & XIVE_SPAPR_SRC_H_INT_ESB) > data->flags |= XIVE_IRQ_FLAG_H_INT_ESB; > - if (flags & XIVE_SRC_STORE_EOI) > + if (flags & XIVE_SPAPR_SRC_STORE_EOI) > data->flags |= XIVE_IRQ_FLAG_STORE_EOI; > - if (flags & XIVE_SRC_LSI) > + if (flags & XIVE_SPAPR_SRC_LSI) > data->flags |= XIVE_IRQ_FLAG_LSI; > data->eoi_page = eoi_page; > data->esb_shift = esb_shift; > @@ -374,7 +362,7 @@ static int xive_spapr_populate_irq_data(u32 hw_irq, > struct xive_irq_data *data) > data->hw_irq = hw_irq; > > /* Full function page supports trigger */ > - if (flags & XIVE_SRC_TRIGGER) { > + if (flags & XIVE_SPAPR_SRC_TRIGGER) { > data->trig_mmio = data->eoi_mmio; > return 0; > } > @@ -391,8 +379,8 @@ static int xive_spapr_configure_irq(u32 hw_irq, u32 > target, u8 prio, u32 sw_irq) > { > long rc; > > - rc =
Re: [PATCH V6 1/4] mm/cma: Add PF flag to force non cma alloc
On Tue, Jan 08, 2019 at 10:21:07AM +0530, Aneesh Kumar K.V wrote: > This patch add PF_MEMALLOC_NOCMA which make sure any allocation in that > context > is marked non movable and hence cannot be satisfied by CMA region. > > This is useful with get_user_pages_cma_migrate where we take a page pin by > migrating pages from CMA region. Marking the section PF_MEMALLOC_NOCMA ensures > that we avoid uncessary page migration later. > > Suggested-by: Andrea Arcangeli > Signed-off-by: Aneesh Kumar K.V Reviewed-by: Andrea Arcangeli
Re: [PATCH V6 3/4] powerpc/mm/iommu: Allow migration of cma allocated pages during mm_iommu_get
Hello, On Tue, Jan 08, 2019 at 10:21:09AM +0530, Aneesh Kumar K.V wrote: > @@ -187,41 +149,25 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, > unsigned long ua, > goto unlock_exit; > } > > + ret = get_user_pages_cma_migrate(ua, entries, 1, mem->hpages); In terms of gup APIs, I've been wondering if this shall become get_user_pages_longerm(FOLL_CMA_MIGRATE). So basically moving this CMA migrate logic inside get_user_pages_longerm. It depends if powerpc will ever need to bail on dax and/or if other non-powerpc vfio drivers which are already bailing on dax may also later optionally need to avoid interfering with CMA. Aside from the API detail above, this CMA page migration logic seems a good solution for the problem. Thanks, Andrea
Re: [PATCH] lkdtm: Add a tests for NULL pointer dereference
On Fri, Dec 14, 2018 at 7:26 AM Christophe Leroy wrote: > > Introduce lkdtm tests for NULL pointer dereference: check > access or exec at NULL address. Why is this not already covered by the existing tests? (Is there something special about NULL that is being missed?) I'd expect SMAP and SMEP to cover NULL as well. -Kees > > Signed-off-by: Christophe Leroy > --- > drivers/misc/lkdtm/core.c | 2 ++ > drivers/misc/lkdtm/lkdtm.h | 2 ++ > drivers/misc/lkdtm/perms.c | 18 ++ > 3 files changed, 22 insertions(+) > > diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c > index bc76756b7eda..36910e1d5c09 100644 > --- a/drivers/misc/lkdtm/core.c > +++ b/drivers/misc/lkdtm/core.c > @@ -157,7 +157,9 @@ static const struct crashtype crashtypes[] = { > CRASHTYPE(EXEC_VMALLOC), > CRASHTYPE(EXEC_RODATA), > CRASHTYPE(EXEC_USERSPACE), > + CRASHTYPE(EXEC_NULL), > CRASHTYPE(ACCESS_USERSPACE), > + CRASHTYPE(ACCESS_NULL), > CRASHTYPE(WRITE_RO), > CRASHTYPE(WRITE_RO_AFTER_INIT), > CRASHTYPE(WRITE_KERN), > diff --git a/drivers/misc/lkdtm/lkdtm.h b/drivers/misc/lkdtm/lkdtm.h > index 3c6fd327e166..b69ee004a3f7 100644 > --- a/drivers/misc/lkdtm/lkdtm.h > +++ b/drivers/misc/lkdtm/lkdtm.h > @@ -45,7 +45,9 @@ void lkdtm_EXEC_KMALLOC(void); > void lkdtm_EXEC_VMALLOC(void); > void lkdtm_EXEC_RODATA(void); > void lkdtm_EXEC_USERSPACE(void); > +void lkdtm_EXEC_NULL(void); > void lkdtm_ACCESS_USERSPACE(void); > +void lkdtm_ACCESS_NULL(void); > > /* lkdtm_refcount.c */ > void lkdtm_REFCOUNT_INC_OVERFLOW(void); > diff --git a/drivers/misc/lkdtm/perms.c b/drivers/misc/lkdtm/perms.c > index fa54add6375a..62f76d506f04 100644 > --- a/drivers/misc/lkdtm/perms.c > +++ b/drivers/misc/lkdtm/perms.c > @@ -164,6 +164,11 @@ void lkdtm_EXEC_USERSPACE(void) > vm_munmap(user_addr, PAGE_SIZE); > } > > +void lkdtm_EXEC_NULL(void) > +{ > + execute_location(NULL, CODE_AS_IS); > +} > + > void lkdtm_ACCESS_USERSPACE(void) > { > unsigned long user_addr, tmp = 0; > @@ -195,6 +200,19 @@ void lkdtm_ACCESS_USERSPACE(void) > vm_munmap(user_addr, PAGE_SIZE); > } > > +void lkdtm_ACCESS_NULL(void) > +{ > + unsigned long tmp; > + unsigned long *ptr = (unsigned long *)NULL; > + > + pr_info("attempting bad read at %px\n", ptr); > + tmp = *ptr; > + tmp += 0xc0dec0de; > + > + pr_info("attempting bad write at %px\n", ptr); > + *ptr = tmp; > +} > + > void __init lkdtm_perms_init(void) > { > /* Make sure we can write to __ro_after_init values during __init */ > -- > 2.13.3 > -- Kees Cook
Re: Kconfig label updates
On Wed, Jan 9, 2019 at 9:31 AM Bjorn Helgaas wrote: > > Hi, > > I want to update the PCI Kconfig labels so they're more consistent and > useful to users, something like the patch below. IIUC, the items > below are all IBM-related; please correct me if not. > > I'd also like to expand (or remove) "RPA" because Google doesn't find > anything about "IBM RPA", except Robotic Process Automation, which I > think must be something else. > > Is there some text expansion of RPA that we could use that would be > meaningful to a user, i.e., something he/she might find on a nameplate > or in a user manual? "RISC Platform Architecture" apparently. I couldn't find any documentation written in the last decade that uses the term so it's probably a hold-over from when pseries and iseries had separate hardware platforms. Replacing "RPA" with "pseries" would be the least confusing way to handle it. > Ideally the PCI Kconfig labels would match the terms used in > arch/.../Kconfig, e.g., > > config PPC_POWERNV > bool "IBM PowerNV (Non-Virtualized) platform support" > > config PPC_PSERIES > bool "IBM pSeries & new (POWER5-based) iSeries" > > config MARCH_Z900 > bool "IBM zSeries model z800 and z900" > > config MARCH_Z9_109 > bool "IBM System z9" > > Bjorn > > > diff --git a/drivers/pci/hotplug/Kconfig b/drivers/pci/hotplug/Kconfig > index e9f78eb390d2..1c1d145bfd84 100644 > --- a/drivers/pci/hotplug/Kconfig > +++ b/drivers/pci/hotplug/Kconfig > @@ -112,7 +112,7 @@ config HOTPLUG_PCI_SHPC > When in doubt, say N. > > config HOTPLUG_PCI_POWERNV > - tristate "PowerPC PowerNV PCI Hotplug driver" > + tristate "IBM PowerNV PCI Hotplug driver" > depends on PPC_POWERNV && EEH > select OF_DYNAMIC > help > @@ -125,10 +125,11 @@ config HOTPLUG_PCI_POWERNV > When in doubt, say N. > > config HOTPLUG_PCI_RPA > - tristate "RPA PCI Hotplug driver" > + tristate "IBM Power Systems RPA PCI Hotplug driver" > depends on PPC_PSERIES && EEH > help > Say Y here if you have a RPA system that supports PCI Hotplug. > + This includes the earlier pSeries and iSeries. > > To compile this driver as a module, choose M here: the > module will be called rpaphp. > @@ -136,7 +137,7 @@ config HOTPLUG_PCI_RPA > When in doubt, say N. > > config HOTPLUG_PCI_RPA_DLPAR > - tristate "RPA Dynamic Logical Partitioning for I/O slots" > + tristate "IBM RPA Dynamic Logical Partitioning for I/O slots" > depends on HOTPLUG_PCI_RPA > help > Say Y here if your system supports Dynamic Logical Partitioning > @@ -157,7 +158,7 @@ config HOTPLUG_PCI_SGI > When in doubt, say N. > > config HOTPLUG_PCI_S390 > - bool "System z PCI Hotplug Support" > + bool "IBM System z PCI Hotplug Support" > depends on S390 && 64BIT > help > Say Y here if you want to use the System z PCI Hotplug
Re: [PATCH v8 24/25] powerpc: Adopt nvram module for PPC64
On Tue, 8 Jan 2019, Michael Ellerman wrote: > > The reason why that doesn't work boils down to introspection. (This > > was mentioned elsewhere in this email thread.) For example, we > > presently have code like this, > > > > ssize_t nvram_get_size(void) > > { > >if (ppc_md.nvram_size) > >return ppc_md.nvram_size(); > >return -1; > > } > > EXPORT_SYMBOL(nvram_get_size); > > > > This construction means we get to decide at run-time which of the NVRAM > > functions should be used. (Whereas your example makes a build-time > > decision.) > > Right, but we only need to make a runtime decision on powerpc (right?). It's needed in many places outside of powerpc. Otherwise the caller can't determine at run-time which ops are implemented. Hence you have to duplicate the caller for each supported configuration that you build. Already, this precludes a shared misc device implementation and belies the "generic" in drivers/char/generic_nvram.c. --
Kconfig label updates
Hi, I want to update the PCI Kconfig labels so they're more consistent and useful to users, something like the patch below. IIUC, the items below are all IBM-related; please correct me if not. I'd also like to expand (or remove) "RPA" because Google doesn't find anything about "IBM RPA", except Robotic Process Automation, which I think must be something else. Is there some text expansion of RPA that we could use that would be meaningful to a user, i.e., something he/she might find on a nameplate or in a user manual? Ideally the PCI Kconfig labels would match the terms used in arch/.../Kconfig, e.g., config PPC_POWERNV bool "IBM PowerNV (Non-Virtualized) platform support" config PPC_PSERIES bool "IBM pSeries & new (POWER5-based) iSeries" config MARCH_Z900 bool "IBM zSeries model z800 and z900" config MARCH_Z9_109 bool "IBM System z9" Bjorn diff --git a/drivers/pci/hotplug/Kconfig b/drivers/pci/hotplug/Kconfig index e9f78eb390d2..1c1d145bfd84 100644 --- a/drivers/pci/hotplug/Kconfig +++ b/drivers/pci/hotplug/Kconfig @@ -112,7 +112,7 @@ config HOTPLUG_PCI_SHPC When in doubt, say N. config HOTPLUG_PCI_POWERNV - tristate "PowerPC PowerNV PCI Hotplug driver" + tristate "IBM PowerNV PCI Hotplug driver" depends on PPC_POWERNV && EEH select OF_DYNAMIC help @@ -125,10 +125,11 @@ config HOTPLUG_PCI_POWERNV When in doubt, say N. config HOTPLUG_PCI_RPA - tristate "RPA PCI Hotplug driver" + tristate "IBM Power Systems RPA PCI Hotplug driver" depends on PPC_PSERIES && EEH help Say Y here if you have a RPA system that supports PCI Hotplug. + This includes the earlier pSeries and iSeries. To compile this driver as a module, choose M here: the module will be called rpaphp. @@ -136,7 +137,7 @@ config HOTPLUG_PCI_RPA When in doubt, say N. config HOTPLUG_PCI_RPA_DLPAR - tristate "RPA Dynamic Logical Partitioning for I/O slots" + tristate "IBM RPA Dynamic Logical Partitioning for I/O slots" depends on HOTPLUG_PCI_RPA help Say Y here if your system supports Dynamic Logical Partitioning @@ -157,7 +158,7 @@ config HOTPLUG_PCI_SGI When in doubt, say N. config HOTPLUG_PCI_S390 - bool "System z PCI Hotplug Support" + bool "IBM System z PCI Hotplug Support" depends on S390 && 64BIT help Say Y here if you want to use the System z PCI Hotplug
[PATCH] powerpc/ps3: Use struct_size() in kzalloc()
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; void *entry[]; }; instance = kzalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL); This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva --- arch/powerpc/platforms/ps3/device-init.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/arch/powerpc/platforms/ps3/device-init.c b/arch/powerpc/platforms/ps3/device-init.c index e7075aaff1bb..59587b75493d 100644 --- a/arch/powerpc/platforms/ps3/device-init.c +++ b/arch/powerpc/platforms/ps3/device-init.c @@ -354,9 +354,7 @@ static int ps3_setup_storage_dev(const struct ps3_repository_device *repo, repo->dev_index, repo->dev_type, port, blk_size, num_blocks, num_regions); - p = kzalloc(sizeof(struct ps3_storage_device) + - num_regions * sizeof(struct ps3_storage_region), - GFP_KERNEL); + p = kzalloc(struct_size(p, regions, num_regions), GFP_KERNEL); if (!p) { result = -ENOMEM; goto fail_malloc; -- 2.20.1
Re: [PATCH] socket: move compat timeout handling into sock.c
On Tue, Jan 8, 2019 at 12:10 PM Arnd Bergmann wrote: > > This is a cleanup to prepare for the addition of 64-bit time_t > in O_SNDTIMEO/O_RCVTIMEO. The existing compat handler seems > unnecessarily complex and error-prone, moving it all into the > main setsockopt()/getsockopt() implementation requires half > as much code and is easier to extend. > > 32-bit user space can now use old_timeval32 on both 32-bit > and 64-bit machines, while 64-bit code can use > __old_kernel_timeval. > > Signed-off-by: Arnd Bergmann This will make the other series so much nicer. Thank you. Acked-by: Deepa Dinamani
Re: [PATCH 2/3] socket: Rename SO_RCVTIMEO/ SO_SNDTIMEO with _OLD suffixes
On Tue, Jan 8, 2019 at 12:04 PM Arnd Bergmann wrote: > > On Tue, Jan 8, 2019 at 6:24 AM Deepa Dinamani wrote: > > > > SO_RCVTIMEO and SO_SNDTIMEO socket options use struct timeval > > as the time format. struct timeval is not y2038 safe. > > The subsequent patches in the series add support for new socket > > timeout options with _NEW suffix that are y2038 safe. > > Rename the existing options with _OLD suffix forms so that the > > right option is enabled for userspace applications according > > to the architecture and time_t definition of libc. > > > > Signed-off-by: Deepa Dinamani > > Looks good overall. A few minor concerns: > > The description above makes it sound like there is a bug with y2038-safety > in this particular interface, which I think is just not what you meant, > as the change is only needed for compatiblity with new C libraries > that work around the y2038 problem in general by changing their > timeval definition. Right, there is y2038 safety issue, just the libc part that needs to be handled. I will fix the commit text. > > diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c > > index 76976d6e50f9..c98ad9777ad9 100644 > > --- a/fs/dlm/lowcomms.c > > +++ b/fs/dlm/lowcomms.c > > @@ -1089,12 +1089,12 @@ static void sctp_connect_to_sock(struct connection > > *con) > > * since O_NONBLOCK argument in connect() function does not work > > here, > > * then, we should restore the default value of this attribute. > > */ > > - kernel_setsockopt(sock, SOL_SOCKET, SO_SNDTIMEO, (char *), > > + kernel_setsockopt(sock, SOL_SOCKET, SO_SNDTIMEO_OLD, (char *), > > sizeof(tv)); > > result = sock->ops->connect(sock, (struct sockaddr *), > > addr_len, > >0); > > memset(, 0, sizeof(tv)); > > - kernel_setsockopt(sock, SOL_SOCKET, SO_SNDTIMEO, (char *), > > + kernel_setsockopt(sock, SOL_SOCKET, SO_SNDTIMEO_OLD, (char *), > > sizeof(tv)); > > > > if (result == -EINPROGRESS) > > It took me a bit to realize there that this is safe as well even if > we don't use SO_SNDTIMEO_NEW, for the same reason. Correct. > > --- a/net/compat.c > > +++ b/net/compat.c > > @@ -378,7 +378,7 @@ static int compat_sock_setsockopt(struct socket *sock, > > int level, int optname, > > return do_set_attach_filter(sock, level, optname, > > optval, optlen); > > if (!COMPAT_USE_64BIT_TIME && > > - (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO)) > > + (optname == SO_RCVTIMEO_OLD || optname == SO_SNDTIMEO_OLD)) > > return do_set_sock_timeout(sock, level, optname, optval, > > optlen); > > > > return sock_setsockopt(sock, level, optname, optval, optlen); > > @@ -450,7 +450,7 @@ static int compat_sock_getsockopt(struct socket *sock, > > int level, int optname, > > char __user *optval, int __user *optlen) > > { > > if (!COMPAT_USE_64BIT_TIME && > > - (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO)) > > + (optname == SO_RCVTIMEO_OLD || optname == SO_SNDTIMEO_OLD)) > > return do_get_sock_timeout(sock, level, optname, optval, > > optlen); > > return sock_getsockopt(sock, level, optname, optval, optlen); > > } > > I looked at the original code and noticed that it's horrible, which of course > is not your fault, but I wonder if we should just fix it now to avoid that > get_fs()/set_fs() hack, since that code mostly implements what you > also have in your patch 3 (which is done more nicely). I did think of getting rid of set_fs()/ get_fs() here. But, I wasn't sure as the maintainers seemed to prefer to leave to the old code as is in the other series for timestamps. > I'll follow up with a patch to demonstrate what I mean here. Your third > patch will then just have to add another code path so we can handle > all of old_timespec32 (for existing 32-bit user space), __kernel_old_timespec > (for sparc64) and __kernel_sock_timeval (for everything else). Cool, I will rebase on top of your patch. Thanks, Deepa
Re: [PATCH v2 1/2] mm: add probe_user_read()
On Tue, Jan 8, 2019 at 1:11 PM Christophe Leroy wrote: > > > > Le 08/01/2019 à 20:48, Andrew Morton a écrit : > > On Tue, 8 Jan 2019 07:37:44 + (UTC) Christophe Leroy > > wrote: > > > >> In powerpc code, there are several places implementing safe > >> access to user data. This is sometimes implemented using > >> probe_kernel_address() with additional access_ok() verification, > >> sometimes with get_user() enclosed in a pagefault_disable()/enable() > >> pair, etc. : > >> show_user_instructions() > >> bad_stack_expansion() > >> p9_hmi_special_emu() > >> fsl_pci_mcheck_exception() > >> read_user_stack_64() > >> read_user_stack_32() on PPC64 > >> read_user_stack_32() on PPC32 > >> power_pmu_bhrb_to() > >> > >> In the same spirit as probe_kernel_read(), this patch adds > >> probe_user_read(). > >> > >> probe_user_read() does the same as probe_kernel_read() but > >> first checks that it is really a user address. > >> > >> ... > >> > >> --- a/include/linux/uaccess.h > >> +++ b/include/linux/uaccess.h > >> @@ -263,6 +263,40 @@ extern long strncpy_from_unsafe(char *dst, const void > >> *unsafe_addr, long count); > >> #define probe_kernel_address(addr, retval) \ > >> probe_kernel_read(, addr, sizeof(retval)) > >> > >> +/** > >> + * probe_user_read(): safely attempt to read from a user location > >> + * @dst: pointer to the buffer that shall take the data > >> + * @src: address to read from > >> + * @size: size of the data chunk > >> + * > >> + * Returns: 0 on success, -EFAULT on error. > >> + * > >> + * Safely read from address @src to the buffer at @dst. If a kernel fault > >> + * happens, handle that and return -EFAULT. > >> + * > >> + * We ensure that the copy_from_user is executed in atomic context so that > >> + * do_page_fault() doesn't attempt to take mmap_sem. This makes > >> + * probe_user_read() suitable for use within regions where the caller > >> + * already holds mmap_sem, or other locks which nest inside mmap_sem. > >> + */ > >> + > >> +#ifndef probe_user_read > >> +static __always_inline long probe_user_read(void *dst, const void __user > >> *src, > >> +size_t size) > >> +{ > >> +long ret; > >> + > >> +if (!access_ok(src, size)) > >> +return -EFAULT; > >> + > >> +pagefault_disable(); > >> +ret = __copy_from_user_inatomic(dst, src, size); > >> +pagefault_enable(); > >> + > >> +return ret ? -EFAULT : 0; > >> +} > >> +#endif > > > > Why was the __always_inline needed? > > > > This function is pretty large. Why is it inlined? > > > > Kees told to do that way, see https://patchwork.ozlabs.org/patch/986848/ Yeah, I'd like to make sure we can plumb the size checks down into the user copy primitives. -- Kees Cook
Re: [PATCH v2 1/2] mm: add probe_user_read()
Le 08/01/2019 à 20:48, Andrew Morton a écrit : On Tue, 8 Jan 2019 07:37:44 + (UTC) Christophe Leroy wrote: In powerpc code, there are several places implementing safe access to user data. This is sometimes implemented using probe_kernel_address() with additional access_ok() verification, sometimes with get_user() enclosed in a pagefault_disable()/enable() pair, etc. : show_user_instructions() bad_stack_expansion() p9_hmi_special_emu() fsl_pci_mcheck_exception() read_user_stack_64() read_user_stack_32() on PPC64 read_user_stack_32() on PPC32 power_pmu_bhrb_to() In the same spirit as probe_kernel_read(), this patch adds probe_user_read(). probe_user_read() does the same as probe_kernel_read() but first checks that it is really a user address. ... --- a/include/linux/uaccess.h +++ b/include/linux/uaccess.h @@ -263,6 +263,40 @@ extern long strncpy_from_unsafe(char *dst, const void *unsafe_addr, long count); #define probe_kernel_address(addr, retval)\ probe_kernel_read(, addr, sizeof(retval)) +/** + * probe_user_read(): safely attempt to read from a user location + * @dst: pointer to the buffer that shall take the data + * @src: address to read from + * @size: size of the data chunk + * + * Returns: 0 on success, -EFAULT on error. + * + * Safely read from address @src to the buffer at @dst. If a kernel fault + * happens, handle that and return -EFAULT. + * + * We ensure that the copy_from_user is executed in atomic context so that + * do_page_fault() doesn't attempt to take mmap_sem. This makes + * probe_user_read() suitable for use within regions where the caller + * already holds mmap_sem, or other locks which nest inside mmap_sem. + */ + +#ifndef probe_user_read +static __always_inline long probe_user_read(void *dst, const void __user *src, + size_t size) +{ + long ret; + + if (!access_ok(src, size)) + return -EFAULT; + + pagefault_disable(); + ret = __copy_from_user_inatomic(dst, src, size); + pagefault_enable(); + + return ret ? -EFAULT : 0; +} +#endif Why was the __always_inline needed? This function is pretty large. Why is it inlined? Kees told to do that way, see https://patchwork.ozlabs.org/patch/986848/ Christophe
[PATCH] socket: move compat timeout handling into sock.c
This is a cleanup to prepare for the addition of 64-bit time_t in O_SNDTIMEO/O_RCVTIMEO. The existing compat handler seems unnecessarily complex and error-prone, moving it all into the main setsockopt()/getsockopt() implementation requires half as much code and is easier to extend. 32-bit user space can now use old_timeval32 on both 32-bit and 64-bit machines, while 64-bit code can use __old_kernel_timeval. Signed-off-by: Arnd Bergmann --- net/compat.c| 66 + net/core/sock.c | 65 +++- 2 files changed, 44 insertions(+), 87 deletions(-) diff --git a/net/compat.c b/net/compat.c index 959d1c51826d..ce8f6e8cdcd2 100644 --- a/net/compat.c +++ b/net/compat.c @@ -348,28 +348,6 @@ static int do_set_attach_filter(struct socket *sock, int level, int optname, sizeof(struct sock_fprog)); } -static int do_set_sock_timeout(struct socket *sock, int level, - int optname, char __user *optval, unsigned int optlen) -{ - struct compat_timeval __user *up = (struct compat_timeval __user *)optval; - struct timeval ktime; - mm_segment_t old_fs; - int err; - - if (optlen < sizeof(*up)) - return -EINVAL; - if (!access_ok(up, sizeof(*up)) || - __get_user(ktime.tv_sec, >tv_sec) || - __get_user(ktime.tv_usec, >tv_usec)) - return -EFAULT; - old_fs = get_fs(); - set_fs(KERNEL_DS); - err = sock_setsockopt(sock, level, optname, (char *), sizeof(ktime)); - set_fs(old_fs); - - return err; -} - static int compat_sock_setsockopt(struct socket *sock, int level, int optname, char __user *optval, unsigned int optlen) { @@ -377,10 +355,6 @@ static int compat_sock_setsockopt(struct socket *sock, int level, int optname, optname == SO_ATTACH_REUSEPORT_CBPF) return do_set_attach_filter(sock, level, optname, optval, optlen); - if (!COMPAT_USE_64BIT_TIME && - (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO)) - return do_set_sock_timeout(sock, level, optname, optval, optlen); - return sock_setsockopt(sock, level, optname, optval, optlen); } @@ -417,44 +391,6 @@ COMPAT_SYSCALL_DEFINE5(setsockopt, int, fd, int, level, int, optname, return __compat_sys_setsockopt(fd, level, optname, optval, optlen); } -static int do_get_sock_timeout(struct socket *sock, int level, int optname, - char __user *optval, int __user *optlen) -{ - struct compat_timeval __user *up; - struct timeval ktime; - mm_segment_t old_fs; - int len, err; - - up = (struct compat_timeval __user *) optval; - if (get_user(len, optlen)) - return -EFAULT; - if (len < sizeof(*up)) - return -EINVAL; - len = sizeof(ktime); - old_fs = get_fs(); - set_fs(KERNEL_DS); - err = sock_getsockopt(sock, level, optname, (char *) , ); - set_fs(old_fs); - - if (!err) { - if (put_user(sizeof(*up), optlen) || - !access_ok(up, sizeof(*up)) || - __put_user(ktime.tv_sec, >tv_sec) || - __put_user(ktime.tv_usec, >tv_usec)) - err = -EFAULT; - } - return err; -} - -static int compat_sock_getsockopt(struct socket *sock, int level, int optname, - char __user *optval, int __user *optlen) -{ - if (!COMPAT_USE_64BIT_TIME && - (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO)) - return do_get_sock_timeout(sock, level, optname, optval, optlen); - return sock_getsockopt(sock, level, optname, optval, optlen); -} - int compat_sock_get_timestamp(struct sock *sk, struct timeval __user *userstamp) { struct compat_timeval __user *ctv; @@ -527,7 +463,7 @@ static int __compat_sys_getsockopt(int fd, int level, int optname, } if (level == SOL_SOCKET) - err = compat_sock_getsockopt(sock, level, + err = sock_getsockopt(sock, level, optname, optval, optlen); else if (sock->ops->compat_getsockopt) err = sock->ops->compat_getsockopt(sock, level, diff --git a/net/core/sock.c b/net/core/sock.c index 6aa2e7e0b4fb..e50b9a2abc92 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -335,14 +335,48 @@ int __sk_backlog_rcv(struct sock *sk, struct sk_buff *skb) } EXPORT_SYMBOL(__sk_backlog_rcv); +static int sock_get_timeout(long timeo, void *optval) +{ + struct __kernel_old_timeval tv; + + if (timeo == MAX_SCHEDULE_TIMEOUT) { + tv.tv_sec = 0; + tv.tv_usec = 0; + } else { + tv.tv_sec = timeo / HZ; +
Re: [PATCH 2/3] socket: Rename SO_RCVTIMEO/ SO_SNDTIMEO with _OLD suffixes
On Tue, Jan 8, 2019 at 6:24 AM Deepa Dinamani wrote: > > SO_RCVTIMEO and SO_SNDTIMEO socket options use struct timeval > as the time format. struct timeval is not y2038 safe. > The subsequent patches in the series add support for new socket > timeout options with _NEW suffix that are y2038 safe. > Rename the existing options with _OLD suffix forms so that the > right option is enabled for userspace applications according > to the architecture and time_t definition of libc. > > Signed-off-by: Deepa Dinamani Looks good overall. A few minor concerns: The description above makes it sound like there is a bug with y2038-safety in this particular interface, which I think is just not what you meant, as the change is only needed for compatiblity with new C libraries that work around the y2038 problem in general by changing their timeval definition. > diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c > index 76976d6e50f9..c98ad9777ad9 100644 > --- a/fs/dlm/lowcomms.c > +++ b/fs/dlm/lowcomms.c > @@ -1089,12 +1089,12 @@ static void sctp_connect_to_sock(struct connection > *con) > * since O_NONBLOCK argument in connect() function does not work here, > * then, we should restore the default value of this attribute. > */ > - kernel_setsockopt(sock, SOL_SOCKET, SO_SNDTIMEO, (char *), > + kernel_setsockopt(sock, SOL_SOCKET, SO_SNDTIMEO_OLD, (char *), > sizeof(tv)); > result = sock->ops->connect(sock, (struct sockaddr *), addr_len, >0); > memset(, 0, sizeof(tv)); > - kernel_setsockopt(sock, SOL_SOCKET, SO_SNDTIMEO, (char *), > + kernel_setsockopt(sock, SOL_SOCKET, SO_SNDTIMEO_OLD, (char *), > sizeof(tv)); > > if (result == -EINPROGRESS) It took me a bit to realize there that this is safe as well even if we don't use SO_SNDTIMEO_NEW, for the same reason. > --- a/net/compat.c > +++ b/net/compat.c > @@ -378,7 +378,7 @@ static int compat_sock_setsockopt(struct socket *sock, > int level, int optname, > return do_set_attach_filter(sock, level, optname, > optval, optlen); > if (!COMPAT_USE_64BIT_TIME && > - (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO)) > + (optname == SO_RCVTIMEO_OLD || optname == SO_SNDTIMEO_OLD)) > return do_set_sock_timeout(sock, level, optname, optval, > optlen); > > return sock_setsockopt(sock, level, optname, optval, optlen); > @@ -450,7 +450,7 @@ static int compat_sock_getsockopt(struct socket *sock, > int level, int optname, > char __user *optval, int __user *optlen) > { > if (!COMPAT_USE_64BIT_TIME && > - (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO)) > + (optname == SO_RCVTIMEO_OLD || optname == SO_SNDTIMEO_OLD)) > return do_get_sock_timeout(sock, level, optname, optval, > optlen); > return sock_getsockopt(sock, level, optname, optval, optlen); > } I looked at the original code and noticed that it's horrible, which of course is not your fault, but I wonder if we should just fix it now to avoid that get_fs()/set_fs() hack, since that code mostly implements what you also have in your patch 3 (which is done more nicely). I'll follow up with a patch to demonstrate what I mean here. Your third patch will then just have to add another code path so we can handle all of old_timespec32 (for existing 32-bit user space), __kernel_old_timespec (for sparc64) and __kernel_sock_timeval (for everything else). Arnd
Re: [PATCH V6 0/4] mm/kvm/vfio/ppc64: Migrate compound pages out of CMA region
On Tue, 8 Jan 2019 10:21:06 +0530 "Aneesh Kumar K.V" wrote: > ppc64 use CMA area for the allocation of guest page table (hash page table). > We won't > be able to start guest if we fail to allocate hash page table. We have > observed > hash table allocation failure because we failed to migrate pages out of CMA > region > because they were pinned. This happen when we are using VFIO. VFIO on ppc64 > pins > the entire guest RAM. If the guest RAM pages get allocated out of CMA region, > we > won't be able to migrate those pages. The pages are also pinned for the > lifetime of the > guest. > > Currently we support migration of non-compound pages. With THP and with the > addition of > hugetlb migration we can end up allocating compound pages from CMA region. > This > patch series add support for migrating compound pages. The first path adds > the helper > get_user_pages_cma_migrate() which pin the page making sure we migrate them > out of > CMA region before incrementing the reference count. Does this code do anything for architectures other than powerpc? If not, should we be adding the ifdefs to avoid burdening other architectures with unused code?
Re: [PATCH v2 1/2] mm: add probe_user_read()
On Tue, 8 Jan 2019 07:37:44 + (UTC) Christophe Leroy wrote: > In powerpc code, there are several places implementing safe > access to user data. This is sometimes implemented using > probe_kernel_address() with additional access_ok() verification, > sometimes with get_user() enclosed in a pagefault_disable()/enable() > pair, etc. : > show_user_instructions() > bad_stack_expansion() > p9_hmi_special_emu() > fsl_pci_mcheck_exception() > read_user_stack_64() > read_user_stack_32() on PPC64 > read_user_stack_32() on PPC32 > power_pmu_bhrb_to() > > In the same spirit as probe_kernel_read(), this patch adds > probe_user_read(). > > probe_user_read() does the same as probe_kernel_read() but > first checks that it is really a user address. > > ... > > --- a/include/linux/uaccess.h > +++ b/include/linux/uaccess.h > @@ -263,6 +263,40 @@ extern long strncpy_from_unsafe(char *dst, const void > *unsafe_addr, long count); > #define probe_kernel_address(addr, retval) \ > probe_kernel_read(, addr, sizeof(retval)) > > +/** > + * probe_user_read(): safely attempt to read from a user location > + * @dst: pointer to the buffer that shall take the data > + * @src: address to read from > + * @size: size of the data chunk > + * > + * Returns: 0 on success, -EFAULT on error. > + * > + * Safely read from address @src to the buffer at @dst. If a kernel fault > + * happens, handle that and return -EFAULT. > + * > + * We ensure that the copy_from_user is executed in atomic context so that > + * do_page_fault() doesn't attempt to take mmap_sem. This makes > + * probe_user_read() suitable for use within regions where the caller > + * already holds mmap_sem, or other locks which nest inside mmap_sem. > + */ > + > +#ifndef probe_user_read > +static __always_inline long probe_user_read(void *dst, const void __user > *src, > + size_t size) > +{ > + long ret; > + > + if (!access_ok(src, size)) > + return -EFAULT; > + > + pagefault_disable(); > + ret = __copy_from_user_inatomic(dst, src, size); > + pagefault_enable(); > + > + return ret ? -EFAULT : 0; > +} > +#endif Why was the __always_inline needed? This function is pretty large. Why is it inlined?
[PATCH AUTOSEL 3.18 05/19] powerpc/pseries/cpuidle: Fix preempt warning
From: Breno Leitao [ Upstream commit 2b038cbc5fcf12a7ee1cc9bfd5da1e46dacdee87 ] When booting a pseries kernel with PREEMPT enabled, it dumps the following warning: BUG: using smp_processor_id() in preemptible [] code: swapper/0/1 caller is pseries_processor_idle_init+0x5c/0x22c CPU: 13 PID: 1 Comm: swapper/0 Not tainted 4.20.0-rc3-00090-g12201a0128bc-dirty #828 Call Trace: [c00429437ab0] [c09c8878] dump_stack+0xec/0x164 (unreliable) [c00429437b00] [c05f2f24] check_preemption_disabled+0x154/0x160 [c00429437b90] [c0cab8e8] pseries_processor_idle_init+0x5c/0x22c [c00429437c10] [c0010ed4] do_one_initcall+0x64/0x300 [c00429437ce0] [c0c54500] kernel_init_freeable+0x3f0/0x500 [c00429437db0] [c00112dc] kernel_init+0x2c/0x160 [c00429437e20] [c000c1d0] ret_from_kernel_thread+0x5c/0x6c This happens because the code calls get_lppaca() which calls get_paca() and it checks if preemption is disabled through check_preemption_disabled(). Preemption should be disabled because the per CPU variable may make no sense if there is a preemption (and a CPU switch) after it reads the per CPU data and when it is used. In this device driver specifically, it is not a problem, because this code just needs to have access to one lppaca struct, and it does not matter if it is the current per CPU lppaca struct or not (i.e. when there is a preemption and a CPU migration). That said, the most appropriate fix seems to be related to avoiding the debug_smp_processor_id() call at get_paca(), instead of calling preempt_disable() before get_paca(). Signed-off-by: Breno Leitao Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin --- drivers/cpuidle/cpuidle-pseries.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c index 6f7b01956885..b755d3a54e3c 100644 --- a/drivers/cpuidle/cpuidle-pseries.c +++ b/drivers/cpuidle/cpuidle-pseries.c @@ -237,7 +237,13 @@ static int pseries_idle_probe(void) return -ENODEV; if (firmware_has_feature(FW_FEATURE_SPLPAR)) { - if (lppaca_shared_proc(get_lppaca())) { + /* +* Use local_paca instead of get_lppaca() since +* preemption is not disabled, and it is not required in +* fact, since lppaca_ptr does not need to be the value +* associated to the current CPU, it can be from any CPU. +*/ + if (lppaca_shared_proc(local_paca->lppaca_ptr)) { cpuidle_state_table = shared_states; max_idle_state = ARRAY_SIZE(shared_states); } else { -- 2.19.1
[PATCH AUTOSEL 4.4 10/28] powerpc/pseries/cpuidle: Fix preempt warning
From: Breno Leitao [ Upstream commit 2b038cbc5fcf12a7ee1cc9bfd5da1e46dacdee87 ] When booting a pseries kernel with PREEMPT enabled, it dumps the following warning: BUG: using smp_processor_id() in preemptible [] code: swapper/0/1 caller is pseries_processor_idle_init+0x5c/0x22c CPU: 13 PID: 1 Comm: swapper/0 Not tainted 4.20.0-rc3-00090-g12201a0128bc-dirty #828 Call Trace: [c00429437ab0] [c09c8878] dump_stack+0xec/0x164 (unreliable) [c00429437b00] [c05f2f24] check_preemption_disabled+0x154/0x160 [c00429437b90] [c0cab8e8] pseries_processor_idle_init+0x5c/0x22c [c00429437c10] [c0010ed4] do_one_initcall+0x64/0x300 [c00429437ce0] [c0c54500] kernel_init_freeable+0x3f0/0x500 [c00429437db0] [c00112dc] kernel_init+0x2c/0x160 [c00429437e20] [c000c1d0] ret_from_kernel_thread+0x5c/0x6c This happens because the code calls get_lppaca() which calls get_paca() and it checks if preemption is disabled through check_preemption_disabled(). Preemption should be disabled because the per CPU variable may make no sense if there is a preemption (and a CPU switch) after it reads the per CPU data and when it is used. In this device driver specifically, it is not a problem, because this code just needs to have access to one lppaca struct, and it does not matter if it is the current per CPU lppaca struct or not (i.e. when there is a preemption and a CPU migration). That said, the most appropriate fix seems to be related to avoiding the debug_smp_processor_id() call at get_paca(), instead of calling preempt_disable() before get_paca(). Signed-off-by: Breno Leitao Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin --- drivers/cpuidle/cpuidle-pseries.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c index 07135e009d8b..601a6c3acc7f 100644 --- a/drivers/cpuidle/cpuidle-pseries.c +++ b/drivers/cpuidle/cpuidle-pseries.c @@ -240,7 +240,13 @@ static int pseries_idle_probe(void) return -ENODEV; if (firmware_has_feature(FW_FEATURE_SPLPAR)) { - if (lppaca_shared_proc(get_lppaca())) { + /* +* Use local_paca instead of get_lppaca() since +* preemption is not disabled, and it is not required in +* fact, since lppaca_ptr does not need to be the value +* associated to the current CPU, it can be from any CPU. +*/ + if (lppaca_shared_proc(local_paca->lppaca_ptr)) { cpuidle_state_table = shared_states; max_idle_state = ARRAY_SIZE(shared_states); } else { -- 2.19.1
[PATCH AUTOSEL 4.9 15/36] powerpc/pseries/cpuidle: Fix preempt warning
From: Breno Leitao [ Upstream commit 2b038cbc5fcf12a7ee1cc9bfd5da1e46dacdee87 ] When booting a pseries kernel with PREEMPT enabled, it dumps the following warning: BUG: using smp_processor_id() in preemptible [] code: swapper/0/1 caller is pseries_processor_idle_init+0x5c/0x22c CPU: 13 PID: 1 Comm: swapper/0 Not tainted 4.20.0-rc3-00090-g12201a0128bc-dirty #828 Call Trace: [c00429437ab0] [c09c8878] dump_stack+0xec/0x164 (unreliable) [c00429437b00] [c05f2f24] check_preemption_disabled+0x154/0x160 [c00429437b90] [c0cab8e8] pseries_processor_idle_init+0x5c/0x22c [c00429437c10] [c0010ed4] do_one_initcall+0x64/0x300 [c00429437ce0] [c0c54500] kernel_init_freeable+0x3f0/0x500 [c00429437db0] [c00112dc] kernel_init+0x2c/0x160 [c00429437e20] [c000c1d0] ret_from_kernel_thread+0x5c/0x6c This happens because the code calls get_lppaca() which calls get_paca() and it checks if preemption is disabled through check_preemption_disabled(). Preemption should be disabled because the per CPU variable may make no sense if there is a preemption (and a CPU switch) after it reads the per CPU data and when it is used. In this device driver specifically, it is not a problem, because this code just needs to have access to one lppaca struct, and it does not matter if it is the current per CPU lppaca struct or not (i.e. when there is a preemption and a CPU migration). That said, the most appropriate fix seems to be related to avoiding the debug_smp_processor_id() call at get_paca(), instead of calling preempt_disable() before get_paca(). Signed-off-by: Breno Leitao Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin --- drivers/cpuidle/cpuidle-pseries.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c index 166ccd711ec9..83203c59bf59 100644 --- a/drivers/cpuidle/cpuidle-pseries.c +++ b/drivers/cpuidle/cpuidle-pseries.c @@ -230,7 +230,13 @@ static int pseries_idle_probe(void) return -ENODEV; if (firmware_has_feature(FW_FEATURE_SPLPAR)) { - if (lppaca_shared_proc(get_lppaca())) { + /* +* Use local_paca instead of get_lppaca() since +* preemption is not disabled, and it is not required in +* fact, since lppaca_ptr does not need to be the value +* associated to the current CPU, it can be from any CPU. +*/ + if (lppaca_shared_proc(local_paca->lppaca_ptr)) { cpuidle_state_table = shared_states; max_idle_state = ARRAY_SIZE(shared_states); } else { -- 2.19.1
[PATCH AUTOSEL 4.9 14/36] powerpc/xmon: Fix invocation inside lock region
From: Breno Leitao [ Upstream commit 8d4a862276a9c30a269d368d324fb56529e6d5fd ] Currently xmon needs to get devtree_lock (through rtas_token()) during its invocation (at crash time). If there is a crash while devtree_lock is being held, then xmon tries to get the lock but spins forever and never get into the interactive debugger, as in the following case: int *ptr = NULL; raw_spin_lock_irqsave(_lock, flags); *ptr = 0xdeadbeef; This patch avoids calling rtas_token(), thus trying to get the same lock, at crash time. This new mechanism proposes getting the token at initialization time (xmon_init()) and just consuming it at crash time. This would allow xmon to be possible invoked independent of devtree_lock being held or not. Signed-off-by: Breno Leitao Reviewed-by: Thiago Jung Bauermann Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin --- arch/powerpc/xmon/xmon.c | 18 ++ 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index 760545519a0b..687e8b8bf5c6 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -73,6 +73,9 @@ static int xmon_gate; #define xmon_owner 0 #endif /* CONFIG_SMP */ +#ifdef CONFIG_PPC_PSERIES +static int set_indicator_token = RTAS_UNKNOWN_SERVICE; +#endif static unsigned long in_xmon __read_mostly = 0; static unsigned long adrs; @@ -340,7 +343,6 @@ static inline void disable_surveillance(void) #ifdef CONFIG_PPC_PSERIES /* Since this can't be a module, args should end up below 4GB. */ static struct rtas_args args; - int token; /* * At this point we have got all the cpus we can into @@ -349,11 +351,11 @@ static inline void disable_surveillance(void) * If we did try to take rtas.lock there would be a * real possibility of deadlock. */ - token = rtas_token("set-indicator"); - if (token == RTAS_UNKNOWN_SERVICE) + if (set_indicator_token == RTAS_UNKNOWN_SERVICE) return; - rtas_call_unlocked(, token, 3, 1, NULL, SURVEILLANCE_TOKEN, 0, 0); + rtas_call_unlocked(, set_indicator_token, 3, 1, NULL, + SURVEILLANCE_TOKEN, 0, 0); #endif /* CONFIG_PPC_PSERIES */ } @@ -3227,6 +3229,14 @@ static void xmon_init(int enable) __debugger_iabr_match = xmon_iabr_match; __debugger_break_match = xmon_break_match; __debugger_fault_handler = xmon_fault_handler; + +#ifdef CONFIG_PPC_PSERIES + /* +* Get the token here to avoid trying to get a lock +* during the crash, causing a deadlock. +*/ + set_indicator_token = rtas_token("set-indicator"); +#endif } else { __debugger = NULL; __debugger_ipi = NULL; -- 2.19.1
[PATCH AUTOSEL 4.14 19/53] powerpc/pseries/cpuidle: Fix preempt warning
From: Breno Leitao [ Upstream commit 2b038cbc5fcf12a7ee1cc9bfd5da1e46dacdee87 ] When booting a pseries kernel with PREEMPT enabled, it dumps the following warning: BUG: using smp_processor_id() in preemptible [] code: swapper/0/1 caller is pseries_processor_idle_init+0x5c/0x22c CPU: 13 PID: 1 Comm: swapper/0 Not tainted 4.20.0-rc3-00090-g12201a0128bc-dirty #828 Call Trace: [c00429437ab0] [c09c8878] dump_stack+0xec/0x164 (unreliable) [c00429437b00] [c05f2f24] check_preemption_disabled+0x154/0x160 [c00429437b90] [c0cab8e8] pseries_processor_idle_init+0x5c/0x22c [c00429437c10] [c0010ed4] do_one_initcall+0x64/0x300 [c00429437ce0] [c0c54500] kernel_init_freeable+0x3f0/0x500 [c00429437db0] [c00112dc] kernel_init+0x2c/0x160 [c00429437e20] [c000c1d0] ret_from_kernel_thread+0x5c/0x6c This happens because the code calls get_lppaca() which calls get_paca() and it checks if preemption is disabled through check_preemption_disabled(). Preemption should be disabled because the per CPU variable may make no sense if there is a preemption (and a CPU switch) after it reads the per CPU data and when it is used. In this device driver specifically, it is not a problem, because this code just needs to have access to one lppaca struct, and it does not matter if it is the current per CPU lppaca struct or not (i.e. when there is a preemption and a CPU migration). That said, the most appropriate fix seems to be related to avoiding the debug_smp_processor_id() call at get_paca(), instead of calling preempt_disable() before get_paca(). Signed-off-by: Breno Leitao Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin --- drivers/cpuidle/cpuidle-pseries.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c index a187a39fb866..7f21c6a57178 100644 --- a/drivers/cpuidle/cpuidle-pseries.c +++ b/drivers/cpuidle/cpuidle-pseries.c @@ -239,7 +239,13 @@ static int pseries_idle_probe(void) return -ENODEV; if (firmware_has_feature(FW_FEATURE_SPLPAR)) { - if (lppaca_shared_proc(get_lppaca())) { + /* +* Use local_paca instead of get_lppaca() since +* preemption is not disabled, and it is not required in +* fact, since lppaca_ptr does not need to be the value +* associated to the current CPU, it can be from any CPU. +*/ + if (lppaca_shared_proc(local_paca->lppaca_ptr)) { cpuidle_state_table = shared_states; max_idle_state = ARRAY_SIZE(shared_states); } else { -- 2.19.1
[PATCH AUTOSEL 4.14 18/53] powerpc/xmon: Fix invocation inside lock region
From: Breno Leitao [ Upstream commit 8d4a862276a9c30a269d368d324fb56529e6d5fd ] Currently xmon needs to get devtree_lock (through rtas_token()) during its invocation (at crash time). If there is a crash while devtree_lock is being held, then xmon tries to get the lock but spins forever and never get into the interactive debugger, as in the following case: int *ptr = NULL; raw_spin_lock_irqsave(_lock, flags); *ptr = 0xdeadbeef; This patch avoids calling rtas_token(), thus trying to get the same lock, at crash time. This new mechanism proposes getting the token at initialization time (xmon_init()) and just consuming it at crash time. This would allow xmon to be possible invoked independent of devtree_lock being held or not. Signed-off-by: Breno Leitao Reviewed-by: Thiago Jung Bauermann Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin --- arch/powerpc/xmon/xmon.c | 18 ++ 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index a5938fadd031..f752f771f29d 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -78,6 +78,9 @@ static int xmon_gate; #define xmon_owner 0 #endif /* CONFIG_SMP */ +#ifdef CONFIG_PPC_PSERIES +static int set_indicator_token = RTAS_UNKNOWN_SERVICE; +#endif static unsigned long in_xmon __read_mostly = 0; static int xmon_on = IS_ENABLED(CONFIG_XMON_DEFAULT); @@ -357,7 +360,6 @@ static inline void disable_surveillance(void) #ifdef CONFIG_PPC_PSERIES /* Since this can't be a module, args should end up below 4GB. */ static struct rtas_args args; - int token; /* * At this point we have got all the cpus we can into @@ -366,11 +368,11 @@ static inline void disable_surveillance(void) * If we did try to take rtas.lock there would be a * real possibility of deadlock. */ - token = rtas_token("set-indicator"); - if (token == RTAS_UNKNOWN_SERVICE) + if (set_indicator_token == RTAS_UNKNOWN_SERVICE) return; - rtas_call_unlocked(, token, 3, 1, NULL, SURVEILLANCE_TOKEN, 0, 0); + rtas_call_unlocked(, set_indicator_token, 3, 1, NULL, + SURVEILLANCE_TOKEN, 0, 0); #endif /* CONFIG_PPC_PSERIES */ } @@ -3472,6 +3474,14 @@ static void xmon_init(int enable) __debugger_iabr_match = xmon_iabr_match; __debugger_break_match = xmon_break_match; __debugger_fault_handler = xmon_fault_handler; + +#ifdef CONFIG_PPC_PSERIES + /* +* Get the token here to avoid trying to get a lock +* during the crash, causing a deadlock. +*/ + set_indicator_token = rtas_token("set-indicator"); +#endif } else { __debugger = NULL; __debugger_ipi = NULL; -- 2.19.1
[PATCH AUTOSEL 4.19 38/97] powerpc/pseries/cpuidle: Fix preempt warning
From: Breno Leitao [ Upstream commit 2b038cbc5fcf12a7ee1cc9bfd5da1e46dacdee87 ] When booting a pseries kernel with PREEMPT enabled, it dumps the following warning: BUG: using smp_processor_id() in preemptible [] code: swapper/0/1 caller is pseries_processor_idle_init+0x5c/0x22c CPU: 13 PID: 1 Comm: swapper/0 Not tainted 4.20.0-rc3-00090-g12201a0128bc-dirty #828 Call Trace: [c00429437ab0] [c09c8878] dump_stack+0xec/0x164 (unreliable) [c00429437b00] [c05f2f24] check_preemption_disabled+0x154/0x160 [c00429437b90] [c0cab8e8] pseries_processor_idle_init+0x5c/0x22c [c00429437c10] [c0010ed4] do_one_initcall+0x64/0x300 [c00429437ce0] [c0c54500] kernel_init_freeable+0x3f0/0x500 [c00429437db0] [c00112dc] kernel_init+0x2c/0x160 [c00429437e20] [c000c1d0] ret_from_kernel_thread+0x5c/0x6c This happens because the code calls get_lppaca() which calls get_paca() and it checks if preemption is disabled through check_preemption_disabled(). Preemption should be disabled because the per CPU variable may make no sense if there is a preemption (and a CPU switch) after it reads the per CPU data and when it is used. In this device driver specifically, it is not a problem, because this code just needs to have access to one lppaca struct, and it does not matter if it is the current per CPU lppaca struct or not (i.e. when there is a preemption and a CPU migration). That said, the most appropriate fix seems to be related to avoiding the debug_smp_processor_id() call at get_paca(), instead of calling preempt_disable() before get_paca(). Signed-off-by: Breno Leitao Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin --- drivers/cpuidle/cpuidle-pseries.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c index 9e56bc411061..74c247972bb3 100644 --- a/drivers/cpuidle/cpuidle-pseries.c +++ b/drivers/cpuidle/cpuidle-pseries.c @@ -247,7 +247,13 @@ static int pseries_idle_probe(void) return -ENODEV; if (firmware_has_feature(FW_FEATURE_SPLPAR)) { - if (lppaca_shared_proc(get_lppaca())) { + /* +* Use local_paca instead of get_lppaca() since +* preemption is not disabled, and it is not required in +* fact, since lppaca_ptr does not need to be the value +* associated to the current CPU, it can be from any CPU. +*/ + if (lppaca_shared_proc(local_paca->lppaca_ptr)) { cpuidle_state_table = shared_states; max_idle_state = ARRAY_SIZE(shared_states); } else { -- 2.19.1
[PATCH AUTOSEL 4.19 37/97] powerpc/xmon: Fix invocation inside lock region
From: Breno Leitao [ Upstream commit 8d4a862276a9c30a269d368d324fb56529e6d5fd ] Currently xmon needs to get devtree_lock (through rtas_token()) during its invocation (at crash time). If there is a crash while devtree_lock is being held, then xmon tries to get the lock but spins forever and never get into the interactive debugger, as in the following case: int *ptr = NULL; raw_spin_lock_irqsave(_lock, flags); *ptr = 0xdeadbeef; This patch avoids calling rtas_token(), thus trying to get the same lock, at crash time. This new mechanism proposes getting the token at initialization time (xmon_init()) and just consuming it at crash time. This would allow xmon to be possible invoked independent of devtree_lock being held or not. Signed-off-by: Breno Leitao Reviewed-by: Thiago Jung Bauermann Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin --- arch/powerpc/xmon/xmon.c | 18 ++ 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index 4264aedc7775..dd6badc31f45 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -75,6 +75,9 @@ static int xmon_gate; #define xmon_owner 0 #endif /* CONFIG_SMP */ +#ifdef CONFIG_PPC_PSERIES +static int set_indicator_token = RTAS_UNKNOWN_SERVICE; +#endif static unsigned long in_xmon __read_mostly = 0; static int xmon_on = IS_ENABLED(CONFIG_XMON_DEFAULT); @@ -358,7 +361,6 @@ static inline void disable_surveillance(void) #ifdef CONFIG_PPC_PSERIES /* Since this can't be a module, args should end up below 4GB. */ static struct rtas_args args; - int token; /* * At this point we have got all the cpus we can into @@ -367,11 +369,11 @@ static inline void disable_surveillance(void) * If we did try to take rtas.lock there would be a * real possibility of deadlock. */ - token = rtas_token("set-indicator"); - if (token == RTAS_UNKNOWN_SERVICE) + if (set_indicator_token == RTAS_UNKNOWN_SERVICE) return; - rtas_call_unlocked(, token, 3, 1, NULL, SURVEILLANCE_TOKEN, 0, 0); + rtas_call_unlocked(, set_indicator_token, 3, 1, NULL, + SURVEILLANCE_TOKEN, 0, 0); #endif /* CONFIG_PPC_PSERIES */ } @@ -3672,6 +3674,14 @@ static void xmon_init(int enable) __debugger_iabr_match = xmon_iabr_match; __debugger_break_match = xmon_break_match; __debugger_fault_handler = xmon_fault_handler; + +#ifdef CONFIG_PPC_PSERIES + /* +* Get the token here to avoid trying to get a lock +* during the crash, causing a deadlock. +*/ + set_indicator_token = rtas_token("set-indicator"); +#endif } else { __debugger = NULL; __debugger_ipi = NULL; -- 2.19.1
[PATCH AUTOSEL 4.20 108/117] KVM: PPC: Book3S HV: Apply combination of host and l1 pte rc for nested guest
From: Suraj Jitindar Singh [ Upstream commit 8b23eee4e55a32a2b51a180dfd27a8d214acc7a1 ] The shadow page table contains ptes for translations from nested guest address to host address. Currently when creating these ptes we take the rc bits from the pte for the L1 guest address to host address translation. This is incorrect as we must also factor in the rc bits from the pte for the nested guest address to L1 guest address translation (as contained in the L1 guest partition table for the nested guest). By not calculating these bits correctly L1 may not have been correctly notified when it needed to update its rc bits in the partition table it maintains for its nested guest. Modify the code so that the rc bits in the resultant pte for the L2->L0 translation are the 'and' of the rc bits in the L2->L1 pte and the L1->L0 pte, also accounting for whether this was a write access when setting the dirty bit. Signed-off-by: Suraj Jitindar Singh Signed-off-by: Paul Mackerras Signed-off-by: Sasha Levin --- arch/powerpc/kvm/book3s_hv_nested.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c index fc64535e4c00..f8176ae3a5a7 100644 --- a/arch/powerpc/kvm/book3s_hv_nested.c +++ b/arch/powerpc/kvm/book3s_hv_nested.c @@ -1229,6 +1229,9 @@ static long int __kvmhv_nested_page_fault(struct kvm_vcpu *vcpu, perm |= gpte.may_read ? 0UL : _PAGE_READ; perm |= gpte.may_write ? 0UL : _PAGE_WRITE; perm |= gpte.may_execute ? 0UL : _PAGE_EXEC; + /* Only set accessed/dirty (rc) bits if set in host and l1 guest ptes */ + perm |= (gpte.rc & _PAGE_ACCESSED) ? 0UL : _PAGE_ACCESSED; + perm |= ((gpte.rc & _PAGE_DIRTY) && writing) ? 0UL : _PAGE_DIRTY; pte = __pte(pte_val(pte) & ~perm); /* What size pte can we insert? */ -- 2.19.1
[PATCH AUTOSEL 4.20 107/117] KVM: PPC: Book3S HV: Align gfn to L1 page size when inserting nest-rmap entry
From: Suraj Jitindar Singh [ Upstream commit 8400f8740651c1a3081c30b46004451c448f4d5f ] Nested rmap entries are used to store the translation from L1 gpa to L2 gpa when entries are inserted into the shadow (nested) page tables. This rmap list is located by indexing the rmap array in the memslot by L1 gfn. When we come to search for these entries we only know the L1 page size (which could be PAGE_SIZE, 2M or a 1G page) and so can only select a gfn aligned to that size. This means that when we insert the entry, so we can find it later, we need to align the gfn we use to select the rmap list in which to insert the entry to L1 page size as well. By not doing this we were missing nested rmap entries when modifying L1 ptes which were for a page also passed through to an L2 guest. Signed-off-by: Suraj Jitindar Singh Signed-off-by: Paul Mackerras Signed-off-by: Sasha Levin --- arch/powerpc/kvm/book3s_hv_nested.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c index 401d2ecbebc5..fc64535e4c00 100644 --- a/arch/powerpc/kvm/book3s_hv_nested.c +++ b/arch/powerpc/kvm/book3s_hv_nested.c @@ -1220,6 +1220,8 @@ static long int __kvmhv_nested_page_fault(struct kvm_vcpu *vcpu, return ret; shift = kvmppc_radix_level_to_shift(level); } + /* Align gfn to the start of the page */ + gfn = (gpa & ~((1UL << shift) - 1)) >> PAGE_SHIFT; /* 3. Compute the pte we need to insert for nest_gpa -> host r_addr */ -- 2.19.1
[PATCH AUTOSEL 4.20 048/117] powerpc/pseries/cpuidle: Fix preempt warning
From: Breno Leitao [ Upstream commit 2b038cbc5fcf12a7ee1cc9bfd5da1e46dacdee87 ] When booting a pseries kernel with PREEMPT enabled, it dumps the following warning: BUG: using smp_processor_id() in preemptible [] code: swapper/0/1 caller is pseries_processor_idle_init+0x5c/0x22c CPU: 13 PID: 1 Comm: swapper/0 Not tainted 4.20.0-rc3-00090-g12201a0128bc-dirty #828 Call Trace: [c00429437ab0] [c09c8878] dump_stack+0xec/0x164 (unreliable) [c00429437b00] [c05f2f24] check_preemption_disabled+0x154/0x160 [c00429437b90] [c0cab8e8] pseries_processor_idle_init+0x5c/0x22c [c00429437c10] [c0010ed4] do_one_initcall+0x64/0x300 [c00429437ce0] [c0c54500] kernel_init_freeable+0x3f0/0x500 [c00429437db0] [c00112dc] kernel_init+0x2c/0x160 [c00429437e20] [c000c1d0] ret_from_kernel_thread+0x5c/0x6c This happens because the code calls get_lppaca() which calls get_paca() and it checks if preemption is disabled through check_preemption_disabled(). Preemption should be disabled because the per CPU variable may make no sense if there is a preemption (and a CPU switch) after it reads the per CPU data and when it is used. In this device driver specifically, it is not a problem, because this code just needs to have access to one lppaca struct, and it does not matter if it is the current per CPU lppaca struct or not (i.e. when there is a preemption and a CPU migration). That said, the most appropriate fix seems to be related to avoiding the debug_smp_processor_id() call at get_paca(), instead of calling preempt_disable() before get_paca(). Signed-off-by: Breno Leitao Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin --- drivers/cpuidle/cpuidle-pseries.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c index 9e56bc411061..74c247972bb3 100644 --- a/drivers/cpuidle/cpuidle-pseries.c +++ b/drivers/cpuidle/cpuidle-pseries.c @@ -247,7 +247,13 @@ static int pseries_idle_probe(void) return -ENODEV; if (firmware_has_feature(FW_FEATURE_SPLPAR)) { - if (lppaca_shared_proc(get_lppaca())) { + /* +* Use local_paca instead of get_lppaca() since +* preemption is not disabled, and it is not required in +* fact, since lppaca_ptr does not need to be the value +* associated to the current CPU, it can be from any CPU. +*/ + if (lppaca_shared_proc(local_paca->lppaca_ptr)) { cpuidle_state_table = shared_states; max_idle_state = ARRAY_SIZE(shared_states); } else { -- 2.19.1
[PATCH AUTOSEL 4.20 047/117] powerpc/xmon: Fix invocation inside lock region
From: Breno Leitao [ Upstream commit 8d4a862276a9c30a269d368d324fb56529e6d5fd ] Currently xmon needs to get devtree_lock (through rtas_token()) during its invocation (at crash time). If there is a crash while devtree_lock is being held, then xmon tries to get the lock but spins forever and never get into the interactive debugger, as in the following case: int *ptr = NULL; raw_spin_lock_irqsave(_lock, flags); *ptr = 0xdeadbeef; This patch avoids calling rtas_token(), thus trying to get the same lock, at crash time. This new mechanism proposes getting the token at initialization time (xmon_init()) and just consuming it at crash time. This would allow xmon to be possible invoked independent of devtree_lock being held or not. Signed-off-by: Breno Leitao Reviewed-by: Thiago Jung Bauermann Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin --- arch/powerpc/xmon/xmon.c | 18 ++ 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index 36b8dc47a3c3..b566203d09c5 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -75,6 +75,9 @@ static int xmon_gate; #define xmon_owner 0 #endif /* CONFIG_SMP */ +#ifdef CONFIG_PPC_PSERIES +static int set_indicator_token = RTAS_UNKNOWN_SERVICE; +#endif static unsigned long in_xmon __read_mostly = 0; static int xmon_on = IS_ENABLED(CONFIG_XMON_DEFAULT); @@ -358,7 +361,6 @@ static inline void disable_surveillance(void) #ifdef CONFIG_PPC_PSERIES /* Since this can't be a module, args should end up below 4GB. */ static struct rtas_args args; - int token; /* * At this point we have got all the cpus we can into @@ -367,11 +369,11 @@ static inline void disable_surveillance(void) * If we did try to take rtas.lock there would be a * real possibility of deadlock. */ - token = rtas_token("set-indicator"); - if (token == RTAS_UNKNOWN_SERVICE) + if (set_indicator_token == RTAS_UNKNOWN_SERVICE) return; - rtas_call_unlocked(, token, 3, 1, NULL, SURVEILLANCE_TOKEN, 0, 0); + rtas_call_unlocked(, set_indicator_token, 3, 1, NULL, + SURVEILLANCE_TOKEN, 0, 0); #endif /* CONFIG_PPC_PSERIES */ } @@ -3688,6 +3690,14 @@ static void xmon_init(int enable) __debugger_iabr_match = xmon_iabr_match; __debugger_break_match = xmon_break_match; __debugger_fault_handler = xmon_fault_handler; + +#ifdef CONFIG_PPC_PSERIES + /* +* Get the token here to avoid trying to get a lock +* during the crash, causing a deadlock. +*/ + set_indicator_token = rtas_token("set-indicator"); +#endif } else { __debugger = NULL; __debugger_ipi = NULL; -- 2.19.1
[PATCH] powerpc/spufs: use struct_size() in kmalloc()
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; void *entry[]; }; instance = kmalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL); This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva --- arch/powerpc/platforms/cell/spufs/file.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/platforms/cell/spufs/file.c b/arch/powerpc/platforms/cell/spufs/file.c index ae8123edddc6..48c2477e7e2a 100644 --- a/arch/powerpc/platforms/cell/spufs/file.c +++ b/arch/powerpc/platforms/cell/spufs/file.c @@ -2338,9 +2338,8 @@ static int spufs_switch_log_open(struct inode *inode, struct file *file) goto out; } - ctx->switch_log = kmalloc(sizeof(struct switch_log) + - SWITCH_LOG_BUFSIZE * sizeof(struct switch_log_entry), - GFP_KERNEL); + ctx->switch_log = kmalloc(struct_size(ctx->switch_log, log, + SWITCH_LOG_BUFSIZE), GFP_KERNEL); if (!ctx->switch_log) { rc = -ENOMEM; -- 2.20.1
Re: [PATCH v4 10/13] x86: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs
On Tue, Jan 08, 2019 at 11:36:33AM -0500, Boris Ostrovsky wrote: > On 1/8/19 5:48 AM, Peter Zijlstra wrote: > > On the various exclude options; they are as follows (IIUC): > > > > - exclude_guest: we're a HV/host-kernel and we don't want the counter > >to run when we run a guest context. > > > > - exclude_host: we're a HV/host-kernel and we don't want the counter > > to run when we run in host context. > > > > - exclude_hv: we're a guest and don't want the counter to run in HV > > context. > > > > Now, KVM always implies exclude_hv afaict (for guests), I'm not sure > > what, if anything Xen does on x86 (IIRC Brendan Gregg once said perf > > works on Xen) -- nor quite sure who to ask, Boris, Jeurgen? > > perf does work inside guests. > > VPMU is managed by the Xen and it presents to the guest only samples > that are associated with the guest. So from that perspective exclude_hv > doesn't seem to be needed. > > There is a VPMU mode that allows profiling whole system (host and > guests) from dom0, and this where exclude_hv might be useful. But this > mode, ahem, needs some work. Thanks Boris!
Re: [1/2] powerpc/4xx/ocm: Fix phys_addr_t printf warnings
On Tuesday, January 8, 2019 10:54:28 AM CET Michael Ellerman wrote: > Christian Lamparter writes: > > On Wednesday, January 2, 2019 12:31:50 PM CET Michael Ellerman wrote: > >> On Tue, 2019-01-01 at 03:56:00 UTC, Michael Ellerman wrote: > >> > Currently the code produces several warnings, eg: > >> > > >> > arch/powerpc/platforms/4xx/ocm.c:240:38: error: format '%llx' > >> > expects argument of type 'long long unsigned int', but argument 3 > >> > has type 'phys_addr_t {aka unsigned int}' > >> > seq_printf(m, "PhysAddr : 0x%llx\n", ocm->phys); > >> >~~~^ ~ > >> > > >> > Fix it by using the special %pa[p] format for printing phys_addr_t. > >> > Note we need to pass the value by reference for the special specifier > >> > to work. > >> > > >> > Signed-off-by: Michael Ellerman > >> > >> Series applied to powerpc fixes. > >> > >> https://git.kernel.org/powerpc/c/52b88fa1e8c7bae03bb691178a9f8b > > > > Well, I guess I'm a late. I had issues with the getting 4.20+ > > crosscompiled on debian with make-kpkg. > > > > Nevertheless, I finally got a working kernel and > > on the MyBook Live APM82181: > > > > --- > > root@mbl:/sys/kernel/debug# cat ppc4xx_ocm/info > > PPC4XX OCM : 1 > > PhysAddr : 0x00040004[p] > > MemTotal : 32768 Bytes > > MemTotal(NC) : 32768 Bytes > > MemTotal(C) : 0 Bytes > > > > NC.PhysAddr : 0x00040004[p] > > NC.VirtAddr : 0x6bc84b36 > > NC.MemTotal : 32768 Bytes > > NC.MemFree : 32768 Bytes > > > > C.PhysAddr : 0x[p] > > C.VirtAddr : 0x (null) > > C.MemTotal : 0 Bytes > > C.MemFree: 0 Bytes > > Oh right, I'm an idiot :) > > The docs say: > > Physical address types phys_addr_t > -- > > %pa[p] 0x01234567 or 0x0123456789abcdef > > > And if you grep for that there's eg: > > drivers/ntb/test/ntb_tool.c: "Window Size > \t%pa[p]\n", > > So I just literally copied that. > > But it's trying to indicate that the p is optional. > > This should fix it, I won't merge it until you've tested it this time :) > > cheers > > > diff --git a/arch/powerpc/platforms/4xx/ocm.c > b/arch/powerpc/platforms/4xx/ocm.c > index a1aaa1569d7c..f0e488d97567 100644 > --- a/arch/powerpc/platforms/4xx/ocm.c > +++ b/arch/powerpc/platforms/4xx/ocm.c > @@ -237,12 +237,12 @@ static int ocm_debugfs_show(struct seq_file *m, void *v) > continue; > > seq_printf(m, "PPC4XX OCM : %d\n", ocm->index); > - seq_printf(m, "PhysAddr : %pa[p]\n", &(ocm->phys)); > + seq_printf(m, "PhysAddr : %pa\n", &(ocm->phys)); > seq_printf(m, "MemTotal : %d Bytes\n", ocm->memtotal); > seq_printf(m, "MemTotal(NC) : %d Bytes\n", ocm->nc.memtotal); > seq_printf(m, "MemTotal(C) : %d Bytes\n\n", ocm->c.memtotal); > > - seq_printf(m, "NC.PhysAddr : %pa[p]\n", &(ocm->nc.phys)); > + seq_printf(m, "NC.PhysAddr : %pa\n", &(ocm->nc.phys)); > seq_printf(m, "NC.VirtAddr : 0x%p\n", ocm->nc.virt); > seq_printf(m, "NC.MemTotal : %d Bytes\n", ocm->nc.memtotal); > seq_printf(m, "NC.MemFree : %d Bytes\n", ocm->nc.memfree); > @@ -252,7 +252,7 @@ static int ocm_debugfs_show(struct seq_file *m, void *v) > blk->size, blk->owner); > } > > - seq_printf(m, "\nC.PhysAddr : %pa[p]\n", &(ocm->c.phys)); > + seq_printf(m, "\nC.PhysAddr : %pa\n", &(ocm->c.phys)); > seq_printf(m, "C.VirtAddr : 0x%p\n", ocm->c.virt); > seq_printf(m, "C.MemTotal : %d Bytes\n", ocm->c.memtotal); > seq_printf(m, "C.MemFree: %d Bytes\n", ocm->c.memfree); Ok, with the patch applied it now looks like: |root@mbl:/sys/kernel/debug/ppc4xx_ocm# cat info |PPC4XX OCM : 1 |PhysAddr : 0x00040004 |MemTotal : 32768 Bytes |MemTotal(NC) : 32768 Bytes |MemTotal(C) : 0 Bytes | |NC.PhysAddr : 0x00040004 |NC.VirtAddr : 0x54f5bce2 |NC.MemTotal : 32768 Bytes |NC.MemFree : 32768 Bytes | |C.PhysAddr : 0x |C.VirtAddr : 0x (null) |C.MemTotal : 0 Bytes |C.MemFree: 0 Bytes ... just as expected. ;) Thanks, Christian
perf trace syscall table generation for powerpc with syscall.tbl
Hi Ravi, I noticed that in: commit ab66dcc76d6ab8fae9d69d149ae38c42605e7fc5 Author: Firoz Khan Date: Mon Dec 17 16:10:36 2018 +0530 powerpc: generate uapi header and system call table files powerpc now generates its syscall tables headers from a syscall.tbl just like x86 and s390, could you please switch to using it, taking the x86 and s390 scripts as a starting point, then test on your systems everything works? Thanks, - Arnaldo
Re: [PATCH v4 10/13] x86: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs
On 1/8/19 5:48 AM, Peter Zijlstra wrote: > On Mon, Jan 07, 2019 at 04:27:27PM +, Andrew Murray wrote: >> For drivers that do not support context exclusion let's advertise the >> PERF_PMU_CAP_NOEXCLUDE capability. This ensures that perf will >> prevent us from handling events where any exclusion flags are set. >> Let's also remove the now unnecessary check for exclusion flags. >> >> Signed-off-by: Andrew Murray >> --- >> arch/x86/events/amd/ibs.c | 13 + >> arch/x86/events/amd/power.c| 10 ++ >> arch/x86/events/intel/cstate.c | 12 +++- >> arch/x86/events/intel/rapl.c | 9 ++--- >> arch/x86/events/intel/uncore_snb.c | 9 ++--- >> arch/x86/events/msr.c | 10 ++ >> 6 files changed, 12 insertions(+), 51 deletions(-) > You (correctly) don't add CAP_NO_EXCLUDE to the main x86 pmu code, but > then you also don't check if it handles all the various exclude options > correctly/consistently. > > Now; I must admit that that is a bit of a maze, but I think we can at > least add exclude_idle and exclude_hv fails in there, nothing uses those > afaict. > > On the various exclude options; they are as follows (IIUC): > > - exclude_guest: we're a HV/host-kernel and we don't want the counter >to run when we run a guest context. > > - exclude_host: we're a HV/host-kernel and we don't want the counter > to run when we run in host context. > > - exclude_hv: we're a guest and don't want the counter to run in HV > context. > > Now, KVM always implies exclude_hv afaict (for guests), I'm not sure > what, if anything Xen does on x86 (IIRC Brendan Gregg once said perf > works on Xen) -- nor quite sure who to ask, Boris, Jeurgen? perf does work inside guests. VPMU is managed by the Xen and it presents to the guest only samples that are associated with the guest. So from that perspective exclude_hv doesn't seem to be needed. There is a VPMU mode that allows profiling whole system (host and guests) from dom0, and this where exclude_hv might be useful. But this mode, ahem, needs some work. -boris
[PATCH] powerpc/ipic: drop unused functions
ipic_set_highest_priority(), ipic_enable_mcp() and ipic_disable_mcp() are unused. This patch drops them. Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/ipic.h | 3 --- arch/powerpc/sysdev/ipic.c | 35 --- 2 files changed, 38 deletions(-) diff --git a/arch/powerpc/include/asm/ipic.h b/arch/powerpc/include/asm/ipic.h index 3dbd47f2bffe..abad50a745db 100644 --- a/arch/powerpc/include/asm/ipic.h +++ b/arch/powerpc/include/asm/ipic.h @@ -69,10 +69,7 @@ enum ipic_mcp_irq { IPIC_MCP_MU = 7, }; -extern void ipic_set_highest_priority(unsigned int irq); extern void ipic_set_default_priority(void); -extern void ipic_enable_mcp(enum ipic_mcp_irq mcp_irq); -extern void ipic_disable_mcp(enum ipic_mcp_irq mcp_irq); extern u32 ipic_get_mcp_status(void); extern void ipic_clear_mcp_status(u32 mask); diff --git a/arch/powerpc/sysdev/ipic.c b/arch/powerpc/sysdev/ipic.c index 8030a0f55e96..fd129c8ecceb 100644 --- a/arch/powerpc/sysdev/ipic.c +++ b/arch/powerpc/sysdev/ipic.c @@ -771,21 +771,6 @@ struct ipic * __init ipic_init(struct device_node *node, unsigned int flags) return ipic; } -void ipic_set_highest_priority(unsigned int virq) -{ - struct ipic *ipic = ipic_from_irq(virq); - unsigned int src = virq_to_hw(virq); - u32 temp; - - temp = ipic_read(ipic->regs, IPIC_SICFR); - - /* clear and set HPI */ - temp &= 0x7f00; - temp |= (src & 0x7f) << 24; - - ipic_write(ipic->regs, IPIC_SICFR, temp); -} - void ipic_set_default_priority(void) { ipic_write(primary_ipic->regs, IPIC_SIPRR_A, IPIC_PRIORITY_DEFAULT); @@ -796,26 +781,6 @@ void ipic_set_default_priority(void) ipic_write(primary_ipic->regs, IPIC_SMPRR_B, IPIC_PRIORITY_DEFAULT); } -void ipic_enable_mcp(enum ipic_mcp_irq mcp_irq) -{ - struct ipic *ipic = primary_ipic; - u32 temp; - - temp = ipic_read(ipic->regs, IPIC_SERMR); - temp |= (1 << (31 - mcp_irq)); - ipic_write(ipic->regs, IPIC_SERMR, temp); -} - -void ipic_disable_mcp(enum ipic_mcp_irq mcp_irq) -{ - struct ipic *ipic = primary_ipic; - u32 temp; - - temp = ipic_read(ipic->regs, IPIC_SERMR); - temp &= (1 << (31 - mcp_irq)); - ipic_write(ipic->regs, IPIC_SERMR, temp); -} - u32 ipic_get_mcp_status(void) { return primary_ipic ? ipic_read(primary_ipic->regs, IPIC_SERSR) : 0; -- 2.13.3
Re: [PATCH] powerpc/irq: drop arch_early_irq_init()
On Tue, 8 Jan 2019, Christophe Leroy wrote: > Oops, forgot to actually copy Thomas. > > Le 08/01/2019 à 12:37, Christophe Leroy a écrit : > > arch_early_irq_init() does nothing different than > > the weak arch_early_irq_init() in kernel/softirq.c > > > > Fixes: 089fb442f301 ("powerpc: Use ARCH_IRQ_INIT_FLAGS") > > Cc: Thomas Gleixner > > Signed-off-by: Christophe Leroy Acked-by: Thomas Gleixner
Re: [PATCH v4 05/13] arm: perf: conditionally use PERF_PMU_CAP_NO_EXCLUDE
On Tue, Jan 08, 2019 at 01:13:57PM +, Andrew Murray wrote: > On Tue, Jan 08, 2019 at 02:10:31PM +0100, Peter Zijlstra wrote: > > On Tue, Jan 08, 2019 at 01:07:41PM +, Andrew Murray wrote: > > > > > Yes I found lots of examples like this across the tree whilst doing this > > > work. However I decided to initially start with simply removing duplicated > > > code as a result of adding this flag and attempting to preserve existing > > > functionality. I thought that if I add missing checks then the patchset > > > will get much bigger and be harder to merge. I would like to do this > > > though > > > as another non-cross-arch series. > > > > > > Can we limit this patch series to the minimal changes required to fully > > > use PERF_PMU_CAP_NO_EXCLUDE and then attempt to fix these existing > > > problems > > > in subsequent patch sets? > > > > Ok, but it would've been nice to see that mentioned somewhere. > > I'll update the cover leter on any next revision. I'll try to be clearer next > time with my intentions. Could you maybe include it in the relevant patches too; like for example the ARM one where we rely on set_event_filter() to DTRT. So with the changelogs and subjects fixed I can take these patches and then you can get on with cleaning up the individual drivers.
Re: [PATCH v12 01/10] powerpc/irq: use memblock functions returning virtual address
Hi, On Tue, Jan 08, 2019 at 01:43:09PM +, Christophe Leroy wrote: > Since only the virtual address of allocated blocks is used, > lets use functions returning directly virtual address. > > Those functions have the advantage of also zeroing the block. > > Suggested-by: Mike Rapoport > Signed-off-by: Christophe Leroy > --- > @Mike: Part of this is taken from your serie. It seems you've found some places I've missed :) Note that my series was merged to mmotm, so you may get a conflict with it. > I was not sure how to include you in the commit log, I used > Suggested-by: but could use any other property, so if you prefer > something different just tell me. Also, if you could review this patch, > it would be nice. Suggested-by sounds reasonable. > arch/powerpc/kernel/irq.c | 5 - > arch/powerpc/kernel/setup_32.c | 15 +-- > arch/powerpc/kernel/setup_64.c | 16 ++-- > 3 files changed, 7 insertions(+), 29 deletions(-) > > diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c > index 916ddc4aac44..4a44bc395fbc 100644 > --- a/arch/powerpc/kernel/irq.c > +++ b/arch/powerpc/kernel/irq.c > @@ -725,18 +725,15 @@ void exc_lvl_ctx_init(void) > #endif > #endif > > - memset((void *)critirq_ctx[cpu_nr], 0, THREAD_SIZE); > tp = critirq_ctx[cpu_nr]; > tp->cpu = cpu_nr; > tp->preempt_count = 0; > > #ifdef CONFIG_BOOKE > - memset((void *)dbgirq_ctx[cpu_nr], 0, THREAD_SIZE); > tp = dbgirq_ctx[cpu_nr]; > tp->cpu = cpu_nr; > tp->preempt_count = 0; > > - memset((void *)mcheckirq_ctx[cpu_nr], 0, THREAD_SIZE); > tp = mcheckirq_ctx[cpu_nr]; > tp->cpu = cpu_nr; > tp->preempt_count = HARDIRQ_OFFSET; > @@ -754,12 +751,10 @@ void irq_ctx_init(void) > int i; > > for_each_possible_cpu(i) { > - memset((void *)softirq_ctx[i], 0, THREAD_SIZE); > tp = softirq_ctx[i]; > tp->cpu = i; > klp_init_thread_info(tp); > > - memset((void *)hardirq_ctx[i], 0, THREAD_SIZE); > tp = hardirq_ctx[i]; > tp->cpu = i; > klp_init_thread_info(tp); > diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c > index 947f904688b0..bfe1b46a26dc 100644 > --- a/arch/powerpc/kernel/setup_32.c > +++ b/arch/powerpc/kernel/setup_32.c > @@ -203,10 +203,8 @@ void __init irqstack_early_init(void) > /* interrupt stacks must be in lowmem, we get that for free on ppc32 >* as the memblock is limited to lowmem by default */ > for_each_possible_cpu(i) { > - softirq_ctx[i] = (struct thread_info *) > - __va(memblock_phys_alloc(THREAD_SIZE, THREAD_SIZE)); > - hardirq_ctx[i] = (struct thread_info *) > - __va(memblock_phys_alloc(THREAD_SIZE, THREAD_SIZE)); > + softirq_ctx[i] = memblock_alloc(THREAD_SIZE, THREAD_SIZE); > + hardirq_ctx[i] = memblock_alloc(THREAD_SIZE, THREAD_SIZE); > } > } > > @@ -224,13 +222,10 @@ void __init exc_lvl_early_init(void) > hw_cpu = 0; > #endif > > - critirq_ctx[hw_cpu] = (struct thread_info *) > - __va(memblock_phys_alloc(THREAD_SIZE, THREAD_SIZE)); > + critirq_ctx[hw_cpu] = memblock_alloc(THREAD_SIZE, THREAD_SIZE); > #ifdef CONFIG_BOOKE > - dbgirq_ctx[hw_cpu] = (struct thread_info *) > - __va(memblock_phys_alloc(THREAD_SIZE, THREAD_SIZE)); > - mcheckirq_ctx[hw_cpu] = (struct thread_info *) > - __va(memblock_phys_alloc(THREAD_SIZE, THREAD_SIZE)); > + dbgirq_ctx[hw_cpu] = memblock_alloc(THREAD_SIZE, THREAD_SIZE); > + mcheckirq_ctx[hw_cpu] = memblock_alloc(THREAD_SIZE, > THREAD_SIZE); > #endif > } > } > diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c > index 236c1151a3a7..943503f1e2c0 100644 > --- a/arch/powerpc/kernel/setup_64.c > +++ b/arch/powerpc/kernel/setup_64.c > @@ -634,19 +634,10 @@ __init u64 ppc64_bolted_size(void) > > static void *__init alloc_stack(unsigned long limit, int cpu) > { > - unsigned long pa; > - > BUILD_BUG_ON(STACK_INT_FRAME_SIZE % 16); > > - pa = memblock_alloc_base_nid(THREAD_SIZE, THREAD_SIZE, limit, > - early_cpu_to_node(cpu), MEMBLOCK_NONE); > - if (!pa) { > - pa = memblock_alloc_base(THREAD_SIZE, THREAD_SIZE, limit); > - if (!pa) > - panic("cannot allocate stacks"); > - } > - > - return __va(pa); > + return memblock_alloc_try_nid(THREAD_SIZE, THREAD_SIZE, > MEMBLOCK_LOW_LIMIT, > + limit, early_cpu_to_node(cpu)); I'd prefer if you would keep the return value check and panic() here, as I'm working on removing
Re: [PATCH v4 1/6] powerpc: prefer memblock APIs returning virtual address
Hi, On Tue, Jan 08, 2019 at 11:02:24AM +0100, Christophe Leroy wrote: > > Le 31/12/2018 à 10:29, Mike Rapoport a écrit : > >There are a several places that allocate memory using memblock APIs that > >return a physical address, convert the returned address to the virtual > >address and frequently also memset(0) the allocated range. > > > >Update these places to use memblock allocators already returning a virtual > >address. Use memblock functions that clear the allocated memory instead of > >calling memset(0) where appropriate. > > > >The calls to memblock_alloc_base() that were not followed by memset(0) are > >replaced with memblock_alloc_try_nid_raw(). Since the latter does not > >panic() when the allocation fails, the appropriate panic() calls are added > >to the call sites. > > > >Signed-off-by: Mike Rapoport > >--- > > arch/powerpc/kernel/paca.c | 16 ++-- > > arch/powerpc/kernel/setup_64.c | 24 ++-- > > arch/powerpc/mm/hash_utils_64.c| 6 +++--- > > arch/powerpc/mm/pgtable-book3e.c | 8 ++-- > > arch/powerpc/mm/pgtable-book3s64.c | 5 + > > arch/powerpc/mm/pgtable-radix.c| 25 +++-- > > arch/powerpc/platforms/pasemi/iommu.c | 5 +++-- > > arch/powerpc/platforms/pseries/setup.c | 18 ++ > > arch/powerpc/sysdev/dart_iommu.c | 7 +-- > > 9 files changed, 51 insertions(+), 63 deletions(-) > > > >diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c > >index 913bfca..276d36d4 100644 > >--- a/arch/powerpc/kernel/paca.c > >+++ b/arch/powerpc/kernel/paca.c > >@@ -27,7 +27,7 @@ > > static void *__init alloc_paca_data(unsigned long size, unsigned long > > align, > > unsigned long limit, int cpu) > > { > >-unsigned long pa; > >+void *ptr; > > int nid; > > /* > >@@ -42,17 +42,15 @@ static void *__init alloc_paca_data(unsigned long size, > >unsigned long align, > > nid = early_cpu_to_node(cpu); > > } > >-pa = memblock_alloc_base_nid(size, align, limit, nid, MEMBLOCK_NONE); > >-if (!pa) { > >-pa = memblock_alloc_base(size, align, limit); > >-if (!pa) > >-panic("cannot allocate paca data"); > >-} > >+ptr = memblock_alloc_try_nid(size, align, MEMBLOCK_LOW_LIMIT, > >+ limit, nid); > >+if (!ptr) > >+panic("cannot allocate paca data"); > > AFAIKS, memblock_alloc_try_nid() panics if memblock_alloc_internal() returns > NULL, so the above panic is useless, isn't it ? My plan is to make all memblock_alloc() APIs to return NULL rather then panic and then get rid of _nopanic variants. It's currently WIP and hopefully I'll have the patches ready next week. > > if (cpu == boot_cpuid) > > memblock_set_bottom_up(false); > >-return __va(pa); > >+return ptr; > > } > > #ifdef CONFIG_PPC_PSERIES > >@@ -118,7 +116,6 @@ static struct slb_shadow * __init new_slb_shadow(int > >cpu, unsigned long limit) > > } > > s = alloc_paca_data(sizeof(*s), L1_CACHE_BYTES, limit, cpu); > >-memset(s, 0, sizeof(*s)); > > s->persistent = cpu_to_be32(SLB_NUM_BOLTED); > > s->buffer_length = cpu_to_be32(sizeof(*s)); > >@@ -222,7 +219,6 @@ void __init allocate_paca(int cpu) > > paca = alloc_paca_data(sizeof(struct paca_struct), L1_CACHE_BYTES, > > limit, cpu); > > paca_ptrs[cpu] = paca; > >-memset(paca, 0, sizeof(struct paca_struct)); > > initialise_paca(paca, cpu); > > #ifdef CONFIG_PPC_PSERIES > >diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c > >index 236c115..3dcd779 100644 > >--- a/arch/powerpc/kernel/setup_64.c > >+++ b/arch/powerpc/kernel/setup_64.c > >@@ -634,19 +634,17 @@ __init u64 ppc64_bolted_size(void) > > static void *__init alloc_stack(unsigned long limit, int cpu) > > { > >-unsigned long pa; > >+void *ptr; > > BUILD_BUG_ON(STACK_INT_FRAME_SIZE % 16); > >-pa = memblock_alloc_base_nid(THREAD_SIZE, THREAD_SIZE, limit, > >-early_cpu_to_node(cpu), MEMBLOCK_NONE); > >-if (!pa) { > >-pa = memblock_alloc_base(THREAD_SIZE, THREAD_SIZE, limit); > >-if (!pa) > >-panic("cannot allocate stacks"); > >-} > >+ptr = memblock_alloc_try_nid(THREAD_SIZE, THREAD_SIZE, > >+ MEMBLOCK_LOW_LIMIT, limit, > >+ early_cpu_to_node(cpu)); > >+if (!ptr) > >+panic("cannot allocate stacks"); > > Same ? > > Christophe > > >-return __va(pa); > >+return ptr; > > } > > void __init irqstack_early_init(void) > >@@ -739,20 +737,17 @@ void __init emergency_stack_init(void) > > struct thread_info *ti; > > ti = alloc_stack(limit, i); > >-memset(ti, 0, THREAD_SIZE); > >
RE: [PATCH 2/3] powerpc/dts/fsl: t4240rdb: use the Cortina PHY driver compatible
> -Original Message- > From: Scott Wood > Sent: Saturday, December 22, 2018 06:07 > To: Camelia Alexandra Groza ; > robh...@kernel.org; mark.rutl...@arm.com; b...@kernel.crashing.org > Cc: devicet...@vger.kernel.org; linux-ker...@vger.kernel.org; > pau...@samba.org; linuxppc-dev@lists.ozlabs.org > Subject: Re: [PATCH 2/3] powerpc/dts/fsl: t4240rdb: use the Cortina PHY > driver compatible > > On Wed, 2018-07-18 at 14:46 +0300, Camelia Groza wrote: > > The Cortina PHY requires the use of the dedicated Cortina PHY driver > > instead of the generic one. > > > > Signed-off-by: Camelia Groza > > --- > > arch/powerpc/boot/dts/fsl/t4240rdb.dts | 8 > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > > diff --git a/arch/powerpc/boot/dts/fsl/t4240rdb.dts > > b/arch/powerpc/boot/dts/fsl/t4240rdb.dts > > index 15eb0a3..a56a705 100644 > > --- a/arch/powerpc/boot/dts/fsl/t4240rdb.dts > > +++ b/arch/powerpc/boot/dts/fsl/t4240rdb.dts > > @@ -267,22 +267,22 @@ > > > > mdio@fd000 { > > xfiphy1: ethernet-phy@10 { > > - compatible = "ethernet-phy- > ieee802.3- > > c45"; > > + compatible = "ethernet-phy- > > id13e5.1002"; > > reg = <0x10>; > > }; > > > > xfiphy2: ethernet-phy@11 { > > - compatible = "ethernet-phy- > ieee802.3- > > c45"; > > + compatible = "ethernet-phy- > > id13e5.1002"; > > reg = <0x11>; > > }; > > > > xfiphy3: ethernet-phy@13 { > > - compatible = "ethernet-phy- > ieee802.3- > > c45"; > > + compatible = "ethernet-phy- > > id13e5.1002"; > > reg = <0x13>; > > }; > > > > xfiphy4: ethernet-phy@12 { > > - compatible = "ethernet-phy- > ieee802.3- > > c45"; > > + compatible = "ethernet-phy- > > id13e5.1002"; > > reg = <0x12>; > > }; > > }; > > I get crashes on boot when using a dtb with this change: Hi Sorry for the late replay. I was on vacation. I'm looking into it. Regards, Camelia
[PATCH v12 10/10] powerpc: clean stack pointers naming
Some stack pointers used to also be thread_info pointers and were called tp. Now that they are only stack pointers, rename them sp. Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/irq.c | 17 +++-- arch/powerpc/kernel/setup_64.c | 11 +++ 2 files changed, 10 insertions(+), 18 deletions(-) diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c index 1aef9316345f..983e53b427c0 100644 --- a/arch/powerpc/kernel/irq.c +++ b/arch/powerpc/kernel/irq.c @@ -659,21 +659,21 @@ void __do_irq(struct pt_regs *regs) void do_IRQ(struct pt_regs *regs) { struct pt_regs *old_regs = set_irq_regs(regs); - void *curtp, *irqtp, *sirqtp; + void *cursp, *irqsp, *sirqsp; /* Switch to the irq stack to handle this */ - curtp = (void *)(current_stack_pointer() & ~(THREAD_SIZE - 1)); - irqtp = hardirq_ctx[raw_smp_processor_id()]; - sirqtp = softirq_ctx[raw_smp_processor_id()]; + cursp = (void *)(current_stack_pointer() & ~(THREAD_SIZE - 1)); + irqsp = hardirq_ctx[raw_smp_processor_id()]; + sirqsp = softirq_ctx[raw_smp_processor_id()]; /* Already there ? */ - if (unlikely(curtp == irqtp || curtp == sirqtp)) { + if (unlikely(cursp == irqsp || cursp == sirqsp)) { __do_irq(regs); set_irq_regs(old_regs); return; } /* Switch stack and call */ - call_do_irq(regs, irqtp); + call_do_irq(regs, irqsp); set_irq_regs(old_regs); } @@ -695,10 +695,7 @@ void *hardirq_ctx[NR_CPUS] __read_mostly; void do_softirq_own_stack(void) { - void *irqtp; - - irqtp = softirq_ctx[smp_processor_id()]; - call_do_softirq(irqtp); + call_do_softirq(softirq_ctx[smp_processor_id()]); } irq_hw_number_t virq_to_hw(unsigned int virq) diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index bc55493ff164..debd8ac3834f 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -709,19 +709,14 @@ void __init emergency_stack_init(void) limit = min(ppc64_bolted_size(), ppc64_rma_size); for_each_possible_cpu(i) { - void *ti; - - ti = alloc_stack(limit, i); - paca_ptrs[i]->emergency_sp = ti + THREAD_SIZE; + paca_ptrs[i]->emergency_sp = alloc_stack(limit, i) + THREAD_SIZE; #ifdef CONFIG_PPC_BOOK3S_64 /* emergency stack for NMI exception handling. */ - ti = alloc_stack(limit, i); - paca_ptrs[i]->nmi_emergency_sp = ti + THREAD_SIZE; + paca_ptrs[i]->nmi_emergency_sp = alloc_stack(limit, i) + THREAD_SIZE; /* emergency stack for machine check exception handling. */ - ti = alloc_stack(limit, i); - paca_ptrs[i]->mc_emergency_sp = ti + THREAD_SIZE; + paca_ptrs[i]->mc_emergency_sp = alloc_stack(limit, i) + THREAD_SIZE; #endif } } -- 2.13.3
[PATCH v12 09/10] powerpc/64: Remove CURRENT_THREAD_INFO
Now that current_thread_info is located at the beginning of 'current' task struct, CURRENT_THREAD_INFO macro is not really needed any more. This patch replaces it by loads of the value at PACACURRENT(r13). Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/exception-64s.h | 4 ++-- arch/powerpc/include/asm/thread_info.h | 4 arch/powerpc/kernel/entry_64.S | 10 +- arch/powerpc/kernel/exceptions-64e.S | 2 +- arch/powerpc/kernel/exceptions-64s.S | 2 +- arch/powerpc/kernel/idle_book3e.S | 2 +- arch/powerpc/kernel/idle_power4.S | 2 +- arch/powerpc/kernel/trace/ftrace_64_mprofile.S | 6 +++--- 8 files changed, 14 insertions(+), 18 deletions(-) diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h index 3b4767ed3ec5..dd6a5ae7a769 100644 --- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -671,7 +671,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) #define RUNLATCH_ON\ BEGIN_FTR_SECTION \ - CURRENT_THREAD_INFO(r3, r1);\ + ld r3, PACACURRENT(r13); \ ld r4,TI_LOCAL_FLAGS(r3); \ andi. r0,r4,_TLF_RUNLATCH;\ beqlppc64_runlatch_on_trampoline; \ @@ -721,7 +721,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CTRL) #ifdef CONFIG_PPC_970_NAP #define FINISH_NAP \ BEGIN_FTR_SECTION \ - CURRENT_THREAD_INFO(r11, r1); \ + ld r11, PACACURRENT(r13); \ ld r9,TI_LOCAL_FLAGS(r11); \ andi. r10,r9,_TLF_NAPPING;\ bnelpower4_fixup_nap; \ diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h index c959b8d66cac..8e1d0195ac36 100644 --- a/arch/powerpc/include/asm/thread_info.h +++ b/arch/powerpc/include/asm/thread_info.h @@ -17,10 +17,6 @@ #define THREAD_SIZE(1 << THREAD_SHIFT) -#ifdef CONFIG_PPC64 -#define CURRENT_THREAD_INFO(dest, sp) stringify_in_c(ld dest, PACACURRENT(r13)) -#endif - #ifndef __ASSEMBLY__ #include #include diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 01d0706d873f..83bddacd7a17 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -166,7 +166,7 @@ system_call:/* label this so stack traces look sane */ li r10,IRQS_ENABLED std r10,SOFTE(r1) - CURRENT_THREAD_INFO(r11, r1) + ld r11, PACACURRENT(r13) ld r10,TI_FLAGS(r11) andi. r11,r10,_TIF_SYSCALL_DOTRACE bne .Lsyscall_dotrace /* does not return */ @@ -213,7 +213,7 @@ system_call:/* label this so stack traces look sane */ ld r3,RESULT(r1) #endif - CURRENT_THREAD_INFO(r12, r1) + ld r12, PACACURRENT(r13) ld r8,_MSR(r1) #ifdef CONFIG_PPC_BOOK3S @@ -348,7 +348,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) /* Repopulate r9 and r10 for the syscall path */ addir9,r1,STACK_FRAME_OVERHEAD - CURRENT_THREAD_INFO(r10, r1) + ld r10, PACACURRENT(r13) ld r10,TI_FLAGS(r10) cmpldi r0,NR_syscalls @@ -746,7 +746,7 @@ _GLOBAL(ret_from_except_lite) mtmsrd r10,1 /* Update machine state */ #endif /* CONFIG_PPC_BOOK3E */ - CURRENT_THREAD_INFO(r9, r1) + ld r9, PACACURRENT(r13) ld r3,_MSR(r1) #ifdef CONFIG_PPC_BOOK3E ld r10,PACACURRENT(r13) @@ -860,7 +860,7 @@ resume_kernel: 1: bl preempt_schedule_irq /* Re-test flags and eventually loop */ - CURRENT_THREAD_INFO(r9, r1) + ld r9, PACACURRENT(r13) ld r4,TI_FLAGS(r9) andi. r0,r4,_TIF_NEED_RESCHED bne 1b diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S index 20f14996281d..04ee24789f80 100644 --- a/arch/powerpc/kernel/exceptions-64e.S +++ b/arch/powerpc/kernel/exceptions-64e.S @@ -493,7 +493,7 @@ exc_##n##_bad_stack: \ * interrupts happen before the wait instruction. */ #define CHECK_NAPPING() \ - CURRENT_THREAD_INFO(r11, r1); \ + ld r11, PACACURRENT(r13); \ ld r10,TI_LOCAL_FLAGS(r11);\ andi. r9,r10,_TLF_NAPPING;\ beq+1f; \ diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 9e253ce27e08..c7c4e2d6f98f 100644
[PATCH v12 08/10] powerpc/32: Remove CURRENT_THREAD_INFO and rename TI_CPU
Now that thread_info is similar to task_struct, its address is in r2 so CURRENT_THREAD_INFO() macro is useless. This patch removes it. At the same time, as the 'cpu' field is not anymore in thread_info, this patch renames it to TASK_CPU. Signed-off-by: Christophe Leroy --- arch/powerpc/Makefile | 2 +- arch/powerpc/include/asm/thread_info.h | 2 -- arch/powerpc/kernel/asm-offsets.c | 2 +- arch/powerpc/kernel/entry_32.S | 43 -- arch/powerpc/kernel/epapr_hcalls.S | 5 ++-- arch/powerpc/kernel/head_fsl_booke.S | 5 ++-- arch/powerpc/kernel/idle_6xx.S | 8 +++ arch/powerpc/kernel/idle_e500.S| 8 +++ arch/powerpc/kernel/misc_32.S | 3 +-- arch/powerpc/mm/hash_low_32.S | 14 --- arch/powerpc/sysdev/6xx-suspend.S | 5 ++-- 11 files changed, 35 insertions(+), 62 deletions(-) diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index 326e8ba6e314..5a9bf22a7534 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -431,7 +431,7 @@ ifdef CONFIG_SMP prepare: task_cpu_prepare task_cpu_prepare: prepare0 - $(eval KBUILD_CFLAGS += -D_TASK_CPU=$(shell awk '{if ($$2 == "TI_CPU") print $$3;}' include/generated/asm-offsets.h)) + $(eval KBUILD_CFLAGS += -D_TASK_CPU=$(shell awk '{if ($$2 == "TASK_CPU") print $$3;}' include/generated/asm-offsets.h)) endif # Check toolchain versions: diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h index d91523c2c7d8..c959b8d66cac 100644 --- a/arch/powerpc/include/asm/thread_info.h +++ b/arch/powerpc/include/asm/thread_info.h @@ -19,8 +19,6 @@ #ifdef CONFIG_PPC64 #define CURRENT_THREAD_INFO(dest, sp) stringify_in_c(ld dest, PACACURRENT(r13)) -#else -#define CURRENT_THREAD_INFO(dest, sp) stringify_in_c(mr dest, r2) #endif #ifndef __ASSEMBLY__ diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 94ac190a0b16..03439785c2ea 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -96,7 +96,7 @@ int main(void) #endif /* CONFIG_PPC64 */ OFFSET(TASK_STACK, task_struct, stack); #ifdef CONFIG_SMP - OFFSET(TI_CPU, task_struct, cpu); + OFFSET(TASK_CPU, task_struct, cpu); #endif #ifdef CONFIG_LIVEPATCH diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S index b547bd4168d8..52a061f14c7d 100644 --- a/arch/powerpc/kernel/entry_32.S +++ b/arch/powerpc/kernel/entry_32.S @@ -168,8 +168,7 @@ transfer_to_handler: tophys(r11,r11) addir11,r11,global_dbcr0@l #ifdef CONFIG_SMP - CURRENT_THREAD_INFO(r9, r1) - lwz r9,TI_CPU(r9) + lwz r9,TASK_CPU(r2) slwir9,r9,3 add r11,r11,r9 #endif @@ -180,8 +179,7 @@ transfer_to_handler: stw r12,4(r11) #endif #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE - CURRENT_THREAD_INFO(r9, r1) - tophys(r9, r9) + tophys(r9, r2) ACCOUNT_CPU_USER_ENTRY(r9, r11, r12) #endif @@ -195,8 +193,7 @@ transfer_to_handler: ble-stack_ovf /* then the kernel stack overflowed */ 5: #if defined(CONFIG_PPC_BOOK3S_32) || defined(CONFIG_E500) - CURRENT_THREAD_INFO(r9, r1) - tophys(r9,r9) /* check local flags */ + tophys(r9,r2) /* check local flags */ lwz r12,TI_LOCAL_FLAGS(r9) mtcrf 0x01,r12 bt- 31-TLF_NAPPING,4f @@ -345,8 +342,7 @@ _GLOBAL(DoSyscall) mtmsr r11 1: #endif /* CONFIG_TRACE_IRQFLAGS */ - CURRENT_THREAD_INFO(r10, r1) - lwz r11,TI_FLAGS(r10) + lwz r11,TI_FLAGS(r2) andi. r11,r11,_TIF_SYSCALL_DOTRACE bne-syscall_dotrace syscall_dotrace_cont: @@ -379,13 +375,12 @@ ret_from_syscall: lwz r3,GPR3(r1) #endif mr r6,r3 - CURRENT_THREAD_INFO(r12, r1) /* disable interrupts so current_thread_info()->flags can't change */ LOAD_MSR_KERNEL(r10,MSR_KERNEL) /* doesn't include MSR_EE */ /* Note: We don't bother telling lockdep about it */ SYNC MTMSRD(r10) - lwz r9,TI_FLAGS(r12) + lwz r9,TI_FLAGS(r2) li r8,-MAX_ERRNO andi. r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK) bne-syscall_exit_work @@ -432,8 +427,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_NEED_PAIRED_STWCX) #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE andi. r4,r8,MSR_PR beq 3f - CURRENT_THREAD_INFO(r4, r1) - ACCOUNT_CPU_USER_EXIT(r4, r5, r7) + ACCOUNT_CPU_USER_EXIT(r2, r5, r7) 3: #endif lwz r4,_LINK(r1) @@ -526,7 +520,7 @@ syscall_exit_work: /* Clear per-syscall TIF flags if any are set. */ li r11,_TIF_PERSYSCALL_MASK - addir12,r12,TI_FLAGS + addir12,r2,TI_FLAGS 3: lwarx
[PATCH v12 07/10] powerpc: 'current_set' is now a table of task_struct pointers
The table of pointers 'current_set' has been used for retrieving the stack and current. They used to be thread_info pointers as they were pointing to the stack and current was taken from the 'task' field of the thread_info. Now, the pointers of 'current_set' table are now both pointers to task_struct and pointers to thread_info. As they are used to get current, and the stack pointer is retrieved from current's stack field, this patch changes their type to task_struct, and renames secondary_ti to secondary_current. Reviewed-by: Nicholas Piggin Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/asm-prototypes.h | 4 ++-- arch/powerpc/kernel/head_32.S | 6 +++--- arch/powerpc/kernel/head_44x.S| 4 ++-- arch/powerpc/kernel/head_fsl_booke.S | 4 ++-- arch/powerpc/kernel/smp.c | 10 -- 5 files changed, 13 insertions(+), 15 deletions(-) diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h index 1d911f68a23b..1484df6779ab 100644 --- a/arch/powerpc/include/asm/asm-prototypes.h +++ b/arch/powerpc/include/asm/asm-prototypes.h @@ -23,8 +23,8 @@ #include /* SMP */ -extern struct thread_info *current_set[NR_CPUS]; -extern struct thread_info *secondary_ti; +extern struct task_struct *current_set[NR_CPUS]; +extern struct task_struct *secondary_current; void start_secondary(void *unused); /* kexec */ diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S index 309a45779ad5..146385b1c2da 100644 --- a/arch/powerpc/kernel/head_32.S +++ b/arch/powerpc/kernel/head_32.S @@ -846,9 +846,9 @@ __secondary_start: #endif /* CONFIG_PPC_BOOK3S_32 */ /* get current's stack and current */ - lis r1,secondary_ti@ha - tophys(r1,r1) - lwz r2,secondary_ti@l(r1) + lis r2,secondary_current@ha + tophys(r2,r2) + lwz r2,secondary_current@l(r2) tophys(r1,r2) lwz r1,TASK_STACK(r1) diff --git a/arch/powerpc/kernel/head_44x.S b/arch/powerpc/kernel/head_44x.S index f94a93b6c2f2..37117ab11584 100644 --- a/arch/powerpc/kernel/head_44x.S +++ b/arch/powerpc/kernel/head_44x.S @@ -1020,8 +1020,8 @@ _GLOBAL(start_secondary_47x) /* Now we can get our task struct and real stack pointer */ /* Get current's stack and current */ - lis r1,secondary_ti@ha - lwz r2,secondary_ti@l(r1) + lis r2,secondary_current@ha + lwz r2,secondary_current@l(r2) lwz r1,TASK_STACK(r2) /* Current stack pointer */ diff --git a/arch/powerpc/kernel/head_fsl_booke.S b/arch/powerpc/kernel/head_fsl_booke.S index 11f38adbe020..4ed2a7c8e89b 100644 --- a/arch/powerpc/kernel/head_fsl_booke.S +++ b/arch/powerpc/kernel/head_fsl_booke.S @@ -1091,8 +1091,8 @@ __secondary_start: bl call_setup_cpu /* get current's stack and current */ - lis r1,secondary_ti@ha - lwz r2,secondary_ti@l(r1) + lis r2,secondary_current@ha + lwz r2,secondary_current@l(r2) lwz r1,TASK_STACK(r2) /* stack */ diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index aa4517686f90..a41fa8924004 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -76,7 +76,7 @@ static DEFINE_PER_CPU(int, cpu_state) = { 0 }; #endif -struct thread_info *secondary_ti; +struct task_struct *secondary_current; bool has_big_cores; DEFINE_PER_CPU(cpumask_var_t, cpu_sibling_map); @@ -664,7 +664,7 @@ void smp_send_stop(void) } #endif /* CONFIG_NMI_IPI */ -struct thread_info *current_set[NR_CPUS]; +struct task_struct *current_set[NR_CPUS]; static void smp_store_cpu_info(int id) { @@ -929,7 +929,7 @@ void smp_prepare_boot_cpu(void) paca_ptrs[boot_cpuid]->__current = current; #endif set_numa_node(numa_cpu_lookup_table[boot_cpuid]); - current_set[boot_cpuid] = task_thread_info(current); + current_set[boot_cpuid] = current; } #ifdef CONFIG_HOTPLUG_CPU @@ -1014,15 +1014,13 @@ static bool secondaries_inhibited(void) static void cpu_idle_thread_init(unsigned int cpu, struct task_struct *idle) { - struct thread_info *ti = task_thread_info(idle); - #ifdef CONFIG_PPC64 paca_ptrs[cpu]->__current = idle; paca_ptrs[cpu]->kstack = (unsigned long)task_stack_page(idle) + THREAD_SIZE - STACK_FRAME_OVERHEAD; #endif idle->cpu = cpu; - secondary_ti = current_set[cpu] = ti; + secondary_current = current_set[cpu] = idle; } int __cpu_up(unsigned int cpu, struct task_struct *tidle) -- 2.13.3
[PATCH v12 06/10] powerpc: regain entire stack space
thread_info is not anymore in the stack, so the entire stack can now be used. There is also no risk anymore of corrupting task_cpu(p) with a stack overflow so the patch removes the test. When doing this, an explicit test for NULL stack pointer is needed in validate_sp() as it is not anymore implicitely covered by the sizeof(thread_info) gap. In the meantime, with the previous patch all pointers to the stacks are not anymore pointers to thread_info so this patch changes them to void* Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/irq.h | 10 +- arch/powerpc/include/asm/processor.h | 3 +-- arch/powerpc/kernel/asm-offsets.c| 1 - arch/powerpc/kernel/entry_32.S | 14 -- arch/powerpc/kernel/irq.c| 19 +-- arch/powerpc/kernel/misc_32.S| 6 ++ arch/powerpc/kernel/process.c| 32 +--- arch/powerpc/kernel/setup_64.c | 8 8 files changed, 38 insertions(+), 55 deletions(-) diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h index 28a7ace0a1b9..c91a60cda4fa 100644 --- a/arch/powerpc/include/asm/irq.h +++ b/arch/powerpc/include/asm/irq.h @@ -48,16 +48,16 @@ struct pt_regs; * Per-cpu stacks for handling critical, debug and machine check * level interrupts. */ -extern struct thread_info *critirq_ctx[NR_CPUS]; -extern struct thread_info *dbgirq_ctx[NR_CPUS]; -extern struct thread_info *mcheckirq_ctx[NR_CPUS]; +extern void *critirq_ctx[NR_CPUS]; +extern void *dbgirq_ctx[NR_CPUS]; +extern void *mcheckirq_ctx[NR_CPUS]; #endif /* * Per-cpu stacks for handling hard and soft interrupts. */ -extern struct thread_info *hardirq_ctx[NR_CPUS]; -extern struct thread_info *softirq_ctx[NR_CPUS]; +extern void *hardirq_ctx[NR_CPUS]; +extern void *softirq_ctx[NR_CPUS]; void call_do_softirq(void *sp); void call_do_irq(struct pt_regs *regs, void *sp); diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index 15acb282a876..8179b64871ed 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -325,8 +325,7 @@ struct thread_struct { #define ARCH_MIN_TASKALIGN 16 #define INIT_SP(sizeof(init_stack) + (unsigned long) _stack) -#define INIT_SP_LIMIT \ - (_ALIGN_UP(sizeof(struct thread_info), 16) + (unsigned long)_stack) +#define INIT_SP_LIMIT ((unsigned long)_stack) #ifdef CONFIG_SPE #define SPEFSCR_INIT \ diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 1fb52206c106..94ac190a0b16 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -92,7 +92,6 @@ int main(void) DEFINE(SIGSEGV, SIGSEGV); DEFINE(NMI_MASK, NMI_MASK); #else - DEFINE(THREAD_INFO_GAP, _ALIGN_UP(sizeof(struct thread_info), 16)); OFFSET(KSP_LIMIT, thread_struct, ksp_limit); #endif /* CONFIG_PPC64 */ OFFSET(TASK_STACK, task_struct, stack); diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S index 1f5a76283bd4..b547bd4168d8 100644 --- a/arch/powerpc/kernel/entry_32.S +++ b/arch/powerpc/kernel/entry_32.S @@ -97,14 +97,11 @@ crit_transfer_to_handler: mfspr r0,SPRN_SRR1 stw r0,_SRR1(r11) - /* set the stack limit to the current stack -* and set the limit to protect the thread_info -* struct -*/ + /* set the stack limit to the current stack */ mfspr r8,SPRN_SPRG_THREAD lwz r0,KSP_LIMIT(r8) stw r0,SAVED_KSP_LIMIT(r11) - rlwimi r0,r1,0,0,(31-THREAD_SHIFT) + rlwinm r0,r1,0,0,(31 - THREAD_SHIFT) stw r0,KSP_LIMIT(r8) /* fall through */ #endif @@ -121,14 +118,11 @@ crit_transfer_to_handler: mfspr r0,SPRN_SRR1 stw r0,crit_srr1@l(0) - /* set the stack limit to the current stack -* and set the limit to protect the thread_info -* struct -*/ + /* set the stack limit to the current stack */ mfspr r8,SPRN_SPRG_THREAD lwz r0,KSP_LIMIT(r8) stw r0,saved_ksp_limit@l(0) - rlwimi r0,r1,0,0,(31-THREAD_SHIFT) + rlwinm r0,r1,0,0,(31 - THREAD_SHIFT) stw r0,KSP_LIMIT(r8) /* fall through */ #endif diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c index acaeae1f11f5..1aef9316345f 100644 --- a/arch/powerpc/kernel/irq.c +++ b/arch/powerpc/kernel/irq.c @@ -618,9 +618,8 @@ static inline void check_stack_overflow(void) sp = current_stack_pointer() & (THREAD_SIZE-1); /* check for stack overflow: is there less than 2KB free? */ - if (unlikely(sp < (sizeof(struct thread_info) + 2048))) { - pr_err("do_IRQ: stack overflow: %ld\n", - sp - sizeof(struct thread_info)); + if (unlikely(sp < 2048)) { + pr_err("do_IRQ: stack overflow: %ld\n", sp);
[PATCH v12 05/10] powerpc: Activate CONFIG_THREAD_INFO_IN_TASK
This patch activates CONFIG_THREAD_INFO_IN_TASK which moves the thread_info into task_struct. Moving thread_info into task_struct has the following advantages: - It protects thread_info from corruption in the case of stack overflows. - Its address is harder to determine if stack addresses are leaked, making a number of attacks more difficult. This has the following consequences: - thread_info is now located at the beginning of task_struct. - The 'cpu' field is now in task_struct, and only exists when CONFIG_SMP is active. - thread_info doesn't have anymore the 'task' field. This patch: - Removes all recopy of thread_info struct when the stack changes. - Changes the CURRENT_THREAD_INFO() macro to point to current. - Selects CONFIG_THREAD_INFO_IN_TASK. - Modifies raw_smp_processor_id() to get ->cpu from current without including linux/sched.h to avoid circular inclusion and without including asm/asm-offsets.h to avoid symbol names duplication between ASM constants and C constants. Signed-off-by: Christophe Leroy Reviewed-by: Nicholas Piggin --- arch/powerpc/Kconfig | 1 + arch/powerpc/Makefile | 7 +++ arch/powerpc/include/asm/irq.h | 4 -- arch/powerpc/include/asm/ptrace.h | 2 +- arch/powerpc/include/asm/smp.h | 17 +++- arch/powerpc/include/asm/thread_info.h | 17 +--- arch/powerpc/kernel/asm-offsets.c | 7 ++- arch/powerpc/kernel/entry_32.S | 9 ++-- arch/powerpc/kernel/exceptions-64e.S | 11 - arch/powerpc/kernel/head_32.S | 6 +-- arch/powerpc/kernel/head_44x.S | 4 +- arch/powerpc/kernel/head_64.S | 1 + arch/powerpc/kernel/head_booke.h | 8 +--- arch/powerpc/kernel/head_fsl_booke.S | 7 ++- arch/powerpc/kernel/irq.c | 79 +- arch/powerpc/kernel/kgdb.c | 28 arch/powerpc/kernel/machine_kexec_64.c | 6 +-- arch/powerpc/kernel/setup_64.c | 21 - arch/powerpc/kernel/smp.c | 2 +- arch/powerpc/net/bpf_jit32.h | 5 +-- 20 files changed, 52 insertions(+), 190 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 2890d36eb531..0a26e0075ce5 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -241,6 +241,7 @@ config PPC select RTC_LIB select SPARSE_IRQ select SYSCTL_EXCEPTION_TRACE + select THREAD_INFO_IN_TASK select VIRT_TO_BUS if !PPC64 # # Please keep this list sorted alphabetically. diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index 488c9edffa58..326e8ba6e314 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -427,6 +427,13 @@ else endif endif +ifdef CONFIG_SMP +prepare: task_cpu_prepare + +task_cpu_prepare: prepare0 + $(eval KBUILD_CFLAGS += -D_TASK_CPU=$(shell awk '{if ($$2 == "TI_CPU") print $$3;}' include/generated/asm-offsets.h)) +endif + # Check toolchain versions: # - gcc-4.6 is the minimum kernel-wide version so nothing required. checkbin: diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h index 2efbae8d93be..28a7ace0a1b9 100644 --- a/arch/powerpc/include/asm/irq.h +++ b/arch/powerpc/include/asm/irq.h @@ -51,9 +51,6 @@ struct pt_regs; extern struct thread_info *critirq_ctx[NR_CPUS]; extern struct thread_info *dbgirq_ctx[NR_CPUS]; extern struct thread_info *mcheckirq_ctx[NR_CPUS]; -extern void exc_lvl_ctx_init(void); -#else -#define exc_lvl_ctx_init() #endif /* @@ -62,7 +59,6 @@ extern void exc_lvl_ctx_init(void); extern struct thread_info *hardirq_ctx[NR_CPUS]; extern struct thread_info *softirq_ctx[NR_CPUS]; -extern void irq_ctx_init(void); void call_do_softirq(void *sp); void call_do_irq(struct pt_regs *regs, void *sp); extern void do_IRQ(struct pt_regs *regs); diff --git a/arch/powerpc/include/asm/ptrace.h b/arch/powerpc/include/asm/ptrace.h index 0b8a735b6d85..64271e562fed 100644 --- a/arch/powerpc/include/asm/ptrace.h +++ b/arch/powerpc/include/asm/ptrace.h @@ -157,7 +157,7 @@ extern int ptrace_put_reg(struct task_struct *task, int regno, unsigned long data); #define current_pt_regs() \ - ((struct pt_regs *)((unsigned long)current_thread_info() + THREAD_SIZE) - 1) + ((struct pt_regs *)((unsigned long)task_stack_page(current) + THREAD_SIZE) - 1) /* * We use the least-significant bit of the trap field to indicate * whether we have saved the full set of registers, or only a diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h index 41695745032c..0de717e16dd6 100644 --- a/arch/powerpc/include/asm/smp.h +++ b/arch/powerpc/include/asm/smp.h @@ -83,7 +83,22 @@ int is_cpu_dead(unsigned int cpu); /* 32-bit */ extern int smp_hw_index[]; -#define raw_smp_processor_id() (current_thread_info()->cpu) +/* + * This is particularly ugly: it appears we can't actually get the definition + * of
[PATCH v12 04/10] powerpc: Prepare for moving thread_info into task_struct
This patch cleans the powerpc kernel before activating CONFIG_THREAD_INFO_IN_TASK: - The purpose of the pointer given to call_do_softirq() and call_do_irq() is to point the new stack ==> change it to void* and rename it 'sp' - Don't use CURRENT_THREAD_INFO() to locate the stack. - Fix a few comments. - Replace current_thread_info()->task by current - Rename THREAD_INFO to TASK_STASK: as it is in fact the offset of the pointer to the stack in task_struct, this pointer will not be impacted by the move of THREAD_INFO. - Makes TASK_STACK available to PPC64. PPC64 will need it to get the stack pointer from current once the thread_info have been moved. - Modifies klp_init_thread_info() to take task_struct pointer argument. Signed-off-by: Christophe Leroy Reviewed-by: Nicholas Piggin --- arch/powerpc/include/asm/irq.h | 4 ++-- arch/powerpc/include/asm/livepatch.h | 6 +++--- arch/powerpc/include/asm/processor.h | 4 ++-- arch/powerpc/include/asm/reg.h | 2 +- arch/powerpc/kernel/asm-offsets.c| 2 +- arch/powerpc/kernel/entry_32.S | 2 +- arch/powerpc/kernel/entry_64.S | 2 +- arch/powerpc/kernel/head_32.S| 4 ++-- arch/powerpc/kernel/head_40x.S | 4 ++-- arch/powerpc/kernel/head_44x.S | 2 +- arch/powerpc/kernel/head_8xx.S | 2 +- arch/powerpc/kernel/head_booke.h | 4 ++-- arch/powerpc/kernel/head_fsl_booke.S | 4 ++-- arch/powerpc/kernel/irq.c| 2 +- arch/powerpc/kernel/misc_32.S| 4 ++-- arch/powerpc/kernel/process.c| 8 arch/powerpc/kernel/setup-common.c | 2 +- arch/powerpc/kernel/smp.c| 4 +++- 18 files changed, 32 insertions(+), 30 deletions(-) diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h index ee39ce56b2a2..2efbae8d93be 100644 --- a/arch/powerpc/include/asm/irq.h +++ b/arch/powerpc/include/asm/irq.h @@ -63,8 +63,8 @@ extern struct thread_info *hardirq_ctx[NR_CPUS]; extern struct thread_info *softirq_ctx[NR_CPUS]; extern void irq_ctx_init(void); -extern void call_do_softirq(struct thread_info *tp); -extern void call_do_irq(struct pt_regs *regs, struct thread_info *tp); +void call_do_softirq(void *sp); +void call_do_irq(struct pt_regs *regs, void *sp); extern void do_IRQ(struct pt_regs *regs); extern void __init init_IRQ(void); extern void __do_irq(struct pt_regs *regs); diff --git a/arch/powerpc/include/asm/livepatch.h b/arch/powerpc/include/asm/livepatch.h index 47a03b9b528b..7cb514865a28 100644 --- a/arch/powerpc/include/asm/livepatch.h +++ b/arch/powerpc/include/asm/livepatch.h @@ -43,13 +43,13 @@ static inline unsigned long klp_get_ftrace_location(unsigned long faddr) return ftrace_location_range(faddr, faddr + 16); } -static inline void klp_init_thread_info(struct thread_info *ti) +static inline void klp_init_thread_info(struct task_struct *p) { /* + 1 to account for STACK_END_MAGIC */ - ti->livepatch_sp = (unsigned long *)(ti + 1) + 1; + task_thread_info(p)->livepatch_sp = end_of_stack(p) + 1; } #else -static void klp_init_thread_info(struct thread_info *ti) { } +static inline void klp_init_thread_info(struct task_struct *p) { } #endif /* CONFIG_LIVEPATCH */ #endif /* _ASM_POWERPC_LIVEPATCH_H */ diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index 692f7383d461..15acb282a876 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -40,7 +40,7 @@ #ifndef __ASSEMBLY__ #include -#include +#include #include #include @@ -326,7 +326,7 @@ struct thread_struct { #define INIT_SP(sizeof(init_stack) + (unsigned long) _stack) #define INIT_SP_LIMIT \ - (_ALIGN_UP(sizeof(init_thread_info), 16) + (unsigned long) _stack) + (_ALIGN_UP(sizeof(struct thread_info), 16) + (unsigned long)_stack) #ifdef CONFIG_SPE #define SPEFSCR_INIT \ diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index 1c98ef1f2d5b..581e61db2dcf 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -1062,7 +1062,7 @@ * - SPRG9 debug exception scratch * * All 32-bit: - * - SPRG3 current thread_info pointer + * - SPRG3 current thread_struct physical addr pointer *(virtual on BookE, physical on others) * * 32-bit classic: diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 9ffc72ded73a..b2b52e002a76 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -90,10 +90,10 @@ int main(void) DEFINE(SIGSEGV, SIGSEGV); DEFINE(NMI_MASK, NMI_MASK); #else - OFFSET(THREAD_INFO, task_struct, stack); DEFINE(THREAD_INFO_GAP, _ALIGN_UP(sizeof(struct thread_info), 16)); OFFSET(KSP_LIMIT, thread_struct, ksp_limit); #endif /* CONFIG_PPC64 */ + OFFSET(TASK_STACK, task_struct, stack); #ifdef CONFIG_LIVEPATCH OFFSET(TI_livepatch_sp,
[PATCH v12 03/10] powerpc: Only use task_struct 'cpu' field on SMP
When moving to CONFIG_THREAD_INFO_IN_TASK, the thread_info 'cpu' field gets moved into task_struct and only defined when CONFIG_SMP is set. This patch ensures that TI_CPU is only used when CONFIG_SMP is set and that task_struct 'cpu' field is not used directly out of SMP code. Signed-off-by: Christophe Leroy Reviewed-by: Nicholas Piggin --- arch/powerpc/kernel/head_fsl_booke.S | 2 ++ arch/powerpc/kernel/misc_32.S| 4 arch/powerpc/xmon/xmon.c | 2 +- 3 files changed, 7 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/head_fsl_booke.S b/arch/powerpc/kernel/head_fsl_booke.S index 2386ce2a9c6e..2c21e8642a00 100644 --- a/arch/powerpc/kernel/head_fsl_booke.S +++ b/arch/powerpc/kernel/head_fsl_booke.S @@ -243,8 +243,10 @@ set_ivor: li r0,0 stwur0,THREAD_SIZE-STACK_FRAME_OVERHEAD(r1) +#ifdef CONFIG_SMP CURRENT_THREAD_INFO(r22, r1) stw r24, TI_CPU(r22) +#endif bl early_init diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S index 57d2ffb2d45c..02b8cdd73792 100644 --- a/arch/powerpc/kernel/misc_32.S +++ b/arch/powerpc/kernel/misc_32.S @@ -183,10 +183,14 @@ _GLOBAL(low_choose_750fx_pll) or r4,r4,r5 mtspr SPRN_HID1,r4 +#ifdef CONFIG_SMP /* Store new HID1 image */ CURRENT_THREAD_INFO(r6, r1) lwz r6,TI_CPU(r6) slwir6,r6,2 +#else + li r6, 0 +#endif addis r6,r6,nap_save_hid1@ha stw r4,nap_save_hid1@l(r6) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index 757b8499aba2..a0f44f992360 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -2997,7 +2997,7 @@ static void show_task(struct task_struct *tsk) printf("%px %016lx %6d %6d %c %2d %s\n", tsk, tsk->thread.ksp, tsk->pid, rcu_dereference(tsk->parent)->pid, - state, task_thread_info(tsk)->cpu, + state, task_cpu(tsk), tsk->comm); } -- 2.13.3
[PATCH v12 02/10] book3s/64: avoid circular header inclusion in mmu-hash.h
When activating CONFIG_THREAD_INFO_IN_TASK, linux/sched.h includes asm/current.h. This generates a circular dependency. To avoid that, asm/processor.h shall not be included in mmu-hash.h In order to do that, this patch moves into a new header called asm/task_size_user64.h the information from asm/processor.h required by mmu-hash.h Signed-off-by: Christophe Leroy Reviewed-by: Nicholas Piggin --- arch/powerpc/include/asm/book3s/64/mmu-hash.h | 2 +- arch/powerpc/include/asm/processor.h | 34 +- arch/powerpc/include/asm/task_size_user64.h | 42 +++ arch/powerpc/kvm/book3s_hv_hmi.c | 1 + 4 files changed, 45 insertions(+), 34 deletions(-) create mode 100644 arch/powerpc/include/asm/task_size_user64.h diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h index 12e522807f9f..b2aba048301e 100644 --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h @@ -23,7 +23,7 @@ */ #include #include -#include +#include #include /* diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index ee58526cb6c2..692f7383d461 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -95,40 +95,8 @@ void release_thread(struct task_struct *); #endif #ifdef CONFIG_PPC64 -/* - * 64-bit user address space can have multiple limits - * For now supported values are: - */ -#define TASK_SIZE_64TB (0x4000UL) -#define TASK_SIZE_128TB (0x8000UL) -#define TASK_SIZE_512TB (0x0002UL) -#define TASK_SIZE_1PB (0x0004UL) -#define TASK_SIZE_2PB (0x0008UL) -/* - * With 52 bits in the address we can support - * upto 4PB of range. - */ -#define TASK_SIZE_4PB (0x0010UL) -/* - * For now 512TB is only supported with book3s and 64K linux page size. - */ -#if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_PPC_64K_PAGES) -/* - * Max value currently used: - */ -#define TASK_SIZE_USER64 TASK_SIZE_4PB -#define DEFAULT_MAP_WINDOW_USER64 TASK_SIZE_128TB -#define TASK_CONTEXT_SIZE TASK_SIZE_512TB -#else -#define TASK_SIZE_USER64 TASK_SIZE_64TB -#define DEFAULT_MAP_WINDOW_USER64 TASK_SIZE_64TB -/* - * We don't need to allocate extended context ids for 4K page size, because - * we limit the max effective address on this config to 64TB. - */ -#define TASK_CONTEXT_SIZE TASK_SIZE_64TB -#endif +#include /* * 32-bit user address space is 4GB - 1 page diff --git a/arch/powerpc/include/asm/task_size_user64.h b/arch/powerpc/include/asm/task_size_user64.h new file mode 100644 index ..a4043075864b --- /dev/null +++ b/arch/powerpc/include/asm/task_size_user64.h @@ -0,0 +1,42 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_POWERPC_TASK_SIZE_USER64_H +#define _ASM_POWERPC_TASK_SIZE_USER64_H + +#ifdef CONFIG_PPC64 +/* + * 64-bit user address space can have multiple limits + * For now supported values are: + */ +#define TASK_SIZE_64TB (0x4000UL) +#define TASK_SIZE_128TB (0x8000UL) +#define TASK_SIZE_512TB (0x0002UL) +#define TASK_SIZE_1PB (0x0004UL) +#define TASK_SIZE_2PB (0x0008UL) +/* + * With 52 bits in the address we can support + * upto 4PB of range. + */ +#define TASK_SIZE_4PB (0x0010UL) + +/* + * For now 512TB is only supported with book3s and 64K linux page size. + */ +#if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_PPC_64K_PAGES) +/* + * Max value currently used: + */ +#define TASK_SIZE_USER64 TASK_SIZE_4PB +#define DEFAULT_MAP_WINDOW_USER64 TASK_SIZE_128TB +#define TASK_CONTEXT_SIZE TASK_SIZE_512TB +#else +#define TASK_SIZE_USER64 TASK_SIZE_64TB +#define DEFAULT_MAP_WINDOW_USER64 TASK_SIZE_64TB +/* + * We don't need to allocate extended context ids for 4K page size, because + * we limit the max effective address on this config to 64TB. + */ +#define TASK_CONTEXT_SIZE TASK_SIZE_64TB +#endif + +#endif /* CONFIG_PPC64 */ +#endif /* _ASM_POWERPC_TASK_SIZE_USER64_H */ diff --git a/arch/powerpc/kvm/book3s_hv_hmi.c b/arch/powerpc/kvm/book3s_hv_hmi.c index e3f738eb1cac..64b5011475c7 100644 --- a/arch/powerpc/kvm/book3s_hv_hmi.c +++ b/arch/powerpc/kvm/book3s_hv_hmi.c @@ -24,6 +24,7 @@ #include #include #include +#include void wait_for_subcore_guest_exit(void) { -- 2.13.3
[PATCH v12 00/10] powerpc: Switch to CONFIG_THREAD_INFO_IN_TASK
The purpose of this serie is to activate CONFIG_THREAD_INFO_IN_TASK which moves the thread_info into task_struct. Moving thread_info into task_struct has the following advantages: - It protects thread_info from corruption in the case of stack overflows. - Its address is harder to determine if stack addresses are leaked, making a number of attacks more difficult. Changes since v11: - Rebased on 81775f5563fa ("Automatic merge of branches 'master', 'next' and 'fixes' into merge") - Added a first patch to change memblock allocs to functions returning virtual addrs. This removes the memset() which were the only remaining stuff in irq_ctx_init() and exc_lvl_ctx_init() at the end. - dropping irq_ctx_init() and exc_lvl_ctx_init() in patch 5 (powerpc: Activate CONFIG_THREAD_INFO_IN_TASK) - A few cosmetic changes in commit log and code. Changes since v10: - Rebased on 21622a0d2023 ("Automatic merge of branches 'master', 'next' and 'fixes' into merge") ==> Fixed conflict in setup_32.S Changes since v9: - Rebased on 183cbf93be88 ("Automatic merge of branches 'master', 'next' and 'fixes' into merge") ==> Fixed conflict on xmon Changes since v8: - Rebased on e589b79e40d9 ("Automatic merge of branches 'master', 'next' and 'fixes' into merge") ==> Main impact was conflicts due to commit 9a8dd708d547 ("memblock: rename memblock_alloc{_nid,_try_nid} to memblock_phys_alloc*") Changes since v7: - Rebased on fb6c6ce7907d ("Automatic merge of branches 'master', 'next' and 'fixes' into merge") Changes since v6: - Fixed validate_sp() to exclude NULL sp in 'regain entire stack space' patch (early crash with CONFIG_KMEMLEAK) Changes since v5: - Fixed livepatch_sp setup by using end_of_stack() instead of hardcoding - Fixed PPC_BPF_LOAD_CPU() macro Changes since v4: - Fixed a build failure on 32bits SMP when include/generated/asm-offsets.h is not already existing, was due to spaces instead of a tab in the Makefile Changes since RFC v3: (based on Nick's review) - Renamed task_size.h to task_size_user64.h to better relate to what it contains. - Handling of the isolation of thread_info cpu field inside CONFIG_SMP #ifdefs moved to a separate patch. - Removed CURRENT_THREAD_INFO macro completely. - Added a guard in asm/smp.h to avoid build failure before _TASK_CPU is defined. - Added a patch at the end to rename 'tp' pointers to 'sp' pointers - Renamed 'tp' into 'sp' pointers in preparation patch when relevant - Fixed a few commit logs - Fixed checkpatch report. Changes since RFC v2: - Removed the modification of names in asm-offsets - Created a rule in arch/powerpc/Makefile to append the offset of current->cpu in CFLAGS - Modified asm/smp.h to use the offset set in CFLAGS - Squashed the renaming of THREAD_INFO to TASK_STACK in the preparation patch - Moved the modification of current_pt_regs in the patch activating CONFIG_THREAD_INFO_IN_TASK Changes since RFC v1: - Removed the first patch which was modifying header inclusion order in timer - Modified some names in asm-offsets to avoid conflicts when including asm-offsets in C files - Modified asm/smp.h to avoid having to include linux/sched.h (using asm-offsets instead) - Moved some changes from the activation patch to the preparation patch. Christophe Leroy (10): powerpc/irq: use memblock functions returning virtual address book3s/64: avoid circular header inclusion in mmu-hash.h powerpc: Only use task_struct 'cpu' field on SMP powerpc: Prepare for moving thread_info into task_struct powerpc: Activate CONFIG_THREAD_INFO_IN_TASK powerpc: regain entire stack space powerpc: 'current_set' is now a table of task_struct pointers powerpc/32: Remove CURRENT_THREAD_INFO and rename TI_CPU powerpc/64: Remove CURRENT_THREAD_INFO powerpc: clean stack pointers naming arch/powerpc/Kconfig | 1 + arch/powerpc/Makefile | 7 ++ arch/powerpc/include/asm/asm-prototypes.h | 4 +- arch/powerpc/include/asm/book3s/64/mmu-hash.h | 2 +- arch/powerpc/include/asm/exception-64s.h | 4 +- arch/powerpc/include/asm/irq.h | 18 ++-- arch/powerpc/include/asm/livepatch.h | 6 +- arch/powerpc/include/asm/processor.h | 39 + arch/powerpc/include/asm/ptrace.h | 2 +- arch/powerpc/include/asm/reg.h | 2 +- arch/powerpc/include/asm/smp.h | 17 +++- arch/powerpc/include/asm/task_size_user64.h| 42 + arch/powerpc/include/asm/thread_info.h | 19 - arch/powerpc/kernel/asm-offsets.c | 10 ++- arch/powerpc/kernel/entry_32.S | 66 +- arch/powerpc/kernel/entry_64.S | 12 +-- arch/powerpc/kernel/epapr_hcalls.S | 5 +- arch/powerpc/kernel/exceptions-64e.S | 13 +-- arch/powerpc/kernel/exceptions-64s.S | 2 +- arch/powerpc/kernel/head_32.S
[PATCH v12 01/10] powerpc/irq: use memblock functions returning virtual address
Since only the virtual address of allocated blocks is used, lets use functions returning directly virtual address. Those functions have the advantage of also zeroing the block. Suggested-by: Mike Rapoport Signed-off-by: Christophe Leroy --- @Mike: Part of this is taken from your serie. I was not sure how to include you in the commit log, I used Suggested-by: but could use any other property, so if you prefer something different just tell me. Also, if you could review this patch, it would be nice. arch/powerpc/kernel/irq.c | 5 - arch/powerpc/kernel/setup_32.c | 15 +-- arch/powerpc/kernel/setup_64.c | 16 ++-- 3 files changed, 7 insertions(+), 29 deletions(-) diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c index 916ddc4aac44..4a44bc395fbc 100644 --- a/arch/powerpc/kernel/irq.c +++ b/arch/powerpc/kernel/irq.c @@ -725,18 +725,15 @@ void exc_lvl_ctx_init(void) #endif #endif - memset((void *)critirq_ctx[cpu_nr], 0, THREAD_SIZE); tp = critirq_ctx[cpu_nr]; tp->cpu = cpu_nr; tp->preempt_count = 0; #ifdef CONFIG_BOOKE - memset((void *)dbgirq_ctx[cpu_nr], 0, THREAD_SIZE); tp = dbgirq_ctx[cpu_nr]; tp->cpu = cpu_nr; tp->preempt_count = 0; - memset((void *)mcheckirq_ctx[cpu_nr], 0, THREAD_SIZE); tp = mcheckirq_ctx[cpu_nr]; tp->cpu = cpu_nr; tp->preempt_count = HARDIRQ_OFFSET; @@ -754,12 +751,10 @@ void irq_ctx_init(void) int i; for_each_possible_cpu(i) { - memset((void *)softirq_ctx[i], 0, THREAD_SIZE); tp = softirq_ctx[i]; tp->cpu = i; klp_init_thread_info(tp); - memset((void *)hardirq_ctx[i], 0, THREAD_SIZE); tp = hardirq_ctx[i]; tp->cpu = i; klp_init_thread_info(tp); diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c index 947f904688b0..bfe1b46a26dc 100644 --- a/arch/powerpc/kernel/setup_32.c +++ b/arch/powerpc/kernel/setup_32.c @@ -203,10 +203,8 @@ void __init irqstack_early_init(void) /* interrupt stacks must be in lowmem, we get that for free on ppc32 * as the memblock is limited to lowmem by default */ for_each_possible_cpu(i) { - softirq_ctx[i] = (struct thread_info *) - __va(memblock_phys_alloc(THREAD_SIZE, THREAD_SIZE)); - hardirq_ctx[i] = (struct thread_info *) - __va(memblock_phys_alloc(THREAD_SIZE, THREAD_SIZE)); + softirq_ctx[i] = memblock_alloc(THREAD_SIZE, THREAD_SIZE); + hardirq_ctx[i] = memblock_alloc(THREAD_SIZE, THREAD_SIZE); } } @@ -224,13 +222,10 @@ void __init exc_lvl_early_init(void) hw_cpu = 0; #endif - critirq_ctx[hw_cpu] = (struct thread_info *) - __va(memblock_phys_alloc(THREAD_SIZE, THREAD_SIZE)); + critirq_ctx[hw_cpu] = memblock_alloc(THREAD_SIZE, THREAD_SIZE); #ifdef CONFIG_BOOKE - dbgirq_ctx[hw_cpu] = (struct thread_info *) - __va(memblock_phys_alloc(THREAD_SIZE, THREAD_SIZE)); - mcheckirq_ctx[hw_cpu] = (struct thread_info *) - __va(memblock_phys_alloc(THREAD_SIZE, THREAD_SIZE)); + dbgirq_ctx[hw_cpu] = memblock_alloc(THREAD_SIZE, THREAD_SIZE); + mcheckirq_ctx[hw_cpu] = memblock_alloc(THREAD_SIZE, THREAD_SIZE); #endif } } diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index 236c1151a3a7..943503f1e2c0 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -634,19 +634,10 @@ __init u64 ppc64_bolted_size(void) static void *__init alloc_stack(unsigned long limit, int cpu) { - unsigned long pa; - BUILD_BUG_ON(STACK_INT_FRAME_SIZE % 16); - pa = memblock_alloc_base_nid(THREAD_SIZE, THREAD_SIZE, limit, - early_cpu_to_node(cpu), MEMBLOCK_NONE); - if (!pa) { - pa = memblock_alloc_base(THREAD_SIZE, THREAD_SIZE, limit); - if (!pa) - panic("cannot allocate stacks"); - } - - return __va(pa); + return memblock_alloc_try_nid(THREAD_SIZE, THREAD_SIZE, MEMBLOCK_LOW_LIMIT, + limit, early_cpu_to_node(cpu)); } void __init irqstack_early_init(void) @@ -739,20 +730,17 @@ void __init emergency_stack_init(void) struct thread_info *ti; ti = alloc_stack(limit, i); - memset(ti, 0, THREAD_SIZE); emerg_stack_init_thread_info(ti, i); paca_ptrs[i]->emergency_sp = (void *)ti + THREAD_SIZE; #ifdef CONFIG_PPC_BOOK3S_64 /* emergency stack for NMI exception
Re: [PATCH v4 05/13] arm: perf: conditionally use PERF_PMU_CAP_NO_EXCLUDE
On Tue, Jan 08, 2019 at 02:10:31PM +0100, Peter Zijlstra wrote: > On Tue, Jan 08, 2019 at 01:07:41PM +, Andrew Murray wrote: > > > Yes I found lots of examples like this across the tree whilst doing this > > work. However I decided to initially start with simply removing duplicated > > code as a result of adding this flag and attempting to preserve existing > > functionality. I thought that if I add missing checks then the patchset > > will get much bigger and be harder to merge. I would like to do this though > > as another non-cross-arch series. > > > > Can we limit this patch series to the minimal changes required to fully > > use PERF_PMU_CAP_NO_EXCLUDE and then attempt to fix these existing problems > > in subsequent patch sets? > > Ok, but it would've been nice to see that mentioned somewhere. I'll update the cover leter on any next revision. I'll try to be clearer next time with my intentions. Andrew Murray
Re: [PATCH v4 10/13] x86: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs
On Tue, Jan 08, 2019 at 11:48:41AM +0100, Peter Zijlstra wrote: > On Mon, Jan 07, 2019 at 04:27:27PM +, Andrew Murray wrote: > > For drivers that do not support context exclusion let's advertise the > > PERF_PMU_CAP_NOEXCLUDE capability. This ensures that perf will > > prevent us from handling events where any exclusion flags are set. > > Let's also remove the now unnecessary check for exclusion flags. > > > > Signed-off-by: Andrew Murray > > --- > > arch/x86/events/amd/ibs.c | 13 + > > arch/x86/events/amd/power.c| 10 ++ > > arch/x86/events/intel/cstate.c | 12 +++- > > arch/x86/events/intel/rapl.c | 9 ++--- > > arch/x86/events/intel/uncore_snb.c | 9 ++--- > > arch/x86/events/msr.c | 10 ++ > > 6 files changed, 12 insertions(+), 51 deletions(-) > > You (correctly) don't add CAP_NO_EXCLUDE to the main x86 pmu code, but > then you also don't check if it handles all the various exclude options > correctly/consistently. > > Now; I must admit that that is a bit of a maze, but I think we can at > least add exclude_idle and exclude_hv fails in there, nothing uses those > afaict. Yes it took me some time to make sense of it. As per my comments in the other patch, I think you're suggesting that I add additional checks to x86. I think they are needed but I'd prefer to make functional changes in a separate series, I'm happy to do this. > > On the various exclude options; they are as follows (IIUC): > > - exclude_guest: we're a HV/host-kernel and we don't want the counter >to run when we run a guest context. > > - exclude_host: we're a HV/host-kernel and we don't want the counter > to run when we run in host context. > > - exclude_hv: we're a guest and don't want the counter to run in HV > context. > > Now, KVM always implies exclude_hv afaict (for guests), It certaintly does for ARM. > I'm not sure > what, if anything Xen does on x86 (IIRC Brendan Gregg once said perf > works on Xen) -- nor quite sure who to ask, Boris, Jeurgen? Thanks, Andrew Murray >
Re: [PATCH v4 05/13] arm: perf: conditionally use PERF_PMU_CAP_NO_EXCLUDE
On Tue, Jan 08, 2019 at 01:07:41PM +, Andrew Murray wrote: > Yes I found lots of examples like this across the tree whilst doing this > work. However I decided to initially start with simply removing duplicated > code as a result of adding this flag and attempting to preserve existing > functionality. I thought that if I add missing checks then the patchset > will get much bigger and be harder to merge. I would like to do this though > as another non-cross-arch series. > > Can we limit this patch series to the minimal changes required to fully > use PERF_PMU_CAP_NO_EXCLUDE and then attempt to fix these existing problems > in subsequent patch sets? Ok, but it would've been nice to see that mentioned somewhere.
Re: [PATCH v4 11/13] x86: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs
On Tue, Jan 08, 2019 at 11:49:40AM +0100, Peter Zijlstra wrote: > On Mon, Jan 07, 2019 at 04:27:28PM +, Andrew Murray wrote: > > This patch has the exact same subject as the previous one.. that seems > sub-optimal. Ah yes, I'll update that in subsquent revisions. (The reason for two patches was to separate functional vs non-functional changes). Andrew Murray
Re: [PATCH v4 05/13] arm: perf: conditionally use PERF_PMU_CAP_NO_EXCLUDE
On Tue, Jan 08, 2019 at 11:28:02AM +0100, Peter Zijlstra wrote: > On Mon, Jan 07, 2019 at 04:27:22PM +, Andrew Murray wrote: > > @@ -393,9 +386,8 @@ __hw_perf_event_init(struct perf_event *event) > > /* > > * Check whether we need to exclude the counter from certain modes. > > */ > > + if (armpmu->set_event_filter && > > + armpmu->set_event_filter(hwc, >attr)) { > > pr_debug("ARM performance counters do not support " > > "mode exclusion\n"); > > return -EOPNOTSUPP; > > This then requires all set_event_filter() implementations to check all > the various exclude options; Yes but this isn't a new requirement, this hunk uses the absence of set_event_filter to blanket indicate that no exclusion flags are supported. > also, set_event_filter() failing then > returns with -EOPNOTSUPP instead of the -EINVAL the CAP_NO_EXCLUDE > generates, which is again inconsitent. Yes, it's not ideal - but a step in the right direction. I wanted to limit user visible changes as much as possible, where I've identified them I've noted it in the commit log. > > If I look at (the very first git-grep found me) > armv7pmu_set_event_filter(), then I find it returning -EPERM (again > inconsistent but irrelevant because the actual value is not preserved) > for exclude_idle. > > But it doesn't seem to check exclude_host at all for example. Yes I found lots of examples like this across the tree whilst doing this work. However I decided to initially start with simply removing duplicated code as a result of adding this flag and attempting to preserve existing functionality. I thought that if I add missing checks then the patchset will get much bigger and be harder to merge. I would like to do this though as another non-cross-arch series. Can we limit this patch series to the minimal changes required to fully use PERF_PMU_CAP_NO_EXCLUDE and then attempt to fix these existing problems in subsequent patch sets? Thanks, Andrew Murray > > > @@ -867,6 +859,9 @@ int armpmu_register(struct arm_pmu *pmu) > > if (ret) > > return ret; > > > > + if (!pmu->set_event_filter) > > + pmu->pmu.capabilities |= PERF_PMU_CAP_NO_EXCLUDE; > > + > > ret = perf_pmu_register(>pmu, pmu->name, -1); > > if (ret) > > goto out_destroy; > > -- > > 2.7.4 > >
[PATCH v2 4/4] ASoC: add imx-audmix DT binding documentation
Add the DT binding documentation for Audio Mixer machine driver. Signed-off-by: Viorel Suman --- .../devicetree/bindings/sound/imx-audmix.txt | 24 ++ 1 file changed, 24 insertions(+) create mode 100644 Documentation/devicetree/bindings/sound/imx-audmix.txt diff --git a/Documentation/devicetree/bindings/sound/imx-audmix.txt b/Documentation/devicetree/bindings/sound/imx-audmix.txt new file mode 100644 index 000..6ac1230 --- /dev/null +++ b/Documentation/devicetree/bindings/sound/imx-audmix.txt @@ -0,0 +1,24 @@ +NXP Audio Mixer (AUDMIX) machine driver. + +Required properties: +=== + - compatible : Compatible list, contains "fsl,imx-audmix" + + - model : Short audio card description. + + - dais : Must contain a list of phandles to AUDMIX connected + DAIs. The current implementation requires two phandles + to SAI interfaces to be provided, the first SAI in the + list being used to route the AUDMIX output. + + - audmix-controller : Must contain the phandle to the AUDMIX device node. + +Machine driver configuration example: + + sound-audmix-sai { +compatible = "fsl,imx-audmix"; +model = "audmix-sai"; +dais = <>, <>; +audmix-controller = <>; + }; + -- 2.7.4
[PATCH v2 3/4] ASoC: fsl: Add Audio Mixer machine driver
This patch implements Audio Mixer machine driver for NXP iMX8 SOCs. It connects together Audio Mixer and related SAI instances. Signed-off-by: Viorel Suman --- sound/soc/fsl/Kconfig | 9 ++ sound/soc/fsl/Makefile | 2 + sound/soc/fsl/imx-audmix.c | 333 + 3 files changed, 344 insertions(+) create mode 100644 sound/soc/fsl/imx-audmix.c diff --git a/sound/soc/fsl/Kconfig b/sound/soc/fsl/Kconfig index 0af2e056..d87c842 100644 --- a/sound/soc/fsl/Kconfig +++ b/sound/soc/fsl/Kconfig @@ -303,6 +303,15 @@ config SND_SOC_FSL_ASOC_CARD CS4271, CS4272 and SGTL5000. Say Y if you want to add support for Freescale Generic ASoC Sound Card. +config SND_SOC_IMX_AUDMIX + tristate "SoC Audio support for i.MX boards with AUDMIX" + select SND_SOC_FSL_AUDMIX + select SND_SOC_FSL_SAI + help + SoC Audio support for i.MX boards with Audio Mixer + Say Y if you want to add support for SoC audio on an i.MX board with + an Audio Mixer. + endif # SND_IMX_SOC endmenu diff --git a/sound/soc/fsl/Makefile b/sound/soc/fsl/Makefile index 4172d5a..c0dd044 100644 --- a/sound/soc/fsl/Makefile +++ b/sound/soc/fsl/Makefile @@ -62,6 +62,7 @@ snd-soc-imx-es8328-objs := imx-es8328.o snd-soc-imx-sgtl5000-objs := imx-sgtl5000.o snd-soc-imx-spdif-objs := imx-spdif.o snd-soc-imx-mc13783-objs := imx-mc13783.o +snd-soc-imx-audmix-objs := imx-audmix.o obj-$(CONFIG_SND_SOC_EUKREA_TLV320) += snd-soc-eukrea-tlv320.o obj-$(CONFIG_SND_SOC_PHYCORE_AC97) += snd-soc-phycore-ac97.o @@ -71,3 +72,4 @@ obj-$(CONFIG_SND_SOC_IMX_ES8328) += snd-soc-imx-es8328.o obj-$(CONFIG_SND_SOC_IMX_SGTL5000) += snd-soc-imx-sgtl5000.o obj-$(CONFIG_SND_SOC_IMX_SPDIF) += snd-soc-imx-spdif.o obj-$(CONFIG_SND_SOC_IMX_MC13783) += snd-soc-imx-mc13783.o +obj-$(CONFIG_SND_SOC_IMX_AUDMIX) += snd-soc-imx-audmix.o diff --git a/sound/soc/fsl/imx-audmix.c b/sound/soc/fsl/imx-audmix.c new file mode 100644 index 000..30cce52 --- /dev/null +++ b/sound/soc/fsl/imx-audmix.c @@ -0,0 +1,333 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright 2017 NXP + * + * The code contained herein is licensed under the GNU General Public + * License. You may obtain a copy of the GNU General Public License + * Version 2 or later at the following locations: + * + * http://www.opensource.org/licenses/gpl-license.html + * http://www.gnu.org/copyleft/gpl.html + */ + +#include +#include +#include +#include +#include +#include +#include "fsl_sai.h" +#include "fsl_audmix.h" + +struct imx_audmix { + struct platform_device *pdev; + struct snd_soc_card card; + struct platform_device *audmix_pdev; + struct platform_device *out_pdev; + struct clk *cpu_mclk; + int num_dai; + struct snd_soc_dai_link *dai; + int num_dai_conf; + struct snd_soc_codec_conf *dai_conf; + int num_dapm_routes; + struct snd_soc_dapm_route *dapm_routes; +}; + +static const u32 imx_audmix_rates[] = { + 8000, 12000, 16000, 24000, 32000, 48000, 64000, 96000, +}; + +static const struct snd_pcm_hw_constraint_list imx_audmix_rate_constraints = { + .count = ARRAY_SIZE(imx_audmix_rates), + .list = imx_audmix_rates, +}; + +static int imx_audmix_fe_startup(struct snd_pcm_substream *substream) +{ + struct snd_soc_pcm_runtime *rtd = substream->private_data; + struct imx_audmix *priv = snd_soc_card_get_drvdata(rtd->card); + struct snd_pcm_runtime *runtime = substream->runtime; + struct device *dev = rtd->card->dev; + unsigned long clk_rate = clk_get_rate(priv->cpu_mclk); + int ret; + + if (clk_rate % 24576000 == 0) { + ret = snd_pcm_hw_constraint_list(runtime, 0, +SNDRV_PCM_HW_PARAM_RATE, +_audmix_rate_constraints); + if (ret < 0) + return ret; + } else { + dev_warn(dev, "mclk may be not supported %lu\n", clk_rate); + } + + ret = snd_pcm_hw_constraint_minmax(runtime, SNDRV_PCM_HW_PARAM_CHANNELS, + 1, 8); + if (ret < 0) + return ret; + + return snd_pcm_hw_constraint_mask64(runtime, SNDRV_PCM_HW_PARAM_FORMAT, + FSL_AUDMIX_FORMATS); +} + +static int imx_audmix_fe_hw_params(struct snd_pcm_substream *substream, + struct snd_pcm_hw_params *params) +{ + struct snd_soc_pcm_runtime *rtd = substream->private_data; + struct device *dev = rtd->card->dev; + bool tx = substream->stream == SNDRV_PCM_STREAM_PLAYBACK; + unsigned int fmt = SND_SOC_DAIFMT_DSP_A | SND_SOC_DAIFMT_NB_NF; + u32 channels = params_channels(params); + int ret, dir; + + /* For playback the AUDMIX is slave, and for record is master */ + fmt |= tx ? SND_SOC_DAIFMT_CBS_CFS :
[PATCH v2 2/4] ASoC: add fsl_audmix DT binding documentation
Add the DT binding documentation for NXP Audio Mixer CPU DAI driver. Signed-off-by: Viorel Suman --- .../devicetree/bindings/sound/fsl,audmix.txt | 44 ++ 1 file changed, 44 insertions(+) create mode 100644 Documentation/devicetree/bindings/sound/fsl,audmix.txt diff --git a/Documentation/devicetree/bindings/sound/fsl,audmix.txt b/Documentation/devicetree/bindings/sound/fsl,audmix.txt new file mode 100644 index 000..512d39b --- /dev/null +++ b/Documentation/devicetree/bindings/sound/fsl,audmix.txt @@ -0,0 +1,44 @@ +NXP Audio Mixer (AUDMIX). + +The Audio Mixer is a on-chip functional module that allows mixing of two +audio streams into a single audio stream. Audio Mixer has two input serial +audio interfaces. These are driven by two Synchronous Audio interface +modules (SAI). Each input serial interface carries 8 audio channels in its +frame in TDM manner. Mixer mixes audio samples of corresponding channels +from two interfaces into a single sample. Before mixing, audio samples of +two inputs can be attenuated based on configuration. The output of the +Audio Mixer is also a serial audio interface. Like input interfaces it has +the same TDM frame format. This output is used to drive the serial DAC TDM +interface of audio codec and also sent to the external pins along with the +receive path of normal audio SAI module for readback by the CPU. + +The output of Audio Mixer can be selected from any of the three streams + - serial audio input 1 + - serial audio input 2 + - mixed audio + +Mixing operation is independent of audio sample rate but the two audio +input streams must have same audio sample rate with same number of channels +in TDM frame to be eligible for mixing. + +Device driver required properties: += + - compatible : Compatible list, contains "fsl,imx8qm-audmix" + + - reg: Offset and length of the register set for the device. + + - clocks : Must contain an entry for each entry in clock-names. + + - clock-names: Must include the "ipg" for register access. + + - power-domains : Must contain the phandle to AUDMIX power domain node + +Device driver configuration example: +== + audmix: audmix@5984 { +compatible = "fsl,imx8qm-audmix"; +reg = <0x0 0x5984 0x0 0x1>; +clocks = < IMX8QXP_AUD_AUDMIX_IPG>; +clock-names = "ipg"; +power-domains = <_audmix>; + }; -- 2.7.4
[PATCH v2 1/4] ASoC: fsl: Add Audio Mixer CPU DAI driver
This patch implements Audio Mixer CPU DAI driver for NXP iMX8 SOCs. The Audio Mixer is a on-chip functional module that allows mixing of two audio streams into a single audio stream. Audio Mixer datasheet is available here: https://www.nxp.com/docs/en/reference-manual/IMX8DQXPRM.pdf Signed-off-by: Viorel Suman --- sound/soc/fsl/Kconfig | 7 + sound/soc/fsl/Makefile | 3 + sound/soc/fsl/fsl_audmix.c | 551 + sound/soc/fsl/fsl_audmix.h | 102 + 4 files changed, 663 insertions(+) create mode 100644 sound/soc/fsl/fsl_audmix.c create mode 100644 sound/soc/fsl/fsl_audmix.h diff --git a/sound/soc/fsl/Kconfig b/sound/soc/fsl/Kconfig index 7b1d997..0af2e056 100644 --- a/sound/soc/fsl/Kconfig +++ b/sound/soc/fsl/Kconfig @@ -24,6 +24,13 @@ config SND_SOC_FSL_SAI This option is only useful for out-of-tree drivers since in-tree drivers select it automatically. +config SND_SOC_FSL_AUDMIX + tristate "Audio Mixer (AUDMIX) module support" + select REGMAP_MMIO + help + Say Y if you want to add Audio Mixer (AUDMIX) + support for the NXP iMX CPUs. + config SND_SOC_FSL_SSI tristate "Synchronous Serial Interface module (SSI) support" select SND_SOC_IMX_PCM_DMA if SND_IMX_SOC != n diff --git a/sound/soc/fsl/Makefile b/sound/soc/fsl/Makefile index 3c0ff31..4172d5a 100644 --- a/sound/soc/fsl/Makefile +++ b/sound/soc/fsl/Makefile @@ -12,6 +12,7 @@ snd-soc-p1022-rdk-objs := p1022_rdk.o obj-$(CONFIG_SND_SOC_P1022_RDK) += snd-soc-p1022-rdk.o # Freescale SSI/DMA/SAI/SPDIF Support +snd-soc-fsl-audmix-objs := fsl_audmix.o snd-soc-fsl-asoc-card-objs := fsl-asoc-card.o snd-soc-fsl-asrc-objs := fsl_asrc.o fsl_asrc_dma.o snd-soc-fsl-sai-objs := fsl_sai.o @@ -22,6 +23,8 @@ snd-soc-fsl-esai-objs := fsl_esai.o snd-soc-fsl-micfil-objs := fsl_micfil.o snd-soc-fsl-utils-objs := fsl_utils.o snd-soc-fsl-dma-objs := fsl_dma.o + +obj-$(CONFIG_SND_SOC_FSL_AUDMIX) += snd-soc-fsl-audmix.o obj-$(CONFIG_SND_SOC_FSL_ASOC_CARD) += snd-soc-fsl-asoc-card.o obj-$(CONFIG_SND_SOC_FSL_ASRC) += snd-soc-fsl-asrc.o obj-$(CONFIG_SND_SOC_FSL_SAI) += snd-soc-fsl-sai.o diff --git a/sound/soc/fsl/fsl_audmix.c b/sound/soc/fsl/fsl_audmix.c new file mode 100644 index 000..f1267e5 --- /dev/null +++ b/sound/soc/fsl/fsl_audmix.c @@ -0,0 +1,551 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * NXP AUDMIX ALSA SoC Digital Audio Interface (DAI) driver + * + * Copyright 2017 NXP + */ + +#include +#include +#include +#include +#include +#include + +#include "fsl_audmix.h" + +#define SOC_ENUM_SINGLE_S(xreg, xshift, xtexts) \ + SOC_ENUM_SINGLE(xreg, xshift, ARRAY_SIZE(xtexts), xtexts) + +static const char + *tdm_sel[] = { "TDM1", "TDM2", }, + *mode_sel[] = { "Disabled", "TDM1", "TDM2", "Mixed", }, + *width_sel[] = { "16b", "18b", "20b", "24b", "32b", }, + *endis_sel[] = { "Disabled", "Enabled", }, + *updn_sel[] = { "Downward", "Upward", }, + *mask_sel[] = { "Unmask", "Mask", }; + +static const struct soc_enum fsl_audmix_enum[] = { +/* FSL_AUDMIX_CTR enums */ +SOC_ENUM_SINGLE_S(FSL_AUDMIX_CTR, FSL_AUDMIX_CTR_MIXCLK_SHIFT, tdm_sel), +SOC_ENUM_SINGLE_S(FSL_AUDMIX_CTR, FSL_AUDMIX_CTR_OUTSRC_SHIFT, mode_sel), +SOC_ENUM_SINGLE_S(FSL_AUDMIX_CTR, FSL_AUDMIX_CTR_OUTWIDTH_SHIFT, width_sel), +SOC_ENUM_SINGLE_S(FSL_AUDMIX_CTR, FSL_AUDMIX_CTR_MASKRTDF_SHIFT, mask_sel), +SOC_ENUM_SINGLE_S(FSL_AUDMIX_CTR, FSL_AUDMIX_CTR_MASKCKDF_SHIFT, mask_sel), +SOC_ENUM_SINGLE_S(FSL_AUDMIX_CTR, FSL_AUDMIX_CTR_SYNCMODE_SHIFT, endis_sel), +SOC_ENUM_SINGLE_S(FSL_AUDMIX_CTR, FSL_AUDMIX_CTR_SYNCSRC_SHIFT, tdm_sel), +/* FSL_AUDMIX_ATCR0 enums */ +SOC_ENUM_SINGLE_S(FSL_AUDMIX_ATCR0, 0, endis_sel), +SOC_ENUM_SINGLE_S(FSL_AUDMIX_ATCR0, 1, updn_sel), +/* FSL_AUDMIX_ATCR1 enums */ +SOC_ENUM_SINGLE_S(FSL_AUDMIX_ATCR1, 0, endis_sel), +SOC_ENUM_SINGLE_S(FSL_AUDMIX_ATCR1, 1, updn_sel), +}; + +struct fsl_audmix_state { + u8 tdms; + u8 clk; + char msg[64]; +}; + +static const struct fsl_audmix_state prms[4][4] = {{ + /* DIS->DIS, do nothing */ + { .tdms = 0, .clk = 0, .msg = "" }, + /* DIS->TDM1*/ + { .tdms = 1, .clk = 1, .msg = "DIS->TDM1: TDM1 not started!\n" }, + /* DIS->TDM2*/ + { .tdms = 2, .clk = 2, .msg = "DIS->TDM2: TDM2 not started!\n" }, + /* DIS->MIX */ + { .tdms = 3, .clk = 0, .msg = "DIS->MIX: Please start both TDMs!\n" } +}, { /* TDM1->DIS */ + { .tdms = 1, .clk = 0, .msg = "TDM1->DIS: TDM1 not started!\n" }, + /* TDM1->TDM1, do nothing */ + { .tdms = 0, .clk = 0, .msg = "" }, + /* TDM1->TDM2 */ + { .tdms = 3, .clk = 2, .msg = "TDM1->TDM2: Please start both TDMs!\n" }, + /* TDM1->MIX */ + { .tdms = 3, .clk = 0, .msg = "TDM1->MIX: Please start both TDMs!\n" } +}, { /* TDM2->DIS */ + { .tdms = 2, .clk = 0, .msg = "TDM2->DIS: TDM2 not started!\n" }, + /* TDM2->TDM1 */ + { .tdms = 3,
[PATCH v2 0/4] Add NXP AUDMIX device and machine drivers
The patchset adds NXP Audio Mixer (AUDMIX) device and machine drivers and related DT bindings documentation. Changes since V1: 1. Original patch split into distinct patches for the device driver and DT binding documentation. 2. Replaced AMIX with AUDMIX in both code and file names as it looks more RM-compliant. 3. Removed polarity control from CPU DAI driver as suggested by Nicolin. 4. Added machine driver and related DT binding documentation. Viorel Suman (4): ASoC: fsl: Add Audio Mixer CPU DAI driver ASoC: add fsl_audmix DT binding documentation ASoC: fsl: Add Audio Mixer machine driver ASoC: add imx-audmix DT binding documentation .../devicetree/bindings/sound/fsl,audmix.txt | 44 ++ .../devicetree/bindings/sound/imx-audmix.txt | 24 + sound/soc/fsl/Kconfig | 16 + sound/soc/fsl/Makefile | 5 + sound/soc/fsl/fsl_audmix.c | 551 + sound/soc/fsl/fsl_audmix.h | 101 sound/soc/fsl/imx-audmix.c | 328 7 files changed, 1069 insertions(+) create mode 100644 Documentation/devicetree/bindings/sound/fsl,audmix.txt create mode 100644 Documentation/devicetree/bindings/sound/imx-audmix.txt create mode 100644 sound/soc/fsl/fsl_audmix.c create mode 100644 sound/soc/fsl/fsl_audmix.h create mode 100644 sound/soc/fsl/imx-audmix.c -- 2.7.4
[PATCH] powerpc: build virtex dtb
I wanted to test the virtex440-ml507 qemu machine and found that the dtb for it was not builded. All powerpc DTB are only built when CONFIG_OF_ALL_DTBS is set which depend on COMPILE_TEST. This patchs adds build of virtex dtbs depending on CONFIG_XILINX_VIRTEX440_GENERIC_BOARD option. Signed-off-by: Corentin Labbe --- arch/powerpc/boot/dts/Makefile | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/boot/dts/Makefile b/arch/powerpc/boot/dts/Makefile index fb335d05aae8..cae14fca682e 100644 --- a/arch/powerpc/boot/dts/Makefile +++ b/arch/powerpc/boot/dts/Makefile @@ -4,3 +4,5 @@ subdir-y += fsl dtstree:= $(srctree)/$(src) dtb-$(CONFIG_OF_ALL_DTBS) := $(patsubst $(dtstree)/%.dts,%.dtb, $(wildcard $(dtstree)/*.dts)) +dtb-$(CONFIG_XILINX_VIRTEX440_GENERIC_BOARD) += virtex440-ml507.dtb +dtb-$(CONFIG_XILINX_VIRTEX440_GENERIC_BOARD) += virtex440-ml510.dtb -- 2.19.2
Re: [PATCH 2/2] powerpc: Show PAGE_SIZE in __die() output
Le 08/01/2019 à 13:05, Michael Ellerman a écrit : The page size the kernel is built with is useful info when debugging a crash, so add it to the output in __die(). Result looks like eg: kernel BUG at drivers/misc/lkdtm/bugs.c:63! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K SMP NR_CPUS=2048 NUMA pSeries Modules linked in: vmx_crypto kvm binfmt_misc ip_tables Signed-off-by: Michael Ellerman --- arch/powerpc/kernel/traps.c | 12 1 file changed, 12 insertions(+) diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index 431a86d3f772..fc972e4eee5f 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -268,6 +268,18 @@ static int __die(const char *str, struct pt_regs *regs, long err) else seq_buf_puts(, "BE "); + seq_buf_puts(, "PAGE_SIZE="); + if (IS_ENABLED(CONFIG_PPC_4K_PAGES)) + seq_buf_puts(, "4K "); + else if (IS_ENABLED(CONFIG_PPC_16K_PAGES)) + seq_buf_puts(, "16K "); + else if (IS_ENABLED(CONFIG_PPC_64K_PAGES)) + seq_buf_puts(, "64K "); + else if (IS_ENABLED(CONFIG_PPC_256K_PAGES)) + seq_buf_puts(, "256K "); Can't we build all the above at once using PAGE_SHIFT ? Something like (untested): "%dK ", 1 << (PAGE_SHIFT - 10) Christophe + else + BUILD_BUG_ON(1); + if (IS_ENABLED(CONFIG_PREEMPT)) seq_buf_puts(, "PREEMPT ");
Re: [PATCH 1/2] powerpc: Use seq_buf to avoid pr_cont() in __die()
Le 08/01/2019 à 13:04, Michael Ellerman a écrit : Using pr_cont() risks having our output interleaved with other output from other CPUs. Instead use a seq_buf to construct the line and then print it as a whole. Why not simply doing a single printk() or similar on the same model as X86 for instance ? (https://elixir.bootlin.com/linux/v5.0-rc1/source/arch/x86/kernel/dumpstack.c#L368) Christophe Signed-off-by: Michael Ellerman --- arch/powerpc/kernel/traps.c | 23 --- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index 64936b60d521..431a86d3f772 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -38,6 +38,7 @@ #include #include #include +#include #include #include #include @@ -255,26 +256,34 @@ NOKPROBE_SYMBOL(oops_end); static int __die(const char *str, struct pt_regs *regs, long err) { + char buf[128]; /* enough for all flags and a long platform name */ + struct seq_buf s; + + seq_buf_init(, buf, sizeof(buf)); + printk("Oops: %s, sig: %ld [#%d]\n", str, err, ++die_counter); if (IS_ENABLED(CONFIG_CPU_LITTLE_ENDIAN)) - printk("LE "); + seq_buf_puts(, "LE "); else - printk("BE "); + seq_buf_puts(, "BE "); if (IS_ENABLED(CONFIG_PREEMPT)) - pr_cont("PREEMPT "); + seq_buf_puts(, "PREEMPT "); if (IS_ENABLED(CONFIG_SMP)) - pr_cont("SMP NR_CPUS=%d ", NR_CPUS); + seq_buf_printf(, "SMP NR_CPUS=%d ", NR_CPUS); if (debug_pagealloc_enabled()) - pr_cont("DEBUG_PAGEALLOC "); + seq_buf_puts(, "DEBUG_PAGEALLOC "); if (IS_ENABLED(CONFIG_NUMA)) - pr_cont("NUMA "); + seq_buf_puts(, "NUMA "); + + if (ppc_md.name) + seq_buf_puts(, ppc_md.name); - pr_cont("%s\n", ppc_md.name ? ppc_md.name : ""); + printk("%s\n", buf); if (notify_die(DIE_OOPS, str, regs, err, 255, SIGSEGV) == NOTIFY_STOP) return 1;
[PATCH 2/2] powerpc: Show PAGE_SIZE in __die() output
The page size the kernel is built with is useful info when debugging a crash, so add it to the output in __die(). Result looks like eg: kernel BUG at drivers/misc/lkdtm/bugs.c:63! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K SMP NR_CPUS=2048 NUMA pSeries Modules linked in: vmx_crypto kvm binfmt_misc ip_tables Signed-off-by: Michael Ellerman --- arch/powerpc/kernel/traps.c | 12 1 file changed, 12 insertions(+) diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index 431a86d3f772..fc972e4eee5f 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -268,6 +268,18 @@ static int __die(const char *str, struct pt_regs *regs, long err) else seq_buf_puts(, "BE "); + seq_buf_puts(, "PAGE_SIZE="); + if (IS_ENABLED(CONFIG_PPC_4K_PAGES)) + seq_buf_puts(, "4K "); + else if (IS_ENABLED(CONFIG_PPC_16K_PAGES)) + seq_buf_puts(, "16K "); + else if (IS_ENABLED(CONFIG_PPC_64K_PAGES)) + seq_buf_puts(, "64K "); + else if (IS_ENABLED(CONFIG_PPC_256K_PAGES)) + seq_buf_puts(, "256K "); + else + BUILD_BUG_ON(1); + if (IS_ENABLED(CONFIG_PREEMPT)) seq_buf_puts(, "PREEMPT "); -- 2.20.1
[PATCH 1/2] powerpc: Use seq_buf to avoid pr_cont() in __die()
Using pr_cont() risks having our output interleaved with other output from other CPUs. Instead use a seq_buf to construct the line and then print it as a whole. Signed-off-by: Michael Ellerman --- arch/powerpc/kernel/traps.c | 23 --- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index 64936b60d521..431a86d3f772 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -38,6 +38,7 @@ #include #include #include +#include #include #include #include @@ -255,26 +256,34 @@ NOKPROBE_SYMBOL(oops_end); static int __die(const char *str, struct pt_regs *regs, long err) { + char buf[128]; /* enough for all flags and a long platform name */ + struct seq_buf s; + + seq_buf_init(, buf, sizeof(buf)); + printk("Oops: %s, sig: %ld [#%d]\n", str, err, ++die_counter); if (IS_ENABLED(CONFIG_CPU_LITTLE_ENDIAN)) - printk("LE "); + seq_buf_puts(, "LE "); else - printk("BE "); + seq_buf_puts(, "BE "); if (IS_ENABLED(CONFIG_PREEMPT)) - pr_cont("PREEMPT "); + seq_buf_puts(, "PREEMPT "); if (IS_ENABLED(CONFIG_SMP)) - pr_cont("SMP NR_CPUS=%d ", NR_CPUS); + seq_buf_printf(, "SMP NR_CPUS=%d ", NR_CPUS); if (debug_pagealloc_enabled()) - pr_cont("DEBUG_PAGEALLOC "); + seq_buf_puts(, "DEBUG_PAGEALLOC "); if (IS_ENABLED(CONFIG_NUMA)) - pr_cont("NUMA "); + seq_buf_puts(, "NUMA "); + + if (ppc_md.name) + seq_buf_puts(, ppc_md.name); - pr_cont("%s\n", ppc_md.name ? ppc_md.name : ""); + printk("%s\n", buf); if (notify_die(DIE_OOPS, str, regs, err, 255, SIGSEGV) == NOTIFY_STOP) return 1; -- 2.20.1
Re: [PATCH] powerpc/irq: drop arch_early_irq_init()
Oops, forgot to actually copy Thomas. Le 08/01/2019 à 12:37, Christophe Leroy a écrit : arch_early_irq_init() does nothing different than the weak arch_early_irq_init() in kernel/softirq.c Fixes: 089fb442f301 ("powerpc: Use ARCH_IRQ_INIT_FLAGS") Cc: Thomas Gleixner Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/irq.c | 5 - 1 file changed, 5 deletions(-) diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c index 916ddc4aac44..bb299613a462 100644 --- a/arch/powerpc/kernel/irq.c +++ b/arch/powerpc/kernel/irq.c @@ -827,11 +827,6 @@ int irq_choose_cpu(const struct cpumask *mask) } #endif -int arch_early_irq_init(void) -{ - return 0; -} - #ifdef CONFIG_PPC64 static int __init setup_noirqdistrib(char *str) {
[PATCH] powerpc/irq: drop arch_early_irq_init()
arch_early_irq_init() does nothing different than the weak arch_early_irq_init() in kernel/softirq.c Fixes: 089fb442f301 ("powerpc: Use ARCH_IRQ_INIT_FLAGS") Cc: Thomas Gleixner Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/irq.c | 5 - 1 file changed, 5 deletions(-) diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c index 916ddc4aac44..bb299613a462 100644 --- a/arch/powerpc/kernel/irq.c +++ b/arch/powerpc/kernel/irq.c @@ -827,11 +827,6 @@ int irq_choose_cpu(const struct cpumask *mask) } #endif -int arch_early_irq_init(void) -{ - return 0; -} - #ifdef CONFIG_PPC64 static int __init setup_noirqdistrib(char *str) { -- 2.13.3
Re: [PATCH v3 1/2] selftests/powerpc: Add MSR bits
On 1/8/19 9:14 AM, Christophe Leroy wrote: > Le 08/01/2019 à 12:11, Breno Leitao a écrit : >> Hi Michael, >> >> On 1/8/19 7:20 AM, Michael Ellerman wrote: >>> Breno Leitao writes: >>> hi Christophe, On 1/3/19 3:19 PM, LEROY Christophe wrote: > Breno Leitao a écrit : > >> This patch simply adds definitions for the MSR bits and some macros to >> test for MSR TM bits. >> >> This was copied from arch/powerpc/include/asm/reg.h generic MSR part. > > Can't we find a way to avoid duplicating such defines ? I think there are three possible ways, but none of them respect the premises we are used too. These are the possible ways I can think of: 1) Including arch/powerpc/include/asm as part of the selftest compilation process. Problem: This might break the selftest independence of the kbuild system. 2) Generate a temporary header file inside selftests/include which contains these macros at compilation time. Problem: The problem as above. 3) Define MSR fields at userspace headers (/usr/include). Problem: I am not sure userspace should have MSR bits information. Do you suggest me to investigate any other way? >>> >>> In this case I think we can probably just copy the few #defines we need. >> >> Cool. That is the very same thing as suggested by Christophe. I sent the v4 >> yesterday, with this change already. > > It didn't get through. Latest is still v3 it seems, look > https://patchwork.ozlabs.org/project/linuxppc-dev/list/?submitter=71262 Weird, I am not finding it as well, although I've just confirmed that I have, in fact, sent it. Anyway, let me send it again. Thanks!
[PATCH v4] selftests/powerpc: New TM signal self test
A new self test that forces MSR[TS] to be set without calling any TM instruction. This test also tries to cause a page fault at a signal handler, exactly between MSR[TS] set and tm_recheckpoint(), forcing thread->texasr to be rewritten with TEXASR[FS] = 0, which will cause a BUG when tm_recheckpoint() is called. This test is not deterministic, since it is hard to guarantee that the page access will cause a page fault. In order to force more page faults at signal context, the signal handler and the ucontext are being mapped into a MADV_DONTNEED memory chunks. Tests have shown that the bug could be exposed with few interactions in a buggy kernel. This test is configured to loop 5000x, having a good chance to hit the kernel issue in just one run. This self test takes less than two seconds to run. This test uses set/getcontext because the kernel will recheckpoint zeroed structures, causing the test to segfault, which is undesired because the test needs to rerun, so, there is a signal handler for SIGSEGV which will restart the test. v2: Uses the MADV_DONTNEED memory advice v3: Fix memcpy and 32-bits compilation v4: Does not define unused macros Signed-off-by: Breno Leitao --- tools/testing/selftests/powerpc/include/reg.h | 8 + .../testing/selftests/powerpc/include/utils.h | 2 + tools/testing/selftests/powerpc/tm/.gitignore | 1 + tools/testing/selftests/powerpc/tm/Makefile | 4 +- .../powerpc/tm/tm-signal-context-force-tm.c | 184 ++ 5 files changed, 198 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/powerpc/tm/tm-signal-context-force-tm.c diff --git a/tools/testing/selftests/powerpc/include/reg.h b/tools/testing/selftests/powerpc/include/reg.h index 52b4710469d2..a69ab6e2afd9 100644 --- a/tools/testing/selftests/powerpc/include/reg.h +++ b/tools/testing/selftests/powerpc/include/reg.h @@ -77,6 +77,14 @@ #define TEXASR_TE 0x0400 #define TEXASR_ROT 0x0200 +/* MSR register bits */ +#define MSR_TS_S_LG 33 /* Trans Mem state: Suspended */ + +#define __MASK(X) (1UL<<(X)) + +/* macro to check TM MSR bits */ +#define MSR_TS_S__MASK(MSR_TS_S_LG) /* Transaction Suspended */ + /* Vector Instructions */ #define VSX_XX1(xs, ra, rb)(((xs) & 0x1f) << 21 | ((ra) << 16) | \ ((rb) << 11) | (((xs) >> 5))) diff --git a/tools/testing/selftests/powerpc/include/utils.h b/tools/testing/selftests/powerpc/include/utils.h index ae43a614835d..7636bf45d5d5 100644 --- a/tools/testing/selftests/powerpc/include/utils.h +++ b/tools/testing/selftests/powerpc/include/utils.h @@ -102,8 +102,10 @@ do { \ #if defined(__powerpc64__) #define UCONTEXT_NIA(UC) (UC)->uc_mcontext.gp_regs[PT_NIP] +#define UCONTEXT_MSR(UC) (UC)->uc_mcontext.gp_regs[PT_MSR] #elif defined(__powerpc__) #define UCONTEXT_NIA(UC) (UC)->uc_mcontext.uc_regs->gregs[PT_NIP] +#define UCONTEXT_MSR(UC) (UC)->uc_mcontext.uc_regs->gregs[PT_MSR] #else #error implement UCONTEXT_NIA #endif diff --git a/tools/testing/selftests/powerpc/tm/.gitignore b/tools/testing/selftests/powerpc/tm/.gitignore index 208452a93e2c..951fe855f7cd 100644 --- a/tools/testing/selftests/powerpc/tm/.gitignore +++ b/tools/testing/selftests/powerpc/tm/.gitignore @@ -11,6 +11,7 @@ tm-signal-context-chk-fpu tm-signal-context-chk-gpr tm-signal-context-chk-vmx tm-signal-context-chk-vsx +tm-signal-context-force-tm tm-signal-sigreturn-nt tm-vmx-unavail tm-unavailable diff --git a/tools/testing/selftests/powerpc/tm/Makefile b/tools/testing/selftests/powerpc/tm/Makefile index 75a685359129..c0734ed0ef56 100644 --- a/tools/testing/selftests/powerpc/tm/Makefile +++ b/tools/testing/selftests/powerpc/tm/Makefile @@ -4,7 +4,8 @@ SIGNAL_CONTEXT_CHK_TESTS := tm-signal-context-chk-gpr tm-signal-context-chk-fpu TEST_GEN_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv tm-signal-stack \ tm-vmxcopy tm-fork tm-tar tm-tmspr tm-vmx-unavail tm-unavailable tm-trap \ - $(SIGNAL_CONTEXT_CHK_TESTS) tm-sigreturn tm-signal-sigreturn-nt + $(SIGNAL_CONTEXT_CHK_TESTS) tm-sigreturn tm-signal-sigreturn-nt \ + tm-signal-context-force-tm top_srcdir = ../../../../.. include ../../lib.mk @@ -20,6 +21,7 @@ $(OUTPUT)/tm-vmx-unavail: CFLAGS += -pthread -m64 $(OUTPUT)/tm-resched-dscr: ../pmu/lib.c $(OUTPUT)/tm-unavailable: CFLAGS += -O0 -pthread -m64 -Wno-error=uninitialized -mvsx $(OUTPUT)/tm-trap: CFLAGS += -O0 -pthread -m64 +$(OUTPUT)/tm-signal-context-force-tm: CFLAGS += -pthread -m64 SIGNAL_CONTEXT_CHK_TESTS := $(patsubst %,$(OUTPUT)/%,$(SIGNAL_CONTEXT_CHK_TESTS)) $(SIGNAL_CONTEXT_CHK_TESTS): tm-signal.S diff --git a/tools/testing/selftests/powerpc/tm/tm-signal-context-force-tm.c b/tools/testing/selftests/powerpc/tm/tm-signal-context-force-tm.c new file mode 100644 index ..31717625f318 --- /dev/null +++
Re: [PATCH v3 1/2] selftests/powerpc: Add MSR bits
Le 08/01/2019 à 12:11, Breno Leitao a écrit : Hi Michael, On 1/8/19 7:20 AM, Michael Ellerman wrote: Breno Leitao writes: hi Christophe, On 1/3/19 3:19 PM, LEROY Christophe wrote: Breno Leitao a écrit : This patch simply adds definitions for the MSR bits and some macros to test for MSR TM bits. This was copied from arch/powerpc/include/asm/reg.h generic MSR part. Can't we find a way to avoid duplicating such defines ? I think there are three possible ways, but none of them respect the premises we are used too. These are the possible ways I can think of: 1) Including arch/powerpc/include/asm as part of the selftest compilation process. Problem: This might break the selftest independence of the kbuild system. 2) Generate a temporary header file inside selftests/include which contains these macros at compilation time. Problem: The problem as above. 3) Define MSR fields at userspace headers (/usr/include). Problem: I am not sure userspace should have MSR bits information. Do you suggest me to investigate any other way? In this case I think we can probably just copy the few #defines we need. Cool. That is the very same thing as suggested by Christophe. I sent the v4 yesterday, with this change already. It didn't get through. Latest is still v3 it seems, look https://patchwork.ozlabs.org/project/linuxppc-dev/list/?submitter=71262 Christophe Thanks for looking at it.
Re: [PATCH v3 1/2] selftests/powerpc: Add MSR bits
Hi Michael, On 1/8/19 7:20 AM, Michael Ellerman wrote: > Breno Leitao writes: > >> hi Christophe, >> >> On 1/3/19 3:19 PM, LEROY Christophe wrote: >>> Breno Leitao a écrit : >>> This patch simply adds definitions for the MSR bits and some macros to test for MSR TM bits. This was copied from arch/powerpc/include/asm/reg.h generic MSR part. >>> >>> Can't we find a way to avoid duplicating such defines ? >> >> I think there are three possible ways, but none of them respect the premises >> we are used too. These are the possible ways I can think of: >> >> 1) Including arch/powerpc/include/asm as part of the selftest compilation >> process. >>Problem: This might break the selftest independence of the kbuild system. >> >> 2) Generate a temporary header file inside selftests/include which contains >> these macros at compilation time. >>Problem: The problem as above. >> >> 3) Define MSR fields at userspace headers (/usr/include). >>Problem: I am not sure userspace should have MSR bits information. >> >> Do you suggest me to investigate any other way? > > In this case I think we can probably just copy the few #defines we need. Cool. That is the very same thing as suggested by Christophe. I sent the v4 yesterday, with this change already. Thanks for looking at it.
Re: [PATCH v4 11/13] x86: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs
On Mon, Jan 07, 2019 at 04:27:28PM +, Andrew Murray wrote: This patch has the exact same subject as the previous one.. that seems sub-optimal.
Re: [PATCH v4 10/13] x86: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs
On Mon, Jan 07, 2019 at 04:27:27PM +, Andrew Murray wrote: > For drivers that do not support context exclusion let's advertise the > PERF_PMU_CAP_NOEXCLUDE capability. This ensures that perf will > prevent us from handling events where any exclusion flags are set. > Let's also remove the now unnecessary check for exclusion flags. > > Signed-off-by: Andrew Murray > --- > arch/x86/events/amd/ibs.c | 13 + > arch/x86/events/amd/power.c| 10 ++ > arch/x86/events/intel/cstate.c | 12 +++- > arch/x86/events/intel/rapl.c | 9 ++--- > arch/x86/events/intel/uncore_snb.c | 9 ++--- > arch/x86/events/msr.c | 10 ++ > 6 files changed, 12 insertions(+), 51 deletions(-) You (correctly) don't add CAP_NO_EXCLUDE to the main x86 pmu code, but then you also don't check if it handles all the various exclude options correctly/consistently. Now; I must admit that that is a bit of a maze, but I think we can at least add exclude_idle and exclude_hv fails in there, nothing uses those afaict. On the various exclude options; they are as follows (IIUC): - exclude_guest: we're a HV/host-kernel and we don't want the counter to run when we run a guest context. - exclude_host: we're a HV/host-kernel and we don't want the counter to run when we run in host context. - exclude_hv: we're a guest and don't want the counter to run in HV context. Now, KVM always implies exclude_hv afaict (for guests), I'm not sure what, if anything Xen does on x86 (IIRC Brendan Gregg once said perf works on Xen) -- nor quite sure who to ask, Boris, Jeurgen?
Re: [PATCH v4 05/13] arm: perf: conditionally use PERF_PMU_CAP_NO_EXCLUDE
On Mon, Jan 07, 2019 at 04:27:22PM +, Andrew Murray wrote: > @@ -393,9 +386,8 @@ __hw_perf_event_init(struct perf_event *event) > /* >* Check whether we need to exclude the counter from certain modes. >*/ > + if (armpmu->set_event_filter && > + armpmu->set_event_filter(hwc, >attr)) { > pr_debug("ARM performance counters do not support " >"mode exclusion\n"); > return -EOPNOTSUPP; This then requires all set_event_filter() implementations to check all the various exclude options; also, set_event_filter() failing then returns with -EOPNOTSUPP instead of the -EINVAL the CAP_NO_EXCLUDE generates, which is again inconsitent. If I look at (the very first git-grep found me) armv7pmu_set_event_filter(), then I find it returning -EPERM (again inconsistent but irrelevant because the actual value is not preserved) for exclude_idle. But it doesn't seem to check exclude_host at all for example. > @@ -867,6 +859,9 @@ int armpmu_register(struct arm_pmu *pmu) > if (ret) > return ret; > > + if (!pmu->set_event_filter) > + pmu->pmu.capabilities |= PERF_PMU_CAP_NO_EXCLUDE; > + > ret = perf_pmu_register(>pmu, pmu->name, -1); > if (ret) > goto out_destroy; > -- > 2.7.4 >
Re: powerpc/pasemi: Add Nemo board IRQ initroutine
Darren Stevens writes: > Hello Michael > > On 22/12/2018, Michael Ellerman wrote: >> On Sun, 2018-08-19 at 20:21:47 UTC, Darren Stevens wrote: >>> Add a IRQ init routine for the Nemo board which inits and attatches >>> the i8259 found in the SB600, and a cascade routine to dispatch the >>> interrupts. >>> >>> Signed-off-by: Darren Stevens >> >> Applied to powerpc next, thanks. > > Thankyou very much! We can now build a X1000 kernel without an out-of-tree > patch, which is good. Yay! >> https://git.kernel.org/powerpc/c/51f4cc2047a4b7e9bf1b49acf06c11 > > Sorry for not mentioning this earlier(holidays etc..), but this commit > has my 4th patch, but the commit message from the 2nd patch. I don't > know how, and it is not a major problem, but can we get it updated? Urgh, cr#p. Not sure how I messed that up. I did have to edit all the patches a bit to get them to apply so presumably somewhere along the line the commit message got messed up to. Unfortunately there's no way to fix it. This email will have to suffice as documentation that it was me that screwed it up and not you. cheers
Re: [PATCH] selftests/powerpc: New TM signal self test
Breno Leitao writes: > On 12/20/18 10:51 AM, Michael Ellerman wrote: >> Breno Leitao writes: >> >>> A new self test that forces MSR[TS] to be set without calling any TM >>> instruction. This test also tries to cause a page fault at a signal >>> handler, exactly between MSR[TS] set and tm_recheckpoint(), forcing >>> thread->texasr to be rewritten with TEXASR[FS] = 0, which will cause a BUG >>> when tm_recheckpoint() is called. >>> >>> This test is not deterministic since it is hard to guarantee that the page >>> access will cause a page fault. Tests have shown that the bug could be >>> exposed with few interactions in a buggy kernel. This test is configured to >>> loop 5000x, having a good chance to hit the kernel issue in just one run. >>> This self test takes less than two seconds to run. >>> >>> This test uses set/getcontext because the kernel will recheckpoint >>> zeroed structures, causing the test to segfault, which is undesired because >>> the test needs to rerun, so, there is a signal handler for SIGSEGV which >>> will restart the test. >> And reference the ucontext->mcontext MSR using UCONTEXT_MSR() macro. >> Hi Breno, >> >> Thanks for the test, some of these TM tests are getting pretty advanced! :) >> >> Unfortunately it doesn't build in a few configurations. >> >> On Ubuntu 18.10 built with powerpc-linux-gnu-gcc I get: >> >> tm-signal-force-msr.c: In function 'trap_signal_handler': >> tm-signal-force-msr.c:42:19: error: 'union uc_regs_ptr' has no member >> named 'gp_regs'; did you mean 'uc_regs'? >> ucp->uc_mcontext.gp_regs[PT_MSR] |= MSR_TS_S; >> ^~~ >> uc_regs >> tm-signal-force-msr.c:17:29: error: left shift count >= width of type >> [-Werror=shift-count-overflow] >>#define __MASK(X) (1UL<<(X)) >>^~ >> tm-signal-force-msr.c:20:25: note: in expansion of macro '__MASK' >>#define MSR_TS_S__MASK(MSR_TS_S_LG) /* Transaction Suspended */ >>^~ >> tm-signal-force-msr.c:42:38: note: in expansion of macro 'MSR_TS_S' >> ucp->uc_mcontext.gp_regs[PT_MSR] |= MSR_TS_S; >> ^~~~ >> > > That is because I missed the -m64 compilation flag on Makefile. I understand > that this test only make sense when compiled in 64 bits. Do you agree? I think the test could work as a 32-bit binary on a 64-bit kernel, but I don't mind if you force it to build 64-bit. cheers
Re: [PATCH] powerpc: build dtb even without COMPILE_TEST
Rob Herring writes: > On Fri, Jan 4, 2019 at 3:58 AM Corentin Labbe wrote: >> >> I wanted to test the virtex440-ml507 qemu machine and found that the dtb >> for it was not builded. > > Just do: > > make virtex440-ml507.dtb I actually thought you had to do that, so I've never done anything different. >> All powerpc DTB are only built when CONFIG_OF_ALL_DTBS is set which depend on >> COMPILE_TEST. >> But building DTB is not related to a "compile build test". > > But it is. We normally only build the dtbs for enabled platforms just > like we only build platform/driver code for enabled platforms. It's > hidden behind COMPILE_TEST so it only gets enabled for > allmodconfig/allyesconfig builds. > >> So this patch made building of DTB independent of COMPILE_TEST (by >> depending only on the PPC arch) >> A better selection of which DTB to build could be done in the future >> like that do the ARM arch. > > No reason you can't start that now for the platform you care about. > You don't have to convert everyone. Yeah that seems like the best option. I'm happy to take a patch just for that target. cheers
Re: [PATCH v4 1/6] powerpc: prefer memblock APIs returning virtual address
Le 31/12/2018 à 10:29, Mike Rapoport a écrit : There are a several places that allocate memory using memblock APIs that return a physical address, convert the returned address to the virtual address and frequently also memset(0) the allocated range. Update these places to use memblock allocators already returning a virtual address. Use memblock functions that clear the allocated memory instead of calling memset(0) where appropriate. The calls to memblock_alloc_base() that were not followed by memset(0) are replaced with memblock_alloc_try_nid_raw(). Since the latter does not panic() when the allocation fails, the appropriate panic() calls are added to the call sites. Signed-off-by: Mike Rapoport --- arch/powerpc/kernel/paca.c | 16 ++-- arch/powerpc/kernel/setup_64.c | 24 ++-- arch/powerpc/mm/hash_utils_64.c| 6 +++--- arch/powerpc/mm/pgtable-book3e.c | 8 ++-- arch/powerpc/mm/pgtable-book3s64.c | 5 + arch/powerpc/mm/pgtable-radix.c| 25 +++-- arch/powerpc/platforms/pasemi/iommu.c | 5 +++-- arch/powerpc/platforms/pseries/setup.c | 18 ++ arch/powerpc/sysdev/dart_iommu.c | 7 +-- 9 files changed, 51 insertions(+), 63 deletions(-) diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c index 913bfca..276d36d4 100644 --- a/arch/powerpc/kernel/paca.c +++ b/arch/powerpc/kernel/paca.c @@ -27,7 +27,7 @@ static void *__init alloc_paca_data(unsigned long size, unsigned long align, unsigned long limit, int cpu) { - unsigned long pa; + void *ptr; int nid; /* @@ -42,17 +42,15 @@ static void *__init alloc_paca_data(unsigned long size, unsigned long align, nid = early_cpu_to_node(cpu); } - pa = memblock_alloc_base_nid(size, align, limit, nid, MEMBLOCK_NONE); - if (!pa) { - pa = memblock_alloc_base(size, align, limit); - if (!pa) - panic("cannot allocate paca data"); - } + ptr = memblock_alloc_try_nid(size, align, MEMBLOCK_LOW_LIMIT, +limit, nid); + if (!ptr) + panic("cannot allocate paca data"); AFAIKS, memblock_alloc_try_nid() panics if memblock_alloc_internal() returns NULL, so the above panic is useless, isn't it ? if (cpu == boot_cpuid) memblock_set_bottom_up(false); - return __va(pa); + return ptr; } #ifdef CONFIG_PPC_PSERIES @@ -118,7 +116,6 @@ static struct slb_shadow * __init new_slb_shadow(int cpu, unsigned long limit) } s = alloc_paca_data(sizeof(*s), L1_CACHE_BYTES, limit, cpu); - memset(s, 0, sizeof(*s)); s->persistent = cpu_to_be32(SLB_NUM_BOLTED); s->buffer_length = cpu_to_be32(sizeof(*s)); @@ -222,7 +219,6 @@ void __init allocate_paca(int cpu) paca = alloc_paca_data(sizeof(struct paca_struct), L1_CACHE_BYTES, limit, cpu); paca_ptrs[cpu] = paca; - memset(paca, 0, sizeof(struct paca_struct)); initialise_paca(paca, cpu); #ifdef CONFIG_PPC_PSERIES diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index 236c115..3dcd779 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -634,19 +634,17 @@ __init u64 ppc64_bolted_size(void) static void *__init alloc_stack(unsigned long limit, int cpu) { - unsigned long pa; + void *ptr; BUILD_BUG_ON(STACK_INT_FRAME_SIZE % 16); - pa = memblock_alloc_base_nid(THREAD_SIZE, THREAD_SIZE, limit, - early_cpu_to_node(cpu), MEMBLOCK_NONE); - if (!pa) { - pa = memblock_alloc_base(THREAD_SIZE, THREAD_SIZE, limit); - if (!pa) - panic("cannot allocate stacks"); - } + ptr = memblock_alloc_try_nid(THREAD_SIZE, THREAD_SIZE, +MEMBLOCK_LOW_LIMIT, limit, +early_cpu_to_node(cpu)); + if (!ptr) + panic("cannot allocate stacks"); Same ? Christophe - return __va(pa); + return ptr; } void __init irqstack_early_init(void) @@ -739,20 +737,17 @@ void __init emergency_stack_init(void) struct thread_info *ti; ti = alloc_stack(limit, i); - memset(ti, 0, THREAD_SIZE); emerg_stack_init_thread_info(ti, i); paca_ptrs[i]->emergency_sp = (void *)ti + THREAD_SIZE; #ifdef CONFIG_PPC_BOOK3S_64 /* emergency stack for NMI exception handling. */ ti = alloc_stack(limit, i); - memset(ti, 0, THREAD_SIZE); emerg_stack_init_thread_info(ti, i); paca_ptrs[i]->nmi_emergency_sp = (void *)ti + THREAD_SIZE; /* emergency stack for machine
Re: [PATCH v2 2/2] powerpc: use probe_user_read()
On Tue, 2019-01-08 at 10:37 +0100, Christophe Leroy wrote: > Hi Michael and Russell, > > Any idea why: > - checkpatch reports missing Signed-off-by: > - Snowpatch build fails on PPC64 (it seems unrelated to the patch, > something wrong in lib/genalloc.c) Upstream kernel broke for powerpc (snowpatch applies patches on top of powerpc/next), it was fixed in commit 35004f2e55807a1a1491db24ab512dd2f770a130 which I believe is in powerpc/next now. I will look at rerunning tests for all the patches that this impacted. As for the S-o-b, no clue, I'll have a look. Thanks for the report! - Russell > > Thanks > Christophe > > Le 08/01/2019 à 08:37, Christophe Leroy a écrit : > > Instead of opencoding, use probe_user_read() to failessly > > read a user location. > > > > Signed-off-by: Christophe Leroy > > --- > > v2: Using probe_user_read() instead of probe_user_address() > > > > arch/powerpc/kernel/process.c | 12 +--- > > arch/powerpc/mm/fault.c | 6 +- > > arch/powerpc/perf/callchain.c | 20 +++- > > arch/powerpc/perf/core-book3s.c | 8 +--- > > arch/powerpc/sysdev/fsl_pci.c | 10 -- > > 5 files changed, 10 insertions(+), 46 deletions(-) > > > > diff --git a/arch/powerpc/kernel/process.c > > b/arch/powerpc/kernel/process.c > > index ce393df243aa..6a4b59d574c2 100644 > > --- a/arch/powerpc/kernel/process.c > > +++ b/arch/powerpc/kernel/process.c > > @@ -1298,16 +1298,6 @@ void show_user_instructions(struct pt_regs > > *regs) > > > > pc = regs->nip - (NR_INSN_TO_PRINT * 3 / 4 * sizeof(int)); > > > > - /* > > -* Make sure the NIP points at userspace, not kernel text/data > > or > > -* elsewhere. > > -*/ > > - if (!__access_ok(pc, NR_INSN_TO_PRINT * sizeof(int), USER_DS)) > > { > > - pr_info("%s[%d]: Bad NIP, not dumping instructions.\n", > > - current->comm, current->pid); > > - return; > > - } > > - > > seq_buf_init(, buf, sizeof(buf)); > > > > while (n) { > > @@ -1318,7 +1308,7 @@ void show_user_instructions(struct pt_regs > > *regs) > > for (i = 0; i < 8 && n; i++, n--, pc += sizeof(int)) { > > int instr; > > > > - if (probe_kernel_address((const void *)pc, > > instr)) { > > + if (probe_user_read(, (void __user *)pc, > > sizeof(instr))) { > > seq_buf_printf(, " "); > > continue; > > } > > diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c > > index 887f11bcf330..ec74305fa330 100644 > > --- a/arch/powerpc/mm/fault.c > > +++ b/arch/powerpc/mm/fault.c > > @@ -276,12 +276,8 @@ static bool bad_stack_expansion(struct pt_regs > > *regs, unsigned long address, > > if ((flags & FAULT_FLAG_WRITE) && (flags & > > FAULT_FLAG_USER) && > > access_ok(nip, sizeof(*nip))) { > > unsigned int inst; > > - int res; > > > > - pagefault_disable(); > > - res = __get_user_inatomic(inst, nip); > > - pagefault_enable(); > > - if (!res) > > + if (!probe_user_read(, nip, sizeof(inst))) > > return !store_updates_sp(inst); > > *must_retry = true; > > } > > diff --git a/arch/powerpc/perf/callchain.c > > b/arch/powerpc/perf/callchain.c > > index 0af051a1974e..0680efb2237b 100644 > > --- a/arch/powerpc/perf/callchain.c > > +++ b/arch/powerpc/perf/callchain.c > > @@ -159,12 +159,8 @@ static int read_user_stack_64(unsigned long > > __user *ptr, unsigned long *ret) > > ((unsigned long)ptr & 7)) > > return -EFAULT; > > > > - pagefault_disable(); > > - if (!__get_user_inatomic(*ret, ptr)) { > > - pagefault_enable(); > > + if (!probe_user_read(ret, ptr, sizeof(*ret))) > > return 0; > > - } > > - pagefault_enable(); > > > > return read_user_stack_slow(ptr, ret, 8); > > } > > @@ -175,12 +171,8 @@ static int read_user_stack_32(unsigned int > > __user *ptr, unsigned int *ret) > > ((unsigned long)ptr & 3)) > > return -EFAULT; > > > > - pagefault_disable(); > > - if (!__get_user_inatomic(*ret, ptr)) { > > - pagefault_enable(); > > + if (!probe_user_read(ret, ptr, sizeof(*ret))) > > return 0; > > - } > > - pagefault_enable(); > > > > return read_user_stack_slow(ptr, ret, 4); > > } > > @@ -307,17 +299,11 @@ static inline int current_is_64bit(void) > >*/ > > static int read_user_stack_32(unsigned int __user *ptr, unsigned > > int *ret) > > { > > - int rc; > > - > > if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) || > > ((unsigned long)ptr & 3)) > > return -EFAULT; > > > > - pagefault_disable(); > > - rc = __get_user_inatomic(*ret, ptr); > > -
Re: [1/2] powerpc/4xx/ocm: Fix phys_addr_t printf warnings
Christian Lamparter writes: > On Wednesday, January 2, 2019 12:31:50 PM CET Michael Ellerman wrote: >> On Tue, 2019-01-01 at 03:56:00 UTC, Michael Ellerman wrote: >> > Currently the code produces several warnings, eg: >> > >> > arch/powerpc/platforms/4xx/ocm.c:240:38: error: format '%llx' >> > expects argument of type 'long long unsigned int', but argument 3 >> > has type 'phys_addr_t {aka unsigned int}' >> > seq_printf(m, "PhysAddr : 0x%llx\n", ocm->phys); >> >~~~^ ~ >> > >> > Fix it by using the special %pa[p] format for printing phys_addr_t. >> > Note we need to pass the value by reference for the special specifier >> > to work. >> > >> > Signed-off-by: Michael Ellerman >> >> Series applied to powerpc fixes. >> >> https://git.kernel.org/powerpc/c/52b88fa1e8c7bae03bb691178a9f8b > > Well, I guess I'm a late. I had issues with the getting 4.20+ > crosscompiled on debian with make-kpkg. > > Nevertheless, I finally got a working kernel and > on the MyBook Live APM82181: > > --- > root@mbl:/sys/kernel/debug# cat ppc4xx_ocm/info > PPC4XX OCM : 1 > PhysAddr : 0x00040004[p] > MemTotal : 32768 Bytes > MemTotal(NC) : 32768 Bytes > MemTotal(C) : 0 Bytes > > NC.PhysAddr : 0x00040004[p] > NC.VirtAddr : 0x6bc84b36 > NC.MemTotal : 32768 Bytes > NC.MemFree : 32768 Bytes > > C.PhysAddr : 0x[p] > C.VirtAddr : 0x (null) > C.MemTotal : 0 Bytes > C.MemFree: 0 Bytes Oh right, I'm an idiot :) The docs say: Physical address types phys_addr_t -- %pa[p] 0x01234567 or 0x0123456789abcdef And if you grep for that there's eg: drivers/ntb/test/ntb_tool.c: "Window Size\t%pa[p]\n", So I just literally copied that. But it's trying to indicate that the p is optional. This should fix it, I won't merge it until you've tested it this time :) cheers diff --git a/arch/powerpc/platforms/4xx/ocm.c b/arch/powerpc/platforms/4xx/ocm.c index a1aaa1569d7c..f0e488d97567 100644 --- a/arch/powerpc/platforms/4xx/ocm.c +++ b/arch/powerpc/platforms/4xx/ocm.c @@ -237,12 +237,12 @@ static int ocm_debugfs_show(struct seq_file *m, void *v) continue; seq_printf(m, "PPC4XX OCM : %d\n", ocm->index); - seq_printf(m, "PhysAddr : %pa[p]\n", &(ocm->phys)); + seq_printf(m, "PhysAddr : %pa\n", &(ocm->phys)); seq_printf(m, "MemTotal : %d Bytes\n", ocm->memtotal); seq_printf(m, "MemTotal(NC) : %d Bytes\n", ocm->nc.memtotal); seq_printf(m, "MemTotal(C) : %d Bytes\n\n", ocm->c.memtotal); - seq_printf(m, "NC.PhysAddr : %pa[p]\n", &(ocm->nc.phys)); + seq_printf(m, "NC.PhysAddr : %pa\n", &(ocm->nc.phys)); seq_printf(m, "NC.VirtAddr : 0x%p\n", ocm->nc.virt); seq_printf(m, "NC.MemTotal : %d Bytes\n", ocm->nc.memtotal); seq_printf(m, "NC.MemFree : %d Bytes\n", ocm->nc.memfree); @@ -252,7 +252,7 @@ static int ocm_debugfs_show(struct seq_file *m, void *v) blk->size, blk->owner); } - seq_printf(m, "\nC.PhysAddr : %pa[p]\n", &(ocm->c.phys)); + seq_printf(m, "\nC.PhysAddr : %pa\n", &(ocm->c.phys)); seq_printf(m, "C.VirtAddr : 0x%p\n", ocm->c.virt); seq_printf(m, "C.MemTotal : %d Bytes\n", ocm->c.memtotal); seq_printf(m, "C.MemFree: %d Bytes\n", ocm->c.memfree);
Re: [PATCH v4 1/2] crypto: talitos - reorder code in talitos_edesc_alloc()
Le 08/01/2019 à 07:56, Christophe Leroy a écrit : This patch moves the mapping of IV after the kmalloc(). This avoids having to unmap in case kmalloc() fails. Signed-off-by: Christophe Leroy Cc: sta...@vger.kernel.org --- new in v4 drivers/crypto/talitos.c | 25 +++-- 1 file changed, 7 insertions(+), 18 deletions(-) diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c index 45e20707cef8..54d80e7edb86 100644 --- a/drivers/crypto/talitos.c +++ b/drivers/crypto/talitos.c @@ -1361,23 +1361,18 @@ static struct talitos_edesc *talitos_edesc_alloc(struct device *dev, struct talitos_private *priv = dev_get_drvdata(dev); bool is_sec1 = has_ftr_sec1(priv); int max_len = is_sec1 ? TALITOS1_MAX_DATA_LEN : TALITOS2_MAX_DATA_LEN; - void *err; if (cryptlen + authsize > max_len) { dev_err(dev, "length exceeds h/w max limit\n"); return ERR_PTR(-EINVAL); } - if (ivsize) - iv_dma = dma_map_single(dev, iv, ivsize, DMA_TO_DEVICE); - if (!dst || dst == src) { src_len = assoclen + cryptlen + authsize; src_nents = sg_nents_for_len(src, src_len); if (src_nents < 0) { dev_err(dev, "Invalid number of src SG.\n"); - err = ERR_PTR(-EINVAL); - goto error_sg; + return ERR_PTR(-EINVAL); } src_nents = (src_nents == 1) ? 0 : src_nents; dst_nents = dst ? src_nents : 0; @@ -1387,16 +1382,14 @@ static struct talitos_edesc *talitos_edesc_alloc(struct device *dev, src_nents = sg_nents_for_len(src, src_len); if (src_nents < 0) { dev_err(dev, "Invalid number of src SG.\n"); - err = ERR_PTR(-EINVAL); - goto error_sg; + return ERR_PTR(-EINVAL); } src_nents = (src_nents == 1) ? 0 : src_nents; dst_len = assoclen + cryptlen + (encrypt ? authsize : 0); dst_nents = sg_nents_for_len(dst, dst_len); if (dst_nents < 0) { dev_err(dev, "Invalid number of dst SG.\n"); - err = ERR_PTR(-EINVAL); - goto error_sg; + return ERR_PTR(-EINVAL); } dst_nents = (dst_nents == 1) ? 0 : dst_nents; } @@ -1425,10 +1418,10 @@ static struct talitos_edesc *talitos_edesc_alloc(struct device *dev, alloc_len += sizeof(struct talitos_desc); edesc = kmalloc(alloc_len, GFP_DMA | flags); - if (!edesc) { - err = ERR_PTR(-ENOMEM); - goto error_sg; - } + if (!edesc) + return ERR_PTR(-ENOMEM); + if (ivsize) + iv_dma = dma_map_single(dev, iv, ivsize, DMA_TO_DEVICE); memset(>desc, 0, sizeof(edesc->desc)); edesc->src_nents = src_nents; @@ -1445,10 +1438,6 @@ static struct talitos_edesc *talitos_edesc_alloc(struct device *dev, DMA_BIDIRECTIONAL); } return edesc; -error_sg: - if (iv_dma) - dma_unmap_single(dev, iv_dma, ivsize, DMA_TO_DEVICE); - return err; } static struct talitos_edesc *aead_edesc_alloc(struct aead_request *areq, u8 *iv,
Re: [PATCH v4 2/2] crypto: talitos - fix ablkcipher for CONFIG_VMAP_STACK
On 1/8/2019 8:56 AM, Christophe Leroy wrote: > [2.364486] WARNING: CPU: 0 PID: 60 at ./arch/powerpc/include/asm/io.h:837 > dma_nommu_map_page+0x44/0xd4 > [2.373579] CPU: 0 PID: 60 Comm: cryptomgr_test Tainted: GW > 4.20.0-rc5-00560-g6bfb52e23a00-dirty #531 > [2.384740] NIP: c000c540 LR: c000c584 CTR: > [2.389743] REGS: c95abab0 TRAP: 0700 Tainted: GW > (4.20.0-rc5-00560-g6bfb52e23a00-dirty) > [2.400042] MSR: 00029032 CR: 24042204 XER: > [2.406669] > [2.406669] GPR00: c02f2244 c95abb60 c6262990 c95abd80 256a 0001 > 0001 0001 > [2.406669] GPR08: 2000 0010 0010 24042202 > 0100 c95abd88 > [2.406669] GPR16: c05569d4 0001 0010 c95abc88 c0615664 > 0004 > [2.406669] GPR24: 0010 c95abc88 c95abc88 c61ae210 c7ff6d40 > c61ae210 3d68 > [2.441559] NIP [c000c540] dma_nommu_map_page+0x44/0xd4 > [2.446720] LR [c000c584] dma_nommu_map_page+0x88/0xd4 > [2.451762] Call Trace: > [2.454195] [c95abb60] [82000808] 0x82000808 (unreliable) > [2.459572] [c95abb80] [c02f2244] talitos_edesc_alloc+0xbc/0x3c8 > [2.465493] [c95abbb0] [c02f2600] ablkcipher_edesc_alloc+0x4c/0x5c > [2.471606] [c95abbd0] [c02f4ed0] ablkcipher_encrypt+0x20/0x64 > [2.477389] [c95abbe0] [c02023b0] __test_skcipher+0x4bc/0xa08 > [2.483049] [c95abe00] [c0204b60] test_skcipher+0x2c/0xcc > [2.488385] [c95abe20] [c0204c48] alg_test_skcipher+0x48/0xbc > [2.494064] [c95abe40] [c0205cec] alg_test+0x164/0x2e8 > [2.499142] [c95abf00] [c0200dec] cryptomgr_test+0x48/0x50 > [2.504558] [c95abf10] [c0039ff4] kthread+0xe4/0x110 > [2.509471] [c95abf40] [c000e1d0] ret_from_kernel_thread+0x14/0x1c > [2.515532] Instruction dump: > [2.518468] 7c7e1b78 7c9d2378 7cbf2b78 41820054 3d20c076 8089c200 3d20c076 > 7c84e850 > [2.526127] 8129c204 7c842e70 7f844840 419c0008 <0fe0> 2f9e > 54847022 7c84fa14 > [2.533960] ---[ end trace bf78d94af73fe3b8 ]--- > [2.539123] talitos ff02.crypto: master data transfer error > [2.544775] talitos ff02.crypto: TEA error: ISR 0x2000_0040 > [2.551625] alg: skcipher: encryption failed on test 1 for > ecb-aes-talitos: ret=22 > > IV cannot be on stack when CONFIG_VMAP_STACK is selected because the stack > cannot be DMA mapped anymore. > > This patch copies the IV into the extended descriptor. > > Fixes: 4de9d0b547b9 ("crypto: talitos - Add ablkcipher algorithms") > Cc: sta...@vger.kernel.org > Signed-off-by: Christophe Leroy Reviewed-by: Horia Geantă Thanks, Horia
Re: [PATCH v4 1/2] crypto: talitos - reorder code in talitos_edesc_alloc()
On 1/8/2019 8:56 AM, Christophe Leroy wrote: > This patch moves the mapping of IV after the kmalloc(). This > avoids having to unmap in case kmalloc() fails. > > Signed-off-by: Christophe Leroy Reviewed-by: Horia Geantă Since patch 2/2 is Cc-ing stable, this one should do the same. Herbert, could you append the commit message before applying it? Thanks, Horia
Re: [PATCH v2 2/2] powerpc: use probe_user_read()
Hi Michael and Russell, Any idea why: - checkpatch reports missing Signed-off-by: - Snowpatch build fails on PPC64 (it seems unrelated to the patch, something wrong in lib/genalloc.c) Thanks Christophe Le 08/01/2019 à 08:37, Christophe Leroy a écrit : Instead of opencoding, use probe_user_read() to failessly read a user location. Signed-off-by: Christophe Leroy --- v2: Using probe_user_read() instead of probe_user_address() arch/powerpc/kernel/process.c | 12 +--- arch/powerpc/mm/fault.c | 6 +- arch/powerpc/perf/callchain.c | 20 +++- arch/powerpc/perf/core-book3s.c | 8 +--- arch/powerpc/sysdev/fsl_pci.c | 10 -- 5 files changed, 10 insertions(+), 46 deletions(-) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index ce393df243aa..6a4b59d574c2 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -1298,16 +1298,6 @@ void show_user_instructions(struct pt_regs *regs) pc = regs->nip - (NR_INSN_TO_PRINT * 3 / 4 * sizeof(int)); - /* -* Make sure the NIP points at userspace, not kernel text/data or -* elsewhere. -*/ - if (!__access_ok(pc, NR_INSN_TO_PRINT * sizeof(int), USER_DS)) { - pr_info("%s[%d]: Bad NIP, not dumping instructions.\n", - current->comm, current->pid); - return; - } - seq_buf_init(, buf, sizeof(buf)); while (n) { @@ -1318,7 +1308,7 @@ void show_user_instructions(struct pt_regs *regs) for (i = 0; i < 8 && n; i++, n--, pc += sizeof(int)) { int instr; - if (probe_kernel_address((const void *)pc, instr)) { + if (probe_user_read(, (void __user *)pc, sizeof(instr))) { seq_buf_printf(, " "); continue; } diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c index 887f11bcf330..ec74305fa330 100644 --- a/arch/powerpc/mm/fault.c +++ b/arch/powerpc/mm/fault.c @@ -276,12 +276,8 @@ static bool bad_stack_expansion(struct pt_regs *regs, unsigned long address, if ((flags & FAULT_FLAG_WRITE) && (flags & FAULT_FLAG_USER) && access_ok(nip, sizeof(*nip))) { unsigned int inst; - int res; - pagefault_disable(); - res = __get_user_inatomic(inst, nip); - pagefault_enable(); - if (!res) + if (!probe_user_read(, nip, sizeof(inst))) return !store_updates_sp(inst); *must_retry = true; } diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c index 0af051a1974e..0680efb2237b 100644 --- a/arch/powerpc/perf/callchain.c +++ b/arch/powerpc/perf/callchain.c @@ -159,12 +159,8 @@ static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret) ((unsigned long)ptr & 7)) return -EFAULT; - pagefault_disable(); - if (!__get_user_inatomic(*ret, ptr)) { - pagefault_enable(); + if (!probe_user_read(ret, ptr, sizeof(*ret))) return 0; - } - pagefault_enable(); return read_user_stack_slow(ptr, ret, 8); } @@ -175,12 +171,8 @@ static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret) ((unsigned long)ptr & 3)) return -EFAULT; - pagefault_disable(); - if (!__get_user_inatomic(*ret, ptr)) { - pagefault_enable(); + if (!probe_user_read(ret, ptr, sizeof(*ret))) return 0; - } - pagefault_enable(); return read_user_stack_slow(ptr, ret, 4); } @@ -307,17 +299,11 @@ static inline int current_is_64bit(void) */ static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret) { - int rc; - if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) || ((unsigned long)ptr & 3)) return -EFAULT; - pagefault_disable(); - rc = __get_user_inatomic(*ret, ptr); - pagefault_enable(); - - return rc; + return probe_user_read(ret, ptr, sizeof(*ret)); } static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry, diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c index b0723002a396..4b64ddf0db68 100644 --- a/arch/powerpc/perf/core-book3s.c +++ b/arch/powerpc/perf/core-book3s.c @@ -416,7 +416,6 @@ static void power_pmu_sched_task(struct perf_event_context *ctx, bool sched_in) static __u64 power_pmu_bhrb_to(u64 addr) { unsigned int instr; - int ret; __u64 target; if (is_kernel_addr(addr)) { @@ -427,13 +426,8 @@ static __u64 power_pmu_bhrb_to(u64 addr) } /* Userspace: need copy instruction here
Re: [PATCH v8 24/25] powerpc: Adopt nvram module for PPC64
Finn Thain writes: > On Mon, 7 Jan 2019, Michael Ellerman wrote: > >> Arnd Bergmann writes: >> > On Wed, Dec 26, 2018 at 1:43 AM Finn Thain >> > wrote: >> > >> >> +static ssize_t ppc_nvram_get_size(void) >> >> +{ >> >> + if (ppc_md.nvram_size) >> >> + return ppc_md.nvram_size(); >> >> + return -ENODEV; >> >> +} >> > >> >> +const struct nvram_ops arch_nvram_ops = { >> >> + .read = ppc_nvram_read, >> >> + .write = ppc_nvram_write, >> >> + .get_size = ppc_nvram_get_size, >> >> + .sync = ppc_nvram_sync, >> >> +}; >> > >> > Coming back to this after my comment on the m68k side, I notice that >> > there is now a double indirection through function pointers. Have you >> > considered completely removing the operations from ppc_md instead >> > by having multiple copies of nvram_ops? >> > >> > With the current method, it does seem odd to have a single >> > per-architecture instance of the exported structure containing >> > function pointers. This doesn't give us the flexibility of having >> > multiple copies in the kernel the way that ppc_md does, but it adds >> > overhead compared to simply exporting the functions directly. >> >> Yeah TBH I'm not convinced the arch ops is the best solution. >> >> Why can't each arch just implement the required ops functions? On ppc >> we'd still use ppc_md but that would be a ppc detail. >> >> Optional ops are fairly easy to support by providing a default >> implementation, eg. instead of: >> >> +if (arch_nvram_ops.get_size == NULL) >> +return -ENODEV; >> + >> +nvram_size = arch_nvram_ops.get_size(); >> +if (nvram_size < 0) >> +return nvram_size; >> >> >> We do in some header: >> >> #ifndef arch_nvram_get_size >> static inline int arch_nvram_get_size(void) >> { >> return -ENODEV; >> } >> #endif >> >> And then: >> >> nvram_size = arch_nvram_get_size(); >> if (nvram_size < 0) >> return nvram_size; >> >> >> But I haven't digested the whole series so maybe I'm missing something? >> > > The reason why that doesn't work boils down to introspection. (This was > mentioned elsewhere in this email thread.) For example, we presently have > code like this, > > ssize_t nvram_get_size(void) > { >if (ppc_md.nvram_size) >return ppc_md.nvram_size(); >return -1; > } > EXPORT_SYMBOL(nvram_get_size); > > This construction means we get to decide at run-time which of the NVRAM > functions should be used. (Whereas your example makes a build-time decision.) Right, but we only need to make a runtime decision on powerpc (right?). So it seems to me we should isolate that in the powerpc code. > The purpose of arch_nvram_ops is much the same. That is, it does for m68k > and x86 what ppc_md already does for ppc32 and ppc64. (And once these > platforms share an API like this, they can share more driver code, and > reduce duplicated code.) > The approach taken in this series was to push the arch_nvram_ops approach > as far as possible, because by making everything fit into that regime it > immediately became apparent where architectures and platforms have > diverged, creating weird inconsistencies like the differences in sync > ioctl behaviour between ppc32 and ppc64 for core99. (Early revisions of > this series exposed more issues like bugs and dead code that got addressed > elsewhere.) I just don't see the advantage of having arch_nvram_ops which is a structure of function pointers that are always the same on a given arch, vs some static inlines that implement the same ops for that arch. > Problem is, as Arnd pointed out, powerpc doesn't need both kinds of ops > struct. So I'm rewriting this series in such a way that powerpc doesn't > have to implement both. This rewrite is going to look totally different > for powerpc (though not for x86 or m68k) so you might want to wait for me > to post v9 before spending more time on code review. OK. I know you've been working on this series for a long time and I don't want to roadblock it, so at the end of the day I don't feel that strongly about it as long as the code works. I'll wait for v9 and have another look then. cheers
Re: [PATCH v3 1/2] selftests/powerpc: Add MSR bits
Breno Leitao writes: > hi Christophe, > > On 1/3/19 3:19 PM, LEROY Christophe wrote: >> Breno Leitao a écrit : >> >>> This patch simply adds definitions for the MSR bits and some macros to >>> test for MSR TM bits. >>> >>> This was copied from arch/powerpc/include/asm/reg.h generic MSR part. >> >> Can't we find a way to avoid duplicating such defines ? > > I think there are three possible ways, but none of them respect the premises > we are used too. These are the possible ways I can think of: > > 1) Including arch/powerpc/include/asm as part of the selftest compilation > process. >Problem: This might break the selftest independence of the kbuild system. > > 2) Generate a temporary header file inside selftests/include which contains > these macros at compilation time. >Problem: The problem as above. > > 3) Define MSR fields at userspace headers (/usr/include). >Problem: I am not sure userspace should have MSR bits information. > > Do you suggest me to investigate any other way? In this case I think we can probably just copy the few #defines we need. If we decide in future we need all or most of the MSR defines then I would suggest we: - split *only* the MSR defines into arch/powerpc/include/asm/msr.h - include that file from reg.h - symlink that into the selftest code cheers
Re: [PATCH v2 2/2] powerpc: use probe_user_read()
Le 08/01/2019 à 10:04, David Hildenbrand a écrit : On 08.01.19 08:37, Christophe Leroy wrote: Instead of opencoding, use probe_user_read() to failessly read a user location. Signed-off-by: Christophe Leroy --- v2: Using probe_user_read() instead of probe_user_address() arch/powerpc/kernel/process.c | 12 +--- arch/powerpc/mm/fault.c | 6 +- arch/powerpc/perf/callchain.c | 20 +++- arch/powerpc/perf/core-book3s.c | 8 +--- arch/powerpc/sysdev/fsl_pci.c | 10 -- 5 files changed, 10 insertions(+), 46 deletions(-) [snip] diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c index 918be816b097..c8a1b26489f5 100644 --- a/arch/powerpc/sysdev/fsl_pci.c +++ b/arch/powerpc/sysdev/fsl_pci.c @@ -1068,13 +1068,11 @@ int fsl_pci_mcheck_exception(struct pt_regs *regs) addr += mfspr(SPRN_MCAR); if (is_in_pci_mem_space(addr)) { - if (user_mode(regs)) { - pagefault_disable(); - ret = get_user(inst, (__u32 __user *)regs->nip); - pagefault_enable(); - } else { + if (user_mode(regs)) + ret = probe_user_read(, (void __user *)regs->nip, + sizeof(inst)); What about also adding probe_user_address ? Michael doesn't like it, see https://patchwork.ozlabs.org/patch/1007117/ Christophe + else ret = probe_kernel_address((void *)regs->nip, inst); - } if (!ret && mcheck_handle_load(regs, inst)) { regs->nip += 4;
Re: [PATCH v2 2/2] powerpc: use probe_user_read()
On 08.01.19 08:37, Christophe Leroy wrote: > Instead of opencoding, use probe_user_read() to failessly > read a user location. > > Signed-off-by: Christophe Leroy > --- > v2: Using probe_user_read() instead of probe_user_address() > > arch/powerpc/kernel/process.c | 12 +--- > arch/powerpc/mm/fault.c | 6 +- > arch/powerpc/perf/callchain.c | 20 +++- > arch/powerpc/perf/core-book3s.c | 8 +--- > arch/powerpc/sysdev/fsl_pci.c | 10 -- > 5 files changed, 10 insertions(+), 46 deletions(-) > > diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c > index ce393df243aa..6a4b59d574c2 100644 > --- a/arch/powerpc/kernel/process.c > +++ b/arch/powerpc/kernel/process.c > @@ -1298,16 +1298,6 @@ void show_user_instructions(struct pt_regs *regs) > > pc = regs->nip - (NR_INSN_TO_PRINT * 3 / 4 * sizeof(int)); > > - /* > - * Make sure the NIP points at userspace, not kernel text/data or > - * elsewhere. > - */ > - if (!__access_ok(pc, NR_INSN_TO_PRINT * sizeof(int), USER_DS)) { > - pr_info("%s[%d]: Bad NIP, not dumping instructions.\n", > - current->comm, current->pid); > - return; > - } > - > seq_buf_init(, buf, sizeof(buf)); > > while (n) { > @@ -1318,7 +1308,7 @@ void show_user_instructions(struct pt_regs *regs) > for (i = 0; i < 8 && n; i++, n--, pc += sizeof(int)) { > int instr; > > - if (probe_kernel_address((const void *)pc, instr)) { > + if (probe_user_read(, (void __user *)pc, > sizeof(instr))) { > seq_buf_printf(, " "); > continue; > } > diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c > index 887f11bcf330..ec74305fa330 100644 > --- a/arch/powerpc/mm/fault.c > +++ b/arch/powerpc/mm/fault.c > @@ -276,12 +276,8 @@ static bool bad_stack_expansion(struct pt_regs *regs, > unsigned long address, > if ((flags & FAULT_FLAG_WRITE) && (flags & FAULT_FLAG_USER) && > access_ok(nip, sizeof(*nip))) { > unsigned int inst; > - int res; > > - pagefault_disable(); > - res = __get_user_inatomic(inst, nip); > - pagefault_enable(); > - if (!res) > + if (!probe_user_read(, nip, sizeof(inst))) > return !store_updates_sp(inst); > *must_retry = true; > } > diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c > index 0af051a1974e..0680efb2237b 100644 > --- a/arch/powerpc/perf/callchain.c > +++ b/arch/powerpc/perf/callchain.c > @@ -159,12 +159,8 @@ static int read_user_stack_64(unsigned long __user *ptr, > unsigned long *ret) > ((unsigned long)ptr & 7)) > return -EFAULT; > > - pagefault_disable(); > - if (!__get_user_inatomic(*ret, ptr)) { > - pagefault_enable(); > + if (!probe_user_read(ret, ptr, sizeof(*ret))) > return 0; > - } > - pagefault_enable(); > > return read_user_stack_slow(ptr, ret, 8); > } > @@ -175,12 +171,8 @@ static int read_user_stack_32(unsigned int __user *ptr, > unsigned int *ret) > ((unsigned long)ptr & 3)) > return -EFAULT; > > - pagefault_disable(); > - if (!__get_user_inatomic(*ret, ptr)) { > - pagefault_enable(); > + if (!probe_user_read(ret, ptr, sizeof(*ret))) > return 0; > - } > - pagefault_enable(); > > return read_user_stack_slow(ptr, ret, 4); > } > @@ -307,17 +299,11 @@ static inline int current_is_64bit(void) > */ > static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret) > { > - int rc; > - > if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) || > ((unsigned long)ptr & 3)) > return -EFAULT; > > - pagefault_disable(); > - rc = __get_user_inatomic(*ret, ptr); > - pagefault_enable(); > - > - return rc; > + return probe_user_read(ret, ptr, sizeof(*ret)); > } > > static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx > *entry, > diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c > index b0723002a396..4b64ddf0db68 100644 > --- a/arch/powerpc/perf/core-book3s.c > +++ b/arch/powerpc/perf/core-book3s.c > @@ -416,7 +416,6 @@ static void power_pmu_sched_task(struct > perf_event_context *ctx, bool sched_in) > static __u64 power_pmu_bhrb_to(u64 addr) > { > unsigned int instr; > - int ret; > __u64 target; > > if (is_kernel_addr(addr)) { > @@ -427,13 +426,8 @@ static __u64 power_pmu_bhrb_to(u64 addr) > } > > /* Userspace: need copy instruction here then translate it */ >