[PATCH v2] powerpc/perf: Give generic PMU a nice name
When booting on a machine that uses the compat pmu driver we see this: [0.071192] GENERIC_COMPAT performance monitor hardware support registered Which is a bit shouty. Give it a nicer name. Signed-off-by: Joel Stanley --- v2: Go with ISAv3 arch/powerpc/perf/generic-compat-pmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/perf/generic-compat-pmu.c b/arch/powerpc/perf/generic-compat-pmu.c index f3db88aee4dd..16392962c511 100644 --- a/arch/powerpc/perf/generic-compat-pmu.c +++ b/arch/powerpc/perf/generic-compat-pmu.c @@ -292,7 +292,7 @@ static int generic_compute_mmcr(u64 event[], int n_ev, } static struct power_pmu generic_compat_pmu = { - .name = "GENERIC_COMPAT", + .name = "ISAv3", .n_counter = MAX_PMU_COUNTERS, .add_fields = ISA207_ADD_FIELDS, .test_adder = ISA207_TEST_ADDER, -- 2.35.1
Re: [PATCH] powerpc/perf: Give generic PMU a nice name
On Tue, 31 May 2022 at 08:53, Madhavan Srinivasan wrote: > > > On 5/26/22 12:07 PM, Joel Stanley wrote: > > When booting on a machine that uses the compat pmu driver we see this: > > > > [0.071192] GENERIC_COMPAT performance monitor hardware support > > registered > Sorry that was my mistake. > I agree having it as ISAv3 is better. Okay. The downside of this is it's not as clear that you're using a fallback driver. I'll send a v2 with ISAv3 > > Maddy > > > > > Which is a bit shouty. Give it a nicer name. > > > > Signed-off-by: Joel Stanley > > --- > > > > Other options: > > > > - ISAv3 (because it is relevant for PowerISA 3.0B and beyond, see the > > comment in init_generic_compat_pmu) > > > > - Generic Compat (same, but less shouty) > > > > arch/powerpc/perf/generic-compat-pmu.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/arch/powerpc/perf/generic-compat-pmu.c > > b/arch/powerpc/perf/generic-compat-pmu.c > > index f3db88aee4dd..5be5a5ebaf42 100644 > > --- a/arch/powerpc/perf/generic-compat-pmu.c > > +++ b/arch/powerpc/perf/generic-compat-pmu.c > > @@ -292,7 +292,7 @@ static int generic_compute_mmcr(u64 event[], int n_ev, > > } > > > > static struct power_pmu generic_compat_pmu = { > > - .name = "GENERIC_COMPAT", > > + .name = "Architected", > > .n_counter = MAX_PMU_COUNTERS, > > .add_fields = ISA207_ADD_FIELDS, > > .test_adder = ISA207_TEST_ADDER,
Re: [PATCH 1/6] powerpc: Add ZERO_GPRS macros for register clears
> On 2 Jun 2022, at 2:00 am, Segher Boessenkool > wrote: > > Hi! > > On Wed, Jun 01, 2022 at 03:48:45PM +1000, Rohan McLure wrote: >> +.macro BINOP_REGS op, rhs, start, end >> +.Lreg=\start >> +.rept (\end - \start + 1) >> +\op .Lreg, \rhs >> +.Lreg=.Lreg+1 >> +.endr >> +.endm > > This is for unary operations, not binary operations (there is only one > item on the RHS). You can in principle put a string "a,b" in the rhs > parameter, but in practice you need a or b to depend on the loop counter > as well, so even such trickiness won't do. Make the naming less > confusing, maybe? Or don't have an unused extra level of abstraction in > the first place :-) > > > Segher Thanks Segher, Christophe for reviewing this. Yep I see how having a macro to perform rX = rX <> Y for arbitrary infix <> and operand is unlikely to find much use outside of ZERO_GPRS. As I resubmit this patch series I will rename it to ZERO_REGS or similar to be more explicitly coupled to ZERO_GPRS. Something like this I was thinking: .macro ZERO_REGS start, end .Lreg=\start .rept (\end - \start + 1) li .Lreg, 0 .Lreg=.Lreg+1 .endr .endm Thanks, Rohan
Re: [PATCH 11/12] powerpc: wiiu: don't enforce flat memory
On Friday 20 May 2022 14:30:02 Pali Rohár wrote: > + linux-mm > > Do you know what are requirements for kernel to support non-contiguous > memory support and what is needed to enable it for 32-bit powerpc? Any hints? > Currently powerpc arch code does not support "memblock.memory.cnt > 1" > except for WII which seems like a hack... See below. > > On Friday 20 May 2022 20:44:04 Ash Logan wrote: > > On 20/5/22 18:04, Pali Rohár wrote: > > > On Friday 20 May 2022 13:41:04 Ash Logan wrote: > > >> On 14/5/22 08:43, Pali Rohár wrote: > > >>> On Wednesday 02 March 2022 15:44:05 Ash Logan wrote: > > pgtable_32.c:mapin_ram loops over each valid memory range, which means > > non-contiguous memory just works. > > >>> > > >>> Hello! Does it mean that non-contiguous memory works for any 32-bit > > >>> powerpc platform, and not only for wiiu? If yes, should not be > > >>> non-contiguous memory support enabled for all 32-bit ppc boards then? > > >> > > >> Hi! Sorry for my delayed response. As best I can tell, it does indeed > > >> Just Work, but I have only been able to test on wiiu which is missing a > > >> lot of features other boards have (like PCI) - so it's possible there's > > >> still an assumption elsewhere in the kernel that I haven't hit. > > >> > > >> As best I can tell, the Wii and Wii U are the only 32-bit powerpc boards > > >> out there where it's even possible to have non-contiguous memory. > > > > > > What is the reason that those two boards are the **only**? Is there some > > > specific requirement from bootloader or hardware to "enable" > > > non-contiguous memory support? > > > > Not that I know of, I was just saying that I was only aware of those two > > boards where the memory map isn't contiguous, and that is the only place > > where it has been tested. Evidently you know of another board! > > > > > I'm interested in enabling non-contiguous memory support for P2020-based > > > board as it has gaps in its 32-bit memory layout and which could be used > > > for RAM mapping when 4GB DDR3 module is plugged in (default is 2GB). > > > > If it's like the Wii or Wii U (some memory at 0, a gap for MMIO or > > whatever, then more memory at a higher address) then you should try a > > patch along these lines, because barring the unknowns I mentioned before > > it should work. At least as far as I'm aware ;) > > > > Signed-off-by: Ash Logan > > --- > > arch/powerpc/mm/init_32.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/arch/powerpc/mm/init_32.c b/arch/powerpc/mm/init_32.c > > index 3d690be48e84..59a84629d9a0 100644 > > --- a/arch/powerpc/mm/init_32.c > > +++ b/arch/powerpc/mm/init_32.c > > @@ -125,10 +125,10 @@ void __init MMU_init(void) > > * lowmem_end_addr is initialized below. > > */ > > if (memblock.memory.cnt > 1) { > > -#ifndef CONFIG_WII > > +#if !defined(CONFIG_WII) && !defined(CONFIG_WIIU) > > > > memblock_enforce_memory_limit(memblock.memory.regions[0].size); > > pr_warn("Only using first contiguous memory region\n"); > > -#else > > +#elif defined(CONFIG_WII) > > wii_memory_fixups(); > > #endif > > } > > -- > > 2.35.1 > >
Re: [PATCH] powerpc/pci: Add config option for using OF 'reg' for PCI domain
On 6/9/22 13:21, Guilherme G. Piccoli wrote: > First of all, thanks for looping me Bjorn! Much appreciated. > I'm also CCing Ben and Gavin, that know a lot of PPC PCI stuff. > > > On 09/06/2022 16:34, Bjorn Helgaas wrote: >> [...] >>> Upgrading powerpc kernels from LTS 4.4 version (which does not >>> contain mentioned commit) to new LTS versions brings a >>> regression in domain assignment. >> >> Can you elaborate why it is a regression ? >> 63a72284b159 That commit says 'no functionnal changes', I'm >> having hard time understanding how a nochange can be a >> regression. > > It is not 'no functional change'. That commit completely changed > PCI domain assignment in a way that is incompatible with other > architectures and also incompatible with the way how it was done > prior that commit. I agree that the "no functional change" statement is incorrect. However, for most powerpc platforms it ended up being simply a cosmetic behavior change. As far as I can tell there is nothing requiring domain ids to increase montonically from zero or that each architecture is required to use the same domain numbering scheme. > > Strongly agree here in both points: first, this was not a "no functional > change" thing, and I apologize for adding this in the commit message. > What I meant is that: despite changing the numbering, (as Tyrel said) > nothing should require increasing monotonic mutable PCI domains. At > least, I'm not aware of such requirement in any spec or even in the > kernel and adjacent tooling. > > >>> [...] We could properly limit it to powernv and pseries by using ibm,fw-phb-id instead of reg property in the look up that follows a failed ibm,opal-phbid lookup. I think this is acceptable as long as no other powerpc platforms have started using this behavior for persistent naming. >>> >>> And what about setting that new config option to enabled by default >>> for those series? > > I don't remember all the details about PPC dt, but it should already be > restricted to pseries/powernv, right? At least, the commit has a comment: > > /* If not pseries nor powernv, or if fixed PHB numbering tried to add > * the same PHB number twice, then fallback to dynamic PHB numbering.*/ > > If this is *not* restricted to these 2 platforms, I agree with Pali's > approach, although I'd consider the correct is to keep the persistent > domain scheme for both pnv and pseries, as it's working like this for 5 > years and counting, and this *does* prevent a lot of issues with PCI > hotplugging in PPC servers. I mentioned this in a previous post, but it is clear the Author's intent was for this only to apply to powernv and pseries platforms. However, it only really checks for powernv, and if that fails it does a read the reg property for the domain which works for and PPC platform. If we really only want this on powernv and pseries and revert all other PPC platforms back we can fix this with a pseries check instead of a config property. Using ibm,fw-phb-id instead of reg property if ibm,opal-phbid lookup fails does the trick. -Tyrel > > >>> [...] I forgot to ask before about the actual regression here. The commit log says domain numbers are different, but I don't know the connection from there to something failing. I assume there's some script or config file that depends on specific domain numbers? And that dependency is (hopefully) powerpc-specific? >>> >>> You assume correct. For example this is the way how OpenWRT handles PCI >>> devices (but not only OpenWRT). This OpenWRT case is not >>> powerpc-specific but generic to all architectures. This is just one >>> example. >> >> So basically everybody uses D/b/d/f for persistent names. That's ... >> well, somewhat stable for things soldered down or in a motherboard >> slot, but a terrible idea for things that can be hot-plugged. >> >> Even for more core things, it's possible for firmware to change bus >> numbering between boots. For example, if a complicated hierarchy is >> cold-plugged into one slot, firmware is likely to assign different bus >> numbers on the next boot to make room for it. Obviously this can also >> happen as a hot-add, and Linux needs the flexibility to do similar >> renumbering then, although we don't support it yet. >> >> It looks like 63a72284b159 was intended to make domain numbers *more* >> consistent, so it's ironic that this actually broke something by >> changing domain numbers. Maybe there's a way to limit the scope of >> 63a72284b159 so it avoids the breakage. I don't know enough about the >> powerpc landscape to even guess at how. > > I don't considereit breaks the userspace since this is definitely no > stable ABI (or if it is, I'm not aware and my apologies). If scripts > rely on
Re: [PATCH] powerpc/pci: Add config option for using OF 'reg' for PCI domain
First of all, thanks for looping me Bjorn! Much appreciated. I'm also CCing Ben and Gavin, that know a lot of PPC PCI stuff. On 09/06/2022 16:34, Bjorn Helgaas wrote: > [...] >> Upgrading powerpc kernels from LTS 4.4 version (which does not >> contain mentioned commit) to new LTS versions brings a >> regression in domain assignment. > > Can you elaborate why it is a regression ? > 63a72284b159 That commit says 'no functionnal changes', I'm > having hard time understanding how a nochange can be a > regression. It is not 'no functional change'. That commit completely changed PCI domain assignment in a way that is incompatible with other architectures and also incompatible with the way how it was done prior that commit. >>> >>> I agree that the "no functional change" statement is incorrect. >>> However, for most powerpc platforms it ended up being simply a >>> cosmetic behavior change. As far as I can tell there is nothing >>> requiring domain ids to increase montonically from zero or that >>> each architecture is required to use the same domain numbering >>> scheme. Strongly agree here in both points: first, this was not a "no functional change" thing, and I apologize for adding this in the commit message. What I meant is that: despite changing the numbering, (as Tyrel said) nothing should require increasing monotonic mutable PCI domains. At least, I'm not aware of such requirement in any spec or even in the kernel and adjacent tooling. >> [...] >>> We could properly limit it to powernv and pseries by using >>> ibm,fw-phb-id instead of reg property in the look up that follows >>> a failed ibm,opal-phbid lookup. I think this is acceptable as long >>> as no other powerpc platforms have started using this behavior for >>> persistent naming. >> >> And what about setting that new config option to enabled by default >> for those series? I don't remember all the details about PPC dt, but it should already be restricted to pseries/powernv, right? At least, the commit has a comment: /* If not pseries nor powernv, or if fixed PHB numbering tried to add * the same PHB number twice, then fallback to dynamic PHB numbering.*/ If this is *not* restricted to these 2 platforms, I agree with Pali's approach, although I'd consider the correct is to keep the persistent domain scheme for both pnv and pseries, as it's working like this for 5 years and counting, and this *does* prevent a lot of issues with PCI hotplugging in PPC servers. >> [...] >>> I forgot to ask before about the actual regression here. The commit >>> log says domain numbers are different, but I don't know the connection >>> from there to something failing. I assume there's some script or >>> config file that depends on specific domain numbers? And that >>> dependency is (hopefully) powerpc-specific? >> >> You assume correct. For example this is the way how OpenWRT handles PCI >> devices (but not only OpenWRT). This OpenWRT case is not >> powerpc-specific but generic to all architectures. This is just one >> example. > > So basically everybody uses D/b/d/f for persistent names. That's ... > well, somewhat stable for things soldered down or in a motherboard > slot, but a terrible idea for things that can be hot-plugged. > > Even for more core things, it's possible for firmware to change bus > numbering between boots. For example, if a complicated hierarchy is > cold-plugged into one slot, firmware is likely to assign different bus > numbers on the next boot to make room for it. Obviously this can also > happen as a hot-add, and Linux needs the flexibility to do similar > renumbering then, although we don't support it yet. > > It looks like 63a72284b159 was intended to make domain numbers *more* > consistent, so it's ironic that this actually broke something by > changing domain numbers. Maybe there's a way to limit the scope of > 63a72284b159 so it avoids the breakage. I don't know enough about the > powerpc landscape to even guess at how. I don't considereit breaks the userspace since this is definitely no stable ABI (or if it is, I'm not aware and my apologies). If scripts rely on that, they are doing the wrong thing it seems. With that said, I'm definitely not against improving the situation with Pali's KConfig - just think that we somehow should keep the persistent behavior for powernv and pseries. Hopefully PPC folks has more to say on that! Cheers, Guilherme
Re: [PATCH] powerpc/pci: Add config option for using OF 'reg' for PCI domain
On Thursday 09 June 2022 14:34:51 Bjorn Helgaas wrote: > On Thu, Jun 09, 2022 at 08:05:26PM +0200, Pali Rohár wrote: > > On Thursday 09 June 2022 12:10:22 Bjorn Helgaas wrote: > > > On Thu, Jun 09, 2022 at 06:27:25PM +0200, Pali Rohár wrote: > > > > On Thursday 09 June 2022 11:22:55 Bjorn Helgaas wrote: > > > > > [+cc Guilherme, Michael, Ben (author of 63a72284b159 and PPC folks), > > > > > thread: > > > > > https://lore.kernel.org/r/20220504175718.29011-1-p...@kernel.org] > > > > > > > > > > On Fri, May 06, 2022 at 12:33:02AM +0200, Pali Rohár wrote: > > > > > > On Thursday 05 May 2022 15:10:01 Tyrel Datwyler wrote: > > > > > > > On 5/5/22 02:31, Pali Rohár wrote: > > > > > > > > On Thursday 05 May 2022 07:16:40 Christophe Leroy wrote: > > > > > > > >> Le 04/05/2022 à 19:57, Pali Rohár a écrit : > > > > > > > >>> Since commit 63a72284b159 ("powerpc/pci: Assign fixed PHB > > > > > > > >>> number based on device-tree properties"), powerpc kernel > > > > > > > >>> always fallback to PCI domain assignment from OF / Device Tree > > > > > > > >>> 'reg' property of the PCI controller. > > > > > > > >>> > > > > > > > >>> PCI code for other Linux architectures use increasing > > > > > > > >>> assignment of the PCI domain for individual controllers > > > > > > > >>> (assign the first free number), like it was also for powerpc > > > > > > > >>> prior mentioned commit. > > > > > > > >>> > > > > > > > >>> Upgrading powerpc kernels from LTS 4.4 version (which does not > > > > > > > >>> contain mentioned commit) to new LTS versions brings a > > > > > > > >>> regression in domain assignment. > > > > > > > >> > > > > > > > >> Can you elaborate why it is a regression ? > > > > > > > >> 63a72284b159 That commit says 'no functionnal changes', I'm > > > > > > > >> having hard time understanding how a nochange can be a > > > > > > > >> regression. > > > > > > > > > > > > > > > > It is not 'no functional change'. That commit completely changed > > > > > > > > PCI domain assignment in a way that is incompatible with other > > > > > > > > architectures and also incompatible with the way how it was done > > > > > > > > prior that commit. > > > > > > > > > > > > > > I agree that the "no functional change" statement is incorrect. > > > > > > > However, for most powerpc platforms it ended up being simply a > > > > > > > cosmetic behavior change. As far as I can tell there is nothing > > > > > > > requiring domain ids to increase montonically from zero or that > > > > > > > each architecture is required to use the same domain numbering > > > > > > > scheme. > > > > > > > > > > > > That is truth. But it looks really suspicious why domains are not > > > > > > assigned monotonically. Some scripts / applications are using PCI > > > > > > location (domain:bus:dev:func) for remembering PCI device and domain > > > > > > change can cause issue for config files. And some (older) > > > > > > applications > > > > > > expects existence of domain zero. In systems without hot plug > > > > > > support > > > > > > with small number of domains (e.g. 3) it means that there are always > > > > > > domains 0, 1 and 2. > > > > > > > > > > > > > Its hard to call this a true regression unless it actually broke > > > > > > > something. The commit in question has been in the kernel since 4.8 > > > > > > > which was released over 5 1/2 years ago. > > > > > > > > > > > > I agree, it really depends on how you look at it. > > > > > > > > > > > > The important is that lot of people are using LTS versions and are > > > > > > doing upgrades when LTS support is dropped. Which for 4.4 now > > > > > > happened. So not all smaller or "cosmetic" changes could be detected > > > > > > until longer LTS period pass. > > > > > > > > > > > > > With all that said looking closer at the code in question I think > > > > > > > it is fair to assume that the author only intended this change for > > > > > > > powernv and pseries platforms and not every powerpc platform. That > > > > > > > change was done to make persistent naming easier to manage in > > > > > > > userspace. > > > > > > > > > > > > I agree that this behavior change may be useful in some situations > > > > > > and I do not object this need. > > > > > > > > > > > > > Your change defaults back to the old behavior which will now break > > > > > > > both powernv and pseries platforms with regard to hotplugging and > > > > > > > persistent naming. > > > > > > > > > > > > I was aware of it, that change could cause issues. And that is why I > > > > > > added config option for choosing behavior. So users would be able to > > > > > > choose what they need. > > > > > > > > > > > > > We could properly limit it to powernv and pseries by using > > > > > > > ibm,fw-phb-id instead of reg property in the look up that follows > > > > > > > a failed ibm,opal-phbid lookup. I think this is acceptable as long > > > > > > > as no other powerpc platforms have started using this behavior for > > > > > > > persistent naming. > >
Re: [PATCH] powerpc/pci: Add config option for using OF 'reg' for PCI domain
On Thu, Jun 09, 2022 at 08:05:26PM +0200, Pali Rohár wrote: > On Thursday 09 June 2022 12:10:22 Bjorn Helgaas wrote: > > On Thu, Jun 09, 2022 at 06:27:25PM +0200, Pali Rohár wrote: > > > On Thursday 09 June 2022 11:22:55 Bjorn Helgaas wrote: > > > > [+cc Guilherme, Michael, Ben (author of 63a72284b159 and PPC folks), > > > > thread: > > > > https://lore.kernel.org/r/20220504175718.29011-1-p...@kernel.org] > > > > > > > > On Fri, May 06, 2022 at 12:33:02AM +0200, Pali Rohár wrote: > > > > > On Thursday 05 May 2022 15:10:01 Tyrel Datwyler wrote: > > > > > > On 5/5/22 02:31, Pali Rohár wrote: > > > > > > > On Thursday 05 May 2022 07:16:40 Christophe Leroy wrote: > > > > > > >> Le 04/05/2022 à 19:57, Pali Rohár a écrit : > > > > > > >>> Since commit 63a72284b159 ("powerpc/pci: Assign fixed PHB > > > > > > >>> number based on device-tree properties"), powerpc kernel > > > > > > >>> always fallback to PCI domain assignment from OF / Device Tree > > > > > > >>> 'reg' property of the PCI controller. > > > > > > >>> > > > > > > >>> PCI code for other Linux architectures use increasing > > > > > > >>> assignment of the PCI domain for individual controllers > > > > > > >>> (assign the first free number), like it was also for powerpc > > > > > > >>> prior mentioned commit. > > > > > > >>> > > > > > > >>> Upgrading powerpc kernels from LTS 4.4 version (which does not > > > > > > >>> contain mentioned commit) to new LTS versions brings a > > > > > > >>> regression in domain assignment. > > > > > > >> > > > > > > >> Can you elaborate why it is a regression ? > > > > > > >> 63a72284b159 That commit says 'no functionnal changes', I'm > > > > > > >> having hard time understanding how a nochange can be a > > > > > > >> regression. > > > > > > > > > > > > > > It is not 'no functional change'. That commit completely changed > > > > > > > PCI domain assignment in a way that is incompatible with other > > > > > > > architectures and also incompatible with the way how it was done > > > > > > > prior that commit. > > > > > > > > > > > > I agree that the "no functional change" statement is incorrect. > > > > > > However, for most powerpc platforms it ended up being simply a > > > > > > cosmetic behavior change. As far as I can tell there is nothing > > > > > > requiring domain ids to increase montonically from zero or that > > > > > > each architecture is required to use the same domain numbering > > > > > > scheme. > > > > > > > > > > That is truth. But it looks really suspicious why domains are not > > > > > assigned monotonically. Some scripts / applications are using PCI > > > > > location (domain:bus:dev:func) for remembering PCI device and domain > > > > > change can cause issue for config files. And some (older) applications > > > > > expects existence of domain zero. In systems without hot plug support > > > > > with small number of domains (e.g. 3) it means that there are always > > > > > domains 0, 1 and 2. > > > > > > > > > > > Its hard to call this a true regression unless it actually broke > > > > > > something. The commit in question has been in the kernel since 4.8 > > > > > > which was released over 5 1/2 years ago. > > > > > > > > > > I agree, it really depends on how you look at it. > > > > > > > > > > The important is that lot of people are using LTS versions and are > > > > > doing upgrades when LTS support is dropped. Which for 4.4 now > > > > > happened. So not all smaller or "cosmetic" changes could be detected > > > > > until longer LTS period pass. > > > > > > > > > > > With all that said looking closer at the code in question I think > > > > > > it is fair to assume that the author only intended this change for > > > > > > powernv and pseries platforms and not every powerpc platform. That > > > > > > change was done to make persistent naming easier to manage in > > > > > > userspace. > > > > > > > > > > I agree that this behavior change may be useful in some situations > > > > > and I do not object this need. > > > > > > > > > > > Your change defaults back to the old behavior which will now break > > > > > > both powernv and pseries platforms with regard to hotplugging and > > > > > > persistent naming. > > > > > > > > > > I was aware of it, that change could cause issues. And that is why I > > > > > added config option for choosing behavior. So users would be able to > > > > > choose what they need. > > > > > > > > > > > We could properly limit it to powernv and pseries by using > > > > > > ibm,fw-phb-id instead of reg property in the look up that follows > > > > > > a failed ibm,opal-phbid lookup. I think this is acceptable as long > > > > > > as no other powerpc platforms have started using this behavior for > > > > > > persistent naming. > > > > > > > > > > And what about setting that new config option to enabled by default > > > > > for those series? > > > > > > > > > > Or is there issue with introduction of the new config option? > > > > > > > > > > One of the point is that it is
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.19-2 tag
The pull request you sent on Fri, 10 Jun 2022 00:59:45 +1000: > https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git > tags/powerpc-5.19-2 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/95fc76c81b9270a9ab38f4947fe5cb786c8c79cc Thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/prtracker.html
Re: [PATCH] powerpc/pci: Add config option for using OF 'reg' for PCI domain
On Thursday 09 June 2022 12:10:22 Bjorn Helgaas wrote: > On Thu, Jun 09, 2022 at 06:27:25PM +0200, Pali Rohár wrote: > > On Thursday 09 June 2022 11:22:55 Bjorn Helgaas wrote: > > > [+cc Guilherme, Michael, Ben (author of 63a72284b159 and PPC folks), > > > thread: > > > https://lore.kernel.org/r/20220504175718.29011-1-p...@kernel.org] > > > > > > On Fri, May 06, 2022 at 12:33:02AM +0200, Pali Rohár wrote: > > > > On Thursday 05 May 2022 15:10:01 Tyrel Datwyler wrote: > > > > > On 5/5/22 02:31, Pali Rohár wrote: > > > > > > On Thursday 05 May 2022 07:16:40 Christophe Leroy wrote: > > > > > >> Le 04/05/2022 à 19:57, Pali Rohár a écrit : > > > > > >>> Since commit 63a72284b159 ("powerpc/pci: Assign fixed PHB > > > > > >>> number based on device-tree properties"), powerpc kernel > > > > > >>> always fallback to PCI domain assignment from OF / Device Tree > > > > > >>> 'reg' property of the PCI controller. > > > > > >>> > > > > > >>> PCI code for other Linux architectures use increasing > > > > > >>> assignment of the PCI domain for individual controllers > > > > > >>> (assign the first free number), like it was also for powerpc > > > > > >>> prior mentioned commit. > > > > > >>> > > > > > >>> Upgrading powerpc kernels from LTS 4.4 version (which does not > > > > > >>> contain mentioned commit) to new LTS versions brings a > > > > > >>> regression in domain assignment. > > > > > >> > > > > > >> Can you elaborate why it is a regression ? > > > > > >> 63a72284b159 That commit says 'no functionnal changes', I'm > > > > > >> having hard time understanding how a nochange can be a > > > > > >> regression. > > > > > > > > > > > > It is not 'no functional change'. That commit completely changed > > > > > > PCI domain assignment in a way that is incompatible with other > > > > > > architectures and also incompatible with the way how it was done > > > > > > prior that commit. > > > > > > > > > > I agree that the "no functional change" statement is incorrect. > > > > > However, for most powerpc platforms it ended up being simply a > > > > > cosmetic behavior change. As far as I can tell there is nothing > > > > > requiring domain ids to increase montonically from zero or that > > > > > each architecture is required to use the same domain numbering > > > > > scheme. > > > > > > > > That is truth. But it looks really suspicious why domains are not > > > > assigned monotonically. Some scripts / applications are using PCI > > > > location (domain:bus:dev:func) for remembering PCI device and domain > > > > change can cause issue for config files. And some (older) applications > > > > expects existence of domain zero. In systems without hot plug support > > > > with small number of domains (e.g. 3) it means that there are always > > > > domains 0, 1 and 2. > > > > > > > > > Its hard to call this a true regression unless it actually broke > > > > > something. The commit in question has been in the kernel since 4.8 > > > > > which was released over 5 1/2 years ago. > > > > > > > > I agree, it really depends on how you look at it. > > > > > > > > The important is that lot of people are using LTS versions and are > > > > doing upgrades when LTS support is dropped. Which for 4.4 now > > > > happened. So not all smaller or "cosmetic" changes could be detected > > > > until longer LTS period pass. > > > > > > > > > With all that said looking closer at the code in question I think > > > > > it is fair to assume that the author only intended this change for > > > > > powernv and pseries platforms and not every powerpc platform. That > > > > > change was done to make persistent naming easier to manage in > > > > > userspace. > > > > > > > > I agree that this behavior change may be useful in some situations > > > > and I do not object this need. > > > > > > > > > Your change defaults back to the old behavior which will now break > > > > > both powernv and pseries platforms with regard to hotplugging and > > > > > persistent naming. > > > > > > > > I was aware of it, that change could cause issues. And that is why I > > > > added config option for choosing behavior. So users would be able to > > > > choose what they need. > > > > > > > > > We could properly limit it to powernv and pseries by using > > > > > ibm,fw-phb-id instead of reg property in the look up that follows > > > > > a failed ibm,opal-phbid lookup. I think this is acceptable as long > > > > > as no other powerpc platforms have started using this behavior for > > > > > persistent naming. > > > > > > > > And what about setting that new config option to enabled by default > > > > for those series? > > > > > > > > Or is there issue with introduction of the new config option? > > > > > > > > One of the point is that it is really a good idea to have > > > > similar/same behavior for all linux platforms. And if it cannot be > > > > enabled by default (for backward compatibility) add at least some > > > > option, so new platforms can start using it or users can
[FSL P50x0] Keyboard and mouse don't work anymore after the devicetree updates for 5.19
On 06 June 2022 at 07:06 pm, Rob Herring wrote: On Mon, Jun 6, 2022 at 11:14 AM Christian Zigotzky wrote: On 06 June 2022 at 04:58 pm, Rob Herring wrote: On Fri, May 27, 2022 at 9:23 AM Rob Herring wrote: On Fri, May 27, 2022 at 3:33 AM Christian Zigotzky wrote: On 27 May 2022 at 10:14 am, Prabhakar Mahadev Lad wrote: Hi, -Original Message- From: Christian Zigotzky On 27 May 2022 at 09:56 am, Prabhakar Mahadev Lad wrote: Hi, -Original Message- From: Christophe Leroy [...] Looks like the driver which you are using has not been converted to use platform_get_irq(), could you please check that. Cheers, Prabhakar Do you mean the mouse and keyboard driver? No it could be your gpio/pinctrl driver assuming the keyboard/mouse are using GPIO's. If you are using interrupts then it might be some hierarchal irqc driver in drivers/irqchip/. Cheers, Prabhakar Good to know. I only use unmodified drivers from the official Linux kernel so it's not an issue of the Cyrus+ board. The issue is in drivers/usb/host/fsl-mph-dr-of.c which copies the resources to a child platform device. Can you try the following change: diff --git a/drivers/usb/host/fsl-mph-dr-of.c b/drivers/usb/host/fsl-mph-dr-of.c index 44a7e58a26e3..47d9b7be60da 100644 --- a/drivers/usb/host/fsl-mph-dr-of.c +++ b/drivers/usb/host/fsl-mph-dr-of.c @@ -80,8 +80,6 @@ static struct platform_device *fsl_usb2_device_register( const char *name, int id) { struct platform_device *pdev; - const struct resource *res = ofdev->resource; - unsigned int num = ofdev->num_resources; int retval; pdev = platform_device_alloc(name, id); @@ -106,11 +104,7 @@ static struct platform_device *fsl_usb2_device_register( if (retval) goto error; - if (num) { - retval = platform_device_add_resources(pdev, res, num); - if (retval) - goto error; - } + pdev->dev.of_node = ofdev->dev.of_node; >From the log, I think you also need to add this line: pdev->dev.of_node_reused = true; retval = platform_device_add(pdev); if (retval) Hello Rob, Thanks a lot for your answer. Is the following patch correct? Yes --- a/drivers/usb/host/fsl-mph-dr-of.c2022-05-28 09:10:26.797688422 +0200 +++ b/drivers/usb/host/fsl-mph-dr-of.c2022-05-28 09:15:01.668594809 +0200 @@ -80,8 +80,6 @@ static struct platform_device *fsl_usb2_ const char *name, int id) { struct platform_device *pdev; -const struct resource *res = ofdev->resource; -unsigned int num = ofdev->num_resources; int retval; pdev = platform_device_alloc(name, id); @@ -106,11 +104,7 @@ static struct platform_device *fsl_usb2_ if (retval) goto error; -if (num) { -retval = platform_device_add_resources(pdev, res, num); -if (retval) -goto error; -} +pdev->dev.of_node = ofdev->dev.of_node; +pdev->dev.of_node_reused = true; retval = platform_device_add(pdev); if (retval) --- Thanks, Christian Hello Rob, I tested this patch today and unfortunately the issue still exists. Do you have another idea? Thanks, Christian Updated patch: --- a/drivers/usb/host/fsl-mph-dr-of.c 2022-06-06 02:18:54.0 +0200 +++ b/drivers/usb/host/fsl-mph-dr-of.c 2022-06-09 19:31:50.135472793 +0200 @@ -80,8 +80,6 @@ static struct platform_device *fsl_usb2_ const char *name, int id) { struct platform_device *pdev; - const struct resource *res = ofdev->resource; - unsigned int num = ofdev->num_resources; int retval; pdev = platform_device_alloc(name, id); @@ -105,12 +103,8 @@ static struct platform_device *fsl_usb2_ retval = platform_device_add_data(pdev, pdata, sizeof(*pdata)); if (retval) goto error; - - if (num) { - retval = platform_device_add_resources(pdev, res, num); - if (retval) - goto error; - } + pdev->dev.of_node = ofdev->dev.of_node; + pdev->dev.of_node_reused = true; retval = platform_device_add(pdev); if (retval)
Re: [PATCH] powerpc/pci: Add config option for using OF 'reg' for PCI domain
On Thu, Jun 09, 2022 at 06:27:25PM +0200, Pali Rohár wrote: > On Thursday 09 June 2022 11:22:55 Bjorn Helgaas wrote: > > [+cc Guilherme, Michael, Ben (author of 63a72284b159 and PPC folks), thread: > > https://lore.kernel.org/r/20220504175718.29011-1-p...@kernel.org] > > > > On Fri, May 06, 2022 at 12:33:02AM +0200, Pali Rohár wrote: > > > On Thursday 05 May 2022 15:10:01 Tyrel Datwyler wrote: > > > > On 5/5/22 02:31, Pali Rohár wrote: > > > > > On Thursday 05 May 2022 07:16:40 Christophe Leroy wrote: > > > > >> Le 04/05/2022 à 19:57, Pali Rohár a écrit : > > > > >>> Since commit 63a72284b159 ("powerpc/pci: Assign fixed PHB > > > > >>> number based on device-tree properties"), powerpc kernel > > > > >>> always fallback to PCI domain assignment from OF / Device Tree > > > > >>> 'reg' property of the PCI controller. > > > > >>> > > > > >>> PCI code for other Linux architectures use increasing > > > > >>> assignment of the PCI domain for individual controllers > > > > >>> (assign the first free number), like it was also for powerpc > > > > >>> prior mentioned commit. > > > > >>> > > > > >>> Upgrading powerpc kernels from LTS 4.4 version (which does not > > > > >>> contain mentioned commit) to new LTS versions brings a > > > > >>> regression in domain assignment. > > > > >> > > > > >> Can you elaborate why it is a regression ? > > > > >> 63a72284b159 That commit says 'no functionnal changes', I'm > > > > >> having hard time understanding how a nochange can be a > > > > >> regression. > > > > > > > > > > It is not 'no functional change'. That commit completely changed > > > > > PCI domain assignment in a way that is incompatible with other > > > > > architectures and also incompatible with the way how it was done > > > > > prior that commit. > > > > > > > > I agree that the "no functional change" statement is incorrect. > > > > However, for most powerpc platforms it ended up being simply a > > > > cosmetic behavior change. As far as I can tell there is nothing > > > > requiring domain ids to increase montonically from zero or that > > > > each architecture is required to use the same domain numbering > > > > scheme. > > > > > > That is truth. But it looks really suspicious why domains are not > > > assigned monotonically. Some scripts / applications are using PCI > > > location (domain:bus:dev:func) for remembering PCI device and domain > > > change can cause issue for config files. And some (older) applications > > > expects existence of domain zero. In systems without hot plug support > > > with small number of domains (e.g. 3) it means that there are always > > > domains 0, 1 and 2. > > > > > > > Its hard to call this a true regression unless it actually broke > > > > something. The commit in question has been in the kernel since 4.8 > > > > which was released over 5 1/2 years ago. > > > > > > I agree, it really depends on how you look at it. > > > > > > The important is that lot of people are using LTS versions and are > > > doing upgrades when LTS support is dropped. Which for 4.4 now > > > happened. So not all smaller or "cosmetic" changes could be detected > > > until longer LTS period pass. > > > > > > > With all that said looking closer at the code in question I think > > > > it is fair to assume that the author only intended this change for > > > > powernv and pseries platforms and not every powerpc platform. That > > > > change was done to make persistent naming easier to manage in > > > > userspace. > > > > > > I agree that this behavior change may be useful in some situations > > > and I do not object this need. > > > > > > > Your change defaults back to the old behavior which will now break > > > > both powernv and pseries platforms with regard to hotplugging and > > > > persistent naming. > > > > > > I was aware of it, that change could cause issues. And that is why I > > > added config option for choosing behavior. So users would be able to > > > choose what they need. > > > > > > > We could properly limit it to powernv and pseries by using > > > > ibm,fw-phb-id instead of reg property in the look up that follows > > > > a failed ibm,opal-phbid lookup. I think this is acceptable as long > > > > as no other powerpc platforms have started using this behavior for > > > > persistent naming. > > > > > > And what about setting that new config option to enabled by default > > > for those series? > > > > > > Or is there issue with introduction of the new config option? > > > > > > One of the point is that it is really a good idea to have > > > similar/same behavior for all linux platforms. And if it cannot be > > > enabled by default (for backward compatibility) add at least some > > > option, so new platforms can start using it or users can decide to > > > switch behavior. > > > > This is a powerpc thing so I'm just kibbitzing a little. > > > > This basically looks like a new config option to selectively revert > > 63a72284b159. That seems hard to maintain and doesn't seem like > >
Re: [PATCH] powerpc/pci: Add config option for using OF 'reg' for PCI domain
On Thursday 09 June 2022 11:22:55 Bjorn Helgaas wrote: > [+cc Guilherme, Michael, Ben (author of 63a72284b159 and PPC folks), thread: > https://lore.kernel.org/r/20220504175718.29011-1-p...@kernel.org] > > On Fri, May 06, 2022 at 12:33:02AM +0200, Pali Rohár wrote: > > On Thursday 05 May 2022 15:10:01 Tyrel Datwyler wrote: > > > On 5/5/22 02:31, Pali Rohár wrote: > > > > On Thursday 05 May 2022 07:16:40 Christophe Leroy wrote: > > > >> Le 04/05/2022 à 19:57, Pali Rohár a écrit : > > > >>> Since commit 63a72284b159 ("powerpc/pci: Assign fixed PHB > > > >>> number based on device-tree properties"), powerpc kernel > > > >>> always fallback to PCI domain assignment from OF / Device Tree > > > >>> 'reg' property of the PCI controller. > > > >>> > > > >>> PCI code for other Linux architectures use increasing > > > >>> assignment of the PCI domain for individual controllers > > > >>> (assign the first free number), like it was also for powerpc > > > >>> prior mentioned commit. > > > >>> > > > >>> Upgrading powerpc kernels from LTS 4.4 version (which does not > > > >>> contain mentioned commit) to new LTS versions brings a > > > >>> regression in domain assignment. > > > >> > > > >> Can you elaborate why it is a regression ? > > > >> 63a72284b159 That commit says 'no functionnal changes', I'm > > > >> having hard time understanding how a nochange can be a > > > >> regression. > > > > > > > > It is not 'no functional change'. That commit completely changed > > > > PCI domain assignment in a way that is incompatible with other > > > > architectures and also incompatible with the way how it was done > > > > prior that commit. > > > > > > I agree that the "no functional change" statement is incorrect. > > > However, for most powerpc platforms it ended up being simply a > > > cosmetic behavior change. As far as I can tell there is nothing > > > requiring domain ids to increase montonically from zero or that > > > each architecture is required to use the same domain numbering > > > scheme. > > > > That is truth. But it looks really suspicious why domains are not > > assigned monotonically. Some scripts / applications are using PCI > > location (domain:bus:dev:func) for remembering PCI device and domain > > change can cause issue for config files. And some (older) applications > > expects existence of domain zero. In systems without hot plug support > > with small number of domains (e.g. 3) it means that there are always > > domains 0, 1 and 2. > > > > > Its hard to call this a true regression unless it actually broke > > > something. The commit in question has been in the kernel since 4.8 > > > which was released over 5 1/2 years ago. > > > > I agree, it really depends on how you look at it. > > > > The important is that lot of people are using LTS versions and are > > doing upgrades when LTS support is dropped. Which for 4.4 now > > happened. So not all smaller or "cosmetic" changes could be detected > > until longer LTS period pass. > > > > > With all that said looking closer at the code in question I think > > > it is fair to assume that the author only intended this change for > > > powernv and pseries platforms and not every powerpc platform. That > > > change was done to make persistent naming easier to manage in > > > userspace. > > > > I agree that this behavior change may be useful in some situations > > and I do not object this need. > > > > > Your change defaults back to the old behavior which will now break > > > both powernv and pseries platforms with regard to hotplugging and > > > persistent naming. > > > > I was aware of it, that change could cause issues. And that is why I > > added config option for choosing behavior. So users would be able to > > choose what they need. > > > > > We could properly limit it to powernv and pseries by using > > > ibm,fw-phb-id instead of reg property in the look up that follows > > > a failed ibm,opal-phbid lookup. I think this is acceptable as long > > > as no other powerpc platforms have started using this behavior for > > > persistent naming. > > > > And what about setting that new config option to enabled by default > > for those series? > > > > Or is there issue with introduction of the new config option? > > > > One of the point is that it is really a good idea to have > > similar/same behavior for all linux platforms. And if it cannot be > > enabled by default (for backward compatibility) add at least some > > option, so new platforms can start using it or users can decide to > > switch behavior. > > This is a powerpc thing so I'm just kibbitzing a little. > > This basically looks like a new config option to selectively revert > 63a72284b159. That seems hard to maintain and doesn't seem like > something that needs to be baked into the kernel at compile-time. > > The 63a72284b159 commit log says persistent NIC names are tied to PCI > domain/bus/dev/fn addresses, which seems like something we should > discourage because we can't predict PCI
Re: [PATCH] powerpc/pci: Add config option for using OF 'reg' for PCI domain
[+cc Guilherme, Michael, Ben (author of 63a72284b159 and PPC folks), thread: https://lore.kernel.org/r/20220504175718.29011-1-p...@kernel.org] On Fri, May 06, 2022 at 12:33:02AM +0200, Pali Rohár wrote: > On Thursday 05 May 2022 15:10:01 Tyrel Datwyler wrote: > > On 5/5/22 02:31, Pali Rohár wrote: > > > On Thursday 05 May 2022 07:16:40 Christophe Leroy wrote: > > >> Le 04/05/2022 à 19:57, Pali Rohár a écrit : > > >>> Since commit 63a72284b159 ("powerpc/pci: Assign fixed PHB > > >>> number based on device-tree properties"), powerpc kernel > > >>> always fallback to PCI domain assignment from OF / Device Tree > > >>> 'reg' property of the PCI controller. > > >>> > > >>> PCI code for other Linux architectures use increasing > > >>> assignment of the PCI domain for individual controllers > > >>> (assign the first free number), like it was also for powerpc > > >>> prior mentioned commit. > > >>> > > >>> Upgrading powerpc kernels from LTS 4.4 version (which does not > > >>> contain mentioned commit) to new LTS versions brings a > > >>> regression in domain assignment. > > >> > > >> Can you elaborate why it is a regression ? > > >> 63a72284b159 That commit says 'no functionnal changes', I'm > > >> having hard time understanding how a nochange can be a > > >> regression. > > > > > > It is not 'no functional change'. That commit completely changed > > > PCI domain assignment in a way that is incompatible with other > > > architectures and also incompatible with the way how it was done > > > prior that commit. > > > > I agree that the "no functional change" statement is incorrect. > > However, for most powerpc platforms it ended up being simply a > > cosmetic behavior change. As far as I can tell there is nothing > > requiring domain ids to increase montonically from zero or that > > each architecture is required to use the same domain numbering > > scheme. > > That is truth. But it looks really suspicious why domains are not > assigned monotonically. Some scripts / applications are using PCI > location (domain:bus:dev:func) for remembering PCI device and domain > change can cause issue for config files. And some (older) applications > expects existence of domain zero. In systems without hot plug support > with small number of domains (e.g. 3) it means that there are always > domains 0, 1 and 2. > > > Its hard to call this a true regression unless it actually broke > > something. The commit in question has been in the kernel since 4.8 > > which was released over 5 1/2 years ago. > > I agree, it really depends on how you look at it. > > The important is that lot of people are using LTS versions and are > doing upgrades when LTS support is dropped. Which for 4.4 now > happened. So not all smaller or "cosmetic" changes could be detected > until longer LTS period pass. > > > With all that said looking closer at the code in question I think > > it is fair to assume that the author only intended this change for > > powernv and pseries platforms and not every powerpc platform. That > > change was done to make persistent naming easier to manage in > > userspace. > > I agree that this behavior change may be useful in some situations > and I do not object this need. > > > Your change defaults back to the old behavior which will now break > > both powernv and pseries platforms with regard to hotplugging and > > persistent naming. > > I was aware of it, that change could cause issues. And that is why I > added config option for choosing behavior. So users would be able to > choose what they need. > > > We could properly limit it to powernv and pseries by using > > ibm,fw-phb-id instead of reg property in the look up that follows > > a failed ibm,opal-phbid lookup. I think this is acceptable as long > > as no other powerpc platforms have started using this behavior for > > persistent naming. > > And what about setting that new config option to enabled by default > for those series? > > Or is there issue with introduction of the new config option? > > One of the point is that it is really a good idea to have > similar/same behavior for all linux platforms. And if it cannot be > enabled by default (for backward compatibility) add at least some > option, so new platforms can start using it or users can decide to > switch behavior. This is a powerpc thing so I'm just kibbitzing a little. This basically looks like a new config option to selectively revert 63a72284b159. That seems hard to maintain and doesn't seem like something that needs to be baked into the kernel at compile-time. The 63a72284b159 commit log says persistent NIC names are tied to PCI domain/bus/dev/fn addresses, which seems like something we should discourage because we can't predict PCI addresses in general. I assume other platforms typically use udev with MAC addresses or something? > > > For example, prior that commit on P2020 RDB board were PCI > > > domains 0, 1 and 2. > > > > > > $ lspci > > > :00:00.0 PCI bridge: Freescale
RE: [PATCH 09/10] scsi/ibmvscsi: Replace srp tasklet with work
From: Sebastian Andrzej Siewior > Sent: 09 June 2022 16:03 > > On 2022-05-30 16:15:11 [-0700], Davidlohr Bueso wrote: > > Tasklets have long been deprecated as being too heavy on the system > > by running in irq context - and this is not a performance critical > > path. If a higher priority process wants to run, it must wait for > > the tasklet to finish before doing so. > > > > Process srps asynchronously in process context in a dedicated > > single threaded workqueue. > > I would suggest threaded interrupts instead. The pattern here is the > same as in the previous driver except here is less locking. How long do these actions runs for, and what is waiting for them to finish? These changes seem to drop the priority from above that of the highest priority RT process down to that of a default priority user process. There is no real guarantee that the latter will run 'any time soon'. Consider some workloads I'm setting up where most of the cpu are likely to spend 90%+ of the time running processes under the RT scheduler that are processing audio. It is quite likely that a non-RT thread (especially one bound to a specific cpu) won't run for several milliseconds. (We have to go through 'hoops' to avoid dropping ethernet frames.) I'd have thought that some of these kernel threads really need to run at a 'middling' RT priority. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [PATCH 09/10] scsi/ibmvscsi: Replace srp tasklet with work
On 2022-05-30 16:15:11 [-0700], Davidlohr Bueso wrote: > Tasklets have long been deprecated as being too heavy on the system > by running in irq context - and this is not a performance critical > path. If a higher priority process wants to run, it must wait for > the tasklet to finish before doing so. > > Process srps asynchronously in process context in a dedicated > single threaded workqueue. I would suggest threaded interrupts instead. The pattern here is the same as in the previous driver except here is less locking. Sebastian
[GIT PULL] Please pull powerpc/linux.git powerpc-5.19-2 tag
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hi Linus, Please pull powerpc fixes for 5.19: The following changes since commit 6112bd00e84e5dbffebc3c1e908cbe914ca772ee: Merge tag 'powerpc-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux (2022-05-28 11:27:17 -0700) are available in the git repository at: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git tags/powerpc-5.19-2 for you to fetch changes up to 8e127846fc97778a5e5c99bca1ce0bbc5ec9: powerpc/32: Fix overread/overwrite of thread_struct via ptrace (2022-06-09 23:32:56 +1000) - -- powerpc fixes for 5.19 #2 - On 32-bit fix overread/overwrite of thread_struct via ptrace PEEK/POKE. - Fix softirqs not switching to the softirq stack since we moved irq_exit(). - Force thread size increase when KASAN is enabled to avoid stack overflows. - On Book3s 64 mark more code as not to be instrumented by KASAN to avoid crashes. - Exempt __get_wchan() from KASAN checking, as it's inherently racy. - Fix a recently introduced crash in the papr_scm driver in some configurations. - Remove include of which is forbidden. Thanks to: Ariel Miculas, Chen Jingwen, Christophe Leroy, Erhard Furtner, He Ying, Kees Cook, Masahiro Yamada, Nageswara R Sastry, Paul Mackerras, Sachin Sant, Vaibhav Jain, Wanming Hu. - -- He Ying (1): powerpc/kasan: Silence KASAN warnings in __get_wchan() Masahiro Yamada (1): powerpc/book3e: get rid of #include Michael Ellerman (3): powerpc: Don't select HAVE_IRQ_EXIT_ON_IRQ_STACK powerpc/kasan: Force thread size increase with KASAN powerpc/32: Fix overread/overwrite of thread_struct via ptrace Paul Mackerras (1): powerpc/kasan: Mark more real-mode code as not to be instrumented Vaibhav Jain (1): powerpc/papr_scm: don't requests stats with '0' sized stats buffer arch/powerpc/Kconfig | 2 -- arch/powerpc/include/asm/thread_info.h| 10 -- arch/powerpc/kernel/Makefile | 2 ++ arch/powerpc/kernel/process.c | 4 ++-- arch/powerpc/kernel/ptrace/ptrace-fpu.c | 20 ++-- arch/powerpc/kernel/ptrace/ptrace.c | 3 +++ arch/powerpc/kernel/rtas.c| 4 ++-- arch/powerpc/kexec/crash.c| 2 +- arch/powerpc/mm/nohash/kaslr_booke.c | 8 ++-- arch/powerpc/platforms/powernv/Makefile | 1 + arch/powerpc/platforms/pseries/papr_scm.c | 3 +++ 11 files changed, 38 insertions(+), 21 deletions(-) -BEGIN PGP SIGNATURE- iQIzBAEBCAAdFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmKiClMACgkQUevqPMjh pYBl2w//cMfBNA1wR3b+r9+Ro9SlFMzfCtk0ue+A44QLo0Va71H9eZLmZ7qM6LK4 B37Am/bNZPnI0DM1pjTXKq0aEt7KRC9VyEiS9Gl4w0mf5bRjZy8Q8A+YA/CDoKrr Ctd1iAZzk1mAbegdSeS66DlmGVGvDwzLK8jkJDPjOhaaXZMLZLRx+j4USeLuTCMS Ub7LXpLcewa7E7xOXshoFIFd6wCSEpsiayCEAlCd4yICKeDfNdWc/9GBd3e6z+1e 7cSr7E59TY1cJ5WZ00p1UPigImNroYBEPmZj+F3vSt/MQwb/VXZYQuFojytk126z sl69pzyghp4oCdvNvVVxHTHjDYZhnoPEKSoPpgD+CegrrdgKBeXweGE8T8JHKr4i 1FSFrX+zwXoTRRpwyDVEtDu1ld/AMHMQO8NiFFDSuYDdgrgYCyqk1Rg3Yb6aYpLQ +b8uwX9b1lUZyrJz/Whf4PpTW0TmU9eyNPyFScpyojX1HMcu+VDBtowuaS1FkzIb 2ft6XsQUV0f4EIrJTsgWWyIw32kE9TiCPKwOX3wMQfdx5j6YuQJ7N2ALZVoQnQbu aUvpGu6Fv/tasSaboThHWpAsRZLMGjFD94WwvX1Kn0NtvWpoeocA6barup0Nosey 3avApSIyPG5Q9RA3m1SjM0dLJdpyUmt2hDuvuFitRmeqePOoVxc= =s319 -END PGP SIGNATURE-
Re: [PATCH] powerpc: get rid of #include
On Sat, 4 Jun 2022 17:50:50 +0900, Masahiro Yamada wrote: > You cannot include here because it is generated > in init/Makefile but there is no guarantee that it happens before > arch/powerpc/mm/nohash/kaslr_booke.c is compiled for parallel builds. > > The places where you can reliably include are: > > - init/ (because init/Makefile can specify the dependency) > - arch/*/boot/ (because it is compiled after vmlinux) > > [...] Applied to powerpc/fixes. [1/1] powerpc: get rid of #include https://git.kernel.org/powerpc/c/7ad4bd887d27c6b6ffbef216f19c19f8fe2b8f52 cheers
Re: [PATCH -v2] powerpc/process, kasan: Silence KASAN warnings in __get_wchan()
On Thu, 20 Jan 2022 20:44:18 -0500, He Ying wrote: > The following KASAN warning was reported in our kernel. > > BUG: KASAN: stack-out-of-bounds in get_wchan+0x188/0x250 > Read of size 4 at addr d216f958 by task ps/14437 > > CPU: 3 PID: 14437 Comm: ps Tainted: G O 5.10.0 #1 > Call Trace: > [daa63858] [c0654348] dump_stack+0x9c/0xe4 (unreliable) > [daa63888] [c035cf0c] print_address_description.constprop.3+0x8c/0x570 > [daa63908] [c035d6bc] kasan_report+0x1ac/0x218 > [daa63948] [c00496e8] get_wchan+0x188/0x250 > [daa63978] [c0461ec8] do_task_stat+0xce8/0xe60 > [daa63b98] [c0455ac8] proc_single_show+0x98/0x170 > [daa63bc8] [c03cab8c] seq_read_iter+0x1ec/0x900 > [daa63c38] [c03cb47c] seq_read+0x1dc/0x290 > [daa63d68] [c037fc94] vfs_read+0x164/0x510 > [daa63ea8] [c03808e4] ksys_read+0x144/0x1d0 > [daa63f38] [c005b1dc] ret_from_syscall+0x0/0x38 > --- interrupt: c00 at 0x8fa8f4 > LR = 0x8fa8cc > > [...] Applied to powerpc/fixes. [1/1] powerpc/process, kasan: Silence KASAN warnings in __get_wchan() https://git.kernel.org/powerpc/c/a1b29ba2f2c171b9bea73be993bfdf0a62d37d15 cheers
Re: [PATCH] powerpc/kasan: Mark more real-mode code as not to be instrumented
On Thu, 19 May 2022 17:45:21 +1000, Paul Mackerras wrote: > This marks more files and functions that can possibly be called in > real mode as not to be instrumented by KASAN. Most were found by > inspection, except for get_pseries_errorlog() which was reported as > causing a crash in testing. > > Applied to powerpc/fixes. [1/1] powerpc/kasan: Mark more real-mode code as not to be instrumented https://git.kernel.org/powerpc/c/743cdb7bd0f1cb32c03680c8b38257957db2e692 cheers
Re: [PATCH] powerpc/kasan: Force thread size increase with KASAN
On Thu, 2 Jun 2022 00:31:14 +1000, Michael Ellerman wrote: > KASAN causes increased stack usage, which can lead to stack overflows. > > The logic in Kconfig to suggest a larger default doesn't work if a user > has CONFIG_EXPERT enabled and has an existing .config with a smaller > value. > > Follow the lead of x86 and arm64, and force the thread size to be > increased when KASAN is enabled. > > [...] Applied to powerpc/fixes. [1/1] powerpc/kasan: Force thread size increase with KASAN https://git.kernel.org/powerpc/c/3e8635fb2e072672cbc650989ffedf8300ad67fb cheers
Re: [PATCH] powerpc: Don't select HAVE_IRQ_EXIT_ON_IRQ_STACK
On Wed, 25 May 2022 13:26:39 +1000, Michael Ellerman wrote: > The HAVE_IRQ_EXIT_ON_IRQ_STACK option tells generic code that irq_exit() > is called while still running on the hard irq stack (hardirq_ctx[] in > the powerpc code). > > Selecting the option means the generic code will *not* switch to the > softirq stack before running softirqs, because the code is already > running on the (mostly empty) hard irq stack. > > [...] Applied to powerpc/fixes. [1/1] powerpc: Don't select HAVE_IRQ_EXIT_ON_IRQ_STACK https://git.kernel.org/powerpc/c/1346d00e1bdfd4067f92bc14e8a6131a01de4190 cheers
Re: [PATCH] powerpc/papr_scm: don't requests stats with '0' sized stats buffer
On Tue, 24 May 2022 16:53:53 +0530, Vaibhav Jain wrote: > Sachin reported [1] that on a POWER-10 lpar he is seeing a kernel panic being > reported with vPMEM when papr_scm probe is being called. The panic is of the > form below and is observed only with following option disabled(profile) for > the > said LPAR 'Enable Performance Information Collection' in the HMC: > > Kernel attempted to write user page (1c) - exploit attempt? (uid: 0) > BUG: Kernel NULL pointer dereference on write at 0x001c > Faulting instruction address: 0xc00801b90844 > Oops: Kernel access of bad area, sig: 11 [#1] > > NIP [c00801b90844] drc_pmem_query_stats+0x5c/0x270 [papr_scm] > LR [c00801b92794] papr_scm_probe+0x2ac/0x6ec [papr_scm] > Call Trace: >0xc941bca0 (unreliable) >papr_scm_probe+0x2ac/0x6ec [papr_scm] >platform_probe+0x98/0x150 >really_probe+0xfc/0x510 >__driver_probe_device+0x17c/0x230 > > ---[ end trace ]--- > Kernel panic - not syncing: Fatal exception > > [...] Applied to powerpc/fixes. [1/1] powerpc/papr_scm: don't requests stats with '0' sized stats buffer https://git.kernel.org/powerpc/c/07bf9431b1590d1cd7a8d62075d0b50b073f0495 cheers
Re: [PATCH] powerpc/32: Fix overread/overwrite of thread_struct via ptrace
On Thu, 9 Jun 2022 23:32:45 +1000, Michael Ellerman wrote: > The ptrace PEEKUSR/POKEUSR (aka PEEKUSER/POKEUSER) API allows a process > to read/write registers of another process. > > To get/set a register, the API takes an index into an imaginary address > space called the "USER area", where the registers of the process are > laid out in some fashion. > > [...] Applied to powerpc/fixes. [1/1] powerpc/32: Fix overread/overwrite of thread_struct via ptrace https://git.kernel.org/powerpc/c/8e127846fc97778a5e5c99bca1ce0bbc5ec9 cheers
Re: [PATCH V2] ASoC: imx-audmux: remove unnecessary check of clk_disable_unprepare/clk_prepare_enable
On Mon, 6 Jun 2022 03:37:05 +, cgel@gmail.com wrote: > From: Minghao Chi > > Because clk_disable_unprepare/clk_prepare_enable already checked NULL clock > parameter, so the additional checks are unnecessary, just remove them. > > Applied to https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next Thanks! [1/1] ASoC: imx-audmux: remove unnecessary check of clk_disable_unprepare/clk_prepare_enable commit: 142d456204cf4dabe18be59e043d806440f609d4 All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark
[PATCH] powerpc/32: Fix overread/overwrite of thread_struct via ptrace
The ptrace PEEKUSR/POKEUSR (aka PEEKUSER/POKEUSER) API allows a process to read/write registers of another process. To get/set a register, the API takes an index into an imaginary address space called the "USER area", where the registers of the process are laid out in some fashion. The kernel then maps that index to a particular register in its own data structures and gets/sets the value. The API only allows a single machine-word to be read/written at a time. So 4 bytes on 32-bit kernels and 8 bytes on 64-bit kernels. The way floating point registers (FPRs) are addressed is somewhat complicated, because double precision float values are 64-bit even on 32-bit CPUs. That means on 32-bit kernels each FPR occupies two word-sized locations in the USER area. On 64-bit kernels each FPR occupies one word-sized location in the USER area. Internally the kernel stores the FPRs in an array of u64s, or if VSX is enabled, an array of pairs of u64s where one half of each pair stores the FPR. Which half of the pair stores the FPR depends on the kernel's endianness. To handle the different layouts of the FPRs depending on VSX/no-VSX and big/little endian, the TS_FPR() macro was introduced. Unfortunately the TS_FPR() macro does not take into account the fact that the addressing of each FPR differs between 32-bit and 64-bit kernels. It just takes the index into the "USER area" passed from userspace and indexes into the fp_state.fpr array. On 32-bit there are 64 indexes that address FPRs, but only 32 entries in the fp_state.fpr array, meaning the user can read/write 256 bytes past the end of the array. Because the fp_state sits in the middle of the thread_struct there are various fields than can be overwritten, including some pointers. As such it may be exploitable. It has also been observed to cause systems to hang or otherwise misbehave when using gdbserver, and is probably the root cause of this report which could not be easily reproduced: https://lore.kernel.org/linuxppc-dev/dc38afe9-6b78-f3f5-666b-986939e40...@keymile.com/ Rather than trying to make the TS_FPR() macro even more complicated to fix the bug, or add more macros, instead add a special-case for 32-bit kernels. This is more obvious and hopefully avoids a similar bug happening again in future. Note that because 32-bit kernels never have VSX enabled the code doesn't need to consider TS_FPRWIDTH/OFFSET at all. Add a BUILD_BUG_ON() to ensure that 32-bit && VSX is never enabled. Fixes: 87fec0514f61 ("powerpc: PTRACE_PEEKUSR/PTRACE_POKEUSER of FPR registers in little endian builds") Cc: sta...@vger.kernel.org # v3.13+ Reported-by: Ariel Miculas Tested-by: Christophe Leroy Signed-off-by: Michael Ellerman --- arch/powerpc/kernel/ptrace/ptrace-fpu.c | 20 ++-- arch/powerpc/kernel/ptrace/ptrace.c | 3 +++ 2 files changed, 17 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/kernel/ptrace/ptrace-fpu.c b/arch/powerpc/kernel/ptrace/ptrace-fpu.c index 5dca19361316..09c49632bfe5 100644 --- a/arch/powerpc/kernel/ptrace/ptrace-fpu.c +++ b/arch/powerpc/kernel/ptrace/ptrace-fpu.c @@ -17,9 +17,13 @@ int ptrace_get_fpr(struct task_struct *child, int index, unsigned long *data) #ifdef CONFIG_PPC_FPU_REGS flush_fp_to_thread(child); - if (fpidx < (PT_FPSCR - PT_FPR0)) - memcpy(data, >thread.TS_FPR(fpidx), sizeof(long)); - else + if (fpidx < (PT_FPSCR - PT_FPR0)) { + if (IS_ENABLED(CONFIG_PPC32)) + // On 32-bit the index we are passed refers to 32-bit words + *data = ((u32 *)child->thread.fp_state.fpr)[fpidx]; + else + memcpy(data, >thread.TS_FPR(fpidx), sizeof(long)); + } else *data = child->thread.fp_state.fpscr; #else *data = 0; @@ -39,9 +43,13 @@ int ptrace_put_fpr(struct task_struct *child, int index, unsigned long data) #ifdef CONFIG_PPC_FPU_REGS flush_fp_to_thread(child); - if (fpidx < (PT_FPSCR - PT_FPR0)) - memcpy(>thread.TS_FPR(fpidx), , sizeof(long)); - else + if (fpidx < (PT_FPSCR - PT_FPR0)) { + if (IS_ENABLED(CONFIG_PPC32)) + // On 32-bit the index we are passed refers to 32-bit words + ((u32 *)child->thread.fp_state.fpr)[fpidx] = data; + else + memcpy(>thread.TS_FPR(fpidx), , sizeof(long)); + } else child->thread.fp_state.fpscr = data; #endif diff --git a/arch/powerpc/kernel/ptrace/ptrace.c b/arch/powerpc/kernel/ptrace/ptrace.c index 4d2dc22d4a2d..5d7a72b41ae7 100644 --- a/arch/powerpc/kernel/ptrace/ptrace.c +++ b/arch/powerpc/kernel/ptrace/ptrace.c @@ -444,4 +444,7 @@ void __init pt_regs_check(void) * real registers. */ BUILD_BUG_ON(PT_DSCR < sizeof(struct user_pt_regs) / sizeof(unsigned long)); + + // ptrace_get/put_fpr() rely on PPC32 and VSX being
Re: [PATCH 2/2] uio:powerpc:mpc85xx: l2-cache-sram uio driver implementation
On Thu, Jun 09, 2022 at 03:28:55AM -0700, Wang Wenhu wrote: > The l2-cache could be optionally configured as SRAM partly or fully. > Users can make use of it as a block of independent memory that offers > special usage, such as for debuging or other cratical status info > storage which keeps consistently even when the whole system crashed. > > The hardware related configuration process utilized the work of the > earlier implementation, which has been removed now. > See: > https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?id=dc21ed2aef4150fc2fcf58227a4ff24502015c03 > > Cc: Christophe Leroy > Signed-off-by: Wang Wenhu > --- > drivers/uio/Kconfig | 10 + > drivers/uio/Makefile | 1 + > drivers/uio/uio_fsl_85xx_cache_sram.c | 286 ++ > 3 files changed, 297 insertions(+) > create mode 100644 drivers/uio/uio_fsl_85xx_cache_sram.c > > diff --git a/drivers/uio/Kconfig b/drivers/uio/Kconfig > index 2e16c5338e5b..9199ced03880 100644 > --- a/drivers/uio/Kconfig > +++ b/drivers/uio/Kconfig > @@ -105,6 +105,16 @@ config UIO_NETX > To compile this driver as a module, choose M here; the module > will be called uio_netx. > > +config UIO_FSL_85XX_CACHE_SRAM > + tristate "Freescale 85xx Cache-Sram driver" > + depends on FSL_SOC_BOOKE && PPC32 > + help > + Generic driver for accessing the Cache-Sram form user level. This > + is extremely helpful for some user-space applications that require > + high performance memory accesses. > + > + If you don't know what to do here, say N. Module name information? > + > config UIO_FSL_ELBC_GPCM > tristate "eLBC/GPCM driver" > depends on FSL_LBC > diff --git a/drivers/uio/Makefile b/drivers/uio/Makefile > index f2f416a14228..1ba07d92a1b1 100644 > --- a/drivers/uio/Makefile > +++ b/drivers/uio/Makefile > @@ -12,3 +12,4 @@ obj-$(CONFIG_UIO_MF624) += uio_mf624.o > obj-$(CONFIG_UIO_FSL_ELBC_GPCM) += uio_fsl_elbc_gpcm.o > obj-$(CONFIG_UIO_HV_GENERIC) += uio_hv_generic.o > obj-$(CONFIG_UIO_DFL)+= uio_dfl.o > +obj-$(CONFIG_UIO_FSL_85XX_CACHE_SRAM)+= uio_fsl_85xx_cache_sram.o > diff --git a/drivers/uio/uio_fsl_85xx_cache_sram.c > b/drivers/uio/uio_fsl_85xx_cache_sram.c > new file mode 100644 > index ..d363f9d2b179 > --- /dev/null > +++ b/drivers/uio/uio_fsl_85xx_cache_sram.c > @@ -0,0 +1,286 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Copyright (C) 2022 Wang Wenhu > + * All rights reserved. > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#define DRIVER_NAME "uio_mpc85xx_cache_sram" > +#define UIO_INFO_VER "0.0.1" > +#define UIO_NAME "uio_cache_sram" > + > +#define L2CR_L2FI0x4000 /* L2 flash > invalidate */ > +#define L2CR_L2IO0x0020 /* L2 > instruction only */ > +#define L2CR_SRAM_ZERO 0x /* L2SRAM zero > size */ > +#define L2CR_SRAM_FULL 0x0001 /* L2SRAM full > size */ > +#define L2CR_SRAM_HALF 0x0002 /* L2SRAM half > size */ > +#define L2CR_SRAM_TWO_HALFS 0x0003 /* L2SRAM two half > sizes */ > +#define L2CR_SRAM_QUART 0x0004 /* L2SRAM one > quarter size */ > +#define L2CR_SRAM_TWO_QUARTS 0x0005 /* L2SRAM two quarter size */ > +#define L2CR_SRAM_EIGHTH 0x0006 /* L2SRAM one eighth > size */ > +#define L2CR_SRAM_TWO_EIGHTH 0x0007 /* L2SRAM two eighth size */ > + > +#define L2SRAM_OPTIMAL_SZ_SHIFT 0x0003 /* Optimum size for > L2SRAM */ > + > +#define L2SRAM_BAR_MSK_LO18 0xC000 /* Lower 18 bits */ > +#define L2SRAM_BARE_MSK_HI4 0x000F /* Upper 4 bits */ > + > +enum cache_sram_lock_ways { > + LOCK_WAYS_ZERO, > + LOCK_WAYS_EIGHTH, > + LOCK_WAYS_TWO_EIGHTH, Why not have values for these? > + LOCK_WAYS_HALF = 4, > + LOCK_WAYS_FULL = 8, > +}; > + > +struct mpc85xx_l2ctlr { > + u32 ctl;/* 0x000 - L2 control */ What is the endian of these u32 values? You map them directly to memory, so they must be specified some way, right? Please make it obvious what they are. > + u8 res1[0xC]; > + u32 ewar0; /* 0x010 - External write address 0 */ > + u32 ewarea0;/* 0x014 - External write address extended 0 */ > + u32 ewcr0; /* 0x018 - External write ctrl */ > + u8 res2[4]; > + u32 ewar1; /* 0x020 - External write address 1 */ > + u32 ewarea1;/* 0x024 - External write address extended 1 */ > + u32 ewcr1; /* 0x028 - External write ctrl 1 */ > + u8 res3[4]; > + u32 ewar2; /* 0x030 - External write address 2 */ > + u32
Re: [PATCH 2/6] powerpc: Provide syscall wrapper
Le 01/06/2022 à 10:29, Christophe Leroy a écrit : Le 01/06/2022 à 07:48, Rohan McLure a écrit : [Vous ne recevez pas souvent de courriers de la part de rmcl...@linux.ibm.com. Découvrez pourquoi cela peut être important à l'adresse https://aka.ms/LearnAboutSenderIdentification.] Syscall wrapper implemented as per s390, x86, arm64, providing the option for gprs to be cleared on entry to the kernel, reducing caller influence influence on speculation within syscall routine. The wrapper is a macro that emits syscall handler implementations with parameters passed by stack pointer. Passing parameters by stack is going to be sub-optimal. Did you make any measurement of the implied performance degradation ? We usually use the null_syscall selftest for that everytime we touch syscall entries/exits. I did a test with null_syscall on an 8xx. Surprisingly I get more than 20% improvement with your series. Looking at the generated code in more details, we see that system_call_exception() is lighter as now no stack frame is needed, the compiler has enough registers available. Before the patch: c000c9ec : c000c9ec: 94 21 ff f0 stwur1,-16(r1) c000c9f0: 93 e1 00 0c stw r31,12(r1) c000c9f4: 7d 5f 53 78 mr r31,r10 c000c9f8: 81 4a 00 84 lwz r10,132(r10) c000c9fc: 90 7f 00 88 stw r3,136(r31) c000ca00: 71 4b 00 02 andi. r11,r10,2 c000ca04: 41 82 00 4c beq c000ca50 c000ca08: 71 4b 40 00 andi. r11,r10,16384 c000ca0c: 41 82 00 50 beq c000ca5c c000ca10: 71 4a 80 00 andi. r10,r10,32768 c000ca14: 41 82 00 54 beq c000ca68 c000ca18: 7c 50 13 a6 mtspr 80,r2 c000ca1c: 81 42 00 4c lwz r10,76(r2) c000ca20: 71 4a 84 91 andi. r10,r10,33937 c000ca24: 40 82 00 50 bne c000ca74 c000ca28: 28 09 01 c2 cmplwi r9,450 c000ca2c: 41 81 00 88 bgt c000cab4 c000ca30: 3d 40 c0 6f lis r10,-16273 c000ca34: 55 29 10 3a rlwinm r9,r9,2,0,29 c000ca38: 39 4a c1 c5 addir10,r10,-15931 c000ca3c: 7d 2a 48 2e lwzxr9,r10,r9 c000ca40: 83 e1 00 0c lwz r31,12(r1) c000ca44: 7d 29 03 a6 mtctr r9 c000ca48: 38 21 00 10 addir1,r1,16 c000ca4c: 4e 80 04 20 bctr ... After the patch: c000cc94 : c000cc94: 81 24 00 84 lwz r9,132(r4) c000cc98: 81 44 00 0c lwz r10,12(r4) c000cc9c: 71 28 00 02 andi. r8,r9,2 c000cca0: 91 44 00 88 stw r10,136(r4) c000cca4: 41 82 00 48 beq c000ccec c000cca8: 71 2a 40 00 andi. r10,r9,16384 c000ccac: 41 82 00 44 beq c000ccf0 c000ccb0: 71 29 80 00 andi. r9,r9,32768 c000ccb4: 41 82 00 40 beq c000ccf4 c000ccb8: 7c 50 13 a6 mtspr 80,r2 c000ccbc: 81 22 00 4c lwz r9,76(r2) c000ccc0: 71 29 84 91 andi. r9,r9,33937 c000ccc4: 40 82 00 34 bne c000ccf8 c000ccc8: 28 03 01 c2 cmplwi r3,450 c000: 41 81 00 78 bgt c000cd44 c000ccd0: 3d 20 c0 70 lis r9,-16272 c000ccd4: 54 63 10 3a rlwinm r3,r3,2,0,29 c000ccd8: 39 29 81 c5 addir9,r9,-32315 c000ccdc: 7d 29 18 2e lwzxr9,r9,r3 c000cce0: 7c 83 23 78 mr r3,r4 c000cce4: 7d 29 03 a6 mtctr r9 c000cce8: 4e 80 04 20 bctr ... Why going via stack ? The main advantage of a RISC processor like powerpc is that, unlike x86, there are enough registers to avoid going through memory. RISC processors are optimised with three operands operations and many registers, and usually have slow memory in return. Well, thinking about it once more. In fact registers are saved to the stack anyway. At the start of syscall functions they are likely to still be hot in the cache, so reading them back is just a few cycles. And it eventually provide the compiler the opportunity to organise stuff better. For platforms supporting this syscall wrapper, emit symbols with usual in-register parameters (`sys...`) to support calls to syscall handlers from within the kernel. Syscalls are wrapped on all platforms except Cell processor. SPUs require access syscall prototypes which are omitted with ARCH_HAS_SYSCALL_WRAPPER enabled. This commit message isn't very clear, please describe in more details what is done, how and why. Christophe
Re: [PATCH 08/10] scsi/ibmvfc: Replace tasklet with work
On 2022-05-30 16:15:10 [-0700], Davidlohr Bueso wrote: > diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c > index d0eab5700dc5..31b1900489e7 100644 > --- a/drivers/scsi/ibmvscsi/ibmvfc.c > +++ b/drivers/scsi/ibmvscsi/ibmvfc.c > @@ -891,7 +891,7 @@ static void ibmvfc_release_crq_queue(struct ibmvfc_host > *vhost) > > ibmvfc_dbg(vhost, "Releasing CRQ\n"); > free_irq(vdev->irq, vhost); > - tasklet_kill(>tasklet); > +cancel_work_sync(>work); s/ {8}/\t/ is there a reason not to use threaded interrupts? The workqueue _might_ migrate to another CPU. The locking ensures that nothing bad happens but ibmvfc_tasklet() has this piece: | spin_lock_irqsave(vhost->host->host_lock, flags); … | while ((async = ibmvfc_next_async_crq(vhost)) != NULL) { | ibmvfc_handle_async(async, vhost); | async->valid = 0; | wmb(); | } … | vio_enable_interrupts(vdev); potentially enables interrupts which fires right away. | if ((async = ibmvfc_next_async_crq(vhost)) != NULL) { | vio_disable_interrupts(vdev); disables it again. | } | | spin_unlock(vhost->crq.q_lock); | spin_unlock_irqrestore(vhost->host->host_lock, flags); If the worker runs on another CPU then the CPU servicing the interrupt will be blocked on the lock which is not nice. My guess is that you could enable interrupts right before unlocking but is a different story. > do { > if (rc) > msleep(100); Sebastian
[PATCH] powerpc: Enable execve syscall exit tracepoint
On execve[at], we are zero'ing out most of the thread register state including gpr[0], which contains the syscall number. Due to this, we fail to trigger the syscall exit tracepoint properly. Fix this by retaining gpr[0] in the thread register state. Before this patch: # tail /sys/kernel/debug/tracing/trace cat-123 [000] .61.449351: sys_execve(filename: 7fffa6b23448, argv: 7fffa6b233e0, envp: 7fffa6b233f8) cat-124 [000] .62.428481: sys_execve(filename: 7fffa6b23448, argv: 7fffa6b233e0, envp: 7fffa6b233f8) echo-125 [000] .65.813702: sys_execve(filename: 7fffa6b23378, argv: 7fffa6b233a0, envp: 7fffa6b233b0) echo-125 [000] .65.822214: sys_execveat(fd: 0, filename: 1009ac48, argv: 765d0c98, envp: 765d0ca8, flags: 0) After this patch: # tail /sys/kernel/debug/tracing/trace cat-127 [000] . 100.416262: sys_execve(filename: 7fffa41b3448, argv: 7fffa41b33e0, envp: 7fffa41b33f8) cat-127 [000] . 100.418203: sys_execve -> 0x0 echo-128 [000] . 103.873968: sys_execve(filename: 7fffa41b3378, argv: 7fffa41b33a0, envp: 7fffa41b33b0) echo-128 [000] . 103.875102: sys_execve -> 0x0 echo-128 [000] . 103.882097: sys_execveat(fd: 0, filename: 1009ac48, argv: 7fffd10d2148, envp: 7fffd10d2158, flags: 0) echo-128 [000] . 103.883225: sys_execveat -> 0x0 Cc: sta...@vger.kernel.org Signed-off-by: Naveen N. Rao --- arch/powerpc/kernel/process.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index d00b20c6596671..bb4da23ecdd7c2 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -1854,7 +1854,7 @@ void start_thread(struct pt_regs *regs, unsigned long start, unsigned long sp) tm_reclaim_current(0); #endif - memset(regs->gpr, 0, sizeof(regs->gpr)); + memset(>gpr[1], 0, sizeof(regs->gpr) - sizeof(regs->gpr[0])); regs->ctr = 0; regs->link = 0; regs->xer = 0; base-commit: 16332b7fbbe46581ddac80c6d32834c1269bc450 -- 2.36.1
Re: [PATCH] powerpc/pci: Add config option for using OF 'reg' for PCI domain
On Tuesday 24 May 2022 11:17:56 Pali Rohár wrote: > On Friday 06 May 2022 00:33:02 Pali Rohár wrote: > > On Thursday 05 May 2022 15:10:01 Tyrel Datwyler wrote: > > > On 5/5/22 02:31, Pali Rohár wrote: > > > > Hello! > > > > > > > > On Thursday 05 May 2022 07:16:40 Christophe Leroy wrote: > > > >> Le 04/05/2022 à 19:57, Pali Rohár a écrit : > > > >>> Since commit 63a72284b159 ("powerpc/pci: Assign fixed PHB number > > > >>> based on > > > >>> device-tree properties"), powerpc kernel always fallback to PCI domain > > > >>> assignment from OF / Device Tree 'reg' property of the PCI controller. > > > >>> > > > >>> PCI code for other Linux architectures use increasing assignment of > > > >>> the PCI > > > >>> domain for individual controllers (assign the first free number), > > > >>> like it > > > >>> was also for powerpc prior mentioned commit. > > > >>> > > > >>> Upgrading powerpc kernels from LTS 4.4 version (which does not contain > > > >>> mentioned commit) to new LTS versions brings a regression in domain > > > >>> assignment. > > > >> > > > >> Can you elaborate why it is a regression ? > > > >>63a72284b159 > > > >> That commit says 'no functionnal changes', I'm having hard time > > > >> understanding how a nochange can be a regression. > > > > > > > > It is not 'no functional change'. That commit completely changed PCI > > > > domain assignment in a way that is incompatible with other architectures > > > > and also incompatible with the way how it was done prior that commit. > > > > > > I agree that the "no functional change" statement is incorrect. However, > > > for > > > most powerpc platforms it ended up being simply a cosmetic behavior > > > change. As > > > far as I can tell there is nothing requiring domain ids to increase > > > montonically > > > from zero or that each architecture is required to use the same domain > > > numbering > > > scheme. > > > > That is truth. But it looks really suspicious why domains are not > > assigned monotonically. Some scripts / applications are using PCI > > location (domain:bus:dev:func) for remembering PCI device and domain > > change can cause issue for config files. And some (older) applications > > expects existence of domain zero. In systems without hot plug support > > with small number of domains (e.g. 3) it means that there are always > > domains 0, 1 and 2. > > > > > Its hard to call this a true regression unless it actually broke > > > something. The commit in question has been in the kernel since 4.8 which > > > was > > > released over 5 1/2 years ago. > > > > I agree, it really depends on how you look at it. > > > > The important is that lot of people are using LTS versions and are doing > > upgrades when LTS support is dropped. Which for 4.4 now happened. So not > > all smaller or "cosmetic" changes could be detected until longer LTS > > period pass. > > > > > With all that said looking closer at the code in question I think it is > > > fair to > > > assume that the author only intended this change for powernv and pseries > > > platforms and not every powerpc platform. That change was done to make > > > persistent naming easier to manage in userspace. > > > > I agree that this behavior change may be useful in some situations and I > > do not object this need. > > > > > Your change defaults back to > > > the old behavior which will now break both powernv and pseries platforms > > > with > > > regard to hotplugging and persistent naming. > > > > I was aware of it, that change could cause issues. And that is why I > > added config option for choosing behavior. So users would be able to > > choose what they need. > > > > > We could properly limit it to powernv and pseries by using ibm,fw-phb-id > > > instead > > > of reg property in the look up that follows a failed ibm,opal-phbid > > > lookup. I > > > think this is acceptable as long as no other powerpc platforms have > > > started > > > using this behavior for persistent naming. > > > > And what about setting that new config option to enabled by default for > > those series? > > > > Or is there issue with introduction of the new config option? > > PING? Any opinion? PING? > > One of the point is that it is really a good idea to have similar/same > > behavior for all linux platforms. And if it cannot be enabled by default > > (for backward compatibility) add at least some option, so new platforms > > can start using it or users can decide to switch behavior. > > > > > -Tyrel > > > > > > > For example, prior that commit on P2020 RDB board were PCI domains 0, 1 > > > > and 2. > > > > > > > > $ lspci > > > > :00:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21) > > > > :01:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB > > > > 3.0 xHCI Host Controller (rev 02) > > > > 0001:02:00.0 PCI bridge: Freescale Semiconductor Inc P2020E (rev 21) > > > > 0001:03:00.0 Network controller: Qualcomm Atheros AR93xx Wireless > > > > Network Adapter
Re: [PATCH] powerpc: dts: Add DTS file for CZ.NIC Turris 1.x routers
On Tuesday 24 May 2022 11:23:32 Pali Rohár wrote: > On Wednesday 11 May 2022 16:37:12 Pali Rohár wrote: > > CZ.NIC Turris 1.0 and 1.1 are open source routers, they have dual-core > > PowerPC Freescale P2020 CPU and are based on Freescale P2020RDB-PC-A board. > > Hardware design is fully open source, all firmware and hardware design > > files are available at Turris project website: > > > > https://docs.turris.cz/hw/turris-1x/turris-1x/ > > https://project.turris.cz/en/hardware.html > > > > Signed-off-by: Pali Rohár > > --- > > arch/powerpc/boot/dts/turris1x.dts | 470 + > > 1 file changed, 470 insertions(+) > > create mode 100644 arch/powerpc/boot/dts/turris1x.dts > > Michael, Rob: PING? PING? > > diff --git a/arch/powerpc/boot/dts/turris1x.dts > > b/arch/powerpc/boot/dts/turris1x.dts > > new file mode 100644 > > index ..2a624f117586 > > --- /dev/null > > +++ b/arch/powerpc/boot/dts/turris1x.dts > > @@ -0,0 +1,470 @@ > > +// SPDX-License-Identifier: GPL-2.0+ > > +/* > > + * Turris 1.x Device Tree Source > > + * > > + * Copyright 2013 - 2022 CZ.NIC z.s.p.o. (http://www.nic.cz/) > > + * > > + * Pinout, Schematics and Altium hardware design files are open source > > + * and available at: https://docs.turris.cz/hw/turris-1x/turris-1x/ > > + */ > > + > > +#include > > +#include > > +#include > > +/include/ "fsl/p2020si-pre.dtsi" > > + > > +/ { > > + model = "Turris 1.x"; > > + compatible = "cznic,turris1x", "fsl,P2020RDB-PC"; /* fsl,P2020RDB-PC is > > required for booting Linux */ > > + > > + aliases { > > + ethernet0 = > > + ethernet1 = > > + ethernet2 = > > + serial0 = > > + serial1 = > > + pci0 = > > + pci1 = > > + pci2 = > > + spi0 = > > + }; > > + > > + memory { > > + device_type = "memory"; > > + }; > > + > > + soc: soc@ffe0 { > > + ranges = <0x0 0x0 0xffe0 0x0010>; > > + > > + i2c@3000 { > > + /* PCA9557PW GPIO controller for boot config */ > > + gpio-controller@18 { > > + compatible = "nxp,pca9557"; > > + label = "bootcfg"; > > + reg = <0x18>; > > + #gpio-cells = <2>; > > + gpio-controller; > > + polarity = <0x00>; > > + }; > > + > > + /* STM32F030R8T6 MCU for power control */ > > + power-control@32 { > > + /* > > +* Turris Power Control firmware runs on > > STM32F0 MCU. > > +* This firmware is open source and available > > at: > > +* > > https://gitlab.nic.cz/turris/hw/turris_power_control > > +*/ > > + reg = <0x32>; > > + }; > > + > > + /* SA56004ED temperature control */ > > + temperature-sensor@4c { > > + compatible = "nxp,sa56004"; > > + reg = <0x4c>; > > + interrupt-parent = <>; > > + interrupts = <12 IRQ_TYPE_LEVEL_LOW>, /* GPIO12 > > - ALERT pin */ > > +<13 IRQ_TYPE_LEVEL_LOW>; /* GPIO13 > > - CRIT pin */ > > + }; > > + > > + /* DDR3 SPD/EEPROM */ > > + eeprom@52 { > > + compatible = "atmel,spd"; > > + reg = <0x52>; > > + }; > > + > > + /* ATSHA204-TH-DA-T crypto module */ > > + crypto@64 { > > + compatible = "atmel,atsha204"; > > + reg = <0x64>; > > + }; > > + > > + /* IDT6V49205BNLGI clock generator */ > > + clock-generator@69 { > > + compatible = "idt,6v49205b"; > > + reg = <0x69>; > > + }; > > + > > + /* MCP79402-I/ST Protected EEPROM */ > > + eeprom@57 { > > + reg = <0x57>; > > + }; > > + > > + /* MCP79402-I/ST RTC */ > > + rtc@6f { > > + compatible = "microchip,mcp7940x"; > > + reg = <0x6f>; > > + interrupt-parent = <>; > > + interrupts = <14 0>; /* GPIO14 - MFP pin */ > > + }; > > + }; > > + > > + /* SPI on connector P1 */ > > + spi0: spi@7000 { > > + }; > > + > > + gpio: gpio-controller@fc00 { > > + #interrupt-cells = <2>; > > +
[PATCH v2 2/3] powerpc/irq: Perform stack_overflow detection after switching to IRQ stack
When KASAN is enabled, as shown by the Oops below, the 2k limit is not enough to allow stack dump after a stack overflow detection when CONFIG_DEBUG_STACKOVERFLOW is selected: do_IRQ: stack overflow: 1984 CPU: 0 PID: 126 Comm: systemd-udevd Not tainted 5.18.0-gentoo-PMacG4 #1 Call Trace: Oops: Kernel stack overflow, sig: 11 [#1] BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 PowerMac Modules linked in: sr_mod cdrom radeon(+) ohci_pci(+) hwmon i2c_algo_bit drm_ttm_helper ttm drm_dp_helper snd_aoa_i2sbus snd_aoa_soundbus snd_pcm ehci_pci snd_timer ohci_hcd snd ssb ehci_hcd 8250_pci soundcore drm_kms_helper pcmcia 8250 pcmcia_core syscopyarea usbcore sysfillrect 8250_base sysimgblt serial_mctrl_gpio fb_sys_fops usb_common pkcs8_key_parser fuse drm drm_panel_orientation_quirks configfs CPU: 0 PID: 126 Comm: systemd-udevd Not tainted 5.18.0-gentoo-PMacG4 #1 NIP: c02e5558 LR: c07eb3bc CTR: c07f46a8 REGS: e7fe9f50 TRAP: Not tainted (5.18.0-gentoo-PMacG4) MSR: 1032 CR: 44a14824 XER: 2000 GPR00: c07eb3bc eaa1c000 c26baea0 eaa1c0a0 0008 c07eb3bc eaa1c010 GPR08: eaa1c0a8 04f3f3f3 f1f1f1f1 c07f4c84 44a14824 0080f7e4 0005 0010 GPR16: 0025 eaa1c154 eaa1c158 c0dbad64 0020 fd543810 eaa1c0a0 eaa1c29e GPR24: c0dbad44 c0db8740 05ff fd543802 eaa1c150 c0c9a3c0 eaa1c0a0 c0c9a3c0 NIP [c02e5558] kasan_check_range+0xc/0x2b4 LR [c07eb3bc] format_decode+0x80/0x604 Call Trace: [eaa1c000] [c07eb3bc] format_decode+0x80/0x604 (unreliable) [eaa1c070] [c07f4dac] vsnprintf+0x128/0x938 [eaa1c110] [c07f5788] sprintf+0xa0/0xc0 [eaa1c180] [c0154c1c] __sprint_symbol.constprop.0+0x170/0x198 [eaa1c230] [c07ee71c] symbol_string+0xf8/0x260 [eaa1c430] [c07f46d0] pointer+0x15c/0x710 [eaa1c4b0] [c07f4fbc] vsnprintf+0x338/0x938 [eaa1c550] [c00e8fa0] vprintk_store+0x2a8/0x678 [eaa1c690] [c00e94e4] vprintk_emit+0x174/0x378 [eaa1c6d0] [c00ea008] _printk+0x9c/0xc0 [eaa1c750] [c000ca94] show_stack+0x21c/0x260 [eaa1c7a0] [c07d0bd4] dump_stack_lvl+0x60/0x90 [eaa1c7c0] [c0009234] __do_IRQ+0x170/0x174 [eaa1c800] [c0009258] do_IRQ+0x20/0x34 [eaa1c820] [c00045b4] HardwareInterrupt_virt+0x108/0x10c ... As the detection is asynchronously performed at IRQs, we could be long after the limit has been crossed, so increasing the limit would not solve the problem completely. In order to be sure that there is enough stack space for the stack dump, do it after the switch to the IRQ stack. That way it is sure that the stack is large enough, unless the IRQ stack has been overfilled in which case the end of life is close. Reported-by: Erhard Furtner Signed-off-by: Christophe Leroy --- v2: Use provided 'sp' instead of overwritting it with current stack pointer --- arch/powerpc/kernel/irq.c | 15 +++ 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c index 370434f6c316..e5b7fb5282ee 100644 --- a/arch/powerpc/kernel/irq.c +++ b/arch/powerpc/kernel/irq.c @@ -184,14 +184,12 @@ u64 arch_irq_stat_cpu(unsigned int cpu) return sum; } -static inline void check_stack_overflow(void) +static inline void check_stack_overflow(unsigned long sp) { - long sp; - if (!IS_ENABLED(CONFIG_DEBUG_STACKOVERFLOW)) return; - sp = current_stack_pointer & (THREAD_SIZE - 1); + sp &= THREAD_SIZE - 1; /* check for stack overflow: is there less than 2KB free? */ if (unlikely(sp < 2048)) { @@ -221,12 +219,14 @@ static __always_inline void call_do_softirq(const void *sp) DEFINE_STATIC_CALL_RET0(ppc_get_irq, *ppc_md.get_irq); -static void __do_irq(struct pt_regs *regs) +static void __do_irq(struct pt_regs *regs, unsigned long oldsp) { unsigned int irq; trace_irq_entry(regs); + check_stack_overflow(oldsp); + /* * Query the platform PIC for the interrupt & ack it. * @@ -254,6 +254,7 @@ static __always_inline void call_do_irq(struct pt_regs *regs, void *sp) /* Temporarily switch r1 to sp, call __do_irq() then restore r1. */ asm volatile ( PPC_STLU " %%r1, %[offset](%[sp]) ;" + "mr %%r4, %%r1 ;" "mr %%r1, %[sp] ;" "bl %[callee] ;" PPC_LL " %%r1, 0(%%r1) ;" @@ -279,11 +280,9 @@ void __do_IRQ(struct pt_regs *regs) irqsp = hardirq_ctx[raw_smp_processor_id()]; sirqsp = softirq_ctx[raw_smp_processor_id()]; - check_stack_overflow(); - /* Already there ? */ if (unlikely(cursp == irqsp || cursp == sirqsp)) { - __do_irq(regs); + __do_irq(regs,
[PATCH v2 1/3] powerpc/irq: Make __do_irq() static
Since commit 48cf12d88969 ("powerpc/irq: Inline call_do_irq() and call_do_softirq()"), __do_irq() is not used outside irq.c Reorder functions and make __do_irq() static and drop the declaration in irq.h. Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/irq.h | 1 - arch/powerpc/kernel/irq.c | 46 +- 2 files changed, 23 insertions(+), 24 deletions(-) diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h index 13f0409dd617..5c1516a5ba8f 100644 --- a/arch/powerpc/include/asm/irq.h +++ b/arch/powerpc/include/asm/irq.h @@ -54,7 +54,6 @@ extern void *softirq_ctx[NR_CPUS]; void __do_IRQ(struct pt_regs *regs); extern void __init init_IRQ(void); -extern void __do_irq(struct pt_regs *regs); int irq_choose_cpu(const struct cpumask *mask); diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c index 873e6dffb868..370434f6c316 100644 --- a/arch/powerpc/kernel/irq.c +++ b/arch/powerpc/kernel/irq.c @@ -219,31 +219,9 @@ static __always_inline void call_do_softirq(const void *sp) ); } -static __always_inline void call_do_irq(struct pt_regs *regs, void *sp) -{ - register unsigned long r3 asm("r3") = (unsigned long)regs; - - /* Temporarily switch r1 to sp, call __do_irq() then restore r1. */ - asm volatile ( -PPC_STLU " %%r1, %[offset](%[sp]) ;" - "mr %%r1, %[sp] ;" - "bl %[callee] ;" -PPC_LL " %%r1, 0(%%r1) ;" -: // Outputs - "+r" (r3) -: // Inputs - [sp] "b" (sp), [offset] "i" (THREAD_SIZE - STACK_FRAME_OVERHEAD), - [callee] "i" (__do_irq) -: // Clobbers - "lr", "xer", "ctr", "memory", "cr0", "cr1", "cr5", "cr6", - "cr7", "r0", "r4", "r5", "r6", "r7", "r8", "r9", "r10", - "r11", "r12" - ); -} - DEFINE_STATIC_CALL_RET0(ppc_get_irq, *ppc_md.get_irq); -void __do_irq(struct pt_regs *regs) +static void __do_irq(struct pt_regs *regs) { unsigned int irq; @@ -269,6 +247,28 @@ void __do_irq(struct pt_regs *regs) trace_irq_exit(regs); } +static __always_inline void call_do_irq(struct pt_regs *regs, void *sp) +{ + register unsigned long r3 asm("r3") = (unsigned long)regs; + + /* Temporarily switch r1 to sp, call __do_irq() then restore r1. */ + asm volatile ( +PPC_STLU " %%r1, %[offset](%[sp]) ;" + "mr %%r1, %[sp] ;" + "bl %[callee] ;" +PPC_LL " %%r1, 0(%%r1) ;" +: // Outputs + "+r" (r3) +: // Inputs + [sp] "b" (sp), [offset] "i" (THREAD_SIZE - STACK_FRAME_OVERHEAD), + [callee] "i" (__do_irq) +: // Clobbers + "lr", "xer", "ctr", "memory", "cr0", "cr1", "cr5", "cr6", + "cr7", "r0", "r4", "r5", "r6", "r7", "r8", "r9", "r10", + "r11", "r12" + ); +} + void __do_IRQ(struct pt_regs *regs) { struct pt_regs *old_regs = set_irq_regs(regs); -- 2.35.3
[PATCH v2 3/3] powerpc/irq: Simplify __do_irq()
Remove duplicated code by implementing a proper if/else. Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/irq.c | 11 --- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c index e5b7fb5282ee..b007f16ccbd3 100644 --- a/arch/powerpc/kernel/irq.c +++ b/arch/powerpc/kernel/irq.c @@ -280,14 +280,11 @@ void __do_IRQ(struct pt_regs *regs) irqsp = hardirq_ctx[raw_smp_processor_id()]; sirqsp = softirq_ctx[raw_smp_processor_id()]; - /* Already there ? */ - if (unlikely(cursp == irqsp || cursp == sirqsp)) { + /* Already there ? If not switch stack and call */ + if (unlikely(cursp == irqsp || cursp == sirqsp)) __do_irq(regs, current_stack_pointer); - set_irq_regs(old_regs); - return; - } - /* Switch stack and call */ - call_do_irq(regs, irqsp); + else + call_do_irq(regs, irqsp); set_irq_regs(old_regs); } -- 2.35.3
Re: [PATCH] powerpc/ptrace: Fix buffer overflow when handling PTRACE_PEEKUSER and PTRACE_POKEUSER
Hi Michael, Le 06/06/2022 à 16:45, Michael Ellerman a écrit : > Hi Ariel, > > I've added Christophe to Cc who works on ppc32. I've added powerpc list > > I haven't actually reproduced the crash with gdbserver, but I have a > test case which shows the bug, so I've been able to confirm it and > test a fix. > > Thanks for your patch, but I wanted to fix it differently. Can you try > the patch below and make sure it fixes the bug for you? > > I've also attached the test case I've been using. > > Christophe are you able to test these on some 32-bit machines? I've > tested it in qemu and on one 32-bit machine I have here, but some more > real testing would be good. Yes I will test it but my CPUs have no FPU so it will be with the kernel software Math emulation. But it should make no difference I guess ? Christophe > > If the patch works then I'll need to do manual back ports for several of > the stable kernels, and then once those are ready I will publish the > patch. > > cheers > > > ---8<--- > From eaa9a32fe38d8722d2da8773965309365805d66d Mon Sep 17 00:00:00 2001 > From: Michael Ellerman > Date: Tue, 7 Jun 2022 00:34:56 +1000 > Subject: [PATCH] powerpc/32: Fix overread/overwrite of thread_struct via > ptrace > > The ptrace PEEKUSR/POKEUSR (aka PEEKUSER/POKEUSER) API allows a process > to read/write registers of another process. > > To get/set a register, the API takes an index into an imaginary address > space called the "USER area", where the registers of the process are > laid out in some fashion. > > The kernel then maps that index to a particular register in its own data > structures and gets/sets the value. > > The API only allows a single machine-word to be read/written at a time. > So 4 bytes on 32-bit kernels and 8 bytes on 64-bit kernels. > > The way floating point registers (FPRs) are addressed is somewhat > complicated, because double precision float values are 64-bit even on > 32-bit CPUs. That means on 32-bit kernels each FPR occupies two > word-sized locations in the USER area. On 64-bit kernels each FPR > occupies one word-sized location in the USER area. > > Internally the kernel stores the FPRs in an array of u64s, or if VSX is > enabled, an array of pairs of u64s where one half of each pair stores > the FPR. Which half of the pair stores the FPR depends on the kernel's > endianness. > > To handle the different layouts of the FPRs depending on VSX/no-VSX and > big/little endian, the TS_FPR() macro was introduced. > > Unfortunately the TS_FPR() macro does not take into account the fact > that the addressing of each FPR differs between 32-bit and 64-bit > kernels. It just takes the index into the "USER area" passed from > userspace and indexes into the fp_state.fpr array. > > On 32-bit there are 64 indexes that address FPRs, but only 32 entries in > the fp_state.fpr array, meaning the user can read/write 256 bytes past > the end of the array. Because the fp_state sits in the middle of the > thread_struct there are various fields than can be overwritten, > including some pointers. As such it is probably exploitable. > > It has also been observed to cause systems to hang or otherwise > misbehave when using gdbserver, and is probably the root cause of this > report which could not be easily reproduced: > > https://lore.kernel.org/linuxppc-dev/dc38afe9-6b78-f3f5-666b-986939e40...@keymile.com/ > > Rather than trying to make the TS_FPR() macro even more complicated to > fix the bug, or add more macros, instead add a special-case for 32-bit > kernels. This is more obvious and hopefully avoids a similar bug > happening again in future. Note that because 32-bit kernels never have > VSX enabled the code doesn't need to consider TS_FPRWIDTH/OFFSET at all. > > Fixes: 87fec0514f61 ("powerpc: PTRACE_PEEKUSR/PTRACE_POKEUSER of FPR > registers in little endian builds") > Cc: sta...@vger.kernel.org # v3.13+ > Reported-by: Ariel Miculas > Signed-off-by: Michael Ellerman > --- > arch/powerpc/kernel/ptrace/ptrace-fpu.c | 20 ++-- > 1 file changed, 14 insertions(+), 6 deletions(-) > > diff --git a/arch/powerpc/kernel/ptrace/ptrace-fpu.c > b/arch/powerpc/kernel/ptrace/ptrace-fpu.c > index 5dca19361316..f406436a0f6c 100644 > --- a/arch/powerpc/kernel/ptrace/ptrace-fpu.c > +++ b/arch/powerpc/kernel/ptrace/ptrace-fpu.c > @@ -17,9 +17,13 @@ int ptrace_get_fpr(struct task_struct *child, int index, > unsigned long *data) > > #ifdef CONFIG_PPC_FPU_REGS > flush_fp_to_thread(child); > - if (fpidx < (PT_FPSCR - PT_FPR0)) > - memcpy(data, >thread.TS_FPR(fpidx), sizeof(long)); > - else > + if (fpidx < (PT_FPSCR - PT_FPR0)) { > + if (IS_ENABLED(CONFIG_PPC32)) > + // The 32-bit ptrace API addresses the FPRs as 32-bit > words > + *data = ((u32 *)child->thread.fp_state.fpr)[fpidx]; > + else > + memcpy(data, >thread.TS_FPR(fpidx), > sizeof(long)); > +
Re: [PATCH 0/2] Disabling NMI watchdog during LPM's memory transfer
On 09/06/2022, 09:45:49, Michael Ellerman wrote: > Nathan Lynch writes: >> Laurent Dufour writes: > ... >> >>> There are ongoing investigations to clarify where and how this latency is >>> happening. I'm not excluding any other issue in the Linux kernel, but right >>> now, this looks to be the best option to prevent system crash during >>> LPM. >> >> It will prevent the likely crash mode for enterprise distros with >> default watchdog tunables that our internal test environments happen to >> use. But if someone were to run the same scenario with softlockup_panic >> enabled, or with the RCU stall timeout lower than the watchdog >> threshold, the failure mode would be different. >> >> Basically I'm saying: >> * Some users may actually want the OS to panic when it's in this state, >> because their applications can't work correctly. >> * But if we're going to inhibit one watchdog, we should inhibit them >> all. > > I'm sympathetic to both of your arguments. > > But I think there is a key difference between the NMI watchdog and other > watchdogs, which is that the NMI watchdog will use the unsafe NMI to > interrupt other CPUs, and that can cause the system to crash when other > watchdogs would just print a backtrace. > > We had the same problem with the rcu_sched stall detector until we > changed it to use the "safe" NMI, see: > 5cc05910f26e ("powerpc/64s: Wire up arch_trigger_cpumask_backtrace()") > > > So even if the NMI watchdog is disabled there are still the other > watchdogs enabled, which should print backtraces by default, and if > desired can also be configured to cause a panic. > > Instead of disabling the NMI watchdog, can we instead increase the > timeout (by how much?) during LPM, so that it is less likely to fire in > normal usage, but is still there as a backup if the system is completely > clogged. That's probably doable, tweaking wd_smp_panic_timeout_tb and wd_panic_timeout_tb when the LPM is in progress. I'll add a new sysctl value, so administrator will have the capability to change that and also fully disable the NMI watchdog during LPM if he want. I've no idea what should be the default factor, I guess this will be a bit empiric. I'll rework my patch in that way. cheers, Laurent.
Re: [PATCH 0/2] Disabling NMI watchdog during LPM's memory transfer
Nathan Lynch writes: > Laurent Dufour writes: ... > >> There are ongoing investigations to clarify where and how this latency is >> happening. I'm not excluding any other issue in the Linux kernel, but right >> now, this looks to be the best option to prevent system crash during >> LPM. > > It will prevent the likely crash mode for enterprise distros with > default watchdog tunables that our internal test environments happen to > use. But if someone were to run the same scenario with softlockup_panic > enabled, or with the RCU stall timeout lower than the watchdog > threshold, the failure mode would be different. > > Basically I'm saying: > * Some users may actually want the OS to panic when it's in this state, > because their applications can't work correctly. > * But if we're going to inhibit one watchdog, we should inhibit them > all. I'm sympathetic to both of your arguments. But I think there is a key difference between the NMI watchdog and other watchdogs, which is that the NMI watchdog will use the unsafe NMI to interrupt other CPUs, and that can cause the system to crash when other watchdogs would just print a backtrace. We had the same problem with the rcu_sched stall detector until we changed it to use the "safe" NMI, see: 5cc05910f26e ("powerpc/64s: Wire up arch_trigger_cpumask_backtrace()") So even if the NMI watchdog is disabled there are still the other watchdogs enabled, which should print backtraces by default, and if desired can also be configured to cause a panic. Instead of disabling the NMI watchdog, can we instead increase the timeout (by how much?) during LPM, so that it is less likely to fire in normal usage, but is still there as a backup if the system is completely clogged. cheers