[PATCH] powerpc: add PPC_FEATURE userspace bits for SCV and DARN instructions
Providing "scv" support to userspace requires kernel support, so it must be advertised as independently to the base ISA 3 instruction set. The darn instruction relies on firmware enablement, so it has been decided to split this out from the core ISA 3 feature as well. Signed-off-by: Nicholas Piggin--- These uapi changes have been agreed by powerpc toolchain and firmware teams. I believe this completes our anticipated requirements for user feature advertisement for ISA v3.0B. arch/powerpc/include/uapi/asm/cputable.h | 2 ++ arch/powerpc/kernel/cputable.c | 3 ++- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/uapi/asm/cputable.h b/arch/powerpc/include/uapi/asm/cputable.h index 3e7ce86d5c13..4d877144f377 100644 --- a/arch/powerpc/include/uapi/asm/cputable.h +++ b/arch/powerpc/include/uapi/asm/cputable.h @@ -46,6 +46,8 @@ #define PPC_FEATURE2_HTM_NOSC 0x0100 #define PPC_FEATURE2_ARCH_3_00 0x0080 /* ISA 3.00 */ #define PPC_FEATURE2_HAS_IEEE128 0x0040 /* VSX IEEE Binary Float 128-bit */ +#define PPC_FEATURE2_DARN 0x0020 /* darn random number insn */ +#define PPC_FEATURE2_SCV 0x0010 /* scv syscall */ /* * IMPORTANT! diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c index 9b3e88b1a9c8..6f849832a669 100644 --- a/arch/powerpc/kernel/cputable.c +++ b/arch/powerpc/kernel/cputable.c @@ -124,7 +124,8 @@ extern void __restore_cpu_e6500(void); #define COMMON_USER_POWER9 COMMON_USER_POWER8 #define COMMON_USER2_POWER9(COMMON_USER2_POWER8 | \ PPC_FEATURE2_ARCH_3_00 | \ -PPC_FEATURE2_HAS_IEEE128) +PPC_FEATURE2_HAS_IEEE128 | \ +PPC_FEATURE2_DARN ) #ifdef CONFIG_PPC_BOOK3E_64 #define COMMON_USER_BOOKE (COMMON_USER_PPC64 | PPC_FEATURE_BOOKE) -- 2.11.0
Re: [RFC] arch hardlockup detector interfaces improvement
On Fri, 19 May 2017 15:43:45 -0400 Don Zickuswrote: > On Sat, May 20, 2017 at 12:53:06AM +1000, Nicholas Piggin wrote: > > > I am curious to know what IBM thinks there. Currently the HARDLOCKUP > > > detector sits on top of perf. I get the impression, you are removing that > > > dependency. Is that a permanent thing or are you thinking of switching > > > back > > > and forth depending on if SOFTLOCKUP is enabled or not? > > > > We want to get away from perf permanently. > > > > The PMU interrupts are not specially non-maskable from a hardware > > POV, everything gets masked when you turn off interrupts in hardware. > > powerpc arch code implements a software disable layer, and PMU > > interrupts are differentiated there by being allowed to run even > > under local_irq_disable(); > > > > We have a few issues with using perf for it. We disable it by > > default because using it for that breaks another PMU feature. > > > > But PMU interupts are not special, so it would be possible to e.g., > > take the timer interrupt before soft disable and have it touch the > > watchdog if it fires while under local_irq_disable(). That give > > exact same kind of pseudo-NMI as perf interrupts, without using PMU. > > > > Further, we now want to introduce a local_pmu_disable() type of > > interface that extends this soft disable layer to perf interrupts > > as well for some cases. Once we start doing that, more code will > > be exempt from the hardlockup watchdog, whereas a watchdog specific > > hook from the timer interrupt would still cover it. > > Interesting. Thanks for that info. > > Do you think you can split the patch in half for me? You are shuffling code > around and making subtle changes in the shuffled code. It is hard to double > check everything. Namely watchdog_nmi_reconfigure and watchdog_update_cpus > suddenly appear. > > The first patch would be straight code movement, the second the real > changes. I almost don't care if you throw in the LOCKUP->SOFTLOCKUP changes > in the first patch either. > > I am really interested in the subtle changes to make sure you covered the > various race conditions that we have uncovered over the years. Yes that makes sense, I'll do that. > I also would like to see a sample of the watchdog_nmi_reconfigure() you had > in mind. Usually we hide all the nmi stuff underneath the watchdog > functions. But those are threaded, which is what you are trying to avoid, > so the placement of the function makes sense but looks odd. I'm just calling it after any sysctl changes or suspend/resume etc, and the lockup detector I have basically just shuts down everything and then re-starts with the new parameters (could be made much fancier but I didn't see a need. I will split out this patch and re-post to you with the powerpc patch in the series that you can take a look at. Thanks, Nick
[PATCH] powernv/npu-dma.c: Fix opal_npu_destroy_context call
opal_npu_destroy_context() should be called with the NPU PHB, not the PCIe PHB. Signed-off-by: Alistair Popple--- arch/powerpc/platforms/powernv/npu-dma.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c index 067defe..78fa939 100644 --- a/arch/powerpc/platforms/powernv/npu-dma.c +++ b/arch/powerpc/platforms/powernv/npu-dma.c @@ -714,7 +714,7 @@ static void pnv_npu2_release_context(struct kref *kref) void pnv_npu2_destroy_context(struct npu_context *npu_context, struct pci_dev *gpdev) { - struct pnv_phb *nphb, *phb; + struct pnv_phb *nphb; struct npu *npu; struct pci_dev *npdev = pnv_pci_get_npu_dev(gpdev, 0); struct device_node *nvlink_dn; @@ -728,13 +728,12 @@ void pnv_npu2_destroy_context(struct npu_context *npu_context, nphb = pci_bus_to_host(npdev->bus)->private_data; npu = >npu; - phb = pci_bus_to_host(gpdev->bus)->private_data; nvlink_dn = of_parse_phandle(npdev->dev.of_node, "ibm,nvlink", 0); if (WARN_ON(of_property_read_u32(nvlink_dn, "ibm,npu-link-index", _index))) return; npu_context->npdev[npu->index][nvlink_index] = NULL; - opal_npu_destroy_context(phb->opal_id, npu_context->mm->context.id, + opal_npu_destroy_context(nphb->opal_id, npu_context->mm->context.id, PCI_DEVID(gpdev->bus->number, gpdev->devfn)); kref_put(_context->kref, pnv_npu2_release_context); } -- 2.1.4
Re: [RFC] arch hardlockup detector interfaces improvement
On Sat, May 20, 2017 at 12:53:06AM +1000, Nicholas Piggin wrote: > > I am curious to know what IBM thinks there. Currently the HARDLOCKUP > > detector sits on top of perf. I get the impression, you are removing that > > dependency. Is that a permanent thing or are you thinking of switching back > > and forth depending on if SOFTLOCKUP is enabled or not? > > We want to get away from perf permanently. > > The PMU interrupts are not specially non-maskable from a hardware > POV, everything gets masked when you turn off interrupts in hardware. > powerpc arch code implements a software disable layer, and PMU > interrupts are differentiated there by being allowed to run even > under local_irq_disable(); > > We have a few issues with using perf for it. We disable it by > default because using it for that breaks another PMU feature. > > But PMU interupts are not special, so it would be possible to e.g., > take the timer interrupt before soft disable and have it touch the > watchdog if it fires while under local_irq_disable(). That give > exact same kind of pseudo-NMI as perf interrupts, without using PMU. > > Further, we now want to introduce a local_pmu_disable() type of > interface that extends this soft disable layer to perf interrupts > as well for some cases. Once we start doing that, more code will > be exempt from the hardlockup watchdog, whereas a watchdog specific > hook from the timer interrupt would still cover it. Interesting. Thanks for that info. Do you think you can split the patch in half for me? You are shuffling code around and making subtle changes in the shuffled code. It is hard to double check everything. Namely watchdog_nmi_reconfigure and watchdog_update_cpus suddenly appear. The first patch would be straight code movement, the second the real changes. I almost don't care if you throw in the LOCKUP->SOFTLOCKUP changes in the first patch either. I am really interested in the subtle changes to make sure you covered the various race conditions that we have uncovered over the years. I also would like to see a sample of the watchdog_nmi_reconfigure() you had in mind. Usually we hide all the nmi stuff underneath the watchdog functions. But those are threaded, which is what you are trying to avoid, so the placement of the function makes sense but looks odd. Thanks! Cheers, Don
[PATCH] drivers/rtc/interface.c: Validate alarm-time before handling rollover
In function __rtc_read_alarm() its possible for an alarm time-stamp to be invalid even after replacing missing components with current time-stamp. The condition 'alarm->time.tm_year < 70' will trigger this case and will cause the call to 'rtc_tm_to_time64(>time)' return a negative value for variable t_alm. While handling alarm rollover this negative t_alm (assumed to seconds offset from '1970-01-01 00:00:00') is converted back to rtc_time via rtc_time64_to_tm() which results in this error log with seemingly garbage values: "rtc rtc0: invalid alarm value: -2-1--1041528741 200557:71582844:32" This error was generated when the rtc driver (rtc-opal in this case) returned an alarm time-stamp of '00-00-00 00:00:00' to indicate that the alarm is disabled. Though I have submitted a separate fix for the rtc-opal driver, this issue may potentially impact other existing/future rtc drivers. To fix this issue the patch validates the alarm time-stamp just after filling up the missing datetime components and if rtc_valid_tm() still reports it to be invalid then bails out of the function without handling the rollover. Reported-by: Steve BestSigned-off-by: Vaibhav Jain --- drivers/rtc/interface.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/rtc/interface.c b/drivers/rtc/interface.c index fc0fa75..8cec9a0 100644 --- a/drivers/rtc/interface.c +++ b/drivers/rtc/interface.c @@ -227,6 +227,13 @@ int __rtc_read_alarm(struct rtc_device *rtc, struct rtc_wkalrm *alarm) missing = year; } + /* Can't proceed if alarm is still invalid after replacing +* missing fields. +*/ + err = rtc_valid_tm(>time); + if (err) + goto done; + /* with luck, no rollover is needed */ t_now = rtc_tm_to_time64(); t_alm = rtc_tm_to_time64(>time); @@ -278,9 +285,9 @@ int __rtc_read_alarm(struct rtc_device *rtc, struct rtc_wkalrm *alarm) dev_warn(>dev, "alarm rollover not handled\n"); } -done: err = rtc_valid_tm(>time); +done: if (err) { dev_warn(>dev, "invalid alarm value: %d-%d-%d %d:%d:%d\n", alarm->time.tm_year + 1900, alarm->time.tm_mon + 1, -- 2.9.3
Re: [PATCH 4/4] arch/powerpc/44x/fsp2: wdt tcr update instead of whole rewrite
Hi Michael, >> diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c >> index 2b33cfa..f75e512 100644 >> --- a/arch/powerpc/kernel/time.c >> +++ b/arch/powerpc/kernel/time.c >> @@ -738,12 +738,28 @@ static int __init get_freq(char *name, int cells, >> unsigned long *val) >> >> static void start_cpu_decrementer(void) >> { >> +unsigned int tcr; >> #if defined(CONFIG_BOOKE) || defined(CONFIG_40x) >> /* Clear any pending timer interrupts */ >> mtspr(SPRN_TSR, TSR_ENW | TSR_WIS | TSR_DIS | TSR_FIS); >> >> +#ifdef CONFIG_FSP2 >> +/* >> + * Prevent a kernel panic caused by unintentionally clearing TCR >> + * watchdog bits. At this point in the kernel boot, the watchdog has >> + * already been enabled by u-boot. The original code's attempt to > > Don't refer to "the original code", as it doesn't exist anymore now that > we've patched it. Just say something like ".. so we must not clear the > watchdog configuration bits". Ok, got it. >That breaks the build for other platforms: > > arch/powerpc/kernel/time.c: In function ‘start_cpu_decrementer’: > arch/powerpc/kernel/time.c:741:15: error: unused variable ‘tcr’ > [-Werror=unused-variable] > Oops, didn't notice, my fault. >Or you could possibly just always leave TCR[WP], is there any case where >it would be correct to clear that? > >cheers >From my point of view it's possible. I've checked docu and on idea it should be possible cause WP is only affecting watchdog ping time. Which in case of '00' is very small, around ~5 ms. Ben also in next message said about get rid of ifdef for FSP2. And now patch looks like this, What do you think Michael, Ben? arch/powerpc/kernel/time.c | 15 +-- 1 files changed, 13 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c index bc2e08d..2411c49 100644 --- a/arch/powerpc/kernel/time.c +++ b/arch/powerpc/kernel/time.c @@ -718,11 +718,22 @@ static int __init get_freq(char *name, int cells, unsigned long *val) static void start_cpu_decrementer(void) { #if defined(CONFIG_BOOKE) || defined(CONFIG_40x) + unsigned int tcr; /* Clear any pending timer interrupts */ mtspr(SPRN_TSR, TSR_ENW | TSR_WIS | TSR_DIS | TSR_FIS); - /* Enable decrementer interrupt */ - mtspr(SPRN_TCR, TCR_DIE); + tcr = mfspr(SPRN_TCR); + /* +* At this point in the kernel boot, the watchdog has already +* been enabled by u-boot. If we set it this to '00' it may +* trigger watchdog earlier than needed which will cause +* inattentional kernel panic. In this case we leaving TCR[WP] +* bit setting from uboot/bootstrap. +*/ + tcr &= TCR_WP_MASK; /* clear all bits except for TCR[WP] */ + tcr |= TCR_DIE; /* enable decrementer */ + mtspr(SPRN_TCR, tcr); + #endif /* defined(CONFIG_BOOKE) || defined(CONFIG_40x) */ } -- 1.7.1
Re: [PATCH 3/3] EDAC: mv64x60: replace in_le32/out_le32 with ioread32/iowrite32
On Fri, May 19, 2017 at 4:01 PM, David Laightwrote: > From: Arnd Bergmann >> Sent: 17 May 2017 22:40 >> >> On Wed, May 17, 2017 at 11:16 PM, Chris Packham >> wrote: >> > On 18/05/17 06:18, Borislav Petkov wrote: >> > One thing I would like confirmation on is is in_le32 -> ioread32 the >> > correct change? I tossed up between ioread32 and readl. Looking at >> > mv643xx_eth.c which supports both the MV643XX and Orion it's using readl >> > so perhaps I should be using that. >> >> There is no easy answer: on powerpc, readl is used for PCI, >> while in_le32 is used for on-chip devices, so in_le32 is the >> right one in principle. The main difference is that readl can >> work with CONFIG_EEH on pseries, but in_le32 is cheaper. >> >> On ARM and most other architectures, readl is used for both >> PCI and on-chip devices, so that's what portable code tends >> to use. >> >> ioread32 is required to behave the same way as readl >> on all __iomem pointers returned from ioremap(), but >> is an extern function on powerpc and can be more >> expensive when CONFIG_GENERIC_IOMAP is set. > > What about x86? > Isn't ioread32() an extern function that checks for 'io' addresses > than need 'inb' (etc) instructions rather than memory ones. Right, x86 uses CONFIG_GENERIC_IOMAP and has that extra check. > If we know a PCI slave isn't 'io' should be be using ioread32() or readl()? Generally speaking yes, though on most architectures that don't use CONFIG_GENERIC_IOMAP there is no practical difference. > Don't some architectures have different enforced barriers in both these? I'm not aware of any architecture that always adds barriers to ioread32 compared to readl, though I guess that may be a reasonable implementation depending on the architecture. PowerPC doesn't do it, though one could argue that it would be required to guarantee non-posted I/O accesses on PCI I/O space that gets mapped with pci_iomap(). On ARM, no extra barrier is needed because the difference is in the page table flags. Arnd
Re: [RFC] arch hardlockup detector interfaces improvement
On Fri, 19 May 2017 09:17:53 -0400 Don Zickuswrote: > On Fri, May 19, 2017 at 09:07:31AM +1000, Nicholas Piggin wrote: > > On Thu, 18 May 2017 12:30:28 -0400 > > Don Zickus wrote: > > > > > (adding Uli) > > > > > > On Fri, May 19, 2017 at 01:50:26AM +1000, Nicholas Piggin wrote: > > > > I'd like to make it easier for architectures that have their own NMI / > > > > hard lockup detector to reuse various configuration interfaces that are > > > > provided by generic detectors (cmdline, sysctl, suspend/resume calls). > > > > > > > > I'd also like to remove the dependency of arch hard lockup detectors > > > > on the softlockup detector. The reason being these watchdogs can be > > > > very small (sparc's is like a page of core code that does not use any > > > > big subsystem like kthreads or timers). > > > > > > > > So I do this by adding a separate CONFIG_SOFTLOCKUP_DETECTOR, and > > > > juggling around what goes under config options. HAVE_NMI_WATCHDOG > > > > continues to be the config for arch to override the hard lockup > > > > detector, which is expanded to cover a few more cases. > > > > > > Basically you are trying to remove the heavy HARDLOCKUP pieces to minimize > > > the SOFTLOCKUP piece and use your own NMI detector, right? > > > > > > I am guessing you would then disable SOFTLOCKUP to remove all the kthread > > > and timer stuff but continue to use the generic infrastructure to help > > > manager your own NMI detector? > > > > Yes that's right. > > > > > A lot of the code is just re-organizing things and adding an explicit > > > ifdef on SOFTLOCKUP, which seems fine to me. > > > > > > I just need to spend some time on some of your #else clauses to see what > > > functionality is dropped when you use your approach. > > > > Okay, appreciated. I can trim down cc lists and send you my powerpc > > WIP if you'd like to have a look. > > I am curious to know what IBM thinks there. Currently the HARDLOCKUP > detector sits on top of perf. I get the impression, you are removing that > dependency. Is that a permanent thing or are you thinking of switching back > and forth depending on if SOFTLOCKUP is enabled or not? We want to get away from perf permanently. The PMU interrupts are not specially non-maskable from a hardware POV, everything gets masked when you turn off interrupts in hardware. powerpc arch code implements a software disable layer, and PMU interrupts are differentiated there by being allowed to run even under local_irq_disable(); We have a few issues with using perf for it. We disable it by default because using it for that breaks another PMU feature. But PMU interupts are not special, so it would be possible to e.g., take the timer interrupt before soft disable and have it touch the watchdog if it fires while under local_irq_disable(). That give exact same kind of pseudo-NMI as perf interrupts, without using PMU. Further, we now want to introduce a local_pmu_disable() type of interface that extends this soft disable layer to perf interrupts as well for some cases. Once we start doing that, more code will be exempt from the hardlockup watchdog, whereas a watchdog specific hook from the timer interrupt would still cover it. Thanks, Nick
[PATCH] powerpc/64: Use tick accounting by default
From: Anton Blanchardppc64 is the only architecture that turns on VIRT_CPU_ACCOUNTING_NATIVE by default. The overhead of this option is extremely high - a context switch microbenchmark using sched_yield() is almost 20% slower. To get finer grained user/hardirq/softirq statitics, the IRQ_TIME_ACCOUNTING option can be used instead, which has much lower overhead. As such, disable this option by default. If a user really wants it, they can still enable it manually. Signed-off-by: Anton Blanchard --- init/Kconfig | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/init/Kconfig b/init/Kconfig index 1d3475fc9496..a5c30acc1ede 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -342,8 +342,7 @@ config VIRT_CPU_ACCOUNTING choice prompt "Cputime accounting" - default TICK_CPU_ACCOUNTING if !PPC64 - default VIRT_CPU_ACCOUNTING_NATIVE if PPC64 + default TICK_CPU_ACCOUNTING # Kind of a stub config for the pure tick based cputime accounting config TICK_CPU_ACCOUNTING -- 2.11.0
RE: [PATCH 3/3] EDAC: mv64x60: replace in_le32/out_le32 with ioread32/iowrite32
From: Arnd Bergmann > Sent: 17 May 2017 22:40 > > On Wed, May 17, 2017 at 11:16 PM, Chris Packham >wrote: > > On 18/05/17 06:18, Borislav Petkov wrote: > > One thing I would like confirmation on is is in_le32 -> ioread32 the > > correct change? I tossed up between ioread32 and readl. Looking at > > mv643xx_eth.c which supports both the MV643XX and Orion it's using readl > > so perhaps I should be using that. > > There is no easy answer: on powerpc, readl is used for PCI, > while in_le32 is used for on-chip devices, so in_le32 is the > right one in principle. The main difference is that readl can > work with CONFIG_EEH on pseries, but in_le32 is cheaper. > > On ARM and most other architectures, readl is used for both > PCI and on-chip devices, so that's what portable code tends > to use. > > ioread32 is required to behave the same way as readl > on all __iomem pointers returned from ioremap(), but > is an extern function on powerpc and can be more > expensive when CONFIG_GENERIC_IOMAP is set. What about x86? Isn't ioread32() an extern function that checks for 'io' addresses than need 'inb' (etc) instructions rather than memory ones. If we know a PCI slave isn't 'io' should be be using ioread32() or readl()? Don't some architectures have different enforced barriers in both these? David
Re: [RFC] arch hardlockup detector interfaces improvement
On Fri, May 19, 2017 at 09:07:31AM +1000, Nicholas Piggin wrote: > On Thu, 18 May 2017 12:30:28 -0400 > Don Zickuswrote: > > > (adding Uli) > > > > On Fri, May 19, 2017 at 01:50:26AM +1000, Nicholas Piggin wrote: > > > I'd like to make it easier for architectures that have their own NMI / > > > hard lockup detector to reuse various configuration interfaces that are > > > provided by generic detectors (cmdline, sysctl, suspend/resume calls). > > > > > > I'd also like to remove the dependency of arch hard lockup detectors > > > on the softlockup detector. The reason being these watchdogs can be > > > very small (sparc's is like a page of core code that does not use any > > > big subsystem like kthreads or timers). > > > > > > So I do this by adding a separate CONFIG_SOFTLOCKUP_DETECTOR, and > > > juggling around what goes under config options. HAVE_NMI_WATCHDOG > > > continues to be the config for arch to override the hard lockup > > > detector, which is expanded to cover a few more cases. > > > > Basically you are trying to remove the heavy HARDLOCKUP pieces to minimize > > the SOFTLOCKUP piece and use your own NMI detector, right? > > > > I am guessing you would then disable SOFTLOCKUP to remove all the kthread > > and timer stuff but continue to use the generic infrastructure to help > > manager your own NMI detector? > > Yes that's right. > > > A lot of the code is just re-organizing things and adding an explicit > > ifdef on SOFTLOCKUP, which seems fine to me. > > > > I just need to spend some time on some of your #else clauses to see what > > functionality is dropped when you use your approach. > > Okay, appreciated. I can trim down cc lists and send you my powerpc > WIP if you'd like to have a look. I am curious to know what IBM thinks there. Currently the HARDLOCKUP detector sits on top of perf. I get the impression, you are removing that dependency. Is that a permanent thing or are you thinking of switching back and forth depending on if SOFTLOCKUP is enabled or not? Cheers, Don
Re: [PATCH] powerpc/sequoia: fix NAND partitions not to overlap
Hi! > >>> > > Signed-off-by: Pavel Machek> >>> > > >>> > Ping? Two partitions at same place are bad news... > >>> > >>> Please expand on "bad news"? What are the runtime effects of this > >>> change? Decisions about which kernel(s) to patch depend on this info. > >> > >> Well... two partitions at same place. If you use one, you will corrupt > >> information on the other one. > >> > >> OTOH this moves partition around (so that they don't overlap) so it is > >> probably not stable candidate. > >> > >> I guess this is not huge issue; people using these boards probably > >> have custom dts changes, anyway... > > > > Or no one's even using it anymore. > > > > I can take this via powerpc, I won't mark it for stable etc. > > It became: > > Author: Pavel Machek > > powerpc/sequoia: Fix NAND partitions not to overlap > > Currently the DTS defines two partitions at the same addresses, if you > use one, you will corrupt information on the other one. Fix it by > shifting the second partition. > > Signed-off-by: Pavel Machek > [mpe: Reconstruct change log from email thread] > Signed-off-by: Michael Ellerman > > > Minor nit, your From: address doesn't match your Signed-off-by: address, > which trips my "check patch is signed-off-by author" script. I can > just ignore it, but I think it's preferable if they match. Sorry about that (I have automatic scripts changing From:). It is still me, but I guess denx should get the credit here, so you can use. Signed-off-by: Pavel Machek Thanks, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
RE: [PATCH 2/9] timers: provide a "modern" variant of timers
From: Christoph Hellwig > Sent: 16 May 2017 12:48 > > The new callback gets a pointer to the timer_list itself, which can > then be used to get the containing structure using container_of > instead of casting from and to unsigned long all the time. What about sensible drivers that put some other value in the 'data' field? Perhaps it ought to have been 'void *data'. Seems retrograde to be passing the address of the timer structure (which, in principle, the callers no nothing about). So I wouldn't call it 'modern', just different. David
Memory hotplug broken on powerkvm/pseries with CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y
Hi, Memory hotplug for a pseries guest on PowerKVM fails with the following error when CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y in the guest kernel. (I am on commit b23afd384801) [root@localhost ~]# pseries-hotplug-mem: Attempting to hot-add 4 LMB(s) at index 8020 removing memory fails, because memory [0x0002-0x00020fff] is onlined [ cut here ] kernel BUG at mm/memory_hotplug.c:2189! Oops: Exception in kernel mode, sig: 5 [#1] Call Trace: remove_memory+0x70/0x100 (unreliable) dlpar_add_lmb+0x370/0x3f0 dlpar_memory+0x7bc/0xcd0 handle_dlpar_errorlog+0xf8/0x160 pseries_hp_work_fn+0x94/0xa0 process_one_work+0x240/0x510 worker_thread+0x9c/0x550 kthread+0x164/0x1b0 ret_from_kernel_thread+0x5c/0x7c This failure occurs because the current code results in double onlining as memhp_auto_online gets set with CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y. It appears that by reverting commit 943db62c316c5 (powerpc/pseries: Revert 'Auto-online hotplugged memory'), hotplug is working again. Apparently that commit itself is a revert of other two commits which caused "the pseries memory DLPAR code to fail when trying to remove an LMB that was previously removed and added back". However I am able to do hotplug, unplug, plug and unplug of the same DIMM device (set of LMBs) most of the times with the above commit (943db62c316c5) reverted. The only occasional failure to unplug occurs when guest refuses to offline an LMB (with -EBUSY) possibly because it is in use, which I think is a valid failure. Nathan - Could you please let me know the exact way to reproduce the problem that led to this commit (943db62c316c5) ? Regards, Bharata. -- http://raobharata.wordpress.com/
Re: [PATCH 4/4] arch/powerpc/44x/fsp2: wdt tcr update instead of whole rewrite
On Mon, 2017-05-15 at 16:07 +0300, Ivan Mikhaylov wrote: > +#ifdef CONFIG_FSP2 > + /* > + * Prevent a kernel panic caused by unintentionally clearing TCR > + * watchdog bits. At this point in the kernel boot, the watchdog has > + * already been enabled by u-boot. The original code's attempt to > + * write to the TCR register results in an inadvertent clearing of the > + * watchdog configuration bits, causing the 440 to reset. > + */ > + tcr = mfspr(SPRN_TCR); > + tcr &= TCR_WP_MASK; /* clear all bits except for TCR[WP] */ > + tcr |= TCR_DIE; /* enable decrementer */ > + mtspr(SPRN_TCR, tcr); > +#else This should be a runtime test, not a compile time option. Cheers, Ben.
Re: [PATCH 4/4] arch/powerpc/44x/fsp2: wdt tcr update instead of whole rewrite
Hi Ivan, Ivan Mikhaylovwrites: > Prevent a kernel panic caused by unintentionally clearing TCR > watchdog bits. At this point in the kernel boot, the watchdog has > already been enabled by u-boot. The original code's attempt to > write to the TCR register results in an inadvertent clearing of the > watchdog configuration bits, causing the 476 to reset. > Panic happens in case of error as silently reboot without any outputs > on serial. That sounds reasonable. > diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c > index 2b33cfa..f75e512 100644 > --- a/arch/powerpc/kernel/time.c > +++ b/arch/powerpc/kernel/time.c > @@ -738,12 +738,28 @@ static int __init get_freq(char *name, int cells, > unsigned long *val) > > static void start_cpu_decrementer(void) > { > + unsigned int tcr; > #if defined(CONFIG_BOOKE) || defined(CONFIG_40x) > /* Clear any pending timer interrupts */ > mtspr(SPRN_TSR, TSR_ENW | TSR_WIS | TSR_DIS | TSR_FIS); > > +#ifdef CONFIG_FSP2 > + /* > + * Prevent a kernel panic caused by unintentionally clearing TCR > + * watchdog bits. At this point in the kernel boot, the watchdog has > + * already been enabled by u-boot. The original code's attempt to Don't refer to "the original code", as it doesn't exist anymore now that we've patched it. Just say something like ".. so we must not clear the watchdog configuration bits". > + * write to the TCR register results in an inadvertent clearing of the > + * watchdog configuration bits, causing the 440 to reset. > + */ > + tcr = mfspr(SPRN_TCR); > + tcr &= TCR_WP_MASK; /* clear all bits except for TCR[WP] */ > + tcr |= TCR_DIE; /* enable decrementer */ > + mtspr(SPRN_TCR, tcr); > +#else > /* Enable decrementer interrupt */ > mtspr(SPRN_TCR, TCR_DIE); > +#endif > + > #endif /* defined(CONFIG_BOOKE) || defined(CONFIG_40x) */ That breaks the build for other platforms: arch/powerpc/kernel/time.c: In function ‘start_cpu_decrementer’: arch/powerpc/kernel/time.c:741:15: error: unused variable ‘tcr’ [-Werror=unused-variable] You can just do something like: #if defined(CONFIG_BOOKE) || defined(CONFIG_40x) unsigned int tcr; #ifdef CONFIG_FSP2 tcr = mfspr(SPRN_TCR); tcr &= TCR_WP_MASK; /* clear all bits except for TCR[WP] */ #else tcr = 0; #endif tcr |= TCR_DIE; /* enable decrementer */ mtspr(SPRN_TCR, TCR_DIE); #endif /* defined(CONFIG_BOOKE) || defined(CONFIG_40x) */ Or you could possibly just always leave TCR[WP], is there any case where it would be correct to clear that? cheers
[PATCH] rtc/tpo: Implement rtc_class_ops.alarm_irq_enable callback
Provide an implementation of the callback rtc_class_ops.alarm_irq_enable callback for rtc-opal driver. This callback is called when the wake alarm is disabled via the command: 'echo 0 > /sys/class/rtc/rtc0/wakealarm' Without this the Timed-Power-On(TPO) config remains set even when its disabled by the above command and the FSP will force machine boot at previously configured time. The callback is implemented as function opal_tpo_alarm_irq_enable() which calls opal_set_tpo_time() with alarm.enabled == 0. A branch has been added to opal_set_tpo_time() to handle this case by passing y_m_d == h_m_s_ms == 0 to opal as arguments for opal_tpo_write() call. Patch-based-on: https://patchwork.ozlabs.org/patch/764559/ Signed-off-by: Vaibhav Jain--- Corresponding opal side changes posted at https://patchwork.ozlabs.org/patch/764555/ "fsp/tpo: Provide support for disabling TPO alarm" --- drivers/rtc/rtc-opal.c | 22 +- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/drivers/rtc/rtc-opal.c b/drivers/rtc/rtc-opal.c index 2d84e2a..e2a946c 100644 --- a/drivers/rtc/rtc-opal.c +++ b/drivers/rtc/rtc-opal.c @@ -167,7 +167,14 @@ static int opal_set_tpo_time(struct device *dev, struct rtc_wkalrm *alarm) u32 y_m_d = 0; int token, rc; - tm_to_opal(>time, _m_d, _m_s_ms); + /* if alarm is enabled */ + if (alarm->enabled) { + tm_to_opal(>time, _m_d, _m_s_ms); + pr_debug("Alarm set to %x %llx\n", y_m_d, h_m_s_ms); + + } else { + pr_debug("Alarm getting disabled\n"); + } token = opal_async_get_token_interruptible(); if (token < 0) { @@ -200,6 +207,18 @@ static int opal_set_tpo_time(struct device *dev, struct rtc_wkalrm *alarm) return rc; } +int opal_tpo_alarm_irq_enable(struct device *dev, unsigned int enabled) +{ + struct rtc_wkalrm alarm = { .enabled = 0 }; + + /* +* TPO is automatically enabled when opal_set_tpo_time() is called with +* non-zero rtc-time. We only handle disable case which needs to be +* explicitly told to opal. +*/ + return enabled ? 0 : opal_set_tpo_time(dev, ); +} + static struct rtc_class_ops opal_rtc_ops = { .read_time = opal_get_rtc_time, .set_time = opal_set_rtc_time, @@ -215,6 +234,7 @@ static int opal_rtc_probe(struct platform_device *pdev) device_set_wakeup_capable(>dev, true); opal_rtc_ops.read_alarm = opal_get_tpo_time; opal_rtc_ops.set_alarm = opal_set_tpo_time; + opal_rtc_ops.alarm_irq_enable = opal_tpo_alarm_irq_enable; } rtc = devm_rtc_device_register(>dev, DRVNAME, _rtc_ops, -- 2.9.3
[PATCH] rtc/tpo: Handle disabled TPO in opal_get_tpo_time()
On PowerNV platform when Timed-Power-On(TPO) is disabled, read of stored TPO yields value with all date components set to '0' inside opal_get_tpo_time(). The function opal_to_tm() then converts it to an offset from year 1900 yielding alarm-time == "1900-00-01 00:00:00". This causes problems with __rtc_read_alarm() that expecting an offset from "1970-00-01 00:00:00" and returned alarm-time results in a -ve value for time64_t. Which ultimately results in this error reported in kernel logs with a seemingly garbage value: "rtc rtc0: invalid alarm value: -2-1--1041528741 200557:71582844:32" We fix this by explicitly handling the case of all alarm date-time components being '0' inside opal_get_tpo_time() and returning -ENOENT in such a case. This signals generic rtc that no alarm is set and it bails out from the alarm initialization flow without reporting the above error. Signed-off-by: Vaibhav JainReported-by: Steve Best --- drivers/rtc/rtc-opal.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/drivers/rtc/rtc-opal.c b/drivers/rtc/rtc-opal.c index ea20f62..2d84e2a 100644 --- a/drivers/rtc/rtc-opal.c +++ b/drivers/rtc/rtc-opal.c @@ -142,6 +142,16 @@ static int opal_get_tpo_time(struct device *dev, struct rtc_wkalrm *alarm) y_m_d = be32_to_cpu(__y_m_d); h_m_s_ms = ((u64)be32_to_cpu(__h_m) << 32); + + /* check if no alarm is set */ + if (y_m_d == 0 && h_m_s_ms == 0) { + pr_debug("No alarm is set\n"); + rc = -ENOENT; + goto exit; + } else { + pr_debug("Alarm set to %x %llx\n", y_m_d, h_m_s_ms); + } + opal_to_tm(y_m_d, h_m_s_ms, >time); exit: -- 2.9.3
Re: [PATCH] powerpc/sequoia: fix NAND partitions not to overlap
Michael Ellermanwrites: > Pavel Machek writes: > >> On Wed 2017-05-17 14:37:17, Andrew Morton wrote: >>> On Wed, 17 May 2017 14:06:13 +0200 Pavel Machek wrote: >>> >>> > On Sun 2017-04-02 12:05:36, Pavel Machek wrote: >>> > > Fix overlapping NAND partitions. >>> > > >>> > > Signed-off-by: Pavel Machek >>> > >>> > Ping? Two partitions at same place are bad news... >>> >>> Please expand on "bad news"? What are the runtime effects of this >>> change? Decisions about which kernel(s) to patch depend on this info. >> >> Well... two partitions at same place. If you use one, you will corrupt >> information on the other one. >> >> OTOH this moves partition around (so that they don't overlap) so it is >> probably not stable candidate. >> >> I guess this is not huge issue; people using these boards probably >> have custom dts changes, anyway... > > Or no one's even using it anymore. > > I can take this via powerpc, I won't mark it for stable etc. It became: Author: Pavel Machek powerpc/sequoia: Fix NAND partitions not to overlap Currently the DTS defines two partitions at the same addresses, if you use one, you will corrupt information on the other one. Fix it by shifting the second partition. Signed-off-by: Pavel Machek [mpe: Reconstruct change log from email thread] Signed-off-by: Michael Ellerman Minor nit, your From: address doesn't match your Signed-off-by: address, which trips my "check patch is signed-off-by author" script. I can just ignore it, but I think it's preferable if they match. cheers
Re: [PATCH 3/3] EDAC: mv64x60: replace in_le32/out_le32 with ioread32/iowrite32
Arnd Bergmannwrites: > On Thu, May 18, 2017 at 7:36 AM, Michael Ellerman wrote: >> Borislav Petkov writes: >> >>> Top-posting so that the PPC list can see the whole patch below. >>> >>> Since I don't know PPC, let me add PPC ML to CC for a confirmation this >>> change is correct. >>> >>> Which brings me to the tangential: this driver is from 2006-ish and >>> is for some "Marvell MV64x60 Memory Controller kernel module for PPC >>> platforms". If you're going to touch it, then you should test on the PPC >>> hardware too, so that you don't break the driver there. >>> >>> Unless that hardware is obsolete now and we don't care and, and ..? >>> >>> Maybe someone has some insights... >> >> Not really sorry. >> >> I don't have one of those boards, so I can't test. Maybe someone else on >> the list does? >> >> I'd err on the side of the PPC hardware being obsolete and no one really >> caring. If the driver is helpful on ARM then we may as well use it >> there, if we can also avoid breaking it on PPC then great. > > I never had one myself, but tried to figure out what is still there to be > supported. In 2014, we removed one platform (PrPMC2800) that was > obsolete. There were eight boards that didn't make the cut from > arch/ppc32 to arch/powerpc. The C2K is the last one and it was > added in 2008 with this comment: > > Support for the C2K cPCI Single Board Computer from GEFanuc > (PowerPC MPC7448 with a Marvell MV64460 chipset). > All features of the board are not supported yet, but the board > boots, flash works, all Ethernet ports are working and PCI > devices are all found (USB and SATA on PCI1 do not work yet). > > Part 3 of 5: driver for the board. At this time it is very generic > and similar to its original, the driver for the prpmc2800. > > The original submitter never followed up on it and neither the > board code not the DTS was ever updated to include additional > features, so I assume it only got worse from there. > > According to https://www.abaco.com/products/c2ka-compactpci-sbc > the end-of-life date for the product was in 2015, presumably 10 years > after it got introduced. > > This is definitely obsolete by now, and given the missing features > I would assume that nobody is running mainline kernels on it and > it can be removed. Great, thanks for doing all that digging. I'll cook up a patch to remove c2k next week. Then we can drop CONFIG_MV64X60 from PPC and see where that leaves things, depending on what's useful on ARM. cheers
Re: powerpc/mm: Fix virt_addr_valid() etc. on 64-bit hash
On Thu, 2017-05-18 at 10:37:31 UTC, Michael Ellerman wrote: > virt_addr_valid() is supposed to tell you if it's OK to call virt_to_page() on > an address. What this means in practice is that it should only return true for > addresses in the linear mapping which are backed by a valid PFN. > > We are failing to properly check that the address is in the linear mapping, > because virt_to_pfn() will return a valid looking PFN for more or less any > address. That bug is actually caused by __pa(), used in virt_to_pfn(). > > eg: __pa(0xc001) = 0x1 # Good > __pa(0xd001) = 0x1 # Bad! > __pa(0x0001) = 0x1 # Bad! > > This started happening after commit bdbc29c19b26 ("powerpc: Work around gcc > miscompilation of __pa() on 64-bit") (Aug 2013), where we changed the > definition > of __pa() to work around a GCC bug. Prior to that we subtracted PAGE_OFFSET > from > the value passed to __pa(), meaning __pa() of a 0xd or 0x0 address would give > you something bogus back. > > Until we can verify if that GCC bug is no longer an issue, or come up with > another solution, this commit does the minimal fix to make virt_addr_valid() > work, by explicitly checking that the address is in the linear mapping region. > > Fixes: bdbc29c19b26 ("powerpc: Work around gcc miscompilation of __pa() on > 64-bit") > Signed-off-by: Michael Ellerman> Reviewed-by: Paul Mackerras > Reviewed-by: Balbir Singh > Tested-by: Breno Leitao Applied to powerpc fixes. https://git.kernel.org/powerpc/c/e41e53cd4fe331d0d1f06f8e4ed7e2 cheers
[GIT PULL] Please pull powerpc/linux.git powerpc-4.12-3 tag
Hi Linus, Please pull some powerpc fixes for 4.12: The following changes since commit 2ea659a9ef488125eb46da6eb571de5eae5c43f6: Linux 4.12-rc1 (2017-05-13 13:19:49 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git tags/powerpc-4.12-3 for you to fetch changes up to e41e53cd4fe331d0d1f06f8e4ed7e2cc63ee2c34: powerpc/mm: Fix virt_addr_valid() etc. on 64-bit hash (2017-05-19 13:04:35 +1000) powerpc fixes for 4.12 #3 The headline is a fix for FP/VMX register corruption when using transactional memory, and a new selftest to go with it. Then there's the virt_addr_valid() fix, currently HARDENDED_USERCOPY is tripping on that causing some machines to crash. A few other fairly minor fixes for long tail things, and a couple of fixes for code we just merged. Thanks to: Breno Leitao, Gautham R. Shenoy, Michael Neuling, Naveen N. Rao. Nicholas Piggin, Paul Mackerras. Gautham R. Shenoy (1): powerpc/powernv: Set NAPSTATELOST after recovering paca on P9 DD1 Michael Ellerman (3): powerpc/modules: If mprofile-kernel is enabled add it to vermagic powerpc/mm: Fix crash in page table dump with huge pages powerpc/mm: Fix virt_addr_valid() etc. on 64-bit hash Michael Neuling (2): powerpc/tm: Fix FP and VMX register corruption selftests/powerpc: Test TM and VMX register state Naveen N. Rao (1): powerpc/kprobes: Fix handling of instruction emulation on probe re-entry arch/powerpc/include/asm/module.h | 4 + arch/powerpc/include/asm/page.h| 12 +++ arch/powerpc/kernel/idle_book3s.S | 2 +- arch/powerpc/kernel/kprobes.c | 3 +- arch/powerpc/kernel/process.c | 19 arch/powerpc/mm/dump_linuxpagetables.c | 7 +- tools/testing/selftests/powerpc/tm/.gitignore | 1 + tools/testing/selftests/powerpc/tm/Makefile| 4 +- .../testing/selftests/powerpc/tm/tm-vmx-unavail.c | 118 + 9 files changed, 164 insertions(+), 6 deletions(-) create mode 100644 tools/testing/selftests/powerpc/tm/tm-vmx-unavail.c signature.asc Description: PGP signature
[PATCH net-next v2] ibmveth: Support to enable LSO/CSO for Trunk VEA.
Current largesend and checksum offload feature in ibmveth driver, - Source VM sends the TCP packets with ip_summed field set as CHECKSUM_PARTIAL and TCP pseudo header checksum is placed in checksum field - CHECKSUM_PARTIAL flag in SKB will enable ibmveth driver to mark "no checksum" and "checksum good" bits in transmit buffer descriptor before the packet is delivered to pseries PowerVM Hypervisor - If ibmveth has largesend capability enabled, transmit buffer descriptors are market accordingly before packet is delivered to Hypervisor (along with mss value for packets with length > MSS) - Destination VM's ibmveth driver receives the packet with "checksum good" bit set and so, SKB's ip_summed field is set with CHECKSUM_UNNECESSARY - If "largesend" bit was on, mss value is copied from receive descriptor into SKB's gso_size and other flags are appropriately set for packets > MSS size - The packet is now successfully delivered up the stack in destination VM The offloads described above works fine for TCP communication among VMs in the same pseries server ( VM A <=> PowerVM Hypervisor <=> VM B ) We are now enabling support for OVS in pseries PowerVM environment. One of our requirements is to have ibmveth driver configured in "Trunk" mode, when they are used with OVS. This is because, PowerVM Hypervisor will no more bridge the packets between VMs, instead the packets are delivered to IO Server which hosts OVS to bridge them between VMs or to external networks (flow shown below), VM A <=> PowerVM Hypervisor <=> IO Server(OVS) <=> PowerVM Hypervisor <=> VM B In "IO server" the packet is received by inbound Trunk ibmveth and then delivered to OVS, which is then bridged to outbound Trunk ibmveth (shown below), Inbound Trunk ibmveth <=> OVS <=> Outbound Trunk ibmveth In this model, we hit the following issues which impacted the VM communication performance, - Issue 1: ibmveth doesn't support largesend and checksum offload features when configured as "Trunk". Driver has explicit checks to prevent enabling these offloads. - Issue 2: SYN packet drops seen at destination VM. When the packet originates, it has CHECKSUM_PARTIAL flag set and as it gets delivered to IO server's inbound Trunk ibmveth, on validating "checksum good" bits in ibmveth receive routine, SKB's ip_summed field is set with CHECKSUM_UNNECESSARY flag. This packet is then bridged by OVS (or Linux Bridge) and delivered to outbound Trunk ibmveth. At this point the outbound ibmveth transmit routine will not set "no checksum" and "checksum good" bits in transmit buffer descriptor, as it does so only when the ip_summed field is CHECKSUM_PARTIAL. When this packet gets delivered to destination VM, TCP layer receives the packet with checksum value of 0 and with no checksum related flags in ip_summed field. This leads to packet drops. So, TCP connections never goes through fine. - Issue 3: First packet of a TCP connection will be dropped, if there is no OVS flow cached in datapath. OVS while trying to identify the flow, computes the checksum. The computed checksum will be invalid at the receiving end, as ibmveth transmit routine zeroes out the pseudo checksum value in the packet. This leads to packet drop. - Issue 4: ibmveth driver doesn't have support for SKB's with frag_list. When Physical NIC has GRO enabled and when OVS bridges these packets, OVS vport send code will end up calling dev_queue_xmit, which in turn calls validate_xmit_skb. In validate_xmit_skb routine, the larger packets will get segmented into MSS sized segments, if SKB has a frag_list and if the driver to which they are delivered to doesn't support NETIF_F_FRAGLIST feature. This patch addresses the above four issues, thereby enabling end to end largesend and checksum offload support for better performance. - Fix for Issue 1 : Remove checks which prevent enabling TCP largesend and checksum offloads. - Fix for Issue 2 : When ibmveth receives a packet with "checksum good" bit set and if its configured in Trunk mode, set appropriate SKB fields using skb_partial_csum_set (ip_summed field is set with CHECKSUM_PARTIAL) - Fix for Issue 3: Recompute the pseudo header checksum before sending the SKB up the stack. - Fix for Issue 4: Linearize the SKBs with frag_list. Though we end up allocating buffers and copying data, this fix gives upto 4X throughput increase. Note: All these fixes need to be dropped together as fixing just one of them will lead to other issues immediately (especially for Issues 1,2 & 3). Signed-off-by: Sivakumar Krishnasamy--- drivers/net/ethernet/ibm/ibmveth.c | 107 ++--- drivers/net/ethernet/ibm/ibmveth.h | 1 + 2 files changed, 90 insertions(+), 18 deletions(-) diff --git
Re: [PATCH] powerpc/sequoia: fix NAND partitions not to overlap
Hi! > > Well... two partitions at same place. If you use one, you will corrupt > > information on the other one. > > > > OTOH this moves partition around (so that they don't overlap) so it is > > probably not stable candidate. > > > > I guess this is not huge issue; people using these boards probably > > have custom dts changes, anyway... > > Or no one's even using it anymore. > > I can take this via powerpc, I won't mark it for stable etc. That sounds good. Thanks, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: [PATCH] perf: libdw support for powerpc
On 18/05/2017 20:48, Paolo Bonzini wrote: > > > On 18/05/2017 20:19, Naveen N. Rao wrote: >> Paolo Bonzini wrote: >>> The ARM and x86 architectures already use libdw, and it is useful to >>> have as much common code for the unwinder as possible. Porting PPC >>> to libdw only needs an architecture-specific hook to move the register >>> state from perf to libdw. >> >> Thanks. Ravi has had a similar patch locally, but from what I >> understand, there are issues with libdw based unwinding on powerpc64. I >> gave this a quick test and I don't see the user-space call trace being >> unwinded properly with libdw. > > I don't see that problem: > > - 99,98% 0,00% dd libc-2.17.so[.] > generic_start_main.isra.0 > generic_start_main.isra.0 >- main > - 99,97% iread > - 97,82% sys_read > - 96,97% extract_entropy_user > 89,44% powernv_get_random_long > 4,63% sha_transform > 2,07% extract_buf >1,15% _raw_spin_lock_irqsave >0,51% extract_buf > > (This is "perf record dd if=/dev/urandom of=/dev/null bs=512"). > > Can you copy the contents of tools/perf/.config-detected here? FWIW it works fine for me here with latest Linus tree and your suggested change to fix compilation: --- a/tools/perf/arch/powerpc/util/unwind-libdw.c +++ b/tools/perf/arch/powerpc/util/unwind-libdw.c @@ -1,7 +1,7 @@ #include -#include "../../util/util.h" #include "../../util/unwind-libdw.h" #include "../../util/perf_regs.h" +#include "../../util/event.h" /* See backends/ppc_initreg.c and backends/ppc_regs.c in elfutils. */ static const int special_regs[3][2] = { Thanks, Paolo