Re: [PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported
On Fri, 13 May 2016 02:33:18 + "Tian, Kevin"wrote: > > From: Alex Williamson [mailto:alex.william...@redhat.com] > > Sent: Friday, May 13, 2016 1:48 AM > > > > On Thu, 12 May 2016 04:53:19 + > > "Tian, Kevin" wrote: > > > > > > From: Alex Williamson [mailto:alex.william...@redhat.com] > > > > Sent: Thursday, May 12, 2016 10:21 AM > > > > > > > > On Thu, 12 May 2016 01:19:44 + > > > > "Tian, Kevin" wrote: > > > > > > > > > > From: Alex Williamson [mailto:alex.william...@redhat.com] > > > > > > Sent: Wednesday, May 11, 2016 11:54 PM > > > > > > > > > > > > On Wed, 11 May 2016 06:29:06 + > > > > > > "Tian, Kevin" wrote: > > > > > > > > > > > > > > From: Alex Williamson [mailto:alex.william...@redhat.com] > > > > > > > > Sent: Thursday, May 05, 2016 11:05 PM > > > > > > > > > > > > > > > > On Thu, 5 May 2016 12:15:46 + > > > > > > > > "Tian, Kevin" wrote: > > > > > > > > > > > > > > > > > > From: Yongji Xie [mailto:xyj...@linux.vnet.ibm.com] > > > > > > > > > > Sent: Thursday, May 05, 2016 7:43 PM > > > > > > > > > > > > > > > > > > > > Hi David and Kevin, > > > > > > > > > > > > > > > > > > > > On 2016/5/5 17:54, David Laight wrote: > > > > > > > > > > > > > > > > > > > > > From: Tian, Kevin > > > > > > > > > > >> Sent: 05 May 2016 10:37 > > > > > > > > > > > ... > > > > > > > > > > >>> Acutually, we are not aimed at accessing MSI-X table > > > > > > > > > > >>> from > > > > > > > > > > >>> guest. So I think it's safe to passthrough MSI-X table > > > > > > > > > > >>> if we > > > > > > > > > > >>> can make sure guest kernel would not touch MSI-X table > > > > > > > > > > >>> in > > > > > > > > > > >>> normal code path such as para-virtualized guest kernel > > > > > > > > > > >>> on PPC64. > > > > > > > > > > >>> > > > > > > > > > > >> Then how do you prevent malicious guest kernel accessing > > > > > > > > > > >> it? > > > > > > > > > > > Or a malicious guest driver for an ethernet card setting > > > > > > > > > > > up > > > > > > > > > > > the receive buffer ring to contain a single word entry > > > > > > > > > > > that > > > > > > > > > > > contains the address associated with an MSI-X interrupt > > > > > > > > > > > and > > > > > > > > > > > then using a loopback mode to cause a specific packet be > > > > > > > > > > > received that writes the required word through that > > > > > > > > > > > address. > > > > > > > > > > > > > > > > > > > > > > Remember the PCIe cycle for an interrupt is a normal > > > > > > > > > > > memory write > > > > > > > > > > > cycle. > > > > > > > > > > > > > > > > > > > > > > David > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If we have enough permission to load a malicious driver or > > > > > > > > > > kernel, we can easily break the guest without exposed > > > > > > > > > > MSI-X table. > > > > > > > > > > > > > > > > > > > > I think it should be safe to expose MSI-X table if we can > > > > > > > > > > make sure that malicious guest driver/kernel can't use > > > > > > > > > > the MSI-X table to break other guest or host. The > > > > > > > > > > capability of IRQ remapping could provide this > > > > > > > > > > kind of protection. > > > > > > > > > > > > > > > > > > > > > > > > > > > > With IRQ remapping it doesn't mean you can pass through MSI-X > > > > > > > > > structure to guest. I know actual IRQ remapping might be > > > > > > > > > platform > > > > > > > > > specific, but at least for Intel VT-d specification, MSI-X > > > > > > > > > entry must > > > > > > > > > be configured with a remappable format by host kernel which > > > > > > > > > contains an index into IRQ remapping table. The index will > > > > > > > > > find a > > > > > > > > > IRQ remapping entry which controls interrupt routing for a > > > > > > > > > specific > > > > > > > > > device. If you allow a malicious program random index into > > > > > > > > > MSI-X > > > > > > > > > entry of assigned device, the hole is obvious... > > > > > > > > > > > > > > > > > > Above might make sense only for a IRQ remapping implementation > > > > > > > > > which doesn't rely on extended MSI-X format (e.g. simply > > > > > > > > > based on > > > > > > > > > BDF). If that's the case for PPC, then you should build MSI-X > > > > > > > > > passthrough based on this fact instead of general IRQ > > > > > > > > > remapping > > > > > > > > > enabled or not. > > > > > > > > > > > > > > > > I don't think anyone is expecting that we can expose the MSI-X > > > > > > > > vector > > > > > > > > table to the guest and the guest can make direct use of it. > > > > > > > > The end > > > > > > > > goal here is that the guest on a power system is already > > > > > > > > paravirtualized to not program the device MSI-X by directly > > > > > > > > writing to > > > > > > > > the MSI-X vector table. They have hypercalls for this since > > > > > > >
Re: [PATCH] powerpc: Use privileged SPR number for MMCR2
On Thu, May 12, 2016 at 01:29:11PM +0200, Thomas Huth wrote: > We are already using the privileged versions of MMCR0, MMCR1 > and MMCRA in the kernel, so for MMCR2, we should better use > the privileged versions, too, to be consistent. > > Suggested-by: Paul Mackerras> Signed-off-by: Thomas Huth Acked-by: Paul Mackerras ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: Fix definition of SIAR and SDAR registers
On Thu, May 12, 2016 at 01:26:44PM +0200, Thomas Huth wrote: > The SIAR and SDAR registers are available twice, one time as SPRs > 780 / 781 (unprivileged, but read-only), and one time as the SPRs > 796 / 797 (privileged, but read and write). The Linux kernel code > currently uses the unprivileged SPRs - while this is OK for reading, > writing to that register of course does not work. > Since the KVM code tries to write to this register, too (see the mtspr > in book3s_hv_rmhandlers.S), the contents of this register sometimes get > lost for the guests, e.g. during migration of a VM. > To fix this issue, simply switch to the privileged SPR numbers instead. > > Signed-off-by: Thomas HuthAcked-by: Paul Mackerras ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported
> From: Tian, Kevin > Sent: Friday, May 13, 2016 10:33 AM > > > means. The MSI-X vector table of a device is always considered > > untrusted which is why we require user opt-ins to subvert that > > protection. Thanks, > > > > I only partially agree with this statement since there is different > trust level for kernel space driver and user space driver. > > OS may choose to trust kernel space driver then it can enable IRQ > remapping only for load balance purpose w/o source id verfification. > In such case MSI-X vector table is trusted and fully under kernel > control. Allowing to mmap MSI-X table in user space definitely > breaks that boundary. > > Anyway my point is simple. Let's ignore how Linux kernel implements > IRQ remapping on x86 (which may change time to time), and just > focus on architectural possibility. Non-x86 platform may implement > IRQ remapping completely separate from device side, then checking > availability of IRQ remapping is enough to decide whether mmap > MSI-X table is safe. x86 with VT-d can be configured to a mode > requiring host control of both MSI-X entry and IRQ remapping hardware > (without source id check). In such case it's insufficient to make > decision simply based on IRQ remapping availability. We need a way > to query from IRQ remapping module whether it's actually safe to > mmap MSI-X. > Or another way is to have VFIO explicitly request intel-iommu to enforce sid check when IRQ remapping is enabled... Thanks Kevin ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported
> From: Alex Williamson [mailto:alex.william...@redhat.com] > Sent: Friday, May 13, 2016 1:48 AM > > On Thu, 12 May 2016 04:53:19 + > "Tian, Kevin"wrote: > > > > From: Alex Williamson [mailto:alex.william...@redhat.com] > > > Sent: Thursday, May 12, 2016 10:21 AM > > > > > > On Thu, 12 May 2016 01:19:44 + > > > "Tian, Kevin" wrote: > > > > > > > > From: Alex Williamson [mailto:alex.william...@redhat.com] > > > > > Sent: Wednesday, May 11, 2016 11:54 PM > > > > > > > > > > On Wed, 11 May 2016 06:29:06 + > > > > > "Tian, Kevin" wrote: > > > > > > > > > > > > From: Alex Williamson [mailto:alex.william...@redhat.com] > > > > > > > Sent: Thursday, May 05, 2016 11:05 PM > > > > > > > > > > > > > > On Thu, 5 May 2016 12:15:46 + > > > > > > > "Tian, Kevin" wrote: > > > > > > > > > > > > > > > > From: Yongji Xie [mailto:xyj...@linux.vnet.ibm.com] > > > > > > > > > Sent: Thursday, May 05, 2016 7:43 PM > > > > > > > > > > > > > > > > > > Hi David and Kevin, > > > > > > > > > > > > > > > > > > On 2016/5/5 17:54, David Laight wrote: > > > > > > > > > > > > > > > > > > > From: Tian, Kevin > > > > > > > > > >> Sent: 05 May 2016 10:37 > > > > > > > > > > ... > > > > > > > > > >>> Acutually, we are not aimed at accessing MSI-X table from > > > > > > > > > >>> guest. So I think it's safe to passthrough MSI-X table if > > > > > > > > > >>> we > > > > > > > > > >>> can make sure guest kernel would not touch MSI-X table in > > > > > > > > > >>> normal code path such as para-virtualized guest kernel on > > > > > > > > > >>> PPC64. > > > > > > > > > >>> > > > > > > > > > >> Then how do you prevent malicious guest kernel accessing > > > > > > > > > >> it? > > > > > > > > > > Or a malicious guest driver for an ethernet card setting up > > > > > > > > > > the receive buffer ring to contain a single word entry that > > > > > > > > > > contains the address associated with an MSI-X interrupt and > > > > > > > > > > then using a loopback mode to cause a specific packet be > > > > > > > > > > received that writes the required word through that address. > > > > > > > > > > > > > > > > > > > > Remember the PCIe cycle for an interrupt is a normal memory > > > > > > > > > > write > > > > > > > > > > cycle. > > > > > > > > > > > > > > > > > > > > David > > > > > > > > > > > > > > > > > > > > > > > > > > > > If we have enough permission to load a malicious driver or > > > > > > > > > kernel, we can easily break the guest without exposed > > > > > > > > > MSI-X table. > > > > > > > > > > > > > > > > > > I think it should be safe to expose MSI-X table if we can > > > > > > > > > make sure that malicious guest driver/kernel can't use > > > > > > > > > the MSI-X table to break other guest or host. The > > > > > > > > > capability of IRQ remapping could provide this > > > > > > > > > kind of protection. > > > > > > > > > > > > > > > > > > > > > > > > > With IRQ remapping it doesn't mean you can pass through MSI-X > > > > > > > > structure to guest. I know actual IRQ remapping might be > > > > > > > > platform > > > > > > > > specific, but at least for Intel VT-d specification, MSI-X > > > > > > > > entry must > > > > > > > > be configured with a remappable format by host kernel which > > > > > > > > contains an index into IRQ remapping table. The index will find > > > > > > > > a > > > > > > > > IRQ remapping entry which controls interrupt routing for a > > > > > > > > specific > > > > > > > > device. If you allow a malicious program random index into MSI-X > > > > > > > > entry of assigned device, the hole is obvious... > > > > > > > > > > > > > > > > Above might make sense only for a IRQ remapping implementation > > > > > > > > which doesn't rely on extended MSI-X format (e.g. simply based > > > > > > > > on > > > > > > > > BDF). If that's the case for PPC, then you should build MSI-X > > > > > > > > passthrough based on this fact instead of general IRQ remapping > > > > > > > > enabled or not. > > > > > > > > > > > > > > I don't think anyone is expecting that we can expose the MSI-X > > > > > > > vector > > > > > > > table to the guest and the guest can make direct use of it. The > > > > > > > end > > > > > > > goal here is that the guest on a power system is already > > > > > > > paravirtualized to not program the device MSI-X by directly > > > > > > > writing to > > > > > > > the MSI-X vector table. They have hypercalls for this since they > > > > > > > always run virtualized. Therefore a) they never intend to touch > > > > > > > the > > > > > > > MSI-X vector table and b) they have sufficient isolation that a > > > > > > > guest > > > > > > > can only hurt itself by doing so. > > > > > > > > > > > > > > On x86 we don't have a), our method of programming the MSI-X > > > > > > > vector > > > > > > > table is to directly write to it. Therefore we will always > > > > > > > require QEMU > > > > > > > to place a
Re: usb: dwc2: regression on MyBook Live Duo / Canyonlands since 4.3.0-rc4
On Thu, 2016-05-12 at 11:58 +0200, Christian Lamparter wrote: > > > http://www.linuxplumbersconf.org/2012/wp-content/uploads/2012/09/2012-lpc-ref-big-little-endian-herrenschmidt.odp > > > > but there are at least two more twists that you completely missed here: > > > > - Some architectures (e.g. ARMv5 "BE32" mode in IXP4xx, surely some others) > > do not implement big-endian mode by wiring up the data lines between the > > bus and the CPU differently between big- and little-endian mode like > > powerpc and armv7 "BE8" do, but instead they swizzle the *address* lines > > on 8-bit and 16-bit addresses. The effect of that is that normal RAM > > accesses work as expected both ways, and devices that are accessed using > > 32-bit MMIO ops never need any byteswap (you actually get "native > > endian") while MMIO with 8 and 16 bit width does something completely > > unexpected and touches the wrong register. Having an explicit byteswap > > on the PCI host bridge gets you the expected addresses again for 8-bit > > cycles but it also means that readl()/writel() again need to swap the > > data. Right. Old PowerPC did that too and it's completely stupid. Thankfully most vendors grew a clue since then and this practice has slowly fallen into oblivion. > > - Some other architectures (e.g. Broadcom MIPS) apparently are even fancier > > and use a strapping pin on the SoC flips the endianess of the CPU core > > at the same time as all the peripheral MMIO registers, with the intention > > of never requiring any byte swaps. I believe they are implemented careful > > enough to actually get this right, but it confuses the heck out of > > Linux drivers that don't expect this. Right. Drivers like that will need an explicit test in the accessors. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3] usb: dwc2: fix regression on big-endian PowerPC/ARM systems
A patch that went into Linux-4.4 to fix big-endian mode on a Lantiq MIPS system unfortunately broke big-endian operation on PowerPC APM82181 as reported by Christian Lamparter, and likely other systems. It actually introduced multiple issues: - it broke big-endian ARM kernels: any machine that was working correctly with a little-endian kernel is no longer using byteswaps on big-endian kernels, which clearly breaks them. - On PowerPC the same thing must be true: if it was working before, using big-endian kernels is now broken. Unlike ARM, 32-bit PowerPC usually uses big-endian kernels, so they are likely all broken. - The barrier for dwc2_writel is on the wrong side of the __raw_writel(), so the MMIO no longer synchronizes with DMA operations. - On architectures that require specific CPU instructions for MMIO access, using the __raw_ variant may turn this into a pointer dereference that does not have the same effect as the readl/writel. This patch is a simple revert for all architectures other than MIPS, in the hope that we can more easily backport it to fix the regression on PowerPC and ARM systems without breaking the Lantiq system again. We should follow this up with a more elaborate change to add runtime detection of endianess, to make sure it also works on all other combinations of architectures and implementations of the usb-dwc2 device. That patch however will be fairly large and not appropriate for backports to stable kernels. Felipe suggested a different approach, using an endianess switching register to always put the device into LE mode, but unfortunately the dwc2 hardware does not provide a generic way to do that. Also, I see no practical way of addressing the problem more generally by patching architecture specific code on MIPS. Signed-off-by: Arnd BergmannFixes: 95c8bc360944 ("usb: dwc2: Use platform endianness when accessing registers") --- v3: reverted the accidental changes that slipped into the patch, resending as requested by Christian. drivers/usb/dwc2/core.h | 29 + 1 file changed, 29 insertions(+) diff --git a/drivers/usb/dwc2/core.h b/drivers/usb/dwc2/core.h index 3c58d633ce80..74ed2ee881cd 100644 --- a/drivers/usb/dwc2/core.h +++ b/drivers/usb/dwc2/core.h @@ -64,6 +64,18 @@ DWC2_TRACE_SCHEDULER_VB(pr_fmt("%s: SCH: " fmt),\ dev_name(hsotg->dev), ##__VA_ARGS__) + +#ifdef CONFIG_MIPS +/* + * There are some MIPS machines that can run in either big-endian + * or little-endian mode and that use the dwc2 register without + * a byteswap in both ways. + * Unlike other architectures, MIPS does not require a barrier + * before the __raw_writel() to synchronize with DMA but does + * require the barrier after the writel() to serialize a series + * of writes. This set of operations was added specifically for + * MIPS and should only be used there. + */ static inline u32 dwc2_readl(const void __iomem *addr) { u32 value = __raw_readl(addr); @@ -90,6 +102,23 @@ static inline void dwc2_writel(u32 value, void __iomem *addr) pr_info("INFO:: wrote %08x to %p\n", value, addr); #endif } +#else +/* Normal architectures just use readl/write */ +static inline u32 dwc2_readl(const void __iomem *addr) +{ + u32 value = readl(addr); + return value; +} + +static inline void dwc2_writel(u32 value, void __iomem *addr) +{ + writel(value, addr); + +#ifdef dwc2_log_writes + pr_info("info:: wrote %08x to %p\n", value, addr); +#endif +} +#endif /* Maximum number of Endpoints/HostChannels */ #define MAX_EPS_CHANNELS 16 -- 2.7.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: usb: dwc2: regression on MyBook Live Duo / Canyonlands since 4.3.0-rc4
On 5/12/2016 1:39 PM, Christian Lamparter wrote: > On Thursday, May 12, 2016 11:40:28 AM John Youn wrote: >> On 5/12/2016 6:30 AM, Christian Lamparter wrote: >>> On Thursday, May 12, 2016 01:55:44 PM Arnd Bergmann wrote: On Thursday 12 May 2016 11:58:18 Christian Lamparter wrote: Detecting the endianess of the device is probably the best future-proof solution, but it's also considerably more work to do in the driver, and comes with a tiny runtime overhead. >>> >>> The runtime overhead is probably non-measurable compared with the cost >>> of the actual MMIOs. >> >> Right. The code size increase is probably measurable (but still small), >> the runtime overhead is not. > > Ok, so no rebuts or complains have been posted. > > I've tested the patch you made in: https://lkml.org/lkml/2016/5/9/354 > and it works: > > Tested-by: Christian Lamparter> > So, how do we go from here? There is are two small issues with the > original patch (#ifdef DWC2_LOG_WRITES got converted to lower case: > #ifdef dwc2_log_writes) and I guess a proper subject would be nice. > > Arnd, can you please respin and post it (cc'd stable as well)? > So this is can be picked up? Or what's your plan? (I just realized my reply was stuck in my outbox, so the patch went out first) If I recall correctly, the rough consensus was to go with your longer patch in the future (fixed up for the comments that BenH and I sent), and I'd suggest basing it on top of a fixed version of my patch. >>> Well, but it comes with the "overhead"! So this was just as I said: >>> "Let's look at it and see if it's any good"... And I think it isn't >>> since the usb/host/ehci people also opted for #ifdef CONFIG_BIG_ENDIAN >>> archs etc... >> >> I slightly prefer the more general patch for future kernel versions. >> The overhead will probably be negligible, but we can perform some >> testing to make sure. >> >> Can you resubmit with all gathered feedback? > Yes I think I can do that. But I would really like to get the > regression out of the way. So for that: I back Arnd's patch. > It explains the problem much better and doesn't kill MIPS > like the revert I was doing in my initial post to the MLs. > Also, another bonus: his patch is suited to port to stable. > > The auto-detection approach is not that easy to get right, > given all the stuff that's going on with BE8, LE4, ... So > can we have your "blessing" for Arnd's patch for now? since > that way, I can base my patch on top of his work about the > issues of endiannes? (Just say: ACK :) ) > I agree Arnd's patch is best for stable. We can also apply it to mainline until we get the autodection working as well. Unless Felipe has objections. > Arnd: do you have a version with the #ifdef lower/uppercase > fix? Or should I give it a try (and fail in a different way ;) ) > Felipe just had another idea, to change the endianess of the dwc2 block by setting a registers (if that exists). That would indeed be preferable, then we can just revert the broken change that went into 4.4 and backport that fix instead. >>> Just a quick reply. I have the docs for the thing. There's something >>> like that in GAHBCFG at Bit 24... BUT it only switches the endiannes >>> for the DMA descriptors (which is not always used, there are devices >>> with PIO only)! It doesn't deal with the MMIO access at all. >> >> That's correct. It only affects descriptor endianness for DMA >> descriptor mode of operation. > > Ok. The funny thing is that for the MyBook Live Duo this setting might > be important since the PLB_DMA engine is not part of the DWC library... > Instead it's from IBM and operates in: Big Endian :-D. > Are you sure the controller is using descriptor DMA? It's more likely using buffer DMA which this setting doesn't affect. DWC2 doesn't support Descriptor DMA in device mode on mainline yet. If it's a host then it might be. Regards, John ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: usb: dwc2: regression on MyBook Live Duo / Canyonlands since 4.3.0-rc4
On Thursday 12 May 2016 22:39:36 Christian Lamparter wrote: > On Thursday, May 12, 2016 11:40:28 AM John Youn wrote: > > On 5/12/2016 6:30 AM, Christian Lamparter wrote: > > > On Thursday, May 12, 2016 01:55:44 PM Arnd Bergmann wrote: > > >> > > >> If I recall correctly, the rough consensus was to go with your longer > > >> patch in the future (fixed up for the comments that BenH and > > >> I sent), and I'd suggest basing it on top of a fixed version of > > >> my patch. > > > Well, but it comes with the "overhead"! So this was just as I said: > > > "Let's look at it and see if it's any good"... And I think it isn't > > > since the usb/host/ehci people also opted for #ifdef CONFIG_BIG_ENDIAN > > > archs etc... > > > > I slightly prefer the more general patch for future kernel versions. > > The overhead will probably be negligible, but we can perform some > > testing to make sure. > > > > Can you resubmit with all gathered feedback? > Yes I think I can do that. But I would really like to get the > regression out of the way. So for that: I back Arnd's patch. > It explains the problem much better and doesn't kill MIPS > like the revert I was doing in my initial post to the MLs. > Also, another bonus: his patch is suited to port to stable. > > The auto-detection approach is not that easy to get right, > given all the stuff that's going on with BE8, LE4, ... So > can we have your "blessing" for Arnd's patch for now? since > that way, I can base my patch on top of his work about the > issues of endiannes? (Just say: ACK ) > > Arnd: do you have a version with the #ifdef lower/uppercase > fix? Or should I give it a try (and fail in a different way ) I've already fixed it up locally, will send the latest version so it's out there, whether Felipe takes it or not. Arnd ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: usb: dwc2: regression on MyBook Live Duo / Canyonlands since 4.3.0-rc4
On Thursday, May 12, 2016 11:40:28 AM John Youn wrote: > On 5/12/2016 6:30 AM, Christian Lamparter wrote: > > On Thursday, May 12, 2016 01:55:44 PM Arnd Bergmann wrote: > >> On Thursday 12 May 2016 11:58:18 Christian Lamparter wrote: > >> Detecting the endianess of the > >> device is probably the best future-proof solution, but it's also > >> considerably more work to do in the driver, and comes with a > >> tiny runtime overhead. > > > > The runtime overhead is probably non-measurable compared with the cost > > of the actual MMIOs. > > Right. The code size increase is probably measurable (but still small), > the runtime overhead is not. > >>> > >>> Ok, so no rebuts or complains have been posted. > >>> > >>> I've tested the patch you made in: https://lkml.org/lkml/2016/5/9/354 > >>> and it works: > >>> > >>> Tested-by: Christian Lamparter> >>> > >>> So, how do we go from here? There is are two small issues with the > >>> original patch (#ifdef DWC2_LOG_WRITES got converted to lower case: > >>> #ifdef dwc2_log_writes) and I guess a proper subject would be nice. > >>> > >>> Arnd, can you please respin and post it (cc'd stable as well)? > >>> So this is can be picked up? Or what's your plan? > >> > >> (I just realized my reply was stuck in my outbox, so the patch > >> went out first) > >> > >> If I recall correctly, the rough consensus was to go with your longer > >> patch in the future (fixed up for the comments that BenH and > >> I sent), and I'd suggest basing it on top of a fixed version of > >> my patch. > > Well, but it comes with the "overhead"! So this was just as I said: > > "Let's look at it and see if it's any good"... And I think it isn't > > since the usb/host/ehci people also opted for #ifdef CONFIG_BIG_ENDIAN > > archs etc... > > I slightly prefer the more general patch for future kernel versions. > The overhead will probably be negligible, but we can perform some > testing to make sure. > > Can you resubmit with all gathered feedback? Yes I think I can do that. But I would really like to get the regression out of the way. So for that: I back Arnd's patch. It explains the problem much better and doesn't kill MIPS like the revert I was doing in my initial post to the MLs. Also, another bonus: his patch is suited to port to stable. The auto-detection approach is not that easy to get right, given all the stuff that's going on with BE8, LE4, ... So can we have your "blessing" for Arnd's patch for now? since that way, I can base my patch on top of his work about the issues of endiannes? (Just say: ACK :) ) Arnd: do you have a version with the #ifdef lower/uppercase fix? Or should I give it a try (and fail in a different way ;) ) > >> Felipe just had another idea, to change the endianess of the dwc2 > >> block by setting a registers (if that exists). That would indeed > >> be preferable, then we can just revert the broken change that > >> went into 4.4 and backport that fix instead. > > Just a quick reply. I have the docs for the thing. There's something > > like that in GAHBCFG at Bit 24... BUT it only switches the endiannes > > for the DMA descriptors (which is not always used, there are devices > > with PIO only)! It doesn't deal with the MMIO access at all. > > That's correct. It only affects descriptor endianness for DMA > descriptor mode of operation. Ok. The funny thing is that for the MyBook Live Duo this setting might be important since the PLB_DMA engine is not part of the DWC library... Instead it's from IBM and operates in: Big Endian :-D. Regards, Christian ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: usb: dwc2: regression on MyBook Live Duo / Canyonlands since 4.3.0-rc4
On 5/12/2016 6:30 AM, Christian Lamparter wrote: > On Thursday, May 12, 2016 01:55:44 PM Arnd Bergmann wrote: >> On Thursday 12 May 2016 11:58:18 Christian Lamparter wrote: >> Detecting the endianess of the >> device is probably the best future-proof solution, but it's also >> considerably more work to do in the driver, and comes with a >> tiny runtime overhead. > > The runtime overhead is probably non-measurable compared with the cost > of the actual MMIOs. Right. The code size increase is probably measurable (but still small), the runtime overhead is not. >>> >>> Ok, so no rebuts or complains have been posted. >>> >>> I've tested the patch you made in: https://lkml.org/lkml/2016/5/9/354 >>> and it works: >>> >>> Tested-by: Christian Lamparter>>> >>> So, how do we go from here? There is are two small issues with the >>> original patch (#ifdef DWC2_LOG_WRITES got converted to lower case: >>> #ifdef dwc2_log_writes) and I guess a proper subject would be nice. >>> >>> Arnd, can you please respin and post it (cc'd stable as well)? >>> So this is can be picked up? Or what's your plan? >> >> (I just realized my reply was stuck in my outbox, so the patch >> went out first) >> >> If I recall correctly, the rough consensus was to go with your longer >> patch in the future (fixed up for the comments that BenH and >> I sent), and I'd suggest basing it on top of a fixed version of >> my patch. > Well, but it comes with the "overhead"! So this was just as I said: > "Let's look at it and see if it's any good"... And I think it isn't > since the usb/host/ehci people also opted for #ifdef CONFIG_BIG_ENDIAN > archs etc... I slightly prefer the more general patch for future kernel versions. The overhead will probably be negligible, but we can perform some testing to make sure. Can you resubmit with all gathered feedback? > >> Felipe just had another idea, to change the endianess of the dwc2 >> block by setting a registers (if that exists). That would indeed >> be preferable, then we can just revert the broken change that >> went into 4.4 and backport that fix instead. > Just a quick reply. I have the docs for the thing. There's something > like that in GAHBCFG at Bit 24... BUT it only switches the endiannes > for the DMA descriptors (which is not always used, there are devices > with PIO only)! It doesn't deal with the MMIO access at all. That's correct. It only affects descriptor endianness for DMA descriptor mode of operation. Regards, John ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported
On Thu, 12 May 2016 04:53:19 + "Tian, Kevin"wrote: > > From: Alex Williamson [mailto:alex.william...@redhat.com] > > Sent: Thursday, May 12, 2016 10:21 AM > > > > On Thu, 12 May 2016 01:19:44 + > > "Tian, Kevin" wrote: > > > > > > From: Alex Williamson [mailto:alex.william...@redhat.com] > > > > Sent: Wednesday, May 11, 2016 11:54 PM > > > > > > > > On Wed, 11 May 2016 06:29:06 + > > > > "Tian, Kevin" wrote: > > > > > > > > > > From: Alex Williamson [mailto:alex.william...@redhat.com] > > > > > > Sent: Thursday, May 05, 2016 11:05 PM > > > > > > > > > > > > On Thu, 5 May 2016 12:15:46 + > > > > > > "Tian, Kevin" wrote: > > > > > > > > > > > > > > From: Yongji Xie [mailto:xyj...@linux.vnet.ibm.com] > > > > > > > > Sent: Thursday, May 05, 2016 7:43 PM > > > > > > > > > > > > > > > > Hi David and Kevin, > > > > > > > > > > > > > > > > On 2016/5/5 17:54, David Laight wrote: > > > > > > > > > > > > > > > > > From: Tian, Kevin > > > > > > > > >> Sent: 05 May 2016 10:37 > > > > > > > > > ... > > > > > > > > >>> Acutually, we are not aimed at accessing MSI-X table from > > > > > > > > >>> guest. So I think it's safe to passthrough MSI-X table if we > > > > > > > > >>> can make sure guest kernel would not touch MSI-X table in > > > > > > > > >>> normal code path such as para-virtualized guest kernel on > > > > > > > > >>> PPC64. > > > > > > > > >>> > > > > > > > > >> Then how do you prevent malicious guest kernel accessing it? > > > > > > > > >> > > > > > > > > > Or a malicious guest driver for an ethernet card setting up > > > > > > > > > the receive buffer ring to contain a single word entry that > > > > > > > > > contains the address associated with an MSI-X interrupt and > > > > > > > > > then using a loopback mode to cause a specific packet be > > > > > > > > > received that writes the required word through that address. > > > > > > > > > > > > > > > > > > Remember the PCIe cycle for an interrupt is a normal memory > > > > > > > > > write > > > > > > > > > cycle. > > > > > > > > > > > > > > > > > > David > > > > > > > > > > > > > > > > > > > > > > > > > If we have enough permission to load a malicious driver or > > > > > > > > kernel, we can easily break the guest without exposed > > > > > > > > MSI-X table. > > > > > > > > > > > > > > > > I think it should be safe to expose MSI-X table if we can > > > > > > > > make sure that malicious guest driver/kernel can't use > > > > > > > > the MSI-X table to break other guest or host. The > > > > > > > > capability of IRQ remapping could provide this > > > > > > > > kind of protection. > > > > > > > > > > > > > > > > > > > > > > With IRQ remapping it doesn't mean you can pass through MSI-X > > > > > > > structure to guest. I know actual IRQ remapping might be platform > > > > > > > specific, but at least for Intel VT-d specification, MSI-X entry > > > > > > > must > > > > > > > be configured with a remappable format by host kernel which > > > > > > > contains an index into IRQ remapping table. The index will find a > > > > > > > IRQ remapping entry which controls interrupt routing for a > > > > > > > specific > > > > > > > device. If you allow a malicious program random index into MSI-X > > > > > > > entry of assigned device, the hole is obvious... > > > > > > > > > > > > > > Above might make sense only for a IRQ remapping implementation > > > > > > > which doesn't rely on extended MSI-X format (e.g. simply based on > > > > > > > BDF). If that's the case for PPC, then you should build MSI-X > > > > > > > passthrough based on this fact instead of general IRQ remapping > > > > > > > enabled or not. > > > > > > > > > > > > I don't think anyone is expecting that we can expose the MSI-X > > > > > > vector > > > > > > table to the guest and the guest can make direct use of it. The end > > > > > > goal here is that the guest on a power system is already > > > > > > paravirtualized to not program the device MSI-X by directly writing > > > > > > to > > > > > > the MSI-X vector table. They have hypercalls for this since they > > > > > > always run virtualized. Therefore a) they never intend to touch the > > > > > > MSI-X vector table and b) they have sufficient isolation that a > > > > > > guest > > > > > > can only hurt itself by doing so. > > > > > > > > > > > > On x86 we don't have a), our method of programming the MSI-X vector > > > > > > table is to directly write to it. Therefore we will always require > > > > > > QEMU > > > > > > to place a MemoryRegion over the vector table to intercept those > > > > > > accesses. However with interrupt remapping, we do have b) on x86, > > > > > > which > > > > > > means that we don't need to be so strict in disallowing user > > > > > > accesses > > > > > > to the MSI-X vector table. It's not useful for configuring MSI-X on > > > > > > the device, but the user should only be able
[PATCH] powerpc: Discard ffs() function and use builtin_ffs instead
With the ffs() function as defined in arch/powerpc/include/asm/bitops.h GCC will not optimise the code in case of constant parameter, as shown by the small exemple below. int ffs_test(void) { return 4 << ffs(31); } c0012334 : c0012334: 39 20 00 01 li r9,1 c0012338: 38 60 00 04 li r3,4 c001233c: 7d 29 00 34 cntlzw r9,r9 c0012340: 21 29 00 20 subfic r9,r9,32 c0012344: 7c 63 48 30 slw r3,r3,r9 c0012348: 4e 80 00 20 blr With this patch, the same function will compile as follows: c0012334 : c0012334: 38 60 00 08 li r3,8 c0012338: 4e 80 00 20 blr Signed-off-by: Christophe Leroy--- arch/powerpc/include/asm/bitops.h | 11 +-- 1 file changed, 1 insertion(+), 10 deletions(-) diff --git a/arch/powerpc/include/asm/bitops.h b/arch/powerpc/include/asm/bitops.h index 59abc62..75e3ebb 100644 --- a/arch/powerpc/include/asm/bitops.h +++ b/arch/powerpc/include/asm/bitops.h @@ -218,16 +218,7 @@ static __inline__ unsigned long __ffs(unsigned long x) return __ilog2(x & -x); } -/* - * ffs: find first bit set. This is defined the same way as - * the libc and compiler builtin ffs routines, therefore - * differs in spirit from the above ffz (man ffs). - */ -static __inline__ int ffs(int x) -{ - unsigned long i = (unsigned long)x; - return __ilog2(i & -i) + 1; -} +#include /* * fls: find last (most-significant) bit set. -- 2.1.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] kvm-pr: manage illegal instructions
On 12/05/2016 11:27, Alexander Graf wrote: > On 05/12/2016 11:10 AM, Laurent Vivier wrote: >> >> On 11/05/2016 13:49, Alexander Graf wrote: >>> On 05/11/2016 01:14 PM, Laurent Vivier wrote: On 11/05/2016 12:35, Alexander Graf wrote: > On 03/15/2016 09:18 PM, Laurent Vivier wrote: >> While writing some instruction tests for kvm-unit-tests for powerpc, >> I've found that illegal instructions are not managed correctly with >> kvm-pr, >> while it is fine with kvm-hv. >> >> When an illegal instruction (like ".long 0") is processed by kvm-pr, >> the kernel logs are filled with: >> >> Couldn't emulate instruction 0x (op 0 xop 0) >> kvmppc_handle_exit_pr: emulation at 700 failed () >> >> While the exception handler receives an interrupt for each >> instruction >> executed after the illegal instruction. >> >> Signed-off-by: Laurent Vivier>> --- >> arch/powerpc/kvm/book3s_emulate.c | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/arch/powerpc/kvm/book3s_emulate.c >> b/arch/powerpc/kvm/book3s_emulate.c >> index 2afdb9c..4ee969d 100644 >> --- a/arch/powerpc/kvm/book3s_emulate.c >> +++ b/arch/powerpc/kvm/book3s_emulate.c >> @@ -99,7 +99,6 @@ int kvmppc_core_emulate_op_pr(struct kvm_run *run, >> struct kvm_vcpu *vcpu, >> switch (get_op(inst)) { >> case 0: >> -emulated = EMULATE_FAIL; >> if ((kvmppc_get_msr(vcpu) & MSR_LE) && >> (inst == swab32(inst_sc))) { >> /* >> @@ -112,6 +111,9 @@ int kvmppc_core_emulate_op_pr(struct kvm_run >> *run, >> struct kvm_vcpu *vcpu, >> kvmppc_set_gpr(vcpu, 3, EV_UNIMPLEMENTED); >> kvmppc_set_pc(vcpu, kvmppc_get_pc(vcpu) + 4); >> emulated = EMULATE_DONE; >> +} else { >> +kvmppc_core_queue_program(vcpu, SRR1_PROGILL); > But isn't that exactly what the semantic of EMULATE_FAIL is? Fixing it > up in book3s_emulate.c is definitely the wrong spot. > > So what is the problem you're trying to solve? Is the SRR0 at the > wrong > spot or are the log messages the problem? No, the problem is the host kernel logs are filled by the message and the execution hangs. And the host becomes unresponsiveness, even after the end of the tests. Please, try to run kvm-unit-tests (the emulator test) on a KVM-PR host, and check the kernel logs (dmesg), then try to ssh to the host... >>> Ok, so the log messages are the problem. Please fix the message output >>> then - or remove it altogether. Or if you like, create a module >>> parameter that allows you to emit them. >>> >>> I personally think the best solution would be to just convert the >>> message into a trace point. >>> >>> While at it, please see whether the guest can trigger similar host log >>> output excess in other code paths. >> The problem is not really with the log messages: they are consequence of >> the bug I try to fix. >> >> What happens is once kvm_pr decodes an invalid instruction all the valid >> following instructions trigger a Program exception to the guest (but are >> executed correctly). It has no real consequence on big machine like >> POWER8, except that the guest become very slow and the log files of the >> host are filled with messages (and qemu uses 100% of the CPU). On a >> smaller machine like a PowerMac G5, the machine becomes simply unusable. > > It's probably more related to your verbosity level of kernel messages. > If you pass loglevel=0 (or quiet) to you kernel cmdline you won't get > the messages printed to serial which is what's slowing you down. > > The other problem sounds pretty severe, but the only thing your patch > does any different from the current code flow would be the patch below. > Or did I miss anything? > > diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c > index 5cc2e7a..4672bc2 100644 > --- a/arch/powerpc/kvm/emulate.c > +++ b/arch/powerpc/kvm/emulate.c > @@ -302,7 +302,11 @@ int kvmppc_emulate_instruction(struct kvm_run *run, > struct kvm_vcpu *vcpu) > advance = 0; > printk(KERN_ERR "Couldn't emulate instruction > 0x%08x " >"(op %d xop %d)\n", inst, get_op(inst), > get_xop(inst)); > +#ifdef CONFIG_PPC_BOOK3S > + kvmppc_core_queue_program(vcpu, SRR1_PROGILL); > +#else > kvmppc_core_queue_program(vcpu, 0); > +#endif > } > } > OK, that fixes the problem. Thanks. :) Could you apply it? Laurent ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
ppc64 sbrk returns executable heap in 32-bit emulation mode
We noticed that on ppc64, the sbrk system call in the 32-bit subsystem returns executable memory. I assume it is related to this, in arch/powerpc/include/asm/page.h: /* * Unfortunately the PLT is in the BSS in the PPC32 ELF ABI, * and needs to be executable. This means the whole heap ends * up being executable. */ #define VM_DATA_DEFAULT_FLAGS32 (VM_READ | VM_WRITE | VM_EXEC | \ VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) What is the rationale for this? This comment must be *really* old, because ld.so in glibc should make sure that the PLT is executable. And for current binaries, .bss is *not* executable, contrary to what the comment suggests. Is this comment about pre-ELF binaries? If yes, would it possible to change the default for ELF binaries? Thanks, Florian ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[FIX PATCH v2 1/2] powerpc, numa: Fix whitespace in hot_add_drconf_memory_max()
Signed-off-by: Bharata B Rao--- arch/powerpc/mm/numa.c | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 669a15e..4a87ccb 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -1163,18 +1163,18 @@ int hot_add_scn_to_nid(unsigned long scn_addr) static u64 hot_add_drconf_memory_max(void) { -struct device_node *memory = NULL; -unsigned int drconf_cell_cnt = 0; -u64 lmb_size = 0; + struct device_node *memory = NULL; + unsigned int drconf_cell_cnt = 0; + u64 lmb_size = 0; const __be32 *dm = NULL; -memory = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory"); -if (memory) { -drconf_cell_cnt = of_get_drconf_memory(memory, ); -lmb_size = of_get_lmb_size(memory); -of_node_put(memory); -} -return lmb_size * drconf_cell_cnt; + memory = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory"); + if (memory) { + drconf_cell_cnt = of_get_drconf_memory(memory, ); + lmb_size = of_get_lmb_size(memory); + of_node_put(memory); + } + return lmb_size * drconf_cell_cnt; } /* -- 2.1.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[FIX PATCH v2 2/2] powerpc,numa: Fix memory_hotplug_max()
memory_hotplug_max() uses hot_add_drconf_memory_max() to get maxmimum addressable memory by referring to ibm,dyanamic-memory property. There are three problems with the current approach: 1 hot_add_drconf_memory_max() assumes that ibm,dynamic-memory includes all the LMBs of the guest, but that is not true for PowerKVM which populates only DR LMBs (LMBs that can be hotplugged/removed) in that property. 2 hot_add_drconf_memory_max() multiplies lmb-size with lmb-count to arrive at the max possible address. Since ibm,dynamic-memory doesn't include RMA LMBs, the address thus obtained will be less than the actual max address. For example, if max possible memory size is 32G, with lmb-size of 256MB there can be 127 LMBs in ibm,dynamic-memory (1 LMB for RMA which won't be present here). hot_add_drconf_memory_max() would then return the max addressable memory as 127 * 256MB = 31.75GB, the max address should have been 32G which is what ibm,lrdr-capacity shows. 3 In PowerKVM, there can be a gap between the end of boot time RAM and beginning of hotplug RAM area. So just multiplying lmb-count with lmb-size will not provide the correct max possible address for PowerKVM. This patch fixes 1 by using ibm,lrdr-capacity property to return the max addressable memory whenever the property is present. Then it fixes 2 & 3 by fetching the address of the last LMB in ibm,dynamic-memory property. Signed-off-by: Bharata B Rao--- arch/powerpc/mm/numa.c | 18 +- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 4a87ccb..f8b1da7 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -1164,17 +1164,33 @@ int hot_add_scn_to_nid(unsigned long scn_addr) static u64 hot_add_drconf_memory_max(void) { struct device_node *memory = NULL; + struct device_node *dn = NULL; unsigned int drconf_cell_cnt = 0; u64 lmb_size = 0; const __be32 *dm = NULL; + const __be64 *lrdr = NULL; + struct of_drconf_cell drmem; + + dn = of_find_node_by_path("/rtas"); + if (dn) { + lrdr = of_get_property(dn, "ibm,lrdr-capacity", NULL); + of_node_put(dn); + if (lrdr) + return be64_to_cpup(lrdr); + } memory = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory"); if (memory) { drconf_cell_cnt = of_get_drconf_memory(memory, ); lmb_size = of_get_lmb_size(memory); + + /* Advance to the last cell, each cell has 6 32 bit integers */ + dm += (drconf_cell_cnt - 1) * 6; + read_drconf_cell(, ); of_node_put(memory); + return drmem.base_addr + lmb_size; } - return lmb_size * drconf_cell_cnt; + return 0; } /* -- 2.1.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[FIX PATCH v2 0/2] Fix powerpc,numa: Fix memory_hotplug_max()
This patchset fixes memory_hotplug_max() routine to return correct value of maximum hotpluggable address. In this version, whitespace fixes are separated into a different patch. v2: https://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg103342.html Bharata B Rao (2): powerpc,numa: Fix whitespace in hot_add_drconf_memory_max() powerpc,numa: Fix memory_hotplug_max() arch/powerpc/mm/numa.c | 36 ++-- 1 file changed, 26 insertions(+), 10 deletions(-) -- 2.1.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: usb: dwc2: regression on MyBook Live Duo / Canyonlands since 4.3.0-rc4
On Thursday, May 12, 2016 01:55:44 PM Arnd Bergmann wrote: > On Thursday 12 May 2016 11:58:18 Christian Lamparter wrote: > > > > > Detecting the endianess of the > > > > > device is probably the best future-proof solution, but it's also > > > > > considerably more work to do in the driver, and comes with a > > > > > tiny runtime overhead. > > > > > > > > The runtime overhead is probably non-measurable compared with the cost > > > > of the actual MMIOs. > > > > > > Right. The code size increase is probably measurable (but still small), > > > the runtime overhead is not. > > > > Ok, so no rebuts or complains have been posted. > > > > I've tested the patch you made in: https://lkml.org/lkml/2016/5/9/354 > > and it works: > > > > Tested-by: Christian Lamparter> > > > So, how do we go from here? There is are two small issues with the > > original patch (#ifdef DWC2_LOG_WRITES got converted to lower case: > > #ifdef dwc2_log_writes) and I guess a proper subject would be nice. > > > > Arnd, can you please respin and post it (cc'd stable as well)? > > So this is can be picked up? Or what's your plan? > > (I just realized my reply was stuck in my outbox, so the patch > went out first) > > If I recall correctly, the rough consensus was to go with your longer > patch in the future (fixed up for the comments that BenH and > I sent), and I'd suggest basing it on top of a fixed version of > my patch. Well, but it comes with the "overhead"! So this was just as I said: "Let's look at it and see if it's any good"... And I think it isn't since the usb/host/ehci people also opted for #ifdef CONFIG_BIG_ENDIAN archs etc... > Felipe just had another idea, to change the endianess of the dwc2 > block by setting a registers (if that exists). That would indeed > be preferable, then we can just revert the broken change that > went into 4.4 and backport that fix instead. Just a quick reply. I have the docs for the thing. There's something like that in GAHBCFG at Bit 24... BUT it only switches the endiannes for the DMA descriptors (which is not always used, there are devices with PIO only)! It doesn't deal with the MMIO access at all. The pin that would select which endian the device uses is probably connected to a DCR or GPIO but I don't know which or where so this is more or less useless. (Or the selectable endianness was dropped during synth). Regards, Christian ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 2/2] powerpc/fadump: add support to parse size based on memory range
Currently, memory for fadump can be specified with fadump_reserve_mem=size, where only a fixed size can be specified. Add the below syntax as well, to support conditional reservation based on system memory size: fadump_reserve_mem=:[,:,...] This syntax helps using the same commandline parameter for different system memory sizes. Signed-off-by: Hari Bathini--- Changes from v1: 1. Changed subject from "powerpc/fadump: add support to specify memory range based size" 2. Reused crashkernel parsing code that was moved to kernel/params.c (see patch 1/2) arch/powerpc/kernel/fadump.c | 64 -- 1 file changed, 55 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index d0af58b..a868281 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -193,6 +193,56 @@ static unsigned long init_fadump_mem_struct(struct fadump_mem_struct *fdm, return addr; } +/* + * This function parses command line for fadump_reserve_mem= + * + * Supports the below two syntaxes: + *1. fadump_reserve_mem=size + *2. fadump_reserve_mem=ramsize-range:size[,...] + * + * Sets fw_dump.reserve_bootvar with the memory size + * provided, 0 otherwise + * + * The function returns -EINVAL on failure, 0 otherwise. + */ +static int __init parse_fadump_reserve_mem(void) +{ + char *name = "fadump_reserve_mem="; + char *fadump_cmdline = NULL, *cur; + + fw_dump.reserve_bootvar = 0; + + /* find fadump_reserve_mem and use the last one if there are many */ + cur = strstr(boot_command_line, name); + while (cur) { + fadump_cmdline = cur; + cur = strstr(cur+1, name); + } + + /* when no fadump_reserve_mem= cmdline option is provided */ + if (!fadump_cmdline) + return 0; + + fadump_cmdline += strlen(name); + + /* for fadump_reserve_mem=size cmdline syntax */ + if (!is_param_range_based(fadump_cmdline)) { + fw_dump.reserve_bootvar = memparse(fadump_cmdline, NULL); + return 0; + } + + /* for fadump_reserve_mem=ramsize-range:size[,...] cmdline syntax */ + cur = fadump_cmdline; + fw_dump.reserve_bootvar = parse_mem_range_size("fadump_reserve_mem", + , memblock_phys_mem_size()); + if (cur == fadump_cmdline) { + printk(KERN_INFO "fadump_reserve_mem: Invaild syntax!\n"); + return -EINVAL; + } + + return 0; +} + /** * fadump_calculate_reserve_size(): reserve variable boot area 5% of System RAM * @@ -212,12 +262,17 @@ static inline unsigned long fadump_calculate_reserve_size(void) { unsigned long size; + /* sets fw_dump.reserve_bootvar */ + parse_fadump_reserve_mem(); + /* * Check if the size is specified through fadump_reserve_mem= cmdline * option. If yes, then use that. */ if (fw_dump.reserve_bootvar) return fw_dump.reserve_bootvar; + else + printk(KERN_INFO "fadump: calculating default boot size\n"); /* divide by 20 to get 5% of value */ size = memblock_end_of_DRAM() / 20; @@ -352,15 +407,6 @@ static int __init early_fadump_param(char *p) } early_param("fadump", early_fadump_param); -/* Look for fadump_reserve_mem= cmdline option */ -static int __init early_fadump_reserve_mem(char *p) -{ - if (p) - fw_dump.reserve_bootvar = memparse(p, ); - return 0; -} -early_param("fadump_reserve_mem", early_fadump_reserve_mem); - static void register_fw_dump(struct fadump_mem_struct *fdm) { int rc; ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/2] Refactor code parsing size based on memory range
Currently, crashkernel parameter supports the below syntax to parse size based on memory range: crashkernel=:[,:,...] While such parsing is implemented for crashkernel parameter, it applies to other parameters with similar syntax. So, move this code to a more generic place for code reuse. Signed-off-by: Hari Bathini--- While this patch in itself has nothing to do with powerpc, the powerpc patch (2/2) depends on this patch.. include/linux/kernel.h |5 +++ kernel/kexec_core.c| 63 +++- kernel/params.c| 96 3 files changed, 106 insertions(+), 58 deletions(-) diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 2f7775e..e755ed1 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -429,6 +429,11 @@ extern char *get_options(const char *str, int nints, int *ints); extern unsigned long long memparse(const char *ptr, char **retptr); extern bool parse_option_str(const char *str, const char *option); +extern bool __init is_param_range_based(const char *cmdline); +extern unsigned long long __init parse_mem_range_size(const char *param, + char **str, + unsigned long long system_ram); + extern int core_kernel_text(unsigned long addr); extern int core_kernel_data(unsigned long addr); extern int __kernel_text_address(unsigned long addr); diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index 1391d3e..71e92b2 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -1084,59 +1084,9 @@ static int __init parse_crashkernel_mem(char *cmdline, char *cur = cmdline, *tmp; /* for each entry of the comma-separated list */ - do { - unsigned long long start, end = ULLONG_MAX, size; - - /* get the start of the range */ - start = memparse(cur, ); - if (cur == tmp) { - pr_warn("crashkernel: Memory value expected\n"); - return -EINVAL; - } - cur = tmp; - if (*cur != '-') { - pr_warn("crashkernel: '-' expected\n"); - return -EINVAL; - } - cur++; - - /* if no ':' is here, than we read the end */ - if (*cur != ':') { - end = memparse(cur, ); - if (cur == tmp) { - pr_warn("crashkernel: Memory value expected\n"); - return -EINVAL; - } - cur = tmp; - if (end <= start) { - pr_warn("crashkernel: end <= start\n"); - return -EINVAL; - } - } - - if (*cur != ':') { - pr_warn("crashkernel: ':' expected\n"); - return -EINVAL; - } - cur++; - - size = memparse(cur, ); - if (cur == tmp) { - pr_warn("Memory value expected\n"); - return -EINVAL; - } - cur = tmp; - if (size >= system_ram) { - pr_warn("crashkernel: invalid size\n"); - return -EINVAL; - } - - /* match ? */ - if (system_ram >= start && system_ram < end) { - *crash_size = size; - break; - } - } while (*cur++ == ','); + *crash_size = parse_mem_range_size("crashkernel", , system_ram); + if (cur == cmdline) + return -EINVAL; if (*crash_size > 0) { while (*cur && *cur != ' ' && *cur != '@') @@ -1273,7 +1223,6 @@ static int __init __parse_crashkernel(char *cmdline, const char *name, const char *suffix) { - char*first_colon, *first_space; char*ck_cmdline; BUG_ON(!crash_size || !crash_base); @@ -1291,12 +1240,10 @@ static int __init __parse_crashkernel(char *cmdline, return parse_crashkernel_suffix(ck_cmdline, crash_size, suffix); /* -* if the commandline contains a ':', then that's the extended +* if the parameter is range based, then that's the extended * syntax -- if not, it must be the classic syntax */ - first_colon = strchr(ck_cmdline, ':'); - first_space = strchr(ck_cmdline, ' '); - if (first_colon && (!first_space || first_colon < first_space)) + if (is_param_range_based(ck_cmdline)) return parse_crashkernel_mem(ck_cmdline, system_ram,
Re: usb: dwc2: regression on MyBook Live Duo / Canyonlands since 4.3.0-rc4
On Thursday 12 May 2016 11:58:18 Christian Lamparter wrote: > > > > Detecting the endianess of the > > > > device is probably the best future-proof solution, but it's also > > > > considerably more work to do in the driver, and comes with a > > > > tiny runtime overhead. > > > > > > The runtime overhead is probably non-measurable compared with the cost > > > of the actual MMIOs. > > > > Right. The code size increase is probably measurable (but still small), > > the runtime overhead is not. > > Ok, so no rebuts or complains have been posted. > > I've tested the patch you made in: https://lkml.org/lkml/2016/5/9/354 > and it works: > > Tested-by: Christian Lamparter> > So, how do we go from here? There is are two small issues with the > original patch (#ifdef DWC2_LOG_WRITES got converted to lower case: > #ifdef dwc2_log_writes) and I guess a proper subject would be nice. > > Arnd, can you please respin and post it (cc'd stable as well)? > So this is can be picked up? Or what's your plan? (I just realized my reply was stuck in my outbox, so the patch went out first) If I recall correctly, the rough consensus was to go with your longer patch in the future (fixed up for the comments that BenH and I sent), and I'd suggest basing it on top of a fixed version of my patch. Felipe just had another idea, to change the endianess of the dwc2 block by setting a registers (if that exists). That would indeed be preferable, then we can just revert the broken change that went into 4.4 and backport that fix instead. Arnd ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] usb: dwc2: fix regression on big-endian PowerPC/ARM systems
On Thursday 12 May 2016 14:43:43 Felipe Balbi wrote: > >> How many more drivers will we have to 'fix' like this ? > > > > Endianess problems will keep coming up, and we have hundreds or thousands > > of drivers that are written with a particular design in mind that could > > be wrong as soon as someone chooses to build an SoC that does things > > differently. Once that happens, we'll fix them. > > > > Also, Christian has already posted a better version of the patch > > that fixes this driver in an architecture independent way, but we still > > need a workaround for the stable backports. > > hmmm, at least dwc3 (also from SNPS) has a couple bits where we can > choose endianess for registers and DMA descriptors. John, do we have the > same for dwc2 ? Wouldn't that be a better way to solve the problem ? Yes, I think that would be the best solution (provided it works correctly). My understanding is that the descriptors don't need to change for the particular MIPS machine, only the registers do. If we have another machine that requires the descriptor endianess to be flipped from the default, we probably need a DT property or platform_data flag to encode that. We can do the register endianess detection from Christian's patch to flip it around if necessary, and then revert back to the previous state of always using readl/writel. Arnd ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] usb: dwc2: fix regression on big-endian PowerPC/ARM systems
Hi, (Arnd, you didn't Cc dwc2's maintainer. I'm also not part of TI anymore) Arnd Bergmannwrites: > On Thursday 12 May 2016 14:25:49 Felipe Balbi wrote: >> > { >> > u32 value = __raw_readl(addr); >> > >> > - /* In order to preserve endianness __raw_* operation is used. >> > Therefore >> > - * a barrier is needed to ensure IO access is not re-ordered across >> > + /* in order to preserve endianness __raw_* operation is used. >> > therefore >> > + * a barrier is needed to ensure io access is not re-ordered across >> >* reads or writes >> >*/ >> > mb(); >> > @@ -81,15 +93,32 @@ static inline void dwc2_writel(u32 value, void __iomem >> > *addr) >> > __raw_writel(value, addr); >> > >> > /* >> > - * In order to preserve endianness __raw_* operation is used. >> > Therefore >> > - * a barrier is needed to ensure IO access is not re-ordered across >> > + * in order to preserve endianness __raw_* operation is used. >> > therefore >> > + * a barrier is needed to ensure io access is not re-ordered across >> >* reads or writes >> >*/ >> > mb(); >> > -#ifdef DWC2_LOG_WRITES >> > - pr_info("INFO:: wrote %08x to %p\n", value, addr); >> > +#ifdef dwc2_log_writes >> > + pr_info("info:: wrote %08x to %p\n", value, addr); >> > #endif >> > } >> > +#else > > Oops, the accidental lowercase conversion is still in here, I'll fix it > up once we agree on the approach. > >> I still think this is something that should be handled at MIPS side, no ? > > As I explained, there isn't really anything we can do in MIPS code > because of the way they have to handle PCI. > >> How many more drivers will we have to 'fix' like this ? > > Endianess problems will keep coming up, and we have hundreds or thousands > of drivers that are written with a particular design in mind that could > be wrong as soon as someone chooses to build an SoC that does things > differently. Once that happens, we'll fix them. > > Also, Christian has already posted a better version of the patch > that fixes this driver in an architecture independent way, but we still > need a workaround for the stable backports. hmmm, at least dwc3 (also from SNPS) has a couple bits where we can choose endianess for registers and DMA descriptors. John, do we have the same for dwc2 ? Wouldn't that be a better way to solve the problem ? -- balbi ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] usb: dwc2: fix regression on big-endian PowerPC/ARM systems
On Thursday 12 May 2016 14:25:49 Felipe Balbi wrote: > > { > > u32 value = __raw_readl(addr); > > > > - /* In order to preserve endianness __raw_* operation is used. > > Therefore > > - * a barrier is needed to ensure IO access is not re-ordered across > > + /* in order to preserve endianness __raw_* operation is used. > > therefore > > + * a barrier is needed to ensure io access is not re-ordered across > >* reads or writes > >*/ > > mb(); > > @@ -81,15 +93,32 @@ static inline void dwc2_writel(u32 value, void __iomem > > *addr) > > __raw_writel(value, addr); > > > > /* > > - * In order to preserve endianness __raw_* operation is used. > > Therefore > > - * a barrier is needed to ensure IO access is not re-ordered across > > + * in order to preserve endianness __raw_* operation is used. > > therefore > > + * a barrier is needed to ensure io access is not re-ordered across > >* reads or writes > >*/ > > mb(); > > -#ifdef DWC2_LOG_WRITES > > - pr_info("INFO:: wrote %08x to %p\n", value, addr); > > +#ifdef dwc2_log_writes > > + pr_info("info:: wrote %08x to %p\n", value, addr); > > #endif > > } > > +#else Oops, the accidental lowercase conversion is still in here, I'll fix it up once we agree on the approach. > I still think this is something that should be handled at MIPS side, no ? As I explained, there isn't really anything we can do in MIPS code because of the way they have to handle PCI. > How many more drivers will we have to 'fix' like this ? Endianess problems will keep coming up, and we have hundreds or thousands of drivers that are written with a particular design in mind that could be wrong as soon as someone chooses to build an SoC that does things differently. Once that happens, we'll fix them. Also, Christian has already posted a better version of the patch that fixes this driver in an architecture independent way, but we still need a workaround for the stable backports. Arnd ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v9 26/26] powerpc/powernv: Exclude root bus in pnv_pci_reset_secondary_bus()
On Thu, 2016-05-12 at 13:48 +1000, Gavin Shan wrote: > On Tue, May 03, 2016 at 03:41:45PM +1000, Gavin Shan wrote: > > The function pnv_pci_reset_secondary_bus() is called like below. > > It's impossible for call the function on root bus. So it's safe > > to remove the root bus case in the function. No functional changes > > introduced. > > > > pci_parent_bus_reset() / pci_bus_reset() / pci_try_reset_bus() > > pci_reset_bridge_secondary_bus() > > pcibios_reset_secondary_bus() > > pnv_pci_reset_secondary_bus() > > > > Signed-off-by: Gavin Shan> > Reviewed-by: Daniel Axtens > > Reviewed-by: Alexey Kardashevskiy > > --- > > arch/powerpc/platforms/powernv/eeh-powernv.c | 12 ++-- > > 1 file changed, 2 insertions(+), 10 deletions(-) > > > > diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c > > b/arch/powerpc/platforms/powernv/eeh-powernv.c > > index 9226df1..593b8dc 100644 > > --- a/arch/powerpc/platforms/powernv/eeh-powernv.c > > +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c > > @@ -868,16 +868,8 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, > > int option) > > > > void pnv_pci_reset_secondary_bus(struct pci_dev *dev) > > { > > - struct pci_controller *hose; > > - > > - if (pci_is_root_bus(dev->bus)) { > > - hose = pci_bus_to_host(dev->bus); > > - pnv_eeh_root_reset(hose, EEH_RESET_HOT); > > - pnv_eeh_root_reset(hose, EEH_RESET_DEACTIVATE); > > - } else { > > - pnv_eeh_bridge_reset(dev, EEH_RESET_HOT); > > - pnv_eeh_bridge_reset(dev, EEH_RESET_DEACTIVATE); > > - } > > + pnv_eeh_bridge_reset(dev, EEH_RESET_HOT); > > + pnv_eeh_bridge_reset(dev, EEH_RESET_DEACTIVATE); > > } > > Michael, please revert this one as it is already in linux-ppc-next > branch. Sorry for the overhead. Done. https://git.kernel.org/cgit/linux/kernel/git/powerpc/linux.git/commit/?h=next=848912e547c4569445a61203a7df402646a88c25 cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [kernel,2/2] powerpc/powernv/npu: Add PE to PHB's list
On Thu, 2016-12-05 at 05:47:10 UTC, Alexey Kardashevskiy wrote: > Before 3e68dc57 "powerpc/powernv: Remove DMA32 PE list", NPU PEs > were linked to the NPU PHB via phb->ioda.pe_dma_list; after that fix, > the phb->ioda.pe_list is used. > > During the pe_dma_list removal, list_add_tail(>ioda.pe_dma_list) > was removed, however no list_add() was added so does this patch. > > Signed-off-by: Alexey Kardashevskiy> Reviewed-by: Gavin Shan Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/1d4e89cf0a4e819fbde428e9e5 cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [kernel,1/2] powerpc/powernv: Fix insufficient memory allocation
On Thu, 2016-12-05 at 05:47:09 UTC, Alexey Kardashevskiy wrote: > The pnv_pci_init_ioda_phb() helper allocates a blob to store auxilary > data such PE and M32/M64 segment allocation maps; this single blob has few > partitions, size of each is derived from the PE number - > phb->ioda.total_pe_num. > > It was assumed that the minimum PE number is 8, however it is 4 for NPU > so the pe_alloc part was missing in the allocated blob. > It was invisible till recently as we were not tracking used M64 segments > and NPUs do not use M32 segments so the phb->ioda.m32_segmap > (which was pointing to the same address as phb->ioda.pe_alloc) > has never been written to leaving the pe_alloc memory intact. > > After 401203ac2d "powerpc/powernv: Track M64 segment consumption" > the pe_alloc gets corrupted and PE allocation cannot work. > This fixes the issue by enforcing the minimum PE number to 8. > > Signed-off-by: Alexey Kardashevskiy> Reviewed-by: Gavin Shan Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/92a86756904b127a3450262b33 cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [2/2, v3] powerpc/iommu: Remove the dependency on EEH struct in DDW mechanism
On Mon, 2016-11-04 at 19:17:23 UTC, "Guilherme G. Piccoli" wrote: > Commit 39baadbf36ce ("powerpc/eeh: Remove eeh information from pci_dn") > changed the pci_dn struct by removing its EEH-related members. > As part of this clean-up, DDW mechanism was modified to read the device > configuration address from eeh_dev struct. > > As a consequence, now if we disable EEH mechanism on kernel command-line > for example, the DDW mechanism will fail, generating a kernel oops by > dereferencing a NULL pointer (which turns to be the eeh_dev pointer). > > This patch just changes the configuration address calculation on DDW > functions to a manual calculation based on pci_dn members instead of > using eeh_dev-based address. > > No functional changes were made. This was tested on pSeries, both > in PHyp and qemu guest. > > Fixes: 39baadbf36ce ("powerpc/eeh: Remove eeh information from pci_dn") > > Cc: sta...@vger.kernel.org > Reviewed-by: Gavin Shan> Signed-off-by: Guilherme G. Piccoli Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/8445a87f7092bc8336ea1305be cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [1/2, v3] Revert "powerpc/eeh: Fix crash in eeh_add_device_early() on Cell"
On Mon, 2016-11-04 at 19:17:22 UTC, "Guilherme G. Piccoli" wrote: > This reverts commit 89a51df5ab1d38b257300b8ac940bbac3bb0eb9b. > > The function eeh_add_device_early() is used to perform EEH initialization in > devices added later on the system, like in hotplug/DLPAR scenarios. Since the > commit 89a51df5ab1d ("powerpc/eeh: Fix crash in eeh_add_device_early() on > Cell") > a new check was introduced in this function - Cell has no EEH capabilities > which led to kernel oops if hotplug was performed, so checking for > eeh_enabled() was introduced to avoid the issue. > > However, in architectures that EEH is present like pSeries or PowerNV, we > might > reach a case in which no PCI devices are present on boot time and so EEH is > not > initialized. Then, if a device is added via DLPAR for example, > eeh_add_device_early() fails because eeh_enabled() is false, and EEH end up > not being enabled at all. > > This reverts the aforementioned patch since a new verification was introduced > by > the commit d91dafc02f42 ("powerpc/eeh: Delay probing EEH device during > hotplug") > and so the original Cell issue does not happen anymore. > > Cc: sta...@vger.kernel.org > Reviewed-by: Gavin Shan> Signed-off-by: Guilherme G. Piccoli Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/c2078d9ef600bdbe568c89e5dd cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v3, 1/4] powerpc/eeh: Don't report error in eeh_pe_reset_and_recover()
On Wed, 2016-27-04 at 01:14:50 UTC, Gavin Shan wrote: > The function eeh_pe_reset_and_recover() is used to recover EEH > error when the passthrough device are transferred to guest and > backwards, meaning the device's driver is vfio-pci or none. > When the driver is vfio-pci that provides error_detected() error > handler only, the handler simply stops the guest and it's not > expected behaviour. On the other hand, no error handlers will > be called if we don't have a bound driver. > > This ignores the error handler in eeh_pe_reset_and_recover() > that reports the error to device driver to avoid the exceptional > behaviour. > > Fixes: 5cfb20b9 ("powerpc/eeh: Emulate EEH recovery for VFIO devices") > Cc: sta...@vger.kernel.org #v3.18+ > Signed-off-by: Gavin Shan> Reviewed-by: Russell Currey Series applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/affeb0f2d3a9af419ad7ef4ac7 cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc: Use privileged SPR number for MMCR2
We are already using the privileged versions of MMCR0, MMCR1 and MMCRA in the kernel, so for MMCR2, we should better use the privileged versions, too, to be consistent. Suggested-by: Paul MackerrasSigned-off-by: Thomas Huth --- arch/powerpc/include/asm/reg.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index ce3e1b7..166d863 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -715,7 +715,7 @@ #define MMCR0_FCWAIT 0x0002UL /* freeze counter in WAIT state */ #define MMCR0_FCHV 0x0001UL /* freeze conditions in hypervisor mode */ #define SPRN_MMCR1 798 -#define SPRN_MMCR2 769 +#define SPRN_MMCR2 785 #define SPRN_MMCRA 0x312 #define MMCRA_SDSYNC 0x8000UL /* SDAR synced with SIAR */ #define MMCRA_SDAR_DCACHE_MISS 0x4000UL -- 1.8.3.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] usb: dwc2: fix regression on big-endian PowerPC/ARM systems
Hi, Arnd Bergmannwrites: > A patch that went into Linux-4.4 to fix big-endian mode on a Lantiq > MIPS system unfortunately broke big-endian operation on PowerPC > APM82181 as reported by Christian Lamparter, and likely other > systems. > > It actually introduced multiple issues: > > - it broke big-endian ARM kernels: any machine that was working > correctly with a little-endian kernel is no longer using byteswaps > on big-endian kernels, which clearly breaks them. > - On PowerPC the same thing must be true: if it was working before, > using big-endian kernels is now broken. Unlike ARM, 32-bit PowerPC > usually uses big-endian kernels, so they are likely all broken. > - The barrier for dwc2_writel is on the wrong side of the __raw_writel(), > so the MMIO no longer synchronizes with DMA operations. > - On architectures that require specific CPU instructions for MMIO > access, using the __raw_ variant may turn this into a pointer > dereference that does not have the same effect as the readl/writel. > > This patch is a simple revert for all architectures other than MIPS, > in the hope that we can more easily backport it to fix the regression > on PowerPC and ARM systems without breaking the Lantiq system again. > > We should follow this up with a more elaborate change to add runtime > detection of endianess, to make sure it also works on all other > combinations of architectures and implementations of the usb-dwc2 > device. That patch however will be fairly large and not appropriate > for backports to stable kernels. > > Signed-off-by: Arnd Bergmann > Fixes: 95c8bc360944 ("usb: dwc2: Use platform endianness when accessing > registers") > --- > drivers/usb/dwc2/core.h | 41 +++-- > 1 file changed, 35 insertions(+), 6 deletions(-) > > diff --git a/drivers/usb/dwc2/core.h b/drivers/usb/dwc2/core.h > index 3c58d633ce80..1f8ed149a40f 100644 > --- a/drivers/usb/dwc2/core.h > +++ b/drivers/usb/dwc2/core.h > @@ -64,12 +64,24 @@ > DWC2_TRACE_SCHEDULER_VB(pr_fmt("%s: SCH: " fmt),\ > dev_name(hsotg->dev), ##__VA_ARGS__) > > + > +#ifdef CONFIG_MIPS > +/* > + * There are some MIPS machines that can run in either big-endian > + * or little-endian mode and that use the dwc2 register without > + * a byteswap in both ways. > + * Unlike other architectures, MIPS does not require a barrier > + * before the __raw_writel() to synchronize with DMA but does > + * require the barrier after the writel() to serialize a series > + * of writes. This set of operations was added specifically for > + * MIPS and should only be used there. > + */ > static inline u32 dwc2_readl(const void __iomem *addr) > { > u32 value = __raw_readl(addr); > > - /* In order to preserve endianness __raw_* operation is used. Therefore > - * a barrier is needed to ensure IO access is not re-ordered across > + /* in order to preserve endianness __raw_* operation is used. therefore > + * a barrier is needed to ensure io access is not re-ordered across >* reads or writes >*/ > mb(); > @@ -81,15 +93,32 @@ static inline void dwc2_writel(u32 value, void __iomem > *addr) > __raw_writel(value, addr); > > /* > - * In order to preserve endianness __raw_* operation is used. Therefore > - * a barrier is needed to ensure IO access is not re-ordered across > + * in order to preserve endianness __raw_* operation is used. therefore > + * a barrier is needed to ensure io access is not re-ordered across >* reads or writes >*/ > mb(); > -#ifdef DWC2_LOG_WRITES > - pr_info("INFO:: wrote %08x to %p\n", value, addr); > +#ifdef dwc2_log_writes > + pr_info("info:: wrote %08x to %p\n", value, addr); > #endif > } > +#else I still think this is something that should be handled at MIPS side, no ? How many more drivers will we have to 'fix' like this ? -- balbi ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc: Fix definition of SIAR and SDAR registers
The SIAR and SDAR registers are available twice, one time as SPRs 780 / 781 (unprivileged, but read-only), and one time as the SPRs 796 / 797 (privileged, but read and write). The Linux kernel code currently uses the unprivileged SPRs - while this is OK for reading, writing to that register of course does not work. Since the KVM code tries to write to this register, too (see the mtspr in book3s_hv_rmhandlers.S), the contents of this register sometimes get lost for the guests, e.g. during migration of a VM. To fix this issue, simply switch to the privileged SPR numbers instead. Signed-off-by: Thomas Huth--- arch/powerpc/include/asm/reg.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index f5f4c66..ce3e1b7 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -752,13 +752,13 @@ #define SPRN_PMC6 792 #define SPRN_PMC7 793 #define SPRN_PMC8 794 -#define SPRN_SIAR 780 -#define SPRN_SDAR 781 #define SPRN_SIER 784 #define SIER_SIPR0x200 /* Sampled MSR_PR */ #define SIER_SIHV0x100 /* Sampled MSR_HV */ #define SIER_SIAR_VALID 0x040 /* SIAR contents valid */ #define SIER_SDAR_VALID 0x020 /* SDAR contents valid */ +#define SPRN_SIAR 796 +#define SPRN_SDAR 797 #define SPRN_TACR 888 #define SPRN_TCSCR 889 #define SPRN_CSIGR 890 -- 1.8.3.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] usb: dwc2: fix regression on big-endian PowerPC/ARM systems
A patch that went into Linux-4.4 to fix big-endian mode on a Lantiq MIPS system unfortunately broke big-endian operation on PowerPC APM82181 as reported by Christian Lamparter, and likely other systems. It actually introduced multiple issues: - it broke big-endian ARM kernels: any machine that was working correctly with a little-endian kernel is no longer using byteswaps on big-endian kernels, which clearly breaks them. - On PowerPC the same thing must be true: if it was working before, using big-endian kernels is now broken. Unlike ARM, 32-bit PowerPC usually uses big-endian kernels, so they are likely all broken. - The barrier for dwc2_writel is on the wrong side of the __raw_writel(), so the MMIO no longer synchronizes with DMA operations. - On architectures that require specific CPU instructions for MMIO access, using the __raw_ variant may turn this into a pointer dereference that does not have the same effect as the readl/writel. This patch is a simple revert for all architectures other than MIPS, in the hope that we can more easily backport it to fix the regression on PowerPC and ARM systems without breaking the Lantiq system again. We should follow this up with a more elaborate change to add runtime detection of endianess, to make sure it also works on all other combinations of architectures and implementations of the usb-dwc2 device. That patch however will be fairly large and not appropriate for backports to stable kernels. Signed-off-by: Arnd BergmannFixes: 95c8bc360944 ("usb: dwc2: Use platform endianness when accessing registers") --- drivers/usb/dwc2/core.h | 41 +++-- 1 file changed, 35 insertions(+), 6 deletions(-) diff --git a/drivers/usb/dwc2/core.h b/drivers/usb/dwc2/core.h index 3c58d633ce80..1f8ed149a40f 100644 --- a/drivers/usb/dwc2/core.h +++ b/drivers/usb/dwc2/core.h @@ -64,12 +64,24 @@ DWC2_TRACE_SCHEDULER_VB(pr_fmt("%s: SCH: " fmt),\ dev_name(hsotg->dev), ##__VA_ARGS__) + +#ifdef CONFIG_MIPS +/* + * There are some MIPS machines that can run in either big-endian + * or little-endian mode and that use the dwc2 register without + * a byteswap in both ways. + * Unlike other architectures, MIPS does not require a barrier + * before the __raw_writel() to synchronize with DMA but does + * require the barrier after the writel() to serialize a series + * of writes. This set of operations was added specifically for + * MIPS and should only be used there. + */ static inline u32 dwc2_readl(const void __iomem *addr) { u32 value = __raw_readl(addr); - /* In order to preserve endianness __raw_* operation is used. Therefore -* a barrier is needed to ensure IO access is not re-ordered across + /* in order to preserve endianness __raw_* operation is used. therefore +* a barrier is needed to ensure io access is not re-ordered across * reads or writes */ mb(); @@ -81,15 +93,32 @@ static inline void dwc2_writel(u32 value, void __iomem *addr) __raw_writel(value, addr); /* -* In order to preserve endianness __raw_* operation is used. Therefore -* a barrier is needed to ensure IO access is not re-ordered across +* in order to preserve endianness __raw_* operation is used. therefore +* a barrier is needed to ensure io access is not re-ordered across * reads or writes */ mb(); -#ifdef DWC2_LOG_WRITES - pr_info("INFO:: wrote %08x to %p\n", value, addr); +#ifdef dwc2_log_writes + pr_info("info:: wrote %08x to %p\n", value, addr); #endif } +#else +/* Normal architectures just use readl/write */ +static inline u32 dwc2_readl(const void __iomem *addr) +{ + u32 value = readl(addr); + return value; +} + +static inline void dwc2_writel(u32 value, void __iomem *addr) +{ + writel(value, addr); + +#ifdef dwc2_log_writes + pr_info("info:: wrote %08x to %p\n", value, addr); +#endif +} +#endif /* Maximum number of Endpoints/HostChannels */ #define MAX_EPS_CHANNELS 16 -- 2.7.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: Fix definition of SIAR register
On Thu, May 12, 2016 at 09:27:40AM +0200, Thomas Huth wrote: > On 12.05.2016 06:57, Paul Mackerras wrote: > > On Mon, Apr 25, 2016 at 10:15:47AM +0200, Alexander Graf wrote: > >> > +#define SPRN_SIAR796 > >> > >> I'm sure there's a reason (iSeries?) we used the r/o version before. > >> Better introduce a new constant that gives us rw access and use that in > >> the kvm entry/exit code. > > > > I don't think we ever did any performance monitoring on iSeries, and > > in any case, we have removed the iSeries code. I think we should just > > use the privileged, read/write number (i.e. 796). > > Ok, then I'll simply add the fix for SDAR to my patch as well, test it > and submit it again. Great! Could you switch MMCR2 to the privileged number as well while you're at it? Thanks, Paul. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [FIX,v1] powerpc,numa: Fix memory_hotplug_max()
On Thu, 2016-12-05 at 08:26:36 UTC, Bharata B Rao wrote: > memory_hotplug_max() uses hot_add_drconf_memory_max() to get maxmimum > addressable memory by referring to ibm,dyanamic-memory property. There > are three problems with the current approach: ... > > NOTE: There are some unnecessary changes in the patch because of converting > spaces to tabs w/o which checkpatch.pl complains. OK. You don't always have to listen to checkpatch :) Please split this into two patches, one that fixes the whitespace problems in the function, and then a second one which makes your changes. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: usb: dwc2: regression on MyBook Live Duo / Canyonlands since 4.3.0-rc4
On Tuesday, May 10, 2016 09:23:59 AM Arnd Bergmann wrote: > On Tuesday 10 May 2016 08:37:52 Benjamin Herrenschmidt wrote: > > On Mon, 2016-05-09 at 17:08 +0200, Arnd Bergmann wrote: > > > > > > Unfortunately, I don't see any way this could be done in MIPS specific > > > code: There is typically a byteswap between the internal bus and the PCI > > > bus on big-endian MIPS systems, so the PCI MMIO ends up being > > > little-endian, > > > > Ugh ... not exactly, re-watch my talk on the matter :-) While there is > > a specific lane wiring to preserve byte addresss, in the end it's the > > end device itself that is either BE or LE. Regardless of any "bus > > endianness". > > I found your slides on > > http://www.linuxplumbersconf.org/2012/wp-content/uploads/2012/09/2012-lpc-ref-big-little-endian-herrenschmidt.odp > > but there are at least two more twists that you completely missed here: > > - Some architectures (e.g. ARMv5 "BE32" mode in IXP4xx, surely some others) > do not implement big-endian mode by wiring up the data lines between the > bus and the CPU differently between big- and little-endian mode like > powerpc and armv7 "BE8" do, but instead they swizzle the *address* lines > on 8-bit and 16-bit addresses. The effect of that is that normal RAM > accesses work as expected both ways, and devices that are accessed using > 32-bit MMIO ops never need any byteswap (you actually get "native > endian") while MMIO with 8 and 16 bit width does something completely > unexpected and touches the wrong register. Having an explicit byteswap > on the PCI host bridge gets you the expected addresses again for 8-bit > cycles but it also means that readl()/writel() again need to swap the > data. > > - Some other architectures (e.g. Broadcom MIPS) apparently are even fancier > and use a strapping pin on the SoC flips the endianess of the CPU core > at the same time as all the peripheral MMIO registers, with the intention > of never requiring any byte swaps. I believe they are implemented careful > enough to actually get this right, but it confuses the heck out of > Linux drivers that don't expect this. > > > > which matches the expected behavior of readl/writel. However, drivers > > > for non-PCI devices often use the same readl/writel accessors because > > > that is how it's done on ARMv6/ARMv7. > > > > Even then, you can have on-SoC (non-PCI) devices that also have a > > different endianness from the main CPU. How does it work on ARM for > > example ? The device endianness should be fixed, regardless of the > > endianness of the core, no ? > > ARMv6/v7 is uses BE8 mode like powerpc: each peripheral is fixed-endian > and you have to know what it is. Only Freescale managed to put identical > IP blocks on various (powerpc-derived) SoCs and have a subset of them > treat the access as little-endian while others remain big-endian, so all > those drivers now require runtime detection. > > > > Doing it hardcoded by architecture is just the simplest way to deal > > > with it, working on the assumption that nothing actually needs the > > > runtime detection that Ben suggested. > > > > No, it's not an archicture problem. It's a problem specific to that one > > SoC that the device was synthetized to be a certain endian while it was > > synthetized differently on another SoC... that also happens to be a > > different architecture. But doesn't have to. > > > > For example, we had in the past cases of both LE and BE EHCI > > implementations on the same architecture (PowerPC). > > I understand this, but from what I see in this history of this particular > driver, all ARM and PowerPC implementations chose to use LE registers for > DWC2 because the normal approach for these is to not mess with endianess, > while presumably all MIPS users of the same block wired up the endian-select > line of the IP block to match that of the CPU core, again because it's > what you are expected to do on a MIPS based SoC. > > So hardcoding it per architecture would make an assumption based on > the mindset of the SoC designers rather than strict technical differences, > and that can fail as soon as someone does things differently on any of > them (see the Freescale example), but I still think it's the easiest > workaround for backporting to stable kernels. A revert of the original > patch would be even easier, but that would break the one big-endian > MIPS machine we know about. > > > > Detecting the endianess of the > > > device is probably the best future-proof solution, but it's also > > > considerably more work to do in the driver, and comes with a > > > tiny runtime overhead. > > > > The runtime overhead is probably non-measurable compared with the cost > > of the actual MMIOs. > > Right. The code size increase is probably measurable (but still small), > the runtime overhead is not. Ok, so no rebuts or complains have been posted. I've tested the patch you made in: https://lkml.org/lkml/2016/5/9/354
Re: [PATCH] kvm-pr: manage illegal instructions
On 05/12/2016 11:10 AM, Laurent Vivier wrote: On 11/05/2016 13:49, Alexander Graf wrote: On 05/11/2016 01:14 PM, Laurent Vivier wrote: On 11/05/2016 12:35, Alexander Graf wrote: On 03/15/2016 09:18 PM, Laurent Vivier wrote: While writing some instruction tests for kvm-unit-tests for powerpc, I've found that illegal instructions are not managed correctly with kvm-pr, while it is fine with kvm-hv. When an illegal instruction (like ".long 0") is processed by kvm-pr, the kernel logs are filled with: Couldn't emulate instruction 0x (op 0 xop 0) kvmppc_handle_exit_pr: emulation at 700 failed () While the exception handler receives an interrupt for each instruction executed after the illegal instruction. Signed-off-by: Laurent Vivier--- arch/powerpc/kvm/book3s_emulate.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/book3s_emulate.c b/arch/powerpc/kvm/book3s_emulate.c index 2afdb9c..4ee969d 100644 --- a/arch/powerpc/kvm/book3s_emulate.c +++ b/arch/powerpc/kvm/book3s_emulate.c @@ -99,7 +99,6 @@ int kvmppc_core_emulate_op_pr(struct kvm_run *run, struct kvm_vcpu *vcpu, switch (get_op(inst)) { case 0: -emulated = EMULATE_FAIL; if ((kvmppc_get_msr(vcpu) & MSR_LE) && (inst == swab32(inst_sc))) { /* @@ -112,6 +111,9 @@ int kvmppc_core_emulate_op_pr(struct kvm_run *run, struct kvm_vcpu *vcpu, kvmppc_set_gpr(vcpu, 3, EV_UNIMPLEMENTED); kvmppc_set_pc(vcpu, kvmppc_get_pc(vcpu) + 4); emulated = EMULATE_DONE; +} else { +kvmppc_core_queue_program(vcpu, SRR1_PROGILL); But isn't that exactly what the semantic of EMULATE_FAIL is? Fixing it up in book3s_emulate.c is definitely the wrong spot. So what is the problem you're trying to solve? Is the SRR0 at the wrong spot or are the log messages the problem? No, the problem is the host kernel logs are filled by the message and the execution hangs. And the host becomes unresponsiveness, even after the end of the tests. Please, try to run kvm-unit-tests (the emulator test) on a KVM-PR host, and check the kernel logs (dmesg), then try to ssh to the host... Ok, so the log messages are the problem. Please fix the message output then - or remove it altogether. Or if you like, create a module parameter that allows you to emit them. I personally think the best solution would be to just convert the message into a trace point. While at it, please see whether the guest can trigger similar host log output excess in other code paths. The problem is not really with the log messages: they are consequence of the bug I try to fix. What happens is once kvm_pr decodes an invalid instruction all the valid following instructions trigger a Program exception to the guest (but are executed correctly). It has no real consequence on big machine like POWER8, except that the guest become very slow and the log files of the host are filled with messages (and qemu uses 100% of the CPU). On a smaller machine like a PowerMac G5, the machine becomes simply unusable. It's probably more related to your verbosity level of kernel messages. If you pass loglevel=0 (or quiet) to you kernel cmdline you won't get the messages printed to serial which is what's slowing you down. The other problem sounds pretty severe, but the only thing your patch does any different from the current code flow would be the patch below. Or did I miss anything? diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c index 5cc2e7a..4672bc2 100644 --- a/arch/powerpc/kvm/emulate.c +++ b/arch/powerpc/kvm/emulate.c @@ -302,7 +302,11 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu) advance = 0; printk(KERN_ERR "Couldn't emulate instruction 0x%08x " "(op %d xop %d)\n", inst, get_op(inst), get_xop(inst)); +#ifdef CONFIG_PPC_BOOK3S + kvmppc_core_queue_program(vcpu, SRR1_PROGILL); +#else kvmppc_core_queue_program(vcpu, 0); +#endif } } ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] kvm-pr: manage illegal instructions
On 11/05/2016 13:49, Alexander Graf wrote: > On 05/11/2016 01:14 PM, Laurent Vivier wrote: >> >> On 11/05/2016 12:35, Alexander Graf wrote: >>> On 03/15/2016 09:18 PM, Laurent Vivier wrote: While writing some instruction tests for kvm-unit-tests for powerpc, I've found that illegal instructions are not managed correctly with kvm-pr, while it is fine with kvm-hv. When an illegal instruction (like ".long 0") is processed by kvm-pr, the kernel logs are filled with: Couldn't emulate instruction 0x (op 0 xop 0) kvmppc_handle_exit_pr: emulation at 700 failed () While the exception handler receives an interrupt for each instruction executed after the illegal instruction. Signed-off-by: Laurent Vivier--- arch/powerpc/kvm/book3s_emulate.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/book3s_emulate.c b/arch/powerpc/kvm/book3s_emulate.c index 2afdb9c..4ee969d 100644 --- a/arch/powerpc/kvm/book3s_emulate.c +++ b/arch/powerpc/kvm/book3s_emulate.c @@ -99,7 +99,6 @@ int kvmppc_core_emulate_op_pr(struct kvm_run *run, struct kvm_vcpu *vcpu, switch (get_op(inst)) { case 0: -emulated = EMULATE_FAIL; if ((kvmppc_get_msr(vcpu) & MSR_LE) && (inst == swab32(inst_sc))) { /* @@ -112,6 +111,9 @@ int kvmppc_core_emulate_op_pr(struct kvm_run *run, struct kvm_vcpu *vcpu, kvmppc_set_gpr(vcpu, 3, EV_UNIMPLEMENTED); kvmppc_set_pc(vcpu, kvmppc_get_pc(vcpu) + 4); emulated = EMULATE_DONE; +} else { +kvmppc_core_queue_program(vcpu, SRR1_PROGILL); >>> But isn't that exactly what the semantic of EMULATE_FAIL is? Fixing it >>> up in book3s_emulate.c is definitely the wrong spot. >>> >>> So what is the problem you're trying to solve? Is the SRR0 at the wrong >>> spot or are the log messages the problem? >> No, the problem is the host kernel logs are filled by the message and >> the execution hangs. And the host becomes unresponsiveness, even after >> the end of the tests. >> >> Please, try to run kvm-unit-tests (the emulator test) on a KVM-PR host, >> and check the kernel logs (dmesg), then try to ssh to the host... > > Ok, so the log messages are the problem. Please fix the message output > then - or remove it altogether. Or if you like, create a module > parameter that allows you to emit them. > > I personally think the best solution would be to just convert the > message into a trace point. > > While at it, please see whether the guest can trigger similar host log > output excess in other code paths. The problem is not really with the log messages: they are consequence of the bug I try to fix. What happens is once kvm_pr decodes an invalid instruction all the valid following instructions trigger a Program exception to the guest (but are executed correctly). It has no real consequence on big machine like POWER8, except that the guest become very slow and the log files of the host are filled with messages (and qemu uses 100% of the CPU). On a smaller machine like a PowerMac G5, the machine becomes simply unusable. Please try kvm-unit-tests to see what happens: qemu-system-ppc64 -machine pseries,accel=kvm,kvm-type=PR -bios powerpc/boot_rom.bin -display none -serial stdio -kernel powerpc/emulator.elf -smp 1 --append "-v" Thanks, Laurent ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[FIX PATCH v1] powerpc,numa: Fix memory_hotplug_max()
memory_hotplug_max() uses hot_add_drconf_memory_max() to get maxmimum addressable memory by referring to ibm,dyanamic-memory property. There are three problems with the current approach: 1 hot_add_drconf_memory_max() assumes that ibm,dynamic-memory includes all the LMBs of the guest, but that is not true for PowerKVM which populates only DR LMBs (LMBs that can be hotplugged/removed) in that property. 2 hot_add_drconf_memory_max() multiplies lmb-size with lmb-count to arrive at the max possible address. Since ibm,dynamic-memory doesn't include RMA LMBs, the address thus obtained will be less than the actual max address. For example, if max possible memory size is 32G, with lmb-size of 256MB there can be 127 LMBs in ibm,dynamic-memory (1 LMB for RMA which won't be present here). hot_add_drconf_memory_max() would then return the max addressable memory as 127 * 256MB = 31.75GB, the max address should have been 32G which is what ibm,lrdr-capacity shows. 3 In PowerKVM, there can be a gap between the end of boot time RAM and beginning of hotplug RAM area. So just multiplying lmb-count with lmb-size will not provide the correct max possible address for PowerKVM. This patch fixes 1 by using ibm,lrdr-capacity property to return the max addressable memory whenever the property is present. Then it fixes 2 & 3 by fetching the address of the last LMB in ibm,dynamic-memory property. NOTE: There are some unnecessary changes in the patch because of converting spaces to tabs w/o which checkpatch.pl complains. Signed-off-by: Bharata B Rao--- Changes in v1 - - Return 0 from hot_add_drconf_memory_max() when ibm,dynamic-reconfiguration-memory node isn't available. This restores the existing behaviour. - Rebase against next branch of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git arch/powerpc/mm/numa.c | 34 +- 1 file changed, 25 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 669a15e..6a26ce8 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -1164,17 +1164,33 @@ int hot_add_scn_to_nid(unsigned long scn_addr) static u64 hot_add_drconf_memory_max(void) { struct device_node *memory = NULL; -unsigned int drconf_cell_cnt = 0; -u64 lmb_size = 0; + struct device_node *dn = NULL; + unsigned int drconf_cell_cnt = 0; + u64 lmb_size = 0; const __be32 *dm = NULL; + const __be64 *lrdr = NULL; + struct of_drconf_cell drmem; + + dn = of_find_node_by_path("/rtas"); + if (dn) { + lrdr = of_get_property(dn, "ibm,lrdr-capacity", NULL); + of_node_put(dn); + if (lrdr) + return be64_to_cpup(lrdr); + } + + memory = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory"); + if (memory) { + drconf_cell_cnt = of_get_drconf_memory(memory, ); + lmb_size = of_get_lmb_size(memory); -memory = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory"); -if (memory) { -drconf_cell_cnt = of_get_drconf_memory(memory, ); -lmb_size = of_get_lmb_size(memory); -of_node_put(memory); -} -return lmb_size * drconf_cell_cnt; + /* Advance to the last cell, each cell has 6 32 bit integers */ + dm += (drconf_cell_cnt - 1) * 6; + read_drconf_cell(, ); + of_node_put(memory); + return drmem.base_addr + lmb_size; + } + return 0; } /* -- 2.1.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH kernel 1/2] powerpc/powernv: Fix insufficient memory allocation
On Thu, May 12, 2016 at 04:39:57PM +1000, Alexey Kardashevskiy wrote: >On 05/12/2016 04:09 PM, Gavin Shan wrote: >>On Thu, May 12, 2016 at 03:47:09PM +1000, Alexey Kardashevskiy wrote: >>>The pnv_pci_init_ioda_phb() helper allocates a blob to store auxilary >>>data such PE and M32/M64 segment allocation maps; this single blob has few >>>partitions, size of each is derived from the PE number - >>>phb->ioda.total_pe_num. >>> >>>It was assumed that the minimum PE number is 8, however it is 4 for NPU >>>so the pe_alloc part was missing in the allocated blob. >>>It was invisible till recently as we were not tracking used M64 segments >>>and NPUs do not use M32 segments so the phb->ioda.m32_segmap >>>(which was pointing to the same address as phb->ioda.pe_alloc) >>>has never been written to leaving the pe_alloc memory intact. >>> >>>After 401203ac2d "powerpc/powernv: Track M64 segment consumption" >>>the pe_alloc gets corrupted and PE allocation cannot work. >>>This fixes the issue by enforcing the minimum PE number to 8. >>> >> >>As I said offline yesterday, the issue exists from day-1 when NPU PHB >>is supported. I don't think it's related to 401203ac2d. Without the logic >>tracking M64 segments, the PE# bitmap still can be corrupted when writting >>to M32 segment map. It's confusing to mention a unrelated commit in the >>changelog. > > >The bug exists from day 1. The actual corruption started happening from >401203ac2d, it could be happening before but it was not. > well, Are you sure it starts from 401203ac2d? I would believe it starts from commit 3fa23ff ("powerpc/powernv: Fix initial IO and M32 segmap"). Please fix the commit log. >> >>Since the issue exists from day-1 when NPU PHB is supported, is a stable >>tag needed? >> >>>Signed-off-by: Alexey Kardashevskiy>> >>With above parts fixed: >> >>Reviewed-by: Gavin Shan >> >>>--- >>>arch/powerpc/platforms/powernv/pci-ioda.c | 3 ++- >>>1 file changed, 2 insertions(+), 1 deletion(-) >>> >>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c >>>b/arch/powerpc/platforms/powernv/pci-ioda.c >>>index 6cda2a8..d0d32c2 100644 >>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c >>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c >>>@@ -3507,7 +3507,8 @@ static void __init pnv_pci_init_ioda_phb(struct >>>device_node *np, >>> PNV_IODA1_DMA32_SEGSIZE; >>> >>> /* Allocate aux data & arrays. We don't have IO ports on PHB3 */ >>>-size = _ALIGN_UP(phb->ioda.total_pe_num / 8, sizeof(unsigned long)); >>>+size = _ALIGN_UP(max_t(unsigned, phb->ioda.total_pe_num, 8) / 8, >>>+sizeof(unsigned long)); >>> m64map_off = size; >>> size += phb->ioda.total_pe_num * sizeof(phb->ioda.m64_segmap[0]); >>> m32map_off = size; >>>-- >>>2.5.0.rc3 >>> >> > > >-- >Alexey > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: Fix definition of SIAR register
On 12.05.2016 06:57, Paul Mackerras wrote: > On Mon, Apr 25, 2016 at 10:15:47AM +0200, Alexander Graf wrote: >> +#define SPRN_SIAR796 >> >> I'm sure there's a reason (iSeries?) we used the r/o version before. Better >> introduce a new constant that gives us rw access and use that in the kvm >> entry/exit code. > > I don't think we ever did any performance monitoring on iSeries, and > in any case, we have removed the iSeries code. I think we should just > use the privileged, read/write number (i.e. 796). Ok, then I'll simply add the fix for SDAR to my patch as well, test it and submit it again. Thomas ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH kernel 1/2] powerpc/powernv: Fix insufficient memory allocation
On 05/12/2016 04:09 PM, Gavin Shan wrote: On Thu, May 12, 2016 at 03:47:09PM +1000, Alexey Kardashevskiy wrote: The pnv_pci_init_ioda_phb() helper allocates a blob to store auxilary data such PE and M32/M64 segment allocation maps; this single blob has few partitions, size of each is derived from the PE number - phb->ioda.total_pe_num. It was assumed that the minimum PE number is 8, however it is 4 for NPU so the pe_alloc part was missing in the allocated blob. It was invisible till recently as we were not tracking used M64 segments and NPUs do not use M32 segments so the phb->ioda.m32_segmap (which was pointing to the same address as phb->ioda.pe_alloc) has never been written to leaving the pe_alloc memory intact. After 401203ac2d "powerpc/powernv: Track M64 segment consumption" the pe_alloc gets corrupted and PE allocation cannot work. This fixes the issue by enforcing the minimum PE number to 8. As I said offline yesterday, the issue exists from day-1 when NPU PHB is supported. I don't think it's related to 401203ac2d. Without the logic tracking M64 segments, the PE# bitmap still can be corrupted when writting to M32 segment map. It's confusing to mention a unrelated commit in the changelog. The bug exists from day 1. The actual corruption started happening from 401203ac2d, it could be happening before but it was not. Since the issue exists from day-1 when NPU PHB is supported, is a stable tag needed? Signed-off-by: Alexey KardashevskiyWith above parts fixed: Reviewed-by: Gavin Shan --- arch/powerpc/platforms/powernv/pci-ioda.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 6cda2a8..d0d32c2 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -3507,7 +3507,8 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np, PNV_IODA1_DMA32_SEGSIZE; /* Allocate aux data & arrays. We don't have IO ports on PHB3 */ - size = _ALIGN_UP(phb->ioda.total_pe_num / 8, sizeof(unsigned long)); + size = _ALIGN_UP(max_t(unsigned, phb->ioda.total_pe_num, 8) / 8, + sizeof(unsigned long)); m64map_off = size; size += phb->ioda.total_pe_num * sizeof(phb->ioda.m64_segmap[0]); m32map_off = size; -- 2.5.0.rc3 -- Alexey ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH kernel 2/2] powerpc/powernv/npu: Add PE to PHB's list
On Thu, May 12, 2016 at 03:47:10PM +1000, Alexey Kardashevskiy wrote: >Before 3e68dc57 "powerpc/powernv: Remove DMA32 PE list", NPU PEs >were linked to the NPU PHB via phb->ioda.pe_dma_list; after that fix, >the phb->ioda.pe_list is used. > >During the pe_dma_list removal, list_add_tail(>ioda.pe_dma_list) >was removed, however no list_add() was added so does this patch. > >Signed-off-by: Alexey KardashevskiyReviewed-by: Gavin Shan >--- > arch/powerpc/platforms/powernv/pci-ioda.c | 3 +++ > 1 file changed, 3 insertions(+) > >diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c >b/arch/powerpc/platforms/powernv/pci-ioda.c >index d0d32c2..4f9e73a 100644 >--- a/arch/powerpc/platforms/powernv/pci-ioda.c >+++ b/arch/powerpc/platforms/powernv/pci-ioda.c >@@ -1009,6 +1009,9 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct >pci_dev *dev) > return NULL; > } > >+ /* Put PE to the list */ >+ list_add_tail(>list, >ioda.pe_list); >+ > return pe; > } > >-- >2.5.0.rc3 > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev