Re: [PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported

2016-05-12 Thread Alex Williamson
On Fri, 13 May 2016 02:33:18 +
"Tian, Kevin"  wrote:

> > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > Sent: Friday, May 13, 2016 1:48 AM
> > 
> > On Thu, 12 May 2016 04:53:19 +
> > "Tian, Kevin"  wrote:
> >   
> > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > Sent: Thursday, May 12, 2016 10:21 AM
> > > >
> > > > On Thu, 12 May 2016 01:19:44 +
> > > > "Tian, Kevin"  wrote:
> > > >  
> > > > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > > > Sent: Wednesday, May 11, 2016 11:54 PM
> > > > > >
> > > > > > On Wed, 11 May 2016 06:29:06 +
> > > > > > "Tian, Kevin"  wrote:
> > > > > >  
> > > > > > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > > > > > Sent: Thursday, May 05, 2016 11:05 PM
> > > > > > > >
> > > > > > > > On Thu, 5 May 2016 12:15:46 +
> > > > > > > > "Tian, Kevin"  wrote:
> > > > > > > >  
> > > > > > > > > > From: Yongji Xie [mailto:xyj...@linux.vnet.ibm.com]
> > > > > > > > > > Sent: Thursday, May 05, 2016 7:43 PM
> > > > > > > > > >
> > > > > > > > > > Hi David and Kevin,
> > > > > > > > > >
> > > > > > > > > > On 2016/5/5 17:54, David Laight wrote:
> > > > > > > > > >  
> > > > > > > > > > > From: Tian, Kevin  
> > > > > > > > > > >> Sent: 05 May 2016 10:37  
> > > > > > > > > > > ...  
> > > > > > > > > > >>> Acutually, we are not aimed at accessing MSI-X table 
> > > > > > > > > > >>> from
> > > > > > > > > > >>> guest. So I think it's safe to passthrough MSI-X table 
> > > > > > > > > > >>> if we
> > > > > > > > > > >>> can make sure guest kernel would not touch MSI-X table 
> > > > > > > > > > >>> in
> > > > > > > > > > >>> normal code path such as para-virtualized guest kernel 
> > > > > > > > > > >>> on PPC64.
> > > > > > > > > > >>>  
> > > > > > > > > > >> Then how do you prevent malicious guest kernel accessing 
> > > > > > > > > > >> it?  
> > > > > > > > > > > Or a malicious guest driver for an ethernet card setting 
> > > > > > > > > > > up
> > > > > > > > > > > the receive buffer ring to contain a single word entry 
> > > > > > > > > > > that
> > > > > > > > > > > contains the address associated with an MSI-X interrupt 
> > > > > > > > > > > and
> > > > > > > > > > > then using a loopback mode to cause a specific packet be
> > > > > > > > > > > received that writes the required word through that 
> > > > > > > > > > > address.
> > > > > > > > > > >
> > > > > > > > > > > Remember the PCIe cycle for an interrupt is a normal 
> > > > > > > > > > > memory write
> > > > > > > > > > > cycle.
> > > > > > > > > > >
> > > > > > > > > > >   David
> > > > > > > > > > >  
> > > > > > > > > >
> > > > > > > > > > If we have enough permission to load a malicious driver or
> > > > > > > > > > kernel, we can easily break the guest without exposed
> > > > > > > > > > MSI-X table.
> > > > > > > > > >
> > > > > > > > > > I think it should be safe to expose MSI-X table if we can
> > > > > > > > > > make sure that malicious guest driver/kernel can't use
> > > > > > > > > > the MSI-X table to break other guest or host. The
> > > > > > > > > > capability of IRQ remapping could provide this
> > > > > > > > > > kind of protection.
> > > > > > > > > >  
> > > > > > > > >
> > > > > > > > > With IRQ remapping it doesn't mean you can pass through MSI-X
> > > > > > > > > structure to guest. I know actual IRQ remapping might be 
> > > > > > > > > platform
> > > > > > > > > specific, but at least for Intel VT-d specification, MSI-X 
> > > > > > > > > entry must
> > > > > > > > > be configured with a remappable format by host kernel which
> > > > > > > > > contains an index into IRQ remapping table. The index will 
> > > > > > > > > find a
> > > > > > > > > IRQ remapping entry which controls interrupt routing for a 
> > > > > > > > > specific
> > > > > > > > > device. If you allow a malicious program random index into 
> > > > > > > > > MSI-X
> > > > > > > > > entry of assigned device, the hole is obvious...
> > > > > > > > >
> > > > > > > > > Above might make sense only for a IRQ remapping implementation
> > > > > > > > > which doesn't rely on extended MSI-X format (e.g. simply 
> > > > > > > > > based on
> > > > > > > > > BDF). If that's the case for PPC, then you should build MSI-X
> > > > > > > > > passthrough based on this fact instead of general IRQ 
> > > > > > > > > remapping
> > > > > > > > > enabled or not.  
> > > > > > > >
> > > > > > > > I don't think anyone is expecting that we can expose the MSI-X 
> > > > > > > > vector
> > > > > > > > table to the guest and the guest can make direct use of it.  
> > > > > > > > The end
> > > > > > > > goal here is that the guest on a power system is already
> > > > > > > > paravirtualized to not program the device MSI-X by directly 
> > > > > > > > writing to
> > > > > > > > the MSI-X vector table.  They have hypercalls for this since 
> > > > > > > 

Re: [PATCH] powerpc: Use privileged SPR number for MMCR2

2016-05-12 Thread Paul Mackerras
On Thu, May 12, 2016 at 01:29:11PM +0200, Thomas Huth wrote:
> We are already using the privileged versions of MMCR0, MMCR1
> and MMCRA in the kernel, so for MMCR2, we should better use
> the privileged versions, too, to be consistent.
> 
> Suggested-by: Paul Mackerras 
> Signed-off-by: Thomas Huth 

Acked-by: Paul Mackerras 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: Fix definition of SIAR and SDAR registers

2016-05-12 Thread Paul Mackerras
On Thu, May 12, 2016 at 01:26:44PM +0200, Thomas Huth wrote:
> The SIAR and SDAR registers are available twice, one time as SPRs
> 780 / 781 (unprivileged, but read-only), and one time as the SPRs
> 796 / 797 (privileged, but read and write). The Linux kernel code
> currently uses the unprivileged  SPRs - while this is OK for reading,
> writing to that register of course does not work.
> Since the KVM code tries to write to this register, too (see the mtspr
> in book3s_hv_rmhandlers.S), the contents of this register sometimes get
> lost for the guests, e.g. during migration of a VM.
> To fix this issue, simply switch to the privileged SPR numbers instead.
> 
> Signed-off-by: Thomas Huth 

Acked-by: Paul Mackerras 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported

2016-05-12 Thread Tian, Kevin
> From: Tian, Kevin
> Sent: Friday, May 13, 2016 10:33 AM
> 
> > means.  The MSI-X vector table of a device is always considered
> > untrusted which is why we require user opt-ins to subvert that
> > protection.  Thanks,
> >
> 
> I only partially agree with this statement since there is different
> trust level for kernel space driver and user space driver.
> 
> OS may choose to trust kernel space driver then it can enable IRQ
> remapping only for load balance purpose w/o source id verfification.
> In such case MSI-X vector table is trusted and fully under kernel
> control. Allowing to mmap MSI-X table in user space definitely
> breaks that boundary.
> 
> Anyway my point is simple. Let's ignore how Linux kernel implements
> IRQ remapping on x86 (which may change time to time), and just
> focus on architectural possibility. Non-x86 platform may implement
> IRQ remapping completely separate from device side, then checking
> availability of IRQ remapping is enough to decide whether mmap
> MSI-X table is safe. x86 with VT-d can be configured to a mode
> requiring host control of both MSI-X entry and IRQ remapping hardware
> (without source id check). In such case it's insufficient to make
> decision simply based on IRQ remapping availability. We need a way
> to query from IRQ remapping module whether it's actually safe to
> mmap MSI-X.
> 

Or another way is to have VFIO explicitly request intel-iommu to
enforce sid check when IRQ remapping is enabled...

Thanks
Kevin
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported

2016-05-12 Thread Tian, Kevin
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Friday, May 13, 2016 1:48 AM
> 
> On Thu, 12 May 2016 04:53:19 +
> "Tian, Kevin"  wrote:
> 
> > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > Sent: Thursday, May 12, 2016 10:21 AM
> > >
> > > On Thu, 12 May 2016 01:19:44 +
> > > "Tian, Kevin"  wrote:
> > >
> > > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > > Sent: Wednesday, May 11, 2016 11:54 PM
> > > > >
> > > > > On Wed, 11 May 2016 06:29:06 +
> > > > > "Tian, Kevin"  wrote:
> > > > >
> > > > > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > > > > Sent: Thursday, May 05, 2016 11:05 PM
> > > > > > >
> > > > > > > On Thu, 5 May 2016 12:15:46 +
> > > > > > > "Tian, Kevin"  wrote:
> > > > > > >
> > > > > > > > > From: Yongji Xie [mailto:xyj...@linux.vnet.ibm.com]
> > > > > > > > > Sent: Thursday, May 05, 2016 7:43 PM
> > > > > > > > >
> > > > > > > > > Hi David and Kevin,
> > > > > > > > >
> > > > > > > > > On 2016/5/5 17:54, David Laight wrote:
> > > > > > > > >
> > > > > > > > > > From: Tian, Kevin
> > > > > > > > > >> Sent: 05 May 2016 10:37
> > > > > > > > > > ...
> > > > > > > > > >>> Acutually, we are not aimed at accessing MSI-X table from
> > > > > > > > > >>> guest. So I think it's safe to passthrough MSI-X table if 
> > > > > > > > > >>> we
> > > > > > > > > >>> can make sure guest kernel would not touch MSI-X table in
> > > > > > > > > >>> normal code path such as para-virtualized guest kernel on 
> > > > > > > > > >>> PPC64.
> > > > > > > > > >>>
> > > > > > > > > >> Then how do you prevent malicious guest kernel accessing 
> > > > > > > > > >> it?
> > > > > > > > > > Or a malicious guest driver for an ethernet card setting up
> > > > > > > > > > the receive buffer ring to contain a single word entry that
> > > > > > > > > > contains the address associated with an MSI-X interrupt and
> > > > > > > > > > then using a loopback mode to cause a specific packet be
> > > > > > > > > > received that writes the required word through that address.
> > > > > > > > > >
> > > > > > > > > > Remember the PCIe cycle for an interrupt is a normal memory 
> > > > > > > > > > write
> > > > > > > > > > cycle.
> > > > > > > > > >
> > > > > > > > > > David
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > If we have enough permission to load a malicious driver or
> > > > > > > > > kernel, we can easily break the guest without exposed
> > > > > > > > > MSI-X table.
> > > > > > > > >
> > > > > > > > > I think it should be safe to expose MSI-X table if we can
> > > > > > > > > make sure that malicious guest driver/kernel can't use
> > > > > > > > > the MSI-X table to break other guest or host. The
> > > > > > > > > capability of IRQ remapping could provide this
> > > > > > > > > kind of protection.
> > > > > > > > >
> > > > > > > >
> > > > > > > > With IRQ remapping it doesn't mean you can pass through MSI-X
> > > > > > > > structure to guest. I know actual IRQ remapping might be 
> > > > > > > > platform
> > > > > > > > specific, but at least for Intel VT-d specification, MSI-X 
> > > > > > > > entry must
> > > > > > > > be configured with a remappable format by host kernel which
> > > > > > > > contains an index into IRQ remapping table. The index will find 
> > > > > > > > a
> > > > > > > > IRQ remapping entry which controls interrupt routing for a 
> > > > > > > > specific
> > > > > > > > device. If you allow a malicious program random index into MSI-X
> > > > > > > > entry of assigned device, the hole is obvious...
> > > > > > > >
> > > > > > > > Above might make sense only for a IRQ remapping implementation
> > > > > > > > which doesn't rely on extended MSI-X format (e.g. simply based 
> > > > > > > > on
> > > > > > > > BDF). If that's the case for PPC, then you should build MSI-X
> > > > > > > > passthrough based on this fact instead of general IRQ remapping
> > > > > > > > enabled or not.
> > > > > > >
> > > > > > > I don't think anyone is expecting that we can expose the MSI-X 
> > > > > > > vector
> > > > > > > table to the guest and the guest can make direct use of it.  The 
> > > > > > > end
> > > > > > > goal here is that the guest on a power system is already
> > > > > > > paravirtualized to not program the device MSI-X by directly 
> > > > > > > writing to
> > > > > > > the MSI-X vector table.  They have hypercalls for this since they
> > > > > > > always run virtualized.  Therefore a) they never intend to touch 
> > > > > > > the
> > > > > > > MSI-X vector table and b) they have sufficient isolation that a 
> > > > > > > guest
> > > > > > > can only hurt itself by doing so.
> > > > > > >
> > > > > > > On x86 we don't have a), our method of programming the MSI-X 
> > > > > > > vector
> > > > > > > table is to directly write to it. Therefore we will always 
> > > > > > > require QEMU
> > > > > > > to place a 

Re: usb: dwc2: regression on MyBook Live Duo / Canyonlands since 4.3.0-rc4

2016-05-12 Thread Benjamin Herrenschmidt
On Thu, 2016-05-12 at 11:58 +0200, Christian Lamparter wrote:
> 
> > http://www.linuxplumbersconf.org/2012/wp-content/uploads/2012/09/2012-lpc-ref-big-little-endian-herrenschmidt.odp
> > 
> > but there are at least two more twists that you completely missed here:
> > 
> > - Some architectures (e.g. ARMv5 "BE32" mode in IXP4xx, surely some others)
> >   do not implement big-endian mode by wiring up the data lines between the
> >   bus and the CPU differently between big- and little-endian mode like
> >   powerpc and armv7 "BE8" do, but instead they swizzle the *address* lines
> >   on 8-bit and 16-bit addresses. The effect of that is that normal RAM
> >   accesses work as expected both ways, and devices that are accessed using
> >   32-bit MMIO ops never need any byteswap (you actually get "native
> >   endian") while MMIO with 8 and 16 bit width does something completely
> >   unexpected and touches the wrong register. Having an explicit byteswap
> >   on the PCI host bridge gets you the expected addresses again for 8-bit
> >   cycles but it also means that readl()/writel() again need to swap the
> >   data.

Right. Old PowerPC did that too and it's completely stupid. Thankfully
most vendors grew a clue since then and this practice has slowly fallen
into oblivion.

> > - Some other architectures (e.g. Broadcom MIPS) apparently are even fancier
> >   and use a strapping pin on the SoC flips the endianess of the CPU core
> >   at the same time as all the peripheral MMIO registers, with the intention
> >   of never requiring any byte swaps. I believe they are implemented careful
> >   enough to actually get this right, but it confuses the heck out of
> >   Linux drivers that don't expect this.

Right. Drivers like that will need an explicit test in the accessors.

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3] usb: dwc2: fix regression on big-endian PowerPC/ARM systems

2016-05-12 Thread Arnd Bergmann
A patch that went into Linux-4.4 to fix big-endian mode on a Lantiq
MIPS system unfortunately broke big-endian operation on PowerPC
APM82181 as reported by Christian Lamparter, and likely other
systems.

It actually introduced multiple issues:

- it broke big-endian ARM kernels: any machine that was working
  correctly with a little-endian kernel is no longer using byteswaps
  on big-endian kernels, which clearly breaks them.
- On PowerPC the same thing must be true: if it was working before,
  using big-endian kernels is now broken. Unlike ARM, 32-bit PowerPC
  usually uses big-endian kernels, so they are likely all broken.
- The barrier for dwc2_writel is on the wrong side of the __raw_writel(),
  so the MMIO no longer synchronizes with DMA operations.
- On architectures that require specific CPU instructions for MMIO
  access, using the __raw_ variant may turn this into a pointer
  dereference that does not have the same effect as the readl/writel.

This patch is a simple revert for all architectures other than MIPS,
in the hope that we can more easily backport it to fix the regression
on PowerPC and ARM systems without breaking the Lantiq system again.

We should follow this up with a more elaborate change to add runtime
detection of endianess, to make sure it also works on all other
combinations of architectures and implementations of the usb-dwc2
device. That patch however will be fairly large and not appropriate
for backports to stable kernels.

Felipe suggested a different approach, using an endianess switching
register to always put the device into LE mode, but unfortunately
the dwc2 hardware does not provide a generic way to do that. Also,
I see no practical way of addressing the problem more generally by
patching architecture specific code on MIPS.

Signed-off-by: Arnd Bergmann 
Fixes: 95c8bc360944 ("usb: dwc2: Use platform endianness when accessing 
registers")
---
v3: reverted the accidental changes that slipped into the patch,
resending as requested by Christian.

 drivers/usb/dwc2/core.h | 29 +
 1 file changed, 29 insertions(+)

diff --git a/drivers/usb/dwc2/core.h b/drivers/usb/dwc2/core.h
index 3c58d633ce80..74ed2ee881cd 100644
--- a/drivers/usb/dwc2/core.h
+++ b/drivers/usb/dwc2/core.h
@@ -64,6 +64,18 @@
DWC2_TRACE_SCHEDULER_VB(pr_fmt("%s: SCH: " fmt),\
dev_name(hsotg->dev), ##__VA_ARGS__)
 
+
+#ifdef CONFIG_MIPS
+/*
+ * There are some MIPS machines that can run in either big-endian
+ * or little-endian mode and that use the dwc2 register without
+ * a byteswap in both ways.
+ * Unlike other architectures, MIPS does not require a barrier
+ * before the __raw_writel() to synchronize with DMA but does
+ * require the barrier after the writel() to serialize a series
+ * of writes. This set of operations was added specifically for
+ * MIPS and should only be used there.
+ */
 static inline u32 dwc2_readl(const void __iomem *addr)
 {
u32 value = __raw_readl(addr);
@@ -90,6 +102,23 @@ static inline void dwc2_writel(u32 value, void __iomem 
*addr)
pr_info("INFO:: wrote %08x to %p\n", value, addr);
 #endif
 }
+#else
+/* Normal architectures just use readl/write */
+static inline u32 dwc2_readl(const void __iomem *addr)
+{
+   u32 value = readl(addr);
+   return value;
+}
+
+static inline void dwc2_writel(u32 value, void __iomem *addr)
+{
+   writel(value, addr);
+
+#ifdef dwc2_log_writes
+   pr_info("info:: wrote %08x to %p\n", value, addr);
+#endif
+}
+#endif
 
 /* Maximum number of Endpoints/HostChannels */
 #define MAX_EPS_CHANNELS   16
-- 
2.7.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: usb: dwc2: regression on MyBook Live Duo / Canyonlands since 4.3.0-rc4

2016-05-12 Thread John Youn
On 5/12/2016 1:39 PM, Christian Lamparter wrote:
> On Thursday, May 12, 2016 11:40:28 AM John Youn wrote:
>> On 5/12/2016 6:30 AM, Christian Lamparter wrote:
>>> On Thursday, May 12, 2016 01:55:44 PM Arnd Bergmann wrote:
 On Thursday 12 May 2016 11:58:18 Christian Lamparter wrote:
 Detecting the endianess of the
 device is probably the best future-proof solution, but it's also
 considerably more work to do in the driver, and comes with a
 tiny runtime overhead.
>>>
>>> The runtime overhead is probably non-measurable compared with the cost
>>> of the actual MMIOs.
>>
>> Right. The code size increase is probably measurable (but still small),
>> the runtime overhead is not.
>
> Ok, so no rebuts or complains have been posted.
>
> I've tested the patch you made in: https://lkml.org/lkml/2016/5/9/354
> and it works: 
>
> Tested-by: Christian Lamparter 
>
> So, how do we go from here? There is are two small issues with the
> original patch (#ifdef DWC2_LOG_WRITES got converted to lower case:
> #ifdef dwc2_log_writes) and I guess a proper subject would be nice.  
>
> Arnd, can you please respin and post it (cc'd stable as well)?
> So this is can be picked up? Or what's your plan?

 (I just realized my reply was stuck in my outbox, so the patch
 went out first)

 If I recall correctly, the rough consensus was to go with your longer
 patch in the future (fixed up for the comments that BenH and
 I sent), and I'd suggest basing it on top of a fixed version of
 my patch.
>>> Well, but it comes with the "overhead"! So this was just as I said:
>>> "Let's look at it and see if it's any good"... And I think it isn't
>>> since the usb/host/ehci people also opted for #ifdef CONFIG_BIG_ENDIAN
>>> archs etc...
>>
>> I slightly prefer the more general patch for future kernel versions.
>> The overhead will probably be negligible, but we can perform some
>> testing to make sure.
>>
>> Can you resubmit with all gathered feedback?
> Yes I think I can do that. But I would really like to get the
> regression out of the way. So for that: I back Arnd's patch.
> It explains the problem much better and doesn't kill MIPS 
> like the revert I was doing in my initial post to the MLs.
> Also, another bonus: his patch is suited to port to stable.
> 
> The auto-detection approach is not that easy to get right,
> given all the stuff that's going on with BE8, LE4, ... So
> can we have your "blessing" for Arnd's patch for now? since
> that way, I can base my patch on top of his work about the
> issues of endiannes? (Just say: ACK :) )
> 

I agree Arnd's patch is best for stable. We can also apply it to
mainline until we get the autodection working as well.

Unless Felipe has objections.


> Arnd: do you have a version with the #ifdef lower/uppercase
> fix? Or should I give it a try (and fail in a different way ;) )
>  
 Felipe just had another idea, to change the endianess of the dwc2
 block by setting a registers (if that exists). That would indeed
 be preferable, then we can just revert the broken change that
 went into 4.4 and backport that fix instead.
>>> Just a quick reply. I have the docs for the thing. There's something
>>> like that in GAHBCFG at Bit 24... BUT it only switches the endiannes
>>> for the DMA descriptors (which is not always used, there are devices
>>> with PIO only)! It doesn't deal with the MMIO access at all. 
>>
>> That's correct. It only affects descriptor endianness for DMA
>> descriptor mode of operation.
> 
> Ok. The funny thing is that for the MyBook Live Duo this setting might
> be important since the PLB_DMA engine is not part of the DWC library...
> Instead it's from IBM and operates in: Big Endian :-D.
> 

Are you sure the controller is using descriptor DMA? It's more likely
using buffer DMA which this setting doesn't affect. DWC2 doesn't
support Descriptor DMA in device mode on mainline yet. If it's a host
then it might be.

Regards,
John
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: usb: dwc2: regression on MyBook Live Duo / Canyonlands since 4.3.0-rc4

2016-05-12 Thread Arnd Bergmann
On Thursday 12 May 2016 22:39:36 Christian Lamparter wrote:
> On Thursday, May 12, 2016 11:40:28 AM John Youn wrote:
> > On 5/12/2016 6:30 AM, Christian Lamparter wrote:
> > > On Thursday, May 12, 2016 01:55:44 PM Arnd Bergmann wrote:
> > >>
> > >> If I recall correctly, the rough consensus was to go with your longer
> > >> patch in the future (fixed up for the comments that BenH and
> > >> I sent), and I'd suggest basing it on top of a fixed version of
> > >> my patch.
> > > Well, but it comes with the "overhead"! So this was just as I said:
> > > "Let's look at it and see if it's any good"... And I think it isn't
> > > since the usb/host/ehci people also opted for #ifdef CONFIG_BIG_ENDIAN
> > > archs etc...
> > 
> > I slightly prefer the more general patch for future kernel versions.
> > The overhead will probably be negligible, but we can perform some
> > testing to make sure.
> > 
> > Can you resubmit with all gathered feedback?
> Yes I think I can do that. But I would really like to get the
> regression out of the way. So for that: I back Arnd's patch.
> It explains the problem much better and doesn't kill MIPS 
> like the revert I was doing in my initial post to the MLs.
> Also, another bonus: his patch is suited to port to stable.
> 
> The auto-detection approach is not that easy to get right,
> given all the stuff that's going on with BE8, LE4, ... So
> can we have your "blessing" for Arnd's patch for now? since
> that way, I can base my patch on top of his work about the
> issues of endiannes? (Just say: ACK  )
> 
> Arnd: do you have a version with the #ifdef lower/uppercase
> fix? Or should I give it a try (and fail in a different way  )

I've already fixed it up locally, will send the latest version
so it's out there, whether Felipe takes it or not.

Arnd
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: usb: dwc2: regression on MyBook Live Duo / Canyonlands since 4.3.0-rc4

2016-05-12 Thread Christian Lamparter via Linuxppc-dev
On Thursday, May 12, 2016 11:40:28 AM John Youn wrote:
> On 5/12/2016 6:30 AM, Christian Lamparter wrote:
> > On Thursday, May 12, 2016 01:55:44 PM Arnd Bergmann wrote:
> >> On Thursday 12 May 2016 11:58:18 Christian Lamparter wrote:
> >> Detecting the endianess of the
> >> device is probably the best future-proof solution, but it's also
> >> considerably more work to do in the driver, and comes with a
> >> tiny runtime overhead.
> >
> > The runtime overhead is probably non-measurable compared with the cost
> > of the actual MMIOs.
> 
>  Right. The code size increase is probably measurable (but still small),
>  the runtime overhead is not.
> >>>
> >>> Ok, so no rebuts or complains have been posted.
> >>>
> >>> I've tested the patch you made in: https://lkml.org/lkml/2016/5/9/354
> >>> and it works: 
> >>>
> >>> Tested-by: Christian Lamparter 
> >>>
> >>> So, how do we go from here? There is are two small issues with the
> >>> original patch (#ifdef DWC2_LOG_WRITES got converted to lower case:
> >>> #ifdef dwc2_log_writes) and I guess a proper subject would be nice.  
> >>>
> >>> Arnd, can you please respin and post it (cc'd stable as well)?
> >>> So this is can be picked up? Or what's your plan?
> >>
> >> (I just realized my reply was stuck in my outbox, so the patch
> >> went out first)
> >>
> >> If I recall correctly, the rough consensus was to go with your longer
> >> patch in the future (fixed up for the comments that BenH and
> >> I sent), and I'd suggest basing it on top of a fixed version of
> >> my patch.
> > Well, but it comes with the "overhead"! So this was just as I said:
> > "Let's look at it and see if it's any good"... And I think it isn't
> > since the usb/host/ehci people also opted for #ifdef CONFIG_BIG_ENDIAN
> > archs etc...
> 
> I slightly prefer the more general patch for future kernel versions.
> The overhead will probably be negligible, but we can perform some
> testing to make sure.
> 
> Can you resubmit with all gathered feedback?
Yes I think I can do that. But I would really like to get the
regression out of the way. So for that: I back Arnd's patch.
It explains the problem much better and doesn't kill MIPS 
like the revert I was doing in my initial post to the MLs.
Also, another bonus: his patch is suited to port to stable.

The auto-detection approach is not that easy to get right,
given all the stuff that's going on with BE8, LE4, ... So
can we have your "blessing" for Arnd's patch for now? since
that way, I can base my patch on top of his work about the
issues of endiannes? (Just say: ACK :) )

Arnd: do you have a version with the #ifdef lower/uppercase
fix? Or should I give it a try (and fail in a different way ;) )
 
> >> Felipe just had another idea, to change the endianess of the dwc2
> >> block by setting a registers (if that exists). That would indeed
> >> be preferable, then we can just revert the broken change that
> >> went into 4.4 and backport that fix instead.
> > Just a quick reply. I have the docs for the thing. There's something
> > like that in GAHBCFG at Bit 24... BUT it only switches the endiannes
> > for the DMA descriptors (which is not always used, there are devices
> > with PIO only)! It doesn't deal with the MMIO access at all. 
> 
> That's correct. It only affects descriptor endianness for DMA
> descriptor mode of operation.

Ok. The funny thing is that for the MyBook Live Duo this setting might
be important since the PLB_DMA engine is not part of the DWC library...
Instead it's from IBM and operates in: Big Endian :-D.

Regards,
Christian
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: usb: dwc2: regression on MyBook Live Duo / Canyonlands since 4.3.0-rc4

2016-05-12 Thread John Youn
On 5/12/2016 6:30 AM, Christian Lamparter wrote:
> On Thursday, May 12, 2016 01:55:44 PM Arnd Bergmann wrote:
>> On Thursday 12 May 2016 11:58:18 Christian Lamparter wrote:
>> Detecting the endianess of the
>> device is probably the best future-proof solution, but it's also
>> considerably more work to do in the driver, and comes with a
>> tiny runtime overhead.
>
> The runtime overhead is probably non-measurable compared with the cost
> of the actual MMIOs.

 Right. The code size increase is probably measurable (but still small),
 the runtime overhead is not.
>>>
>>> Ok, so no rebuts or complains have been posted.
>>>
>>> I've tested the patch you made in: https://lkml.org/lkml/2016/5/9/354
>>> and it works: 
>>>
>>> Tested-by: Christian Lamparter 
>>>
>>> So, how do we go from here? There is are two small issues with the
>>> original patch (#ifdef DWC2_LOG_WRITES got converted to lower case:
>>> #ifdef dwc2_log_writes) and I guess a proper subject would be nice.  
>>>
>>> Arnd, can you please respin and post it (cc'd stable as well)?
>>> So this is can be picked up? Or what's your plan?
>>
>> (I just realized my reply was stuck in my outbox, so the patch
>> went out first)
>>
>> If I recall correctly, the rough consensus was to go with your longer
>> patch in the future (fixed up for the comments that BenH and
>> I sent), and I'd suggest basing it on top of a fixed version of
>> my patch.
> Well, but it comes with the "overhead"! So this was just as I said:
> "Let's look at it and see if it's any good"... And I think it isn't
> since the usb/host/ehci people also opted for #ifdef CONFIG_BIG_ENDIAN
> archs etc...

I slightly prefer the more general patch for future kernel versions.
The overhead will probably be negligible, but we can perform some
testing to make sure.

Can you resubmit with all gathered feedback?

>  
>> Felipe just had another idea, to change the endianess of the dwc2
>> block by setting a registers (if that exists). That would indeed
>> be preferable, then we can just revert the broken change that
>> went into 4.4 and backport that fix instead.
> Just a quick reply. I have the docs for the thing. There's something
> like that in GAHBCFG at Bit 24... BUT it only switches the endiannes
> for the DMA descriptors (which is not always used, there are devices
> with PIO only)! It doesn't deal with the MMIO access at all. 

That's correct. It only affects descriptor endianness for DMA
descriptor mode of operation.

Regards,
John
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported

2016-05-12 Thread Alex Williamson
On Thu, 12 May 2016 04:53:19 +
"Tian, Kevin"  wrote:

> > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > Sent: Thursday, May 12, 2016 10:21 AM
> > 
> > On Thu, 12 May 2016 01:19:44 +
> > "Tian, Kevin"  wrote:
> >   
> > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > Sent: Wednesday, May 11, 2016 11:54 PM
> > > >
> > > > On Wed, 11 May 2016 06:29:06 +
> > > > "Tian, Kevin"  wrote:
> > > >  
> > > > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > > > Sent: Thursday, May 05, 2016 11:05 PM
> > > > > >
> > > > > > On Thu, 5 May 2016 12:15:46 +
> > > > > > "Tian, Kevin"  wrote:
> > > > > >  
> > > > > > > > From: Yongji Xie [mailto:xyj...@linux.vnet.ibm.com]
> > > > > > > > Sent: Thursday, May 05, 2016 7:43 PM
> > > > > > > >
> > > > > > > > Hi David and Kevin,
> > > > > > > >
> > > > > > > > On 2016/5/5 17:54, David Laight wrote:
> > > > > > > >  
> > > > > > > > > From: Tian, Kevin  
> > > > > > > > >> Sent: 05 May 2016 10:37  
> > > > > > > > > ...  
> > > > > > > > >>> Acutually, we are not aimed at accessing MSI-X table from
> > > > > > > > >>> guest. So I think it's safe to passthrough MSI-X table if we
> > > > > > > > >>> can make sure guest kernel would not touch MSI-X table in
> > > > > > > > >>> normal code path such as para-virtualized guest kernel on 
> > > > > > > > >>> PPC64.
> > > > > > > > >>>  
> > > > > > > > >> Then how do you prevent malicious guest kernel accessing it? 
> > > > > > > > >>  
> > > > > > > > > Or a malicious guest driver for an ethernet card setting up
> > > > > > > > > the receive buffer ring to contain a single word entry that
> > > > > > > > > contains the address associated with an MSI-X interrupt and
> > > > > > > > > then using a loopback mode to cause a specific packet be
> > > > > > > > > received that writes the required word through that address.
> > > > > > > > >
> > > > > > > > > Remember the PCIe cycle for an interrupt is a normal memory 
> > > > > > > > > write
> > > > > > > > > cycle.
> > > > > > > > >
> > > > > > > > >   David
> > > > > > > > >  
> > > > > > > >
> > > > > > > > If we have enough permission to load a malicious driver or
> > > > > > > > kernel, we can easily break the guest without exposed
> > > > > > > > MSI-X table.
> > > > > > > >
> > > > > > > > I think it should be safe to expose MSI-X table if we can
> > > > > > > > make sure that malicious guest driver/kernel can't use
> > > > > > > > the MSI-X table to break other guest or host. The
> > > > > > > > capability of IRQ remapping could provide this
> > > > > > > > kind of protection.
> > > > > > > >  
> > > > > > >
> > > > > > > With IRQ remapping it doesn't mean you can pass through MSI-X
> > > > > > > structure to guest. I know actual IRQ remapping might be platform
> > > > > > > specific, but at least for Intel VT-d specification, MSI-X entry 
> > > > > > > must
> > > > > > > be configured with a remappable format by host kernel which
> > > > > > > contains an index into IRQ remapping table. The index will find a
> > > > > > > IRQ remapping entry which controls interrupt routing for a 
> > > > > > > specific
> > > > > > > device. If you allow a malicious program random index into MSI-X
> > > > > > > entry of assigned device, the hole is obvious...
> > > > > > >
> > > > > > > Above might make sense only for a IRQ remapping implementation
> > > > > > > which doesn't rely on extended MSI-X format (e.g. simply based on
> > > > > > > BDF). If that's the case for PPC, then you should build MSI-X
> > > > > > > passthrough based on this fact instead of general IRQ remapping
> > > > > > > enabled or not.  
> > > > > >
> > > > > > I don't think anyone is expecting that we can expose the MSI-X 
> > > > > > vector
> > > > > > table to the guest and the guest can make direct use of it.  The end
> > > > > > goal here is that the guest on a power system is already
> > > > > > paravirtualized to not program the device MSI-X by directly writing 
> > > > > > to
> > > > > > the MSI-X vector table.  They have hypercalls for this since they
> > > > > > always run virtualized.  Therefore a) they never intend to touch the
> > > > > > MSI-X vector table and b) they have sufficient isolation that a 
> > > > > > guest
> > > > > > can only hurt itself by doing so.
> > > > > >
> > > > > > On x86 we don't have a), our method of programming the MSI-X vector
> > > > > > table is to directly write to it. Therefore we will always require 
> > > > > > QEMU
> > > > > > to place a MemoryRegion over the vector table to intercept those
> > > > > > accesses.  However with interrupt remapping, we do have b) on x86, 
> > > > > > which
> > > > > > means that we don't need to be so strict in disallowing user 
> > > > > > accesses
> > > > > > to the MSI-X vector table.  It's not useful for configuring MSI-X on
> > > > > > the device, but the user should only be able 

[PATCH] powerpc: Discard ffs() function and use builtin_ffs instead

2016-05-12 Thread Christophe Leroy
With the ffs() function as defined in arch/powerpc/include/asm/bitops.h
GCC will not optimise the code in case of constant parameter, as shown
by the small exemple below.

int ffs_test(void)
{
return 4 << ffs(31);
}

c0012334 :
c0012334:   39 20 00 01 li  r9,1
c0012338:   38 60 00 04 li  r3,4
c001233c:   7d 29 00 34 cntlzw  r9,r9
c0012340:   21 29 00 20 subfic  r9,r9,32
c0012344:   7c 63 48 30 slw r3,r3,r9
c0012348:   4e 80 00 20 blr

With this patch, the same function will compile as follows:

c0012334 :
c0012334:   38 60 00 08 li  r3,8
c0012338:   4e 80 00 20 blr

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/bitops.h | 11 +--
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/bitops.h 
b/arch/powerpc/include/asm/bitops.h
index 59abc62..75e3ebb 100644
--- a/arch/powerpc/include/asm/bitops.h
+++ b/arch/powerpc/include/asm/bitops.h
@@ -218,16 +218,7 @@ static __inline__ unsigned long __ffs(unsigned long x)
return __ilog2(x & -x);
 }
 
-/*
- * ffs: find first bit set. This is defined the same way as
- * the libc and compiler builtin ffs routines, therefore
- * differs in spirit from the above ffz (man ffs).
- */
-static __inline__ int ffs(int x)
-{
-   unsigned long i = (unsigned long)x;
-   return __ilog2(i & -i) + 1;
-}
+#include 
 
 /*
  * fls: find last (most-significant) bit set.
-- 
2.1.0
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] kvm-pr: manage illegal instructions

2016-05-12 Thread Laurent Vivier


On 12/05/2016 11:27, Alexander Graf wrote:
> On 05/12/2016 11:10 AM, Laurent Vivier wrote:
>>
>> On 11/05/2016 13:49, Alexander Graf wrote:
>>> On 05/11/2016 01:14 PM, Laurent Vivier wrote:
 On 11/05/2016 12:35, Alexander Graf wrote:
> On 03/15/2016 09:18 PM, Laurent Vivier wrote:
>> While writing some instruction tests for kvm-unit-tests for powerpc,
>> I've found that illegal instructions are not managed correctly with
>> kvm-pr,
>> while it is fine with kvm-hv.
>>
>> When an illegal instruction (like ".long 0") is processed by kvm-pr,
>> the kernel logs are filled with:
>>
>> Couldn't emulate instruction 0x (op 0 xop 0)
>> kvmppc_handle_exit_pr: emulation at 700 failed ()
>>
>> While the exception handler receives an interrupt for each
>> instruction
>> executed after the illegal instruction.
>>
>> Signed-off-by: Laurent Vivier 
>> ---
>> arch/powerpc/kvm/book3s_emulate.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/kvm/book3s_emulate.c
>> b/arch/powerpc/kvm/book3s_emulate.c
>> index 2afdb9c..4ee969d 100644
>> --- a/arch/powerpc/kvm/book3s_emulate.c
>> +++ b/arch/powerpc/kvm/book3s_emulate.c
>> @@ -99,7 +99,6 @@ int kvmppc_core_emulate_op_pr(struct kvm_run *run,
>> struct kvm_vcpu *vcpu,
>>   switch (get_op(inst)) {
>> case 0:
>> -emulated = EMULATE_FAIL;
>> if ((kvmppc_get_msr(vcpu) & MSR_LE) &&
>> (inst == swab32(inst_sc))) {
>> /*
>> @@ -112,6 +111,9 @@ int kvmppc_core_emulate_op_pr(struct kvm_run
>> *run,
>> struct kvm_vcpu *vcpu,
>> kvmppc_set_gpr(vcpu, 3, EV_UNIMPLEMENTED);
>> kvmppc_set_pc(vcpu, kvmppc_get_pc(vcpu) + 4);
>> emulated = EMULATE_DONE;
>> +} else {
>> +kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
> But isn't that exactly what the semantic of EMULATE_FAIL is? Fixing it
> up in book3s_emulate.c is definitely the wrong spot.
>
> So what is the problem you're trying to solve? Is the SRR0 at the
> wrong
> spot or are the log messages the problem?
 No, the problem is the host kernel logs are filled by the message and
 the execution hangs. And the host becomes unresponsiveness, even after
 the end of the tests.

 Please, try to run kvm-unit-tests (the emulator test) on a KVM-PR host,
 and check the kernel logs (dmesg), then try to ssh to the host...
>>> Ok, so the log messages are the problem. Please fix the message output
>>> then - or remove it altogether. Or if you like, create a module
>>> parameter that allows you to emit them.
>>>
>>> I personally think the best solution would be to just convert the
>>> message into a trace point.
>>>
>>> While at it, please see whether the guest can trigger similar host log
>>> output excess in other code paths.
>> The problem is not really with the log messages: they are consequence of
>> the bug I try to fix.
>>
>> What happens is once kvm_pr decodes an invalid instruction all the valid
>> following instructions trigger a Program exception to the guest (but are
>> executed correctly). It has no real consequence on big machine like
>> POWER8, except that the guest become very slow and the log files of the
>> host are filled with messages (and qemu uses 100% of the CPU). On a
>> smaller machine like a  PowerMac G5, the machine becomes simply unusable.
> 
> It's probably more related to your verbosity level of kernel messages.
> If you pass loglevel=0 (or quiet) to you kernel cmdline you won't get
> the messages printed to serial which is what's slowing you down.
> 
> The other problem sounds pretty severe, but the only thing your patch
> does any different from the current code flow would be the patch below.
> Or did I miss anything?
> 
> diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
> index 5cc2e7a..4672bc2 100644
> --- a/arch/powerpc/kvm/emulate.c
> +++ b/arch/powerpc/kvm/emulate.c
> @@ -302,7 +302,11 @@ int kvmppc_emulate_instruction(struct kvm_run *run,
> struct kvm_vcpu *vcpu)
> advance = 0;
> printk(KERN_ERR "Couldn't emulate instruction
> 0x%08x "
>"(op %d xop %d)\n", inst, get_op(inst),
> get_xop(inst));
> +#ifdef CONFIG_PPC_BOOK3S
> +   kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
> +#else
> kvmppc_core_queue_program(vcpu, 0);
> +#endif
> }
> }
> 

OK, that fixes the problem. Thanks. :)

Could you apply it?

Laurent
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

ppc64 sbrk returns executable heap in 32-bit emulation mode

2016-05-12 Thread Florian Weimer
We noticed that on ppc64, the sbrk system call in the 32-bit subsystem 
returns executable memory.  I assume it is related to this, in 
arch/powerpc/include/asm/page.h:


/*
 * Unfortunately the PLT is in the BSS in the PPC32 ELF ABI,
 * and needs to be executable.  This means the whole heap ends
 * up being executable.
 */
#define VM_DATA_DEFAULT_FLAGS32 (VM_READ | VM_WRITE | VM_EXEC | \
 VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC)


What is the rationale for this?  This comment must be *really* old, 
because ld.so in glibc should make sure that the PLT is executable.  And 
for current binaries, .bss is *not* executable, contrary to what the 
comment suggests.


Is this comment about pre-ELF binaries?  If yes, would it possible to 
change the default for ELF binaries?


Thanks,
Florian
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[FIX PATCH v2 1/2] powerpc, numa: Fix whitespace in hot_add_drconf_memory_max()

2016-05-12 Thread Bharata B Rao
Signed-off-by: Bharata B Rao 
---
 arch/powerpc/mm/numa.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 669a15e..4a87ccb 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1163,18 +1163,18 @@ int hot_add_scn_to_nid(unsigned long scn_addr)
 
 static u64 hot_add_drconf_memory_max(void)
 {
-struct device_node *memory = NULL;
-unsigned int drconf_cell_cnt = 0;
-u64 lmb_size = 0;
+   struct device_node *memory = NULL;
+   unsigned int drconf_cell_cnt = 0;
+   u64 lmb_size = 0;
const __be32 *dm = NULL;
 
-memory = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
-if (memory) {
-drconf_cell_cnt = of_get_drconf_memory(memory, );
-lmb_size = of_get_lmb_size(memory);
-of_node_put(memory);
-}
-return lmb_size * drconf_cell_cnt;
+   memory = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
+   if (memory) {
+   drconf_cell_cnt = of_get_drconf_memory(memory, );
+   lmb_size = of_get_lmb_size(memory);
+   of_node_put(memory);
+   }
+   return lmb_size * drconf_cell_cnt;
 }
 
 /*
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[FIX PATCH v2 2/2] powerpc,numa: Fix memory_hotplug_max()

2016-05-12 Thread Bharata B Rao
memory_hotplug_max() uses hot_add_drconf_memory_max() to get maxmimum
addressable memory by referring to ibm,dyanamic-memory property. There
are three problems with the current approach:

1 hot_add_drconf_memory_max() assumes that ibm,dynamic-memory includes
  all the LMBs of the guest, but that is not true for PowerKVM which
  populates only DR LMBs (LMBs that can be hotplugged/removed) in that
  property.
2 hot_add_drconf_memory_max() multiplies lmb-size with lmb-count to arrive
  at the max possible address. Since ibm,dynamic-memory doesn't include
  RMA LMBs, the address thus obtained will be less than the actual max
  address. For example, if max possible memory size is 32G, with lmb-size
  of 256MB there can be 127 LMBs in ibm,dynamic-memory (1 LMB for RMA
  which won't be present here).  hot_add_drconf_memory_max() would then
  return the max addressable memory as 127 * 256MB = 31.75GB, the max
  address should have been 32G which is what ibm,lrdr-capacity shows.
3 In PowerKVM, there can be a gap between the end of boot time RAM and
  beginning of hotplug RAM area. So just multiplying lmb-count with
  lmb-size will not provide the correct max possible address for PowerKVM.

This patch fixes 1 by using ibm,lrdr-capacity property to return the max
addressable memory whenever the property is present. Then it fixes 2 & 3
by fetching the address of the last LMB in ibm,dynamic-memory property.

Signed-off-by: Bharata B Rao 
---
 arch/powerpc/mm/numa.c | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 4a87ccb..f8b1da7 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1164,17 +1164,33 @@ int hot_add_scn_to_nid(unsigned long scn_addr)
 static u64 hot_add_drconf_memory_max(void)
 {
struct device_node *memory = NULL;
+   struct device_node *dn = NULL;
unsigned int drconf_cell_cnt = 0;
u64 lmb_size = 0;
const __be32 *dm = NULL;
+   const __be64 *lrdr = NULL;
+   struct of_drconf_cell drmem;
+
+   dn = of_find_node_by_path("/rtas");
+   if (dn) {
+   lrdr = of_get_property(dn, "ibm,lrdr-capacity", NULL);
+   of_node_put(dn);
+   if (lrdr)
+   return be64_to_cpup(lrdr);
+   }
 
memory = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
if (memory) {
drconf_cell_cnt = of_get_drconf_memory(memory, );
lmb_size = of_get_lmb_size(memory);
+
+   /* Advance to the last cell, each cell has 6 32 bit integers */
+   dm += (drconf_cell_cnt - 1) * 6;
+   read_drconf_cell(, );
of_node_put(memory);
+   return drmem.base_addr + lmb_size;
}
-   return lmb_size * drconf_cell_cnt;
+   return 0;
 }
 
 /*
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[FIX PATCH v2 0/2] Fix powerpc,numa: Fix memory_hotplug_max()

2016-05-12 Thread Bharata B Rao
This patchset fixes memory_hotplug_max() routine to return correct
value of maximum hotpluggable address.

In this version, whitespace fixes are separated into a different patch.

v2: https://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg103342.html

Bharata B Rao (2):
  powerpc,numa: Fix whitespace in hot_add_drconf_memory_max()
  powerpc,numa: Fix memory_hotplug_max()

 arch/powerpc/mm/numa.c | 36 ++--
 1 file changed, 26 insertions(+), 10 deletions(-)

-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: usb: dwc2: regression on MyBook Live Duo / Canyonlands since 4.3.0-rc4

2016-05-12 Thread Christian Lamparter via Linuxppc-dev
On Thursday, May 12, 2016 01:55:44 PM Arnd Bergmann wrote:
> On Thursday 12 May 2016 11:58:18 Christian Lamparter wrote:
> > > > > Detecting the endianess of the
> > > > > device is probably the best future-proof solution, but it's also
> > > > > considerably more work to do in the driver, and comes with a
> > > > > tiny runtime overhead.
> > > > 
> > > > The runtime overhead is probably non-measurable compared with the cost
> > > > of the actual MMIOs.
> > > 
> > > Right. The code size increase is probably measurable (but still small),
> > > the runtime overhead is not.
> > 
> > Ok, so no rebuts or complains have been posted.
> > 
> > I've tested the patch you made in: https://lkml.org/lkml/2016/5/9/354
> > and it works: 
> > 
> > Tested-by: Christian Lamparter 
> > 
> > So, how do we go from here? There is are two small issues with the
> > original patch (#ifdef DWC2_LOG_WRITES got converted to lower case:
> > #ifdef dwc2_log_writes) and I guess a proper subject would be nice.  
> > 
> > Arnd, can you please respin and post it (cc'd stable as well)?
> > So this is can be picked up? Or what's your plan?
> 
> (I just realized my reply was stuck in my outbox, so the patch
> went out first)
> 
> If I recall correctly, the rough consensus was to go with your longer
> patch in the future (fixed up for the comments that BenH and
> I sent), and I'd suggest basing it on top of a fixed version of
> my patch.
Well, but it comes with the "overhead"! So this was just as I said:
"Let's look at it and see if it's any good"... And I think it isn't
since the usb/host/ehci people also opted for #ifdef CONFIG_BIG_ENDIAN
archs etc...
 
> Felipe just had another idea, to change the endianess of the dwc2
> block by setting a registers (if that exists). That would indeed
> be preferable, then we can just revert the broken change that
> went into 4.4 and backport that fix instead.
Just a quick reply. I have the docs for the thing. There's something
like that in GAHBCFG at Bit 24... BUT it only switches the endiannes
for the DMA descriptors (which is not always used, there are devices
with PIO only)! It doesn't deal with the MMIO access at all. 

The pin that would select which endian the device uses is probably
connected to a DCR or GPIO but I don't know which or where so this
is more or less useless. (Or the selectable endianness was dropped
during synth).

Regards,
Christian
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 2/2] powerpc/fadump: add support to parse size based on memory range

2016-05-12 Thread Hari Bathini
Currently, memory for fadump can be specified with fadump_reserve_mem=size,
where only a fixed size can be specified. Add the below syntax as well, to
support conditional reservation based on system memory size:

fadump_reserve_mem=:[,:,...]

This syntax helps using the same commandline parameter for different system
memory sizes.

Signed-off-by: Hari Bathini 
---
Changes from v1:
1. Changed subject from "powerpc/fadump: add support to specify memory range 
based size"
2. Reused crashkernel parsing code that was moved to kernel/params.c (see patch 
1/2)

 arch/powerpc/kernel/fadump.c |   64 --
 1 file changed, 55 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index d0af58b..a868281 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -193,6 +193,56 @@ static unsigned long init_fadump_mem_struct(struct 
fadump_mem_struct *fdm,
return addr;
 }
 
+/*
+ * This function parses command line for fadump_reserve_mem=
+ *
+ * Supports the below two syntaxes:
+ *1. fadump_reserve_mem=size
+ *2. fadump_reserve_mem=ramsize-range:size[,...]
+ *
+ * Sets fw_dump.reserve_bootvar with the memory size
+ * provided, 0 otherwise
+ *
+ * The function returns -EINVAL on failure, 0 otherwise.
+ */
+static int __init parse_fadump_reserve_mem(void)
+{
+   char *name = "fadump_reserve_mem=";
+   char *fadump_cmdline = NULL, *cur;
+
+   fw_dump.reserve_bootvar = 0;
+
+   /* find fadump_reserve_mem and use the last one if there are many */
+   cur = strstr(boot_command_line, name);
+   while (cur) {
+   fadump_cmdline = cur;
+   cur = strstr(cur+1, name);
+   }
+
+   /* when no fadump_reserve_mem= cmdline option is provided */
+   if (!fadump_cmdline)
+   return 0;
+
+   fadump_cmdline += strlen(name);
+
+   /* for fadump_reserve_mem=size cmdline syntax */
+   if (!is_param_range_based(fadump_cmdline)) {
+   fw_dump.reserve_bootvar = memparse(fadump_cmdline, NULL);
+   return 0;
+   }
+
+   /* for fadump_reserve_mem=ramsize-range:size[,...] cmdline syntax */
+   cur = fadump_cmdline;
+   fw_dump.reserve_bootvar = parse_mem_range_size("fadump_reserve_mem",
+   , memblock_phys_mem_size());
+   if (cur == fadump_cmdline) {
+   printk(KERN_INFO "fadump_reserve_mem: Invaild syntax!\n");
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
 /**
  * fadump_calculate_reserve_size(): reserve variable boot area 5% of System RAM
  *
@@ -212,12 +262,17 @@ static inline unsigned long 
fadump_calculate_reserve_size(void)
 {
unsigned long size;
 
+   /* sets fw_dump.reserve_bootvar */
+   parse_fadump_reserve_mem();
+
/*
 * Check if the size is specified through fadump_reserve_mem= cmdline
 * option. If yes, then use that.
 */
if (fw_dump.reserve_bootvar)
return fw_dump.reserve_bootvar;
+   else
+   printk(KERN_INFO "fadump: calculating default boot size\n");
 
/* divide by 20 to get 5% of value */
size = memblock_end_of_DRAM() / 20;
@@ -352,15 +407,6 @@ static int __init early_fadump_param(char *p)
 }
 early_param("fadump", early_fadump_param);
 
-/* Look for fadump_reserve_mem= cmdline option */
-static int __init early_fadump_reserve_mem(char *p)
-{
-   if (p)
-   fw_dump.reserve_bootvar = memparse(p, );
-   return 0;
-}
-early_param("fadump_reserve_mem", early_fadump_reserve_mem);
-
 static void register_fw_dump(struct fadump_mem_struct *fdm)
 {
int rc;

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/2] Refactor code parsing size based on memory range

2016-05-12 Thread Hari Bathini
Currently, crashkernel parameter supports the below syntax to parse size
based on memory range:

crashkernel=:[,:,...]

While such parsing is implemented for crashkernel parameter, it applies to
other parameters with similar syntax. So, move this code to a more generic
place for code reuse.

Signed-off-by: Hari Bathini 
---
While this patch in itself has nothing to do with powerpc,
the powerpc patch (2/2) depends on this patch..

 include/linux/kernel.h |5 +++
 kernel/kexec_core.c|   63 +++-
 kernel/params.c|   96 
 3 files changed, 106 insertions(+), 58 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 2f7775e..e755ed1 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -429,6 +429,11 @@ extern char *get_options(const char *str, int nints, int 
*ints);
 extern unsigned long long memparse(const char *ptr, char **retptr);
 extern bool parse_option_str(const char *str, const char *option);
 
+extern bool __init is_param_range_based(const char *cmdline);
+extern unsigned long long __init parse_mem_range_size(const char *param,
+ char **str,
+ unsigned long long 
system_ram);
+
 extern int core_kernel_text(unsigned long addr);
 extern int core_kernel_data(unsigned long addr);
 extern int __kernel_text_address(unsigned long addr);
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 1391d3e..71e92b2 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -1084,59 +1084,9 @@ static int __init parse_crashkernel_mem(char *cmdline,
char *cur = cmdline, *tmp;
 
/* for each entry of the comma-separated list */
-   do {
-   unsigned long long start, end = ULLONG_MAX, size;
-
-   /* get the start of the range */
-   start = memparse(cur, );
-   if (cur == tmp) {
-   pr_warn("crashkernel: Memory value expected\n");
-   return -EINVAL;
-   }
-   cur = tmp;
-   if (*cur != '-') {
-   pr_warn("crashkernel: '-' expected\n");
-   return -EINVAL;
-   }
-   cur++;
-
-   /* if no ':' is here, than we read the end */
-   if (*cur != ':') {
-   end = memparse(cur, );
-   if (cur == tmp) {
-   pr_warn("crashkernel: Memory value expected\n");
-   return -EINVAL;
-   }
-   cur = tmp;
-   if (end <= start) {
-   pr_warn("crashkernel: end <= start\n");
-   return -EINVAL;
-   }
-   }
-
-   if (*cur != ':') {
-   pr_warn("crashkernel: ':' expected\n");
-   return -EINVAL;
-   }
-   cur++;
-
-   size = memparse(cur, );
-   if (cur == tmp) {
-   pr_warn("Memory value expected\n");
-   return -EINVAL;
-   }
-   cur = tmp;
-   if (size >= system_ram) {
-   pr_warn("crashkernel: invalid size\n");
-   return -EINVAL;
-   }
-
-   /* match ? */
-   if (system_ram >= start && system_ram < end) {
-   *crash_size = size;
-   break;
-   }
-   } while (*cur++ == ',');
+   *crash_size = parse_mem_range_size("crashkernel", , system_ram);
+   if (cur == cmdline)
+   return -EINVAL;
 
if (*crash_size > 0) {
while (*cur && *cur != ' ' && *cur != '@')
@@ -1273,7 +1223,6 @@ static int __init __parse_crashkernel(char *cmdline,
 const char *name,
 const char *suffix)
 {
-   char*first_colon, *first_space;
char*ck_cmdline;
 
BUG_ON(!crash_size || !crash_base);
@@ -1291,12 +1240,10 @@ static int __init __parse_crashkernel(char *cmdline,
return parse_crashkernel_suffix(ck_cmdline, crash_size,
suffix);
/*
-* if the commandline contains a ':', then that's the extended
+* if the parameter is range based, then that's the extended
 * syntax -- if not, it must be the classic syntax
 */
-   first_colon = strchr(ck_cmdline, ':');
-   first_space = strchr(ck_cmdline, ' ');
-   if (first_colon && (!first_space || first_colon < first_space))
+   if (is_param_range_based(ck_cmdline))
return parse_crashkernel_mem(ck_cmdline, system_ram,

Re: usb: dwc2: regression on MyBook Live Duo / Canyonlands since 4.3.0-rc4

2016-05-12 Thread Arnd Bergmann
On Thursday 12 May 2016 11:58:18 Christian Lamparter wrote:
> > > > Detecting the endianess of the
> > > > device is probably the best future-proof solution, but it's also
> > > > considerably more work to do in the driver, and comes with a
> > > > tiny runtime overhead.
> > > 
> > > The runtime overhead is probably non-measurable compared with the cost
> > > of the actual MMIOs.
> > 
> > Right. The code size increase is probably measurable (but still small),
> > the runtime overhead is not.
> 
> Ok, so no rebuts or complains have been posted.
> 
> I've tested the patch you made in: https://lkml.org/lkml/2016/5/9/354
> and it works: 
> 
> Tested-by: Christian Lamparter 
> 
> So, how do we go from here? There is are two small issues with the
> original patch (#ifdef DWC2_LOG_WRITES got converted to lower case:
> #ifdef dwc2_log_writes) and I guess a proper subject would be nice.  
> 
> Arnd, can you please respin and post it (cc'd stable as well)?
> So this is can be picked up? Or what's your plan?

(I just realized my reply was stuck in my outbox, so the patch
went out first)

If I recall correctly, the rough consensus was to go with your longer
patch in the future (fixed up for the comments that BenH and
I sent), and I'd suggest basing it on top of a fixed version of
my patch.

Felipe just had another idea, to change the endianess of the dwc2
block by setting a registers (if that exists). That would indeed
be preferable, then we can just revert the broken change that
went into 4.4 and backport that fix instead.

Arnd
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] usb: dwc2: fix regression on big-endian PowerPC/ARM systems

2016-05-12 Thread Arnd Bergmann
On Thursday 12 May 2016 14:43:43 Felipe Balbi wrote:
> >> How many more drivers will we have to 'fix' like this ?
> >
> > Endianess problems will keep coming up, and we have hundreds or thousands
> > of drivers that are written with a particular design in mind that could
> > be wrong as soon as someone chooses to build an SoC that does things
> > differently. Once that happens, we'll fix them.
> >
> > Also, Christian has already posted a better version of the patch
> > that fixes this driver in an architecture independent way, but we still
> > need a workaround for the stable backports.
> 
> hmmm, at least dwc3 (also from SNPS) has a couple bits where we can
> choose endianess for registers and DMA descriptors. John, do we have the
> same for dwc2 ? Wouldn't that be a better way to solve the problem ?

Yes, I think that would be the best solution (provided it works correctly).

My understanding is that the descriptors don't need to change
for the particular MIPS machine, only the registers do. If we have
another machine that requires the descriptor endianess to be flipped
from the default, we probably need a DT property or platform_data
flag to encode that.

We can do the register endianess detection from Christian's patch to
flip it around if necessary, and then revert back to the previous
state of always using readl/writel.

Arnd
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] usb: dwc2: fix regression on big-endian PowerPC/ARM systems

2016-05-12 Thread Felipe Balbi

Hi,

(Arnd, you didn't Cc dwc2's maintainer. I'm also not part of TI anymore)

Arnd Bergmann  writes:
> On Thursday 12 May 2016 14:25:49 Felipe Balbi wrote:
>> >  {
>> >   u32 value = __raw_readl(addr);
>> >  
>> > - /* In order to preserve endianness __raw_* operation is used. 
>> > Therefore
>> > -  * a barrier is needed to ensure IO access is not re-ordered across
>> > + /* in order to preserve endianness __raw_* operation is used. 
>> > therefore
>> > +  * a barrier is needed to ensure io access is not re-ordered across
>> >* reads or writes
>> >*/
>> >   mb();
>> > @@ -81,15 +93,32 @@ static inline void dwc2_writel(u32 value, void __iomem 
>> > *addr)
>> >   __raw_writel(value, addr);
>> >  
>> >   /*
>> > -  * In order to preserve endianness __raw_* operation is used. 
>> > Therefore
>> > -  * a barrier is needed to ensure IO access is not re-ordered across
>> > +  * in order to preserve endianness __raw_* operation is used. 
>> > therefore
>> > +  * a barrier is needed to ensure io access is not re-ordered across
>> >* reads or writes
>> >*/
>> >   mb();
>> > -#ifdef DWC2_LOG_WRITES
>> > - pr_info("INFO:: wrote %08x to %p\n", value, addr);
>> > +#ifdef dwc2_log_writes
>> > + pr_info("info:: wrote %08x to %p\n", value, addr);
>> >  #endif
>> >  }
>> > +#else
>
> Oops, the accidental lowercase conversion is still in here, I'll fix it
> up once we agree on the approach.
>
>> I still think this is something that should be handled at MIPS side, no ?
>
> As I explained, there isn't really anything we can do in MIPS code
> because of the way they have to handle PCI.
>
>> How many more drivers will we have to 'fix' like this ?
>
> Endianess problems will keep coming up, and we have hundreds or thousands
> of drivers that are written with a particular design in mind that could
> be wrong as soon as someone chooses to build an SoC that does things
> differently. Once that happens, we'll fix them.
>
> Also, Christian has already posted a better version of the patch
> that fixes this driver in an architecture independent way, but we still
> need a workaround for the stable backports.

hmmm, at least dwc3 (also from SNPS) has a couple bits where we can
choose endianess for registers and DMA descriptors. John, do we have the
same for dwc2 ? Wouldn't that be a better way to solve the problem ?

-- 
balbi
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] usb: dwc2: fix regression on big-endian PowerPC/ARM systems

2016-05-12 Thread Arnd Bergmann
On Thursday 12 May 2016 14:25:49 Felipe Balbi wrote:
> >  {
> >   u32 value = __raw_readl(addr);
> >  
> > - /* In order to preserve endianness __raw_* operation is used. 
> > Therefore
> > -  * a barrier is needed to ensure IO access is not re-ordered across
> > + /* in order to preserve endianness __raw_* operation is used. 
> > therefore
> > +  * a barrier is needed to ensure io access is not re-ordered across
> >* reads or writes
> >*/
> >   mb();
> > @@ -81,15 +93,32 @@ static inline void dwc2_writel(u32 value, void __iomem 
> > *addr)
> >   __raw_writel(value, addr);
> >  
> >   /*
> > -  * In order to preserve endianness __raw_* operation is used. 
> > Therefore
> > -  * a barrier is needed to ensure IO access is not re-ordered across
> > +  * in order to preserve endianness __raw_* operation is used. 
> > therefore
> > +  * a barrier is needed to ensure io access is not re-ordered across
> >* reads or writes
> >*/
> >   mb();
> > -#ifdef DWC2_LOG_WRITES
> > - pr_info("INFO:: wrote %08x to %p\n", value, addr);
> > +#ifdef dwc2_log_writes
> > + pr_info("info:: wrote %08x to %p\n", value, addr);
> >  #endif
> >  }
> > +#else

Oops, the accidental lowercase conversion is still in here, I'll fix it
up once we agree on the approach.

> I still think this is something that should be handled at MIPS side, no ?

As I explained, there isn't really anything we can do in MIPS code
because of the way they have to handle PCI.

> How many more drivers will we have to 'fix' like this ?

Endianess problems will keep coming up, and we have hundreds or thousands
of drivers that are written with a particular design in mind that could
be wrong as soon as someone chooses to build an SoC that does things
differently. Once that happens, we'll fix them.

Also, Christian has already posted a better version of the patch
that fixes this driver in an architecture independent way, but we still
need a workaround for the stable backports.

Arnd
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v9 26/26] powerpc/powernv: Exclude root bus in pnv_pci_reset_secondary_bus()

2016-05-12 Thread Michael Ellerman
On Thu, 2016-05-12 at 13:48 +1000, Gavin Shan wrote:

> On Tue, May 03, 2016 at 03:41:45PM +1000, Gavin Shan wrote:

> > The function pnv_pci_reset_secondary_bus() is called like below.
> > It's impossible for call the function on root bus. So it's safe
> > to remove the root bus case in the function. No functional changes
> > introduced.
> > 
> >   pci_parent_bus_reset() / pci_bus_reset() / pci_try_reset_bus()
> >   pci_reset_bridge_secondary_bus()
> >   pcibios_reset_secondary_bus()
> >   pnv_pci_reset_secondary_bus()
> > 
> > Signed-off-by: Gavin Shan 
> > Reviewed-by: Daniel Axtens 
> > Reviewed-by: Alexey Kardashevskiy 
> > ---
> > arch/powerpc/platforms/powernv/eeh-powernv.c | 12 ++--
> > 1 file changed, 2 insertions(+), 10 deletions(-)
> > 
> > diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
> > b/arch/powerpc/platforms/powernv/eeh-powernv.c
> > index 9226df1..593b8dc 100644
> > --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
> > +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
> > @@ -868,16 +868,8 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, 
> > int option)
> > 
> > void pnv_pci_reset_secondary_bus(struct pci_dev *dev)
> > {
> > -   struct pci_controller *hose;
> > -
> > -   if (pci_is_root_bus(dev->bus)) {
> > -   hose = pci_bus_to_host(dev->bus);
> > -   pnv_eeh_root_reset(hose, EEH_RESET_HOT);
> > -   pnv_eeh_root_reset(hose, EEH_RESET_DEACTIVATE);
> > -   } else {
> > -   pnv_eeh_bridge_reset(dev, EEH_RESET_HOT);
> > -   pnv_eeh_bridge_reset(dev, EEH_RESET_DEACTIVATE);
> > -   }
> > +   pnv_eeh_bridge_reset(dev, EEH_RESET_HOT);
> > +   pnv_eeh_bridge_reset(dev, EEH_RESET_DEACTIVATE);
> > }
> 
> Michael, please revert this one as it is already in linux-ppc-next
> branch. Sorry for the overhead.

Done.

https://git.kernel.org/cgit/linux/kernel/git/powerpc/linux.git/commit/?h=next=848912e547c4569445a61203a7df402646a88c25

cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [kernel,2/2] powerpc/powernv/npu: Add PE to PHB's list

2016-05-12 Thread Michael Ellerman
On Thu, 2016-12-05 at 05:47:10 UTC, Alexey Kardashevskiy wrote:
> Before 3e68dc57 "powerpc/powernv: Remove DMA32 PE list", NPU PEs
> were linked to the NPU PHB via phb->ioda.pe_dma_list; after that fix,
> the phb->ioda.pe_list is used.
> 
> During the pe_dma_list removal, list_add_tail(>ioda.pe_dma_list)
> was removed, however no list_add() was added so does this patch.
> 
> Signed-off-by: Alexey Kardashevskiy 
> Reviewed-by: Gavin Shan 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/1d4e89cf0a4e819fbde428e9e5

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [kernel,1/2] powerpc/powernv: Fix insufficient memory allocation

2016-05-12 Thread Michael Ellerman
On Thu, 2016-12-05 at 05:47:09 UTC, Alexey Kardashevskiy wrote:
> The pnv_pci_init_ioda_phb() helper allocates a blob to store auxilary
> data such PE and M32/M64 segment allocation maps; this single blob has few
> partitions, size of each is derived from the PE number -
> phb->ioda.total_pe_num.
> 
> It was assumed that the minimum PE number is 8, however it is 4 for NPU
> so the pe_alloc part was missing in the allocated blob.
> It was invisible till recently as we were not tracking used M64 segments
> and NPUs do not use M32 segments so the phb->ioda.m32_segmap
> (which was pointing to the same address as phb->ioda.pe_alloc)
> has never been written to leaving the pe_alloc memory intact.
> 
> After 401203ac2d "powerpc/powernv: Track M64 segment consumption"
> the pe_alloc gets corrupted and PE allocation cannot work.
> This fixes the issue by enforcing the minimum PE number to 8.
> 
> Signed-off-by: Alexey Kardashevskiy 
> Reviewed-by: Gavin Shan 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/92a86756904b127a3450262b33

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [2/2, v3] powerpc/iommu: Remove the dependency on EEH struct in DDW mechanism

2016-05-12 Thread Michael Ellerman
On Mon, 2016-11-04 at 19:17:23 UTC, "Guilherme G. Piccoli" wrote:
> Commit 39baadbf36ce ("powerpc/eeh: Remove eeh information from pci_dn")
> changed the pci_dn struct by removing its EEH-related members.
> As part of this clean-up, DDW mechanism was modified to read the device
> configuration address from eeh_dev struct.
> 
> As a consequence, now if we disable EEH mechanism on kernel command-line
> for example, the DDW mechanism will fail, generating a kernel oops by
> dereferencing a NULL pointer (which turns to be the eeh_dev pointer).
> 
> This patch just changes the configuration address calculation on DDW
> functions to a manual calculation based on pci_dn members instead of
> using eeh_dev-based address.
> 
> No functional changes were made. This was tested on pSeries, both
> in PHyp and qemu guest.
> 
> Fixes: 39baadbf36ce ("powerpc/eeh: Remove eeh information from pci_dn")
> 
> Cc: sta...@vger.kernel.org
> Reviewed-by: Gavin Shan 
> Signed-off-by: Guilherme G. Piccoli 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/8445a87f7092bc8336ea1305be

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [1/2, v3] Revert "powerpc/eeh: Fix crash in eeh_add_device_early() on Cell"

2016-05-12 Thread Michael Ellerman
On Mon, 2016-11-04 at 19:17:22 UTC, "Guilherme G. Piccoli" wrote:
> This reverts commit 89a51df5ab1d38b257300b8ac940bbac3bb0eb9b.
> 
> The function eeh_add_device_early() is used to perform EEH initialization in
> devices added later on the system, like in hotplug/DLPAR scenarios. Since the
> commit 89a51df5ab1d ("powerpc/eeh: Fix crash in eeh_add_device_early() on 
> Cell")
> a new check was introduced in this function - Cell has no EEH capabilities
> which led to kernel oops if hotplug was performed, so checking for
> eeh_enabled() was introduced to avoid the issue.
> 
> However, in architectures that EEH is present like pSeries or PowerNV, we 
> might
> reach a case in which no PCI devices are present on boot time and so EEH is 
> not
> initialized. Then, if a device is added via DLPAR for example,
> eeh_add_device_early() fails because eeh_enabled() is false, and EEH end up
> not being enabled at all.
> 
> This reverts the aforementioned patch since a new verification was introduced 
> by
> the commit d91dafc02f42 ("powerpc/eeh: Delay probing EEH device during 
> hotplug")
> and so the original Cell issue does not happen anymore.
> 
> Cc: sta...@vger.kernel.org
> Reviewed-by: Gavin Shan 
> Signed-off-by: Guilherme G. Piccoli 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/c2078d9ef600bdbe568c89e5dd

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v3, 1/4] powerpc/eeh: Don't report error in eeh_pe_reset_and_recover()

2016-05-12 Thread Michael Ellerman
On Wed, 2016-27-04 at 01:14:50 UTC, Gavin Shan wrote:
> The function eeh_pe_reset_and_recover() is used to recover EEH
> error when the passthrough device are transferred to guest and
> backwards, meaning the device's driver is vfio-pci or none.
> When the driver is vfio-pci that provides error_detected() error
> handler only, the handler simply stops the guest and it's not
> expected behaviour. On the other hand, no error handlers will
> be called if we don't have a bound driver.
> 
> This ignores the error handler in eeh_pe_reset_and_recover()
> that reports the error to device driver to avoid the exceptional
> behaviour.
> 
> Fixes: 5cfb20b9 ("powerpc/eeh: Emulate EEH recovery for VFIO devices")
> Cc: sta...@vger.kernel.org #v3.18+
> Signed-off-by: Gavin Shan 
> Reviewed-by: Russell Currey 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/affeb0f2d3a9af419ad7ef4ac7

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc: Use privileged SPR number for MMCR2

2016-05-12 Thread Thomas Huth
We are already using the privileged versions of MMCR0, MMCR1
and MMCRA in the kernel, so for MMCR2, we should better use
the privileged versions, too, to be consistent.

Suggested-by: Paul Mackerras 
Signed-off-by: Thomas Huth 
---
 arch/powerpc/include/asm/reg.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index ce3e1b7..166d863 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -715,7 +715,7 @@
 #define   MMCR0_FCWAIT 0x0002UL /* freeze counter in WAIT state */
 #define   MMCR0_FCHV   0x0001UL /* freeze conditions in hypervisor mode */
 #define SPRN_MMCR1 798
-#define SPRN_MMCR2 769
+#define SPRN_MMCR2 785
 #define SPRN_MMCRA 0x312
 #define   MMCRA_SDSYNC 0x8000UL /* SDAR synced with SIAR */
 #define   MMCRA_SDAR_DCACHE_MISS 0x4000UL
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] usb: dwc2: fix regression on big-endian PowerPC/ARM systems

2016-05-12 Thread Felipe Balbi

Hi,

Arnd Bergmann  writes:
> A patch that went into Linux-4.4 to fix big-endian mode on a Lantiq
> MIPS system unfortunately broke big-endian operation on PowerPC
> APM82181 as reported by Christian Lamparter, and likely other
> systems.
>
> It actually introduced multiple issues:
>
> - it broke big-endian ARM kernels: any machine that was working
>   correctly with a little-endian kernel is no longer using byteswaps
>   on big-endian kernels, which clearly breaks them.
> - On PowerPC the same thing must be true: if it was working before,
>   using big-endian kernels is now broken. Unlike ARM, 32-bit PowerPC
>   usually uses big-endian kernels, so they are likely all broken.
> - The barrier for dwc2_writel is on the wrong side of the __raw_writel(),
>   so the MMIO no longer synchronizes with DMA operations.
> - On architectures that require specific CPU instructions for MMIO
>   access, using the __raw_ variant may turn this into a pointer
>   dereference that does not have the same effect as the readl/writel.
>
> This patch is a simple revert for all architectures other than MIPS,
> in the hope that we can more easily backport it to fix the regression
> on PowerPC and ARM systems without breaking the Lantiq system again.
>
> We should follow this up with a more elaborate change to add runtime
> detection of endianess, to make sure it also works on all other
> combinations of architectures and implementations of the usb-dwc2
> device. That patch however will be fairly large and not appropriate
> for backports to stable kernels.
>
> Signed-off-by: Arnd Bergmann 
> Fixes: 95c8bc360944 ("usb: dwc2: Use platform endianness when accessing 
> registers")
> ---
>  drivers/usb/dwc2/core.h | 41 +++--
>  1 file changed, 35 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/usb/dwc2/core.h b/drivers/usb/dwc2/core.h
> index 3c58d633ce80..1f8ed149a40f 100644
> --- a/drivers/usb/dwc2/core.h
> +++ b/drivers/usb/dwc2/core.h
> @@ -64,12 +64,24 @@
>   DWC2_TRACE_SCHEDULER_VB(pr_fmt("%s: SCH: " fmt),\
>   dev_name(hsotg->dev), ##__VA_ARGS__)
>  
> +
> +#ifdef CONFIG_MIPS
> +/*
> + * There are some MIPS machines that can run in either big-endian
> + * or little-endian mode and that use the dwc2 register without
> + * a byteswap in both ways.
> + * Unlike other architectures, MIPS does not require a barrier
> + * before the __raw_writel() to synchronize with DMA but does
> + * require the barrier after the writel() to serialize a series
> + * of writes. This set of operations was added specifically for
> + * MIPS and should only be used there.
> + */
>  static inline u32 dwc2_readl(const void __iomem *addr)
>  {
>   u32 value = __raw_readl(addr);
>  
> - /* In order to preserve endianness __raw_* operation is used. Therefore
> -  * a barrier is needed to ensure IO access is not re-ordered across
> + /* in order to preserve endianness __raw_* operation is used. therefore
> +  * a barrier is needed to ensure io access is not re-ordered across
>* reads or writes
>*/
>   mb();
> @@ -81,15 +93,32 @@ static inline void dwc2_writel(u32 value, void __iomem 
> *addr)
>   __raw_writel(value, addr);
>  
>   /*
> -  * In order to preserve endianness __raw_* operation is used. Therefore
> -  * a barrier is needed to ensure IO access is not re-ordered across
> +  * in order to preserve endianness __raw_* operation is used. therefore
> +  * a barrier is needed to ensure io access is not re-ordered across
>* reads or writes
>*/
>   mb();
> -#ifdef DWC2_LOG_WRITES
> - pr_info("INFO:: wrote %08x to %p\n", value, addr);
> +#ifdef dwc2_log_writes
> + pr_info("info:: wrote %08x to %p\n", value, addr);
>  #endif
>  }
> +#else

I still think this is something that should be handled at MIPS side, no ?

How many more drivers will we have to 'fix' like this ?

-- 
balbi
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc: Fix definition of SIAR and SDAR registers

2016-05-12 Thread Thomas Huth
The SIAR and SDAR registers are available twice, one time as SPRs
780 / 781 (unprivileged, but read-only), and one time as the SPRs
796 / 797 (privileged, but read and write). The Linux kernel code
currently uses the unprivileged  SPRs - while this is OK for reading,
writing to that register of course does not work.
Since the KVM code tries to write to this register, too (see the mtspr
in book3s_hv_rmhandlers.S), the contents of this register sometimes get
lost for the guests, e.g. during migration of a VM.
To fix this issue, simply switch to the privileged SPR numbers instead.

Signed-off-by: Thomas Huth 
---
 arch/powerpc/include/asm/reg.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index f5f4c66..ce3e1b7 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -752,13 +752,13 @@
 #define SPRN_PMC6  792
 #define SPRN_PMC7  793
 #define SPRN_PMC8  794
-#define SPRN_SIAR  780
-#define SPRN_SDAR  781
 #define SPRN_SIER  784
 #define   SIER_SIPR0x200   /* Sampled MSR_PR */
 #define   SIER_SIHV0x100   /* Sampled MSR_HV */
 #define   SIER_SIAR_VALID  0x040   /* SIAR contents valid */
 #define   SIER_SDAR_VALID  0x020   /* SDAR contents valid */
+#define SPRN_SIAR  796
+#define SPRN_SDAR  797
 #define SPRN_TACR  888
 #define SPRN_TCSCR 889
 #define SPRN_CSIGR 890
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] usb: dwc2: fix regression on big-endian PowerPC/ARM systems

2016-05-12 Thread Arnd Bergmann
A patch that went into Linux-4.4 to fix big-endian mode on a Lantiq
MIPS system unfortunately broke big-endian operation on PowerPC
APM82181 as reported by Christian Lamparter, and likely other
systems.

It actually introduced multiple issues:

- it broke big-endian ARM kernels: any machine that was working
  correctly with a little-endian kernel is no longer using byteswaps
  on big-endian kernels, which clearly breaks them.
- On PowerPC the same thing must be true: if it was working before,
  using big-endian kernels is now broken. Unlike ARM, 32-bit PowerPC
  usually uses big-endian kernels, so they are likely all broken.
- The barrier for dwc2_writel is on the wrong side of the __raw_writel(),
  so the MMIO no longer synchronizes with DMA operations.
- On architectures that require specific CPU instructions for MMIO
  access, using the __raw_ variant may turn this into a pointer
  dereference that does not have the same effect as the readl/writel.

This patch is a simple revert for all architectures other than MIPS,
in the hope that we can more easily backport it to fix the regression
on PowerPC and ARM systems without breaking the Lantiq system again.

We should follow this up with a more elaborate change to add runtime
detection of endianess, to make sure it also works on all other
combinations of architectures and implementations of the usb-dwc2
device. That patch however will be fairly large and not appropriate
for backports to stable kernels.

Signed-off-by: Arnd Bergmann 
Fixes: 95c8bc360944 ("usb: dwc2: Use platform endianness when accessing 
registers")
---
 drivers/usb/dwc2/core.h | 41 +++--
 1 file changed, 35 insertions(+), 6 deletions(-)

diff --git a/drivers/usb/dwc2/core.h b/drivers/usb/dwc2/core.h
index 3c58d633ce80..1f8ed149a40f 100644
--- a/drivers/usb/dwc2/core.h
+++ b/drivers/usb/dwc2/core.h
@@ -64,12 +64,24 @@
DWC2_TRACE_SCHEDULER_VB(pr_fmt("%s: SCH: " fmt),\
dev_name(hsotg->dev), ##__VA_ARGS__)
 
+
+#ifdef CONFIG_MIPS
+/*
+ * There are some MIPS machines that can run in either big-endian
+ * or little-endian mode and that use the dwc2 register without
+ * a byteswap in both ways.
+ * Unlike other architectures, MIPS does not require a barrier
+ * before the __raw_writel() to synchronize with DMA but does
+ * require the barrier after the writel() to serialize a series
+ * of writes. This set of operations was added specifically for
+ * MIPS and should only be used there.
+ */
 static inline u32 dwc2_readl(const void __iomem *addr)
 {
u32 value = __raw_readl(addr);
 
-   /* In order to preserve endianness __raw_* operation is used. Therefore
-* a barrier is needed to ensure IO access is not re-ordered across
+   /* in order to preserve endianness __raw_* operation is used. therefore
+* a barrier is needed to ensure io access is not re-ordered across
 * reads or writes
 */
mb();
@@ -81,15 +93,32 @@ static inline void dwc2_writel(u32 value, void __iomem 
*addr)
__raw_writel(value, addr);
 
/*
-* In order to preserve endianness __raw_* operation is used. Therefore
-* a barrier is needed to ensure IO access is not re-ordered across
+* in order to preserve endianness __raw_* operation is used. therefore
+* a barrier is needed to ensure io access is not re-ordered across
 * reads or writes
 */
mb();
-#ifdef DWC2_LOG_WRITES
-   pr_info("INFO:: wrote %08x to %p\n", value, addr);
+#ifdef dwc2_log_writes
+   pr_info("info:: wrote %08x to %p\n", value, addr);
 #endif
 }
+#else
+/* Normal architectures just use readl/write */
+static inline u32 dwc2_readl(const void __iomem *addr)
+{
+   u32 value = readl(addr);
+   return value;
+}
+
+static inline void dwc2_writel(u32 value, void __iomem *addr)
+{
+   writel(value, addr);
+
+#ifdef dwc2_log_writes
+   pr_info("info:: wrote %08x to %p\n", value, addr);
+#endif
+}
+#endif
 
 /* Maximum number of Endpoints/HostChannels */
 #define MAX_EPS_CHANNELS   16
-- 
2.7.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: Fix definition of SIAR register

2016-05-12 Thread Paul Mackerras
On Thu, May 12, 2016 at 09:27:40AM +0200, Thomas Huth wrote:
> On 12.05.2016 06:57, Paul Mackerras wrote:
> > On Mon, Apr 25, 2016 at 10:15:47AM +0200, Alexander Graf wrote:
> >>
>  +#define SPRN_SIAR796
> >>
> >> I'm sure there's a reason (iSeries?) we used the r/o version before. 
> >> Better introduce a new constant that gives us rw access and use that in 
> >> the kvm entry/exit code.
> > 
> > I don't think we ever did any performance monitoring on iSeries, and
> > in any case, we have removed the iSeries code.  I think we should just
> > use the privileged, read/write number (i.e. 796).
> 
> Ok, then I'll simply add the fix for SDAR to my patch as well, test it
> and submit it again.

Great!  Could you switch MMCR2 to the privileged number as well while
you're at it?

Thanks,
Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [FIX,v1] powerpc,numa: Fix memory_hotplug_max()

2016-05-12 Thread Michael Ellerman
On Thu, 2016-12-05 at 08:26:36 UTC, Bharata B Rao wrote:
> memory_hotplug_max() uses hot_add_drconf_memory_max() to get maxmimum
> addressable memory by referring to ibm,dyanamic-memory property. There
> are three problems with the current approach:
...
> 
> NOTE: There are some unnecessary changes in the patch because of converting
> spaces to tabs w/o which checkpatch.pl complains.

OK. You don't always have to listen to checkpatch :)

Please split this into two patches, one that fixes the whitespace problems in
the function, and then a second one which makes your changes.

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: usb: dwc2: regression on MyBook Live Duo / Canyonlands since 4.3.0-rc4

2016-05-12 Thread Christian Lamparter via Linuxppc-dev
On Tuesday, May 10, 2016 09:23:59 AM Arnd Bergmann wrote:
> On Tuesday 10 May 2016 08:37:52 Benjamin Herrenschmidt wrote:
> > On Mon, 2016-05-09 at 17:08 +0200, Arnd Bergmann wrote:
> > > 
> > > Unfortunately, I don't see any way this could be done in MIPS specific
> > > code: There is typically a byteswap between the internal bus and the PCI
> > > bus on big-endian MIPS systems, so the PCI MMIO ends up being 
> > > little-endian,
> > 
> > Ugh ... not exactly, re-watch my talk on the matter :-) While there is
> > a specific lane wiring to preserve byte addresss, in the end it's the
> > end device itself that is either BE or LE. Regardless of any "bus
> > endianness".
> 
> I found your slides on
> 
> http://www.linuxplumbersconf.org/2012/wp-content/uploads/2012/09/2012-lpc-ref-big-little-endian-herrenschmidt.odp
> 
> but there are at least two more twists that you completely missed here:
> 
> - Some architectures (e.g. ARMv5 "BE32" mode in IXP4xx, surely some others)
>   do not implement big-endian mode by wiring up the data lines between the
>   bus and the CPU differently between big- and little-endian mode like
>   powerpc and armv7 "BE8" do, but instead they swizzle the *address* lines
>   on 8-bit and 16-bit addresses. The effect of that is that normal RAM
>   accesses work as expected both ways, and devices that are accessed using
>   32-bit MMIO ops never need any byteswap (you actually get "native
>   endian") while MMIO with 8 and 16 bit width does something completely
>   unexpected and touches the wrong register. Having an explicit byteswap
>   on the PCI host bridge gets you the expected addresses again for 8-bit
>   cycles but it also means that readl()/writel() again need to swap the
>   data.
> 
> - Some other architectures (e.g. Broadcom MIPS) apparently are even fancier
>   and use a strapping pin on the SoC flips the endianess of the CPU core
>   at the same time as all the peripheral MMIO registers, with the intention
>   of never requiring any byte swaps. I believe they are implemented careful
>   enough to actually get this right, but it confuses the heck out of
>   Linux drivers that don't expect this.
> 
> > > which matches the expected behavior of readl/writel. However, drivers
> > > for non-PCI devices often use the same readl/writel accessors because
> > > that is how it's done on ARMv6/ARMv7.
> > 
> > Even then, you can have on-SoC (non-PCI) devices that also have a
> > different endianness from the main CPU. How does it work on ARM for
> > example ? The device endianness should be fixed, regardless of the
> > endianness of the core, no ?
> 
> ARMv6/v7 is uses BE8 mode like powerpc: each peripheral is fixed-endian
> and you have to know what it is. Only Freescale managed to put identical
> IP blocks on various (powerpc-derived) SoCs and have a subset of them
> treat the access as little-endian while others remain big-endian, so all
> those drivers now require runtime detection.
> 
> > > Doing it hardcoded by architecture is just the simplest way to deal
> > > with it, working on the assumption that nothing actually needs the
> > > runtime detection that Ben suggested. 
> > 
> > No, it's not an archicture problem. It's a problem specific to that one
> > SoC that the device was synthetized to be a certain endian while it was
> > synthetized differently on another SoC... that also happens to be a
> > different architecture. But doesn't have to.
> > 
> > For example, we had in the past cases of both LE and BE EHCI
> > implementations on the same architecture (PowerPC).
> 
> I understand this, but from what I see in this history of this particular
> driver, all ARM and PowerPC implementations chose to use LE registers for
> DWC2 because the normal approach for these is to not mess with endianess,
> while presumably all MIPS users of the same block wired up the endian-select
> line of the IP block to match that of the CPU core, again because it's
> what you are expected to do on a MIPS based SoC.
> 
> So hardcoding it per architecture would make an assumption based on
> the mindset of the SoC designers rather than strict technical differences,
> and that can fail as soon as someone does things differently on any of
> them (see the Freescale example), but I still think it's the easiest
> workaround for backporting to stable kernels. A revert of the original
> patch would be even easier, but that would break the one big-endian
> MIPS machine we know about.
> 
> > > Detecting the endianess of the
> > > device is probably the best future-proof solution, but it's also
> > > considerably more work to do in the driver, and comes with a
> > > tiny runtime overhead.
> > 
> > The runtime overhead is probably non-measurable compared with the cost
> > of the actual MMIOs.
> 
> Right. The code size increase is probably measurable (but still small),
> the runtime overhead is not.

Ok, so no rebuts or complains have been posted.

I've tested the patch you made in: https://lkml.org/lkml/2016/5/9/354

Re: [PATCH] kvm-pr: manage illegal instructions

2016-05-12 Thread Alexander Graf

On 05/12/2016 11:10 AM, Laurent Vivier wrote:


On 11/05/2016 13:49, Alexander Graf wrote:

On 05/11/2016 01:14 PM, Laurent Vivier wrote:

On 11/05/2016 12:35, Alexander Graf wrote:

On 03/15/2016 09:18 PM, Laurent Vivier wrote:

While writing some instruction tests for kvm-unit-tests for powerpc,
I've found that illegal instructions are not managed correctly with
kvm-pr,
while it is fine with kvm-hv.

When an illegal instruction (like ".long 0") is processed by kvm-pr,
the kernel logs are filled with:

Couldn't emulate instruction 0x (op 0 xop 0)
kvmppc_handle_exit_pr: emulation at 700 failed ()

While the exception handler receives an interrupt for each instruction
executed after the illegal instruction.

Signed-off-by: Laurent Vivier 
---
arch/powerpc/kvm/book3s_emulate.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_emulate.c
b/arch/powerpc/kvm/book3s_emulate.c
index 2afdb9c..4ee969d 100644
--- a/arch/powerpc/kvm/book3s_emulate.c
+++ b/arch/powerpc/kvm/book3s_emulate.c
@@ -99,7 +99,6 @@ int kvmppc_core_emulate_op_pr(struct kvm_run *run,
struct kvm_vcpu *vcpu,
  switch (get_op(inst)) {
case 0:
-emulated = EMULATE_FAIL;
if ((kvmppc_get_msr(vcpu) & MSR_LE) &&
(inst == swab32(inst_sc))) {
/*
@@ -112,6 +111,9 @@ int kvmppc_core_emulate_op_pr(struct kvm_run *run,
struct kvm_vcpu *vcpu,
kvmppc_set_gpr(vcpu, 3, EV_UNIMPLEMENTED);
kvmppc_set_pc(vcpu, kvmppc_get_pc(vcpu) + 4);
emulated = EMULATE_DONE;
+} else {
+kvmppc_core_queue_program(vcpu, SRR1_PROGILL);

But isn't that exactly what the semantic of EMULATE_FAIL is? Fixing it
up in book3s_emulate.c is definitely the wrong spot.

So what is the problem you're trying to solve? Is the SRR0 at the wrong
spot or are the log messages the problem?

No, the problem is the host kernel logs are filled by the message and
the execution hangs. And the host becomes unresponsiveness, even after
the end of the tests.

Please, try to run kvm-unit-tests (the emulator test) on a KVM-PR host,
and check the kernel logs (dmesg), then try to ssh to the host...

Ok, so the log messages are the problem. Please fix the message output
then - or remove it altogether. Or if you like, create a module
parameter that allows you to emit them.

I personally think the best solution would be to just convert the
message into a trace point.

While at it, please see whether the guest can trigger similar host log
output excess in other code paths.

The problem is not really with the log messages: they are consequence of
the bug I try to fix.

What happens is once kvm_pr decodes an invalid instruction all the valid
following instructions trigger a Program exception to the guest (but are
executed correctly). It has no real consequence on big machine like
POWER8, except that the guest become very slow and the log files of the
host are filled with messages (and qemu uses 100% of the CPU). On a
smaller machine like a  PowerMac G5, the machine becomes simply unusable.


It's probably more related to your verbosity level of kernel messages. 
If you pass loglevel=0 (or quiet) to you kernel cmdline you won't get 
the messages printed to serial which is what's slowing you down.


The other problem sounds pretty severe, but the only thing your patch 
does any different from the current code flow would be the patch below. 
Or did I miss anything?


diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index 5cc2e7a..4672bc2 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -302,7 +302,11 @@ int kvmppc_emulate_instruction(struct kvm_run *run, 
struct kvm_vcpu *vcpu)

advance = 0;
printk(KERN_ERR "Couldn't emulate instruction 
0x%08x "
   "(op %d xop %d)\n", inst, get_op(inst), 
get_xop(inst));

+#ifdef CONFIG_PPC_BOOK3S
+   kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
+#else
kvmppc_core_queue_program(vcpu, 0);
+#endif
}
}


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] kvm-pr: manage illegal instructions

2016-05-12 Thread Laurent Vivier


On 11/05/2016 13:49, Alexander Graf wrote:
> On 05/11/2016 01:14 PM, Laurent Vivier wrote:
>>
>> On 11/05/2016 12:35, Alexander Graf wrote:
>>> On 03/15/2016 09:18 PM, Laurent Vivier wrote:
 While writing some instruction tests for kvm-unit-tests for powerpc,
 I've found that illegal instructions are not managed correctly with
 kvm-pr,
 while it is fine with kvm-hv.

 When an illegal instruction (like ".long 0") is processed by kvm-pr,
 the kernel logs are filled with:

Couldn't emulate instruction 0x (op 0 xop 0)
kvmppc_handle_exit_pr: emulation at 700 failed ()

 While the exception handler receives an interrupt for each instruction
 executed after the illegal instruction.

 Signed-off-by: Laurent Vivier 
 ---
arch/powerpc/kvm/book3s_emulate.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

 diff --git a/arch/powerpc/kvm/book3s_emulate.c
 b/arch/powerpc/kvm/book3s_emulate.c
 index 2afdb9c..4ee969d 100644
 --- a/arch/powerpc/kvm/book3s_emulate.c
 +++ b/arch/powerpc/kvm/book3s_emulate.c
 @@ -99,7 +99,6 @@ int kvmppc_core_emulate_op_pr(struct kvm_run *run,
 struct kvm_vcpu *vcpu,
  switch (get_op(inst)) {
case 0:
 -emulated = EMULATE_FAIL;
if ((kvmppc_get_msr(vcpu) & MSR_LE) &&
(inst == swab32(inst_sc))) {
/*
 @@ -112,6 +111,9 @@ int kvmppc_core_emulate_op_pr(struct kvm_run *run,
 struct kvm_vcpu *vcpu,
kvmppc_set_gpr(vcpu, 3, EV_UNIMPLEMENTED);
kvmppc_set_pc(vcpu, kvmppc_get_pc(vcpu) + 4);
emulated = EMULATE_DONE;
 +} else {
 +kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
>>> But isn't that exactly what the semantic of EMULATE_FAIL is? Fixing it
>>> up in book3s_emulate.c is definitely the wrong spot.
>>>
>>> So what is the problem you're trying to solve? Is the SRR0 at the wrong
>>> spot or are the log messages the problem?
>> No, the problem is the host kernel logs are filled by the message and
>> the execution hangs. And the host becomes unresponsiveness, even after
>> the end of the tests.
>>
>> Please, try to run kvm-unit-tests (the emulator test) on a KVM-PR host,
>> and check the kernel logs (dmesg), then try to ssh to the host...
> 
> Ok, so the log messages are the problem. Please fix the message output
> then - or remove it altogether. Or if you like, create a module
> parameter that allows you to emit them.
> 
> I personally think the best solution would be to just convert the
> message into a trace point.
> 
> While at it, please see whether the guest can trigger similar host log
> output excess in other code paths.

The problem is not really with the log messages: they are consequence of
the bug I try to fix.

What happens is once kvm_pr decodes an invalid instruction all the valid
following instructions trigger a Program exception to the guest (but are
executed correctly). It has no real consequence on big machine like
POWER8, except that the guest become very slow and the log files of the
host are filled with messages (and qemu uses 100% of the CPU). On a
smaller machine like a  PowerMac G5, the machine becomes simply unusable.

Please try kvm-unit-tests to see what happens:

qemu-system-ppc64 -machine pseries,accel=kvm,kvm-type=PR -bios
powerpc/boot_rom.bin -display none -serial stdio -kernel
powerpc/emulator.elf -smp 1 --append "-v"


Thanks,
Laurent
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[FIX PATCH v1] powerpc,numa: Fix memory_hotplug_max()

2016-05-12 Thread Bharata B Rao
memory_hotplug_max() uses hot_add_drconf_memory_max() to get maxmimum
addressable memory by referring to ibm,dyanamic-memory property. There
are three problems with the current approach:

1 hot_add_drconf_memory_max() assumes that ibm,dynamic-memory includes
  all the LMBs of the guest, but that is not true for PowerKVM which
  populates only DR LMBs (LMBs that can be hotplugged/removed) in that
  property.
2 hot_add_drconf_memory_max() multiplies lmb-size with lmb-count to arrive
  at the max possible address. Since ibm,dynamic-memory doesn't include
  RMA LMBs, the address thus obtained will be less than the actual max
  address. For example, if max possible memory size is 32G, with lmb-size
  of 256MB there can be 127 LMBs in ibm,dynamic-memory (1 LMB for RMA
  which won't be present here).  hot_add_drconf_memory_max() would then
  return the max addressable memory as 127 * 256MB = 31.75GB, the max
  address should have been 32G which is what ibm,lrdr-capacity shows.
3 In PowerKVM, there can be a gap between the end of boot time RAM and
  beginning of hotplug RAM area. So just multiplying lmb-count with
  lmb-size will not provide the correct max possible address for PowerKVM.

This patch fixes 1 by using ibm,lrdr-capacity property to return the max
addressable memory whenever the property is present. Then it fixes 2 & 3
by fetching the address of the last LMB in ibm,dynamic-memory property.

NOTE: There are some unnecessary changes in the patch because of converting
spaces to tabs w/o which checkpatch.pl complains.

Signed-off-by: Bharata B Rao 
---
Changes in v1
-
- Return 0 from hot_add_drconf_memory_max() when
  ibm,dynamic-reconfiguration-memory node isn't available. This restores
  the existing behaviour.
- Rebase against next branch of
  git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git

 arch/powerpc/mm/numa.c | 34 +-
 1 file changed, 25 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 669a15e..6a26ce8 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1164,17 +1164,33 @@ int hot_add_scn_to_nid(unsigned long scn_addr)
 static u64 hot_add_drconf_memory_max(void)
 {
 struct device_node *memory = NULL;
-unsigned int drconf_cell_cnt = 0;
-u64 lmb_size = 0;
+   struct device_node *dn = NULL;
+   unsigned int drconf_cell_cnt = 0;
+   u64 lmb_size = 0;
const __be32 *dm = NULL;
+   const __be64 *lrdr = NULL;
+   struct of_drconf_cell drmem;
+
+   dn = of_find_node_by_path("/rtas");
+   if (dn) {
+   lrdr = of_get_property(dn, "ibm,lrdr-capacity", NULL);
+   of_node_put(dn);
+   if (lrdr)
+   return be64_to_cpup(lrdr);
+   }
+
+   memory = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
+   if (memory) {
+   drconf_cell_cnt = of_get_drconf_memory(memory, );
+   lmb_size = of_get_lmb_size(memory);
 
-memory = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
-if (memory) {
-drconf_cell_cnt = of_get_drconf_memory(memory, );
-lmb_size = of_get_lmb_size(memory);
-of_node_put(memory);
-}
-return lmb_size * drconf_cell_cnt;
+   /* Advance to the last cell, each cell has 6 32 bit integers */
+   dm += (drconf_cell_cnt - 1) * 6;
+   read_drconf_cell(, );
+   of_node_put(memory);
+   return drmem.base_addr + lmb_size;
+   }
+   return 0;
 }
 
 /*
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH kernel 1/2] powerpc/powernv: Fix insufficient memory allocation

2016-05-12 Thread Gavin Shan
On Thu, May 12, 2016 at 04:39:57PM +1000, Alexey Kardashevskiy wrote:
>On 05/12/2016 04:09 PM, Gavin Shan wrote:
>>On Thu, May 12, 2016 at 03:47:09PM +1000, Alexey Kardashevskiy wrote:
>>>The pnv_pci_init_ioda_phb() helper allocates a blob to store auxilary
>>>data such PE and M32/M64 segment allocation maps; this single blob has few
>>>partitions, size of each is derived from the PE number -
>>>phb->ioda.total_pe_num.
>>>
>>>It was assumed that the minimum PE number is 8, however it is 4 for NPU
>>>so the pe_alloc part was missing in the allocated blob.
>>>It was invisible till recently as we were not tracking used M64 segments
>>>and NPUs do not use M32 segments so the phb->ioda.m32_segmap
>>>(which was pointing to the same address as phb->ioda.pe_alloc)
>>>has never been written to leaving the pe_alloc memory intact.
>>>
>>>After 401203ac2d "powerpc/powernv: Track M64 segment consumption"
>>>the pe_alloc gets corrupted and PE allocation cannot work.
>>>This fixes the issue by enforcing the minimum PE number to 8.
>>>
>>
>>As I said offline yesterday, the issue exists from day-1 when NPU PHB
>>is supported. I don't think it's related to 401203ac2d. Without the logic
>>tracking M64 segments, the PE# bitmap still can be corrupted when writting
>>to M32 segment map. It's confusing to mention a unrelated commit in the
>>changelog.
>
>
>The bug exists from day 1. The actual corruption started happening from
>401203ac2d, it could be happening before but it was not.
>

well, Are you sure it starts from 401203ac2d? I would believe it starts
from commit 3fa23ff ("powerpc/powernv: Fix initial IO and M32 segmap").
Please fix the commit log.

>>
>>Since the issue exists from day-1 when NPU PHB is supported, is a stable
>>tag needed?
>>
>>>Signed-off-by: Alexey Kardashevskiy 
>>
>>With above parts fixed:
>>
>>Reviewed-by: Gavin Shan 
>>
>>>---
>>>arch/powerpc/platforms/powernv/pci-ioda.c | 3 ++-
>>>1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
>>>b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>index 6cda2a8..d0d32c2 100644
>>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>@@ -3507,7 +3507,8 @@ static void __init pnv_pci_init_ioda_phb(struct 
>>>device_node *np,
>>> PNV_IODA1_DMA32_SEGSIZE;
>>>
>>> /* Allocate aux data & arrays. We don't have IO ports on PHB3 */
>>>-size = _ALIGN_UP(phb->ioda.total_pe_num / 8, sizeof(unsigned long));
>>>+size = _ALIGN_UP(max_t(unsigned, phb->ioda.total_pe_num, 8) / 8,
>>>+sizeof(unsigned long));
>>> m64map_off = size;
>>> size += phb->ioda.total_pe_num * sizeof(phb->ioda.m64_segmap[0]);
>>> m32map_off = size;
>>>--
>>>2.5.0.rc3
>>>
>>
>
>
>-- 
>Alexey
>

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: Fix definition of SIAR register

2016-05-12 Thread Thomas Huth
On 12.05.2016 06:57, Paul Mackerras wrote:
> On Mon, Apr 25, 2016 at 10:15:47AM +0200, Alexander Graf wrote:
>>
 +#define SPRN_SIAR796
>>
>> I'm sure there's a reason (iSeries?) we used the r/o version before. Better 
>> introduce a new constant that gives us rw access and use that in the kvm 
>> entry/exit code.
> 
> I don't think we ever did any performance monitoring on iSeries, and
> in any case, we have removed the iSeries code.  I think we should just
> use the privileged, read/write number (i.e. 796).

Ok, then I'll simply add the fix for SDAR to my patch as well, test it
and submit it again.

 Thomas

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH kernel 1/2] powerpc/powernv: Fix insufficient memory allocation

2016-05-12 Thread Alexey Kardashevskiy

On 05/12/2016 04:09 PM, Gavin Shan wrote:

On Thu, May 12, 2016 at 03:47:09PM +1000, Alexey Kardashevskiy wrote:

The pnv_pci_init_ioda_phb() helper allocates a blob to store auxilary
data such PE and M32/M64 segment allocation maps; this single blob has few
partitions, size of each is derived from the PE number -
phb->ioda.total_pe_num.

It was assumed that the minimum PE number is 8, however it is 4 for NPU
so the pe_alloc part was missing in the allocated blob.
It was invisible till recently as we were not tracking used M64 segments
and NPUs do not use M32 segments so the phb->ioda.m32_segmap
(which was pointing to the same address as phb->ioda.pe_alloc)
has never been written to leaving the pe_alloc memory intact.

After 401203ac2d "powerpc/powernv: Track M64 segment consumption"
the pe_alloc gets corrupted and PE allocation cannot work.
This fixes the issue by enforcing the minimum PE number to 8.



As I said offline yesterday, the issue exists from day-1 when NPU PHB
is supported. I don't think it's related to 401203ac2d. Without the logic
tracking M64 segments, the PE# bitmap still can be corrupted when writting
to M32 segment map. It's confusing to mention a unrelated commit in the
changelog.



The bug exists from day 1. The actual corruption started happening from 
401203ac2d, it could be happening before but it was not.




Since the issue exists from day-1 when NPU PHB is supported, is a stable
tag needed?


Signed-off-by: Alexey Kardashevskiy 


With above parts fixed:

Reviewed-by: Gavin Shan 


---
arch/powerpc/platforms/powernv/pci-ioda.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 6cda2a8..d0d32c2 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -3507,7 +3507,8 @@ static void __init pnv_pci_init_ioda_phb(struct 
device_node *np,
PNV_IODA1_DMA32_SEGSIZE;

/* Allocate aux data & arrays. We don't have IO ports on PHB3 */
-   size = _ALIGN_UP(phb->ioda.total_pe_num / 8, sizeof(unsigned long));
+   size = _ALIGN_UP(max_t(unsigned, phb->ioda.total_pe_num, 8) / 8,
+   sizeof(unsigned long));
m64map_off = size;
size += phb->ioda.total_pe_num * sizeof(phb->ioda.m64_segmap[0]);
m32map_off = size;
--
2.5.0.rc3






--
Alexey
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH kernel 2/2] powerpc/powernv/npu: Add PE to PHB's list

2016-05-12 Thread Gavin Shan
On Thu, May 12, 2016 at 03:47:10PM +1000, Alexey Kardashevskiy wrote:
>Before 3e68dc57 "powerpc/powernv: Remove DMA32 PE list", NPU PEs
>were linked to the NPU PHB via phb->ioda.pe_dma_list; after that fix,
>the phb->ioda.pe_list is used.
>
>During the pe_dma_list removal, list_add_tail(>ioda.pe_dma_list)
>was removed, however no list_add() was added so does this patch.
>
>Signed-off-by: Alexey Kardashevskiy 

Reviewed-by: Gavin Shan 

>---
> arch/powerpc/platforms/powernv/pci-ioda.c | 3 +++
> 1 file changed, 3 insertions(+)
>
>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
>b/arch/powerpc/platforms/powernv/pci-ioda.c
>index d0d32c2..4f9e73a 100644
>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>@@ -1009,6 +1009,9 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct 
>pci_dev *dev)
>   return NULL;
>   }
>
>+  /* Put PE to the list */
>+  list_add_tail(>list, >ioda.pe_list);
>+
>   return pe;
> }
>
>-- 
>2.5.0.rc3
>

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev