Re: [PATCH 04/16] iommu: sva: Add support for private PASIDs

2018-07-17 Thread Jordan Crouse
On Tue, Jul 17, 2018 at 12:21:03PM +0100, Jean-Philippe Brucker wrote:
> Hi Jordan,
> 
> Thanks for the patches, I finally got around testing them with SMMUv3.
> It's an important feature, arguably more than SVA itself. I could pick
> this one as part of the SVA series, what do you think?

I'm good with whatever is the easiest.

> Although I probably would have done the same, I dislike the interface
> because it forces us to duplicate functions and IOMMU ops. The list is
> small but growing:
> 
> iommu_map
> iommu_map_sg
> iommu_unmap
> iommu_unmap_fast
> iommu_iova_to_phys
> iommu_tlb_range_add
> iommu_flush_tlb_all
> 
> Each of these and their associated IOMMU op will have an iommu_sva_X
> counterpart that takes one different argument. Modifying these functions
> to take both a domain and a PASID argument would be more elegant. Or as
> an intermediate solution, perhaps we could only change the IOMMU ops to
> take an additional argument, like you did for map_sg?
> 
> In any case it requires invasive changes in lots of drivers and we can
> always tidy up later, so unless Joerg has a preference I'd keep the
> duplicates for now.

I agree.

> However, having to lookup pasid-to-io_mm on every map/unmap call is
> cumbersome, especially since map/unmap are supposed to be as fast as
> possible. iommu_sva_alloc_pasid should return a structure representing
> the PASID instead of the value alone. The io_mm structure seems like a
> good fit, and the device driver can access io_mm->pasid directly or via
> an io_mm_get_pasid() function.
> 
> The new functions would then be:
> 
> struct io_mm *iommu_sva_alloc_pasid(domain, dev)
> void iommu_sva_free_pasid(domain, io_mm)
> 
> int iommu_sva_map(io_mm, iova, paddr, size, prot)
> size_t iommu_map_sg(io_mm, iova, sg, nents, prot)
> size_t iommu_sva_unmap(io_mm, iova, size)
> size_t iommu_sva_unmap_fast(io_mm, iova, size)
> phys_addr_t iommu_sva_iova_to_phys(io_mm, iova)
> void iommu_sva_flush_tlb_all(io_mm)
> void iommu_sva_tlb_range_add(io_mm, iova, size)

Okay - this sounds reasonable - a simplification like that could
even lead to making all the new functions static inlines which
would cut down on the exported symbols.

> A few more comments inline

All those sound like good ideas to me. I'll take a bit of time to bash on this
and send out an updated revision soonish.

Jordan



-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: usb HC busted?

2018-07-17 Thread Sudip Mukherjee
On Tue, Jul 17, 2018 at 04:59:01PM +0100, Sudip Mukherjee wrote:
> On Tue, Jul 17, 2018 at 05:52:59PM +0200, Greg KH wrote:
> > On Tue, Jul 17, 2018 at 10:31:38AM -0400, Alan Stern wrote:
> > > On Tue, 17 Jul 2018, Greg KH wrote:
> > > 
> > > > > From: Sudip Mukherjee 
> > > > > Date: Tue, 10 Jul 2018 09:50:00 +0100
> > > > > Subject: [PATCH] hacky solution to mem-corruption
> > > > > 
> > > > > Signed-off-by: Sudip Mukherjee 
> > > > > ---
> 
> > > 
> > > No, neither of these is right.  It's possible to use 
> > > usb_set_interface() as a kind of "soft" reset.  Even when the new 
> > > altsetting is specified to be the same as the current one, we still 
> > > have to tell the lower-layer drivers and hardware about it.
> > 
> > You are right, it's a hacky soft reset, I was just trying to figure out
> > what the bluetooth driver was trying to do.  I wouldn't expect it to be
> > calling that function a lot, but I guess it does :(
> 
> usb_set_interface() is being called two times from bluetooth event. But
> I am now adding more debugs to see why your patch did not work.

So, a very simple debug to see the sequence of functions being called.
I have attached the patch I used.

In a good case:
[  124.287991] sudip: xhci_urb_dequeue
[  124.287997] sudip: xhci_queue_stop_endpoint cmd=ee032950
[  124.288016] sudip: handle_cmd_completion cmd=ee032950
[  124.288173] sudip: xhci_urb_dequeue
[  124.288176] sudip: xhci_queue_stop_endpoint cmd=ee032950
[  124.288189] sudip: handle_cmd_completion cmd=ee032950
[  124.290647] sudip: usb_hcd_flush_endpoint
[  124.290652] sudip: usb_hcd_flush_endpoint

But in a bad case:
[  186.786900] sudip: xhci_urb_dequeue
[  186.786905] sudip: xhci_queue_stop_endpoint cmd=ebe47cb0
[  186.786923] sudip: handle_cmd_completion cmd=ebe47cb0
[  186.789040] sudip: xhci_urb_dequeue
[  186.789047] sudip: xhci_queue_stop_endpoint cmd=ebe47cb0
[  186.789069] sudip: handle_cmd_completion cmd=ebe47cb0
[  186.790082] sudip: usb_hcd_flush_endpoint
[  186.790094] sudip: xhci_urb_dequeue
[  186.790097] sudip: xhci_queue_stop_endpoint cmd=ebe47290
[  186.790150] sudip: handle_cmd_completion cmd=ebe47290
[  186.790202] sudip: usb_hcd_flush_endpoint

So, when usb_hcd_flush_endpoint() is called by usb_disable_endpoint() it
finds urbs still on the urb_list of the ep. And in the process of unlinking
them, it again sends the command to stop the endpoint, although that endpoint
has already been stopped.
So Greg's patch did not work as the memory got corrupted on the first call
to usb_set_interface(), whereas that patch was preventing the second call
to usb_set_interface().

--
Regards
Sudip
diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
index 467bedeb542a..8d28f120ec0a 100644
--- a/drivers/usb/core/hcd.c
+++ b/drivers/usb/core/hcd.c
@@ -1885,6 +1885,7 @@ void usb_hcd_flush_endpoint(struct usb_device *udev,
might_sleep();
hcd = bus_to_hcd(udev->bus);
 
+   pr_err("sudip: %s\n", __func__);
/* No more submits can occur */
spin_lock_irq(_urb_list_lock);
 rescan:
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 6996235e34a9..4f80791fdfc5 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -1450,6 +1450,7 @@ static void handle_cmd_completion(struct xhci_hcd *xhci,
case TRB_STOP_RING:
WARN_ON(slot_id != TRB_TO_SLOT_ID(
le32_to_cpu(cmd_trb->generic.field[3])));
+   pr_err("sudip: %s cmd=%p\n", __func__, cmd);
xhci_handle_cmd_stop_ep(xhci, slot_id, cmd_trb, event);
break;
case TRB_SET_DEQ:
@@ -4009,6 +4010,7 @@ int xhci_queue_stop_endpoint(struct xhci_hcd *xhci, 
struct xhci_command *cmd,
u32 type = TRB_TYPE(TRB_STOP_RING);
u32 trb_suspend = SUSPEND_PORT_FOR_TRB(suspend);
 
+   pr_err("sudip: %s cmd=%p\n", __func__, cmd);
return queue_command(xhci, cmd, 0, 0, 0,
trb_slot_id | trb_ep_index | type | trb_suspend, false);
 }
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index db1de6113db2..3832128107ff 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -1516,6 +1516,7 @@ static int xhci_urb_dequeue(struct usb_hcd *hcd, struct 
urb *urb, int status)
ep->stop_cmd_timer.expires = jiffies +
XHCI_STOP_EP_CMD_TIMEOUT * HZ;
add_timer(>stop_cmd_timer);
+   pr_err("sudip: %s\n", __func__);
xhci_queue_stop_endpoint(xhci, command, urb->dev->slot_id,
 ep_index, 0);
xhci_ring_cmd_db(xhci);
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] dma-mapping: Relax warnings for per-device areas

2018-07-17 Thread Fredrik Noring
Hi Christoph, Geoff,

[ CC-ing Geoff to give him an opportunity to chime in about the PS3 part. ]

> > Here are three other regressions related to the coherent mask WARN_ON_ONCE:
> 
> They are a pretty strong indication that yes, you should really set
> the coherent mask if you ever do coherent allocations..

I'm unfortunately unfamiliar with the USB drivers for the PS3. Geoff, what
do you think about setting a coherent mask?

Christoph, what mask value would you suggest for the PS2 driver? Typical
DMA addresses 0-0x20 are mapped to 0x1c00-0x1c20 and is memory
managed exclusively by the IOP. Robin indicated that DMA_BIT_MASK(20) or a
nonzero but useless value such as 1 are possibly candidates. It seems quite
unclear what the coherent mask actually means in this case, doesn't it?

Fredrik
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: usb HC busted?

2018-07-17 Thread Sudip Mukherjee
On Tue, Jul 17, 2018 at 05:52:59PM +0200, Greg KH wrote:
> On Tue, Jul 17, 2018 at 10:31:38AM -0400, Alan Stern wrote:
> > On Tue, 17 Jul 2018, Greg KH wrote:
> > 
> > > > From: Sudip Mukherjee 
> > > > Date: Tue, 10 Jul 2018 09:50:00 +0100
> > > > Subject: [PATCH] hacky solution to mem-corruption
> > > > 
> > > > Signed-off-by: Sudip Mukherjee 
> > > > ---

> > 
> > No, neither of these is right.  It's possible to use 
> > usb_set_interface() as a kind of "soft" reset.  Even when the new 
> > altsetting is specified to be the same as the current one, we still 
> > have to tell the lower-layer drivers and hardware about it.
> 
> You are right, it's a hacky soft reset, I was just trying to figure out
> what the bluetooth driver was trying to do.  I wouldn't expect it to be
> calling that function a lot, but I guess it does :(

usb_set_interface() is being called two times from bluetooth event. But
I am now adding more debugs to see why your patch did not work.

--
Regards
Sudip
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: usb HC busted?

2018-07-17 Thread Greg KH
On Tue, Jul 17, 2018 at 10:31:38AM -0400, Alan Stern wrote:
> On Tue, 17 Jul 2018, Greg KH wrote:
> 
> > > From: Sudip Mukherjee 
> > > Date: Tue, 10 Jul 2018 09:50:00 +0100
> > > Subject: [PATCH] hacky solution to mem-corruption
> > > 
> > > Signed-off-by: Sudip Mukherjee 
> > > ---
> > >  drivers/usb/core/message.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/usb/core/message.c b/drivers/usb/core/message.c
> > > index 7cd4ec33dbf4..7fdf7a27611d 100644
> > > --- a/drivers/usb/core/message.c
> > > +++ b/drivers/usb/core/message.c
> > > @@ -1398,7 +1398,8 @@ int usb_set_interface(struct usb_device *dev, int 
> > > interface, int alternate)
> > >   remove_intf_ep_devs(iface);
> > >   usb_remove_sysfs_intf_files(iface);
> > >   }
> > > - usb_disable_interface(dev, iface, true);
> > > + if (!(iface->cur_altsetting && alt))
> > > + usb_disable_interface(dev, iface, true);
> > 
> > 
> > 
> > This feels like a "correct" patch anyway, why would a driver keep
> > calling set_interface to an interface that it was already set to?
> > 
> > But can't we check for this higher up in the function?  This hack will
> > just not disable an interface but it will do all of the other stuff
> > being asked for.  Does the patch below also solve this for you?  It's
> > not a good solution of course, but it might work around the problem a
> > bit better.
> > 
> > thanks,
> > 
> > greg k-h
> > 
> > diff --git a/drivers/usb/core/message.c b/drivers/usb/core/message.c
> > index 1a15392326fc..0f718f1a1ca3 100644
> > --- a/drivers/usb/core/message.c
> > +++ b/drivers/usb/core/message.c
> > @@ -1376,6 +1376,14 @@ int usb_set_interface(struct usb_device *dev, int 
> > interface, int alternate)
> > return -EINVAL;
> > }
> >  
> > +   if (iface->cur_altsetting == alt) {
> > +   /*
> > +* foolish bluetooth stack, don't try to set a setting you are
> > +* already set to...
> > +*/
> > +   return 0;
> > +   }
> > +
> > /* Make sure we have enough bandwidth for this alternate interface.
> >  * Remove the current alt setting and add the new alt setting.
> >  */
> 
> No, neither of these is right.  It's possible to use 
> usb_set_interface() as a kind of "soft" reset.  Even when the new 
> altsetting is specified to be the same as the current one, we still 
> have to tell the lower-layer drivers and hardware about it.

You are right, it's a hacky soft reset, I was just trying to figure out
what the bluetooth driver was trying to do.  I wouldn't expect it to be
calling that function a lot, but I guess it does :(

thanks,

greg k-h
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: usb HC busted?

2018-07-17 Thread Sudip Mukherjee
Hi Alan, Greg,

On Tue, Jul 17, 2018 at 03:49:18PM +0100, Sudip Mukherjee wrote:
> On Tue, Jul 17, 2018 at 03:40:22PM +0100, Sudip Mukherjee wrote:
> > Hi Alan,
> > 
> > On Tue, Jul 17, 2018 at 10:28:14AM -0400, Alan Stern wrote:
> > > On Tue, 17 Jul 2018, Sudip Mukherjee wrote:
> > > 
> > > > I did some more debugging. Tested with a KASAN enabled kernel and that
> > > > shows the problem. The report is attached.
> > > > 
> > > > To my understanding:
> > > > 
> > > > btusb_work() is calling usb_set_interface() with alternate = 0. which
> > > > again calls usb_hcd_alloc_bandwidth() and that frees the rings by
> > > > xhci_free_endpoint_ring().
> > > 
> > > That doesn't sound like the right thing to do.  The rings shouldn't be 
> > > freed until xhci_endpoint_disable() is called.  
> > > 
> > > On the other hand, there doesn't appear to be any 
> > > xhci_endpoint_disable() routine, although a comment refers to it.  
> > > Maybe this is the real problem?
> > 
> > one of your old mail might help :)
> > 
> > https://www.spinics.net/lists/linux-usb/msg98123.html
> 
> Wrote too soon.
> 
> Is it the one you are looking for -
> usb_disable_endpoint() is in drivers/usb/core/message.c

I think now I understand what the problem is.
usb_set_interface() calls usb_disable_interface() which again calls
usb_disable_endpoint(). This usb_disable_endpoint() gets the pointer
to 'ep', marks it as NULL and sends the pointer to usb_hcd_flush_endpoint().
After flushing the endpoints usb_disable_endpoint() calls
usb_hcd_disable_endpoint() which tries to do:
if (hcd->driver->endpoint_disable)
hcd->driver->endpoint_disable(hcd, ep);
but there is no endpoint_disable() callback in xhci, so the endpoint is
never marked as disabled. So, next time usb_hcd_flush_endpoint() is
called I get this corruption. 
And this is exactly where I used to see the problem happening.

And, my hacky patch worked as I prevented it from calling
usb_disable_interface() in this particular case.

Greg - answering your question here. My hacky patch was based on the
fact that usb_hcd_alloc_bandwidth() is calling hcd->driver->drop_endpoint()
and hcd->driver->add_endpoint() if (cur_alt && new_alt). So, I prevented
usb_disable_interface() to be called for that same condition. And that
worked as the call to usb_hcd_flush_endpoint() was not executed.
I know it is not correct and I might be having memory leaks for this, but
I have the system working till we get the actual fix.

--
Regards
Sudip

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: usb HC busted?

2018-07-17 Thread Alan Stern
On Tue, 17 Jul 2018, Sudip Mukherjee wrote:

> On Tue, Jul 17, 2018 at 03:40:22PM +0100, Sudip Mukherjee wrote:
> > Hi Alan,
> > 
> > On Tue, Jul 17, 2018 at 10:28:14AM -0400, Alan Stern wrote:
> > > On Tue, 17 Jul 2018, Sudip Mukherjee wrote:
> > > 
> > > > I did some more debugging. Tested with a KASAN enabled kernel and that
> > > > shows the problem. The report is attached.
> > > > 
> > > > To my understanding:
> > > > 
> > > > btusb_work() is calling usb_set_interface() with alternate = 0. which
> > > > again calls usb_hcd_alloc_bandwidth() and that frees the rings by
> > > > xhci_free_endpoint_ring().
> > > 
> > > That doesn't sound like the right thing to do.  The rings shouldn't be 
> > > freed until xhci_endpoint_disable() is called.  
> > > 
> > > On the other hand, there doesn't appear to be any 
> > > xhci_endpoint_disable() routine, although a comment refers to it.  
> > > Maybe this is the real problem?
> > 
> > one of your old mail might help :)
> > 
> > https://www.spinics.net/lists/linux-usb/msg98123.html

That message seems to say the same thing as what I just wrote, more or 
less.

> Wrote too soon.
> 
> Is it the one you are looking for -
> usb_disable_endpoint() is in drivers/usb/core/message.c

No, I'm talking about xhci_endpoint_disable(), which would be called by 
usb_hcd_disable_endpoint() if it existed.  Of course, 
usb_hcd_disable_endpoint() is called by usb_disable_endpoint().

Alan Stern

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] dma: remove unsupported gfp_mask parameter from dma_alloc_from_contiguous()

2018-07-17 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] mm/cma: remove unsupported gfp_mask parameter from cma_alloc()

2018-07-17 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] dma-mapping: Relax warnings for per-device areas

2018-07-17 Thread Christoph Hellwig
On Sun, Jul 15, 2018 at 02:28:27PM +0200, Fredrik Noring wrote:
> Hi Christoph, Robin,
> 
> On Thu, Jul 05, 2018 at 09:36:13PM +0200, Christoph Hellwig wrote:
> > > - BUG_ON(!ops);
> > > - WARN_ON_ONCE(dev && !dev->coherent_dma_mask);
> > > -
> > >   if (dma_alloc_from_dev_coherent(dev, size, dma_handle, _addr))
> > >   return cpu_addr;
> > >  
> > > + BUG_ON(!ops);
> > > + WARN_ON_ONCE(dev && !dev->coherent_dma_mask);
> > 
> > I think doing dma on a device without ops is completely broken no matter
> > what you think of it, so I very much disagree with that part of the change.
> > 
> > Also while I don't think not having a dma mask is a good idea even for
> > a driver purely using dma coherent pools.  If the pools really are on
> > the device itself I can see why it might not matter, but for the case
> > commonly used on some ARM SOCs where we just reserve memory for certain
> > devices from a system pool it very much does matter.
> > 
> > There really is no good excuse to not set a coherent mask in the drivers.
> 
> Here are three other regressions related to the coherent mask WARN_ON_ONCE:

They are a pretty strong indication that yes, you should really set
the coherent mask if you ever do coherent allocations..
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: usb HC busted?

2018-07-17 Thread Sudip Mukherjee
On Tue, Jul 17, 2018 at 03:40:22PM +0100, Sudip Mukherjee wrote:
> Hi Alan,
> 
> On Tue, Jul 17, 2018 at 10:28:14AM -0400, Alan Stern wrote:
> > On Tue, 17 Jul 2018, Sudip Mukherjee wrote:
> > 
> > > I did some more debugging. Tested with a KASAN enabled kernel and that
> > > shows the problem. The report is attached.
> > > 
> > > To my understanding:
> > > 
> > > btusb_work() is calling usb_set_interface() with alternate = 0. which
> > > again calls usb_hcd_alloc_bandwidth() and that frees the rings by
> > > xhci_free_endpoint_ring().
> > 
> > That doesn't sound like the right thing to do.  The rings shouldn't be 
> > freed until xhci_endpoint_disable() is called.  
> > 
> > On the other hand, there doesn't appear to be any 
> > xhci_endpoint_disable() routine, although a comment refers to it.  
> > Maybe this is the real problem?
> 
> one of your old mail might help :)
> 
> https://www.spinics.net/lists/linux-usb/msg98123.html

Wrote too soon.

Is it the one you are looking for -
usb_disable_endpoint() is in drivers/usb/core/message.c

--
Regards
Sudip
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: usb HC busted?

2018-07-17 Thread Sudip Mukherjee
Hi Alan,

On Tue, Jul 17, 2018 at 10:28:14AM -0400, Alan Stern wrote:
> On Tue, 17 Jul 2018, Sudip Mukherjee wrote:
> 
> > I did some more debugging. Tested with a KASAN enabled kernel and that
> > shows the problem. The report is attached.
> > 
> > To my understanding:
> > 
> > btusb_work() is calling usb_set_interface() with alternate = 0. which
> > again calls usb_hcd_alloc_bandwidth() and that frees the rings by
> > xhci_free_endpoint_ring().
> 
> That doesn't sound like the right thing to do.  The rings shouldn't be 
> freed until xhci_endpoint_disable() is called.  
> 
> On the other hand, there doesn't appear to be any 
> xhci_endpoint_disable() routine, although a comment refers to it.  
> Maybe this is the real problem?

one of your old mail might help :)

https://www.spinics.net/lists/linux-usb/msg98123.html

--
Regards
Sudip
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: usb HC busted?

2018-07-17 Thread Alan Stern
On Tue, 17 Jul 2018, Greg KH wrote:

> > From: Sudip Mukherjee 
> > Date: Tue, 10 Jul 2018 09:50:00 +0100
> > Subject: [PATCH] hacky solution to mem-corruption
> > 
> > Signed-off-by: Sudip Mukherjee 
> > ---
> >  drivers/usb/core/message.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/usb/core/message.c b/drivers/usb/core/message.c
> > index 7cd4ec33dbf4..7fdf7a27611d 100644
> > --- a/drivers/usb/core/message.c
> > +++ b/drivers/usb/core/message.c
> > @@ -1398,7 +1398,8 @@ int usb_set_interface(struct usb_device *dev, int 
> > interface, int alternate)
> > remove_intf_ep_devs(iface);
> > usb_remove_sysfs_intf_files(iface);
> > }
> > -   usb_disable_interface(dev, iface, true);
> > +   if (!(iface->cur_altsetting && alt))
> > +   usb_disable_interface(dev, iface, true);
> 
> 
> 
> This feels like a "correct" patch anyway, why would a driver keep
> calling set_interface to an interface that it was already set to?
> 
> But can't we check for this higher up in the function?  This hack will
> just not disable an interface but it will do all of the other stuff
> being asked for.  Does the patch below also solve this for you?  It's
> not a good solution of course, but it might work around the problem a
> bit better.
> 
> thanks,
> 
> greg k-h
> 
> diff --git a/drivers/usb/core/message.c b/drivers/usb/core/message.c
> index 1a15392326fc..0f718f1a1ca3 100644
> --- a/drivers/usb/core/message.c
> +++ b/drivers/usb/core/message.c
> @@ -1376,6 +1376,14 @@ int usb_set_interface(struct usb_device *dev, int 
> interface, int alternate)
>   return -EINVAL;
>   }
>  
> + if (iface->cur_altsetting == alt) {
> + /*
> +  * foolish bluetooth stack, don't try to set a setting you are
> +  * already set to...
> +  */
> + return 0;
> + }
> +
>   /* Make sure we have enough bandwidth for this alternate interface.
>* Remove the current alt setting and add the new alt setting.
>*/

No, neither of these is right.  It's possible to use 
usb_set_interface() as a kind of "soft" reset.  Even when the new 
altsetting is specified to be the same as the current one, we still 
have to tell the lower-layer drivers and hardware about it.

Alan Stern

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: usb HC busted?

2018-07-17 Thread Alan Stern
On Tue, 17 Jul 2018, Sudip Mukherjee wrote:

> I did some more debugging. Tested with a KASAN enabled kernel and that
> shows the problem. The report is attached.
> 
> To my understanding:
> 
> btusb_work() is calling usb_set_interface() with alternate = 0. which
> again calls usb_hcd_alloc_bandwidth() and that frees the rings by
> xhci_free_endpoint_ring().

That doesn't sound like the right thing to do.  The rings shouldn't be 
freed until xhci_endpoint_disable() is called.  

On the other hand, there doesn't appear to be any 
xhci_endpoint_disable() routine, although a comment refers to it.  
Maybe this is the real problem?

Alan Stern

> But then usb_set_interface() continues and
> calls usb_disable_interface() -> usb_hcd_flush_endpoint()->unlink1()->
> xhci_urb_dequeue() which at the end gives the command to stop endpoint.
> 
> In all the cycles I have tested I see that only in the fail case
> handle_cmd_completion() gets called, but in the cycles where the error
> is not there handle_cmd_completion() is not called with that command.
> 
> I am not sure what is happening, and you are the best person to understand
> what is happening. :)
> 
> But for now (untill you are back from holiday and suggest a proper solution),
> I made a hacky patch (attached) which is working and I donot get any
> corruption after that. Both KASAN and slub debug are also happy.
> 
> So, now waiting for you to analyze what is going on and suggest a proper
> fix.
> 
> Thanks in advance.
> 
> --
> Regards
> Sudip
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: usb HC busted?

2018-07-17 Thread Greg KH
On Tue, Jul 17, 2018 at 02:20:00PM +0100, Sudip Mukherjee wrote:
> Hi Greg,
> 
> On Tue, Jul 17, 2018 at 02:04:11PM +0200, Greg KH wrote:
> > On Tue, Jul 17, 2018 at 12:41:04PM +0100, Sudip Mukherjee wrote:
> > > Hi Mathias,
> > > 
> > > On Sat, Jun 30, 2018 at 10:07:04PM +0100, Sudip Mukherjee wrote:
> > > > Hi Mathias,
> > > > 
> > > > On Fri, Jun 29, 2018 at 02:41:13PM +0300, Mathias Nyman wrote:
> > > > > On 27.06.2018 14:59, Sudip Mukherjee wrote:
> > > > > > > > Can you share a bit more details on the platform you are using, 
> > > > > > > > and what types of test you are running.
> > > > > > > 
> > > 
> > > > Then to track what is going on, I added the slub debugging and :(
> > > > I have attached part of dmesg for you to check.
> > > > Will appreciate your help in finding out the problem.
> > > 
> > > I did some more debugging. Tested with a KASAN enabled kernel and that
> > > shows the problem. The report is attached.
> > > 
> > > To my understanding:
> > > 
> > > btusb_work() is calling usb_set_interface() with alternate = 0. which
> > > again calls usb_hcd_alloc_bandwidth() and that frees the rings by
> > > xhci_free_endpoint_ring(). But then usb_set_interface() continues and
> > > calls usb_disable_interface() -> usb_hcd_flush_endpoint()->unlink1()->
> > > xhci_urb_dequeue() which at the end gives the command to stop endpoint.
> > > 
> > > In all the cycles I have tested I see that only in the fail case
> > > handle_cmd_completion() gets called, but in the cycles where the error
> > > is not there handle_cmd_completion() is not called with that command.
> > > 
> > > I am not sure what is happening, and you are the best person to understand
> > > what is happening. :)
> > > 
> > > But for now (untill you are back from holiday and suggest a proper 
> > > solution),
> > > I made a hacky patch (attached) which is working and I donot get any
> > > corruption after that. Both KASAN and slub debug are also happy.
> > > 
> > > So, now waiting for you to analyze what is going on and suggest a proper
> > > fix.
> > > 
> > > Thanks in advance.
> > > 
> > > --
> > > Regards
> > > Sudip
> > 
> > > [  236.814156] 
> > > ==
> > > [  236.814187] BUG: KASAN: use-after-free in 
> > > xhci_trb_virt_to_dma+0x2e/0x74 [xhci_hcd]
> > > [  236.814193] Read of size 8 at addr 8800789329c8 by task weston/138
> > > 
> > > [  236.814203] CPU: 0 PID: 138 Comm: weston Tainted: G U  W  O
> > > 4.14.47-20180606+ #7
> > > [  236.814206] Hardware name: xxx, BIOS 2017.01-00087-g43e04de 08/30/2017
> > > [  236.814209] Call Trace:
> > > [  236.814214]  
> > > [  236.814226]  dump_stack+0x46/0x59
> > > [  236.814238]  print_address_description+0x6b/0x23b
> > > [  236.814255]  ? xhci_trb_virt_to_dma+0x2e/0x74 [xhci_hcd]
> > > [  236.814262]  kasan_report+0x220/0x246
> > > [  236.814278]  xhci_trb_virt_to_dma+0x2e/0x74 [xhci_hcd]
> > > [  236.814294]  trb_in_td+0x3b/0x1cd [xhci_hcd]
> > > [  236.814311]  handle_cmd_completion+0x1181/0x2c9b [xhci_hcd]
> > > [  236.814329]  ? xhci_queue_new_dequeue_state+0x5d9/0x5d9 [xhci_hcd]
> > > [  236.814337]  ? drm_handle_vblank+0x4ec/0x590
> > > [  236.814352]  xhci_irq+0x529/0x3294 [xhci_hcd]
> > > [  236.814362]  ? __accumulate_pelt_segments+0x24/0x33
> > > [  236.814378]  ? finish_td.isra.40+0x223/0x223 [xhci_hcd]
> > > [  236.814384]  ? __accumulate_pelt_segments+0x24/0x33
> > > [  236.814390]  ? __accumulate_pelt_segments+0x24/0x33
> > > [  236.814405]  ? xhci_irq+0x3294/0x3294 [xhci_hcd]
> > > [  236.814412]  __handle_irq_event_percpu+0x149/0x3db
> > > [  236.814421]  handle_irq_event_percpu+0x65/0x109
> > > [  236.814428]  ? __handle_irq_event_percpu+0x3db/0x3db
> > > [  236.814436]  ? ttwu_do_wakeup.isra.18+0x3a2/0x3ce
> > > [  236.814442]  handle_irq_event+0xa8/0x10a
> > > [  236.814449]  handle_edge_irq+0x4b2/0x538
> > > [  236.814458]  handle_irq+0x3e/0x45
> > > [  236.814465]  do_IRQ+0x5c/0x126
> > > [  236.814474]  common_interrupt+0x7a/0x7a
> > > [  236.814478]  
> > > [  236.814483] RIP: 0023:0xf79d3d82
> > > [  236.814486] RSP: 002b:ffc588e8 EFLAGS: 00200282 ORIG_RAX: 
> > > ffdc
> > > [  236.814493] RAX:  RBX: f7bebd5c RCX: 
> > > 
> > > [  236.814496] RDX: 08d4197c RSI:  RDI: 
> > > f746c020
> > > [  236.814499] RBP: ffc588e8 R08:  R09: 
> > > 
> > > [  236.814503] R10:  R11: 00200206 R12: 
> > > 
> > > [  236.814506] R13:  R14:  R15: 
> > > 
> > > 
> > > [  236.814513] Allocated by task 2082:
> > > [  236.814521]  kasan_kmalloc.part.1+0x51/0xc7
> > > [  236.814526]  kmem_cache_alloc_trace+0x178/0x187
> > > [  236.814540]  xhci_segment_alloc.isra.11+0x9d/0x3bf [xhci_hcd]
> > > [  236.814553]  xhci_alloc_segments_for_ring+0x9e/0x176 [xhci_hcd]
> > > [  236.814566]  

Re: usb HC busted?

2018-07-17 Thread Sudip Mukherjee
Hi Greg,

On Tue, Jul 17, 2018 at 02:04:11PM +0200, Greg KH wrote:
> On Tue, Jul 17, 2018 at 12:41:04PM +0100, Sudip Mukherjee wrote:
> > Hi Mathias,
> > 
> > On Sat, Jun 30, 2018 at 10:07:04PM +0100, Sudip Mukherjee wrote:
> > > Hi Mathias,
> > > 
> > > On Fri, Jun 29, 2018 at 02:41:13PM +0300, Mathias Nyman wrote:
> > > > On 27.06.2018 14:59, Sudip Mukherjee wrote:
> > > > > > > Can you share a bit more details on the platform you are using, 
> > > > > > > and what types of test you are running.
> > > > > > 
> > 
> > > Then to track what is going on, I added the slub debugging and :(
> > > I have attached part of dmesg for you to check.
> > > Will appreciate your help in finding out the problem.
> > 
> > I did some more debugging. Tested with a KASAN enabled kernel and that
> > shows the problem. The report is attached.
> > 
> > To my understanding:
> > 
> > btusb_work() is calling usb_set_interface() with alternate = 0. which
> > again calls usb_hcd_alloc_bandwidth() and that frees the rings by
> > xhci_free_endpoint_ring(). But then usb_set_interface() continues and
> > calls usb_disable_interface() -> usb_hcd_flush_endpoint()->unlink1()->
> > xhci_urb_dequeue() which at the end gives the command to stop endpoint.
> > 
> > In all the cycles I have tested I see that only in the fail case
> > handle_cmd_completion() gets called, but in the cycles where the error
> > is not there handle_cmd_completion() is not called with that command.
> > 
> > I am not sure what is happening, and you are the best person to understand
> > what is happening. :)
> > 
> > But for now (untill you are back from holiday and suggest a proper 
> > solution),
> > I made a hacky patch (attached) which is working and I donot get any
> > corruption after that. Both KASAN and slub debug are also happy.
> > 
> > So, now waiting for you to analyze what is going on and suggest a proper
> > fix.
> > 
> > Thanks in advance.
> > 
> > --
> > Regards
> > Sudip
> 
> > [  236.814156] 
> > ==
> > [  236.814187] BUG: KASAN: use-after-free in xhci_trb_virt_to_dma+0x2e/0x74 
> > [xhci_hcd]
> > [  236.814193] Read of size 8 at addr 8800789329c8 by task weston/138
> > 
> > [  236.814203] CPU: 0 PID: 138 Comm: weston Tainted: G U  W  O
> > 4.14.47-20180606+ #7
> > [  236.814206] Hardware name: xxx, BIOS 2017.01-00087-g43e04de 08/30/2017
> > [  236.814209] Call Trace:
> > [  236.814214]  
> > [  236.814226]  dump_stack+0x46/0x59
> > [  236.814238]  print_address_description+0x6b/0x23b
> > [  236.814255]  ? xhci_trb_virt_to_dma+0x2e/0x74 [xhci_hcd]
> > [  236.814262]  kasan_report+0x220/0x246
> > [  236.814278]  xhci_trb_virt_to_dma+0x2e/0x74 [xhci_hcd]
> > [  236.814294]  trb_in_td+0x3b/0x1cd [xhci_hcd]
> > [  236.814311]  handle_cmd_completion+0x1181/0x2c9b [xhci_hcd]
> > [  236.814329]  ? xhci_queue_new_dequeue_state+0x5d9/0x5d9 [xhci_hcd]
> > [  236.814337]  ? drm_handle_vblank+0x4ec/0x590
> > [  236.814352]  xhci_irq+0x529/0x3294 [xhci_hcd]
> > [  236.814362]  ? __accumulate_pelt_segments+0x24/0x33
> > [  236.814378]  ? finish_td.isra.40+0x223/0x223 [xhci_hcd]
> > [  236.814384]  ? __accumulate_pelt_segments+0x24/0x33
> > [  236.814390]  ? __accumulate_pelt_segments+0x24/0x33
> > [  236.814405]  ? xhci_irq+0x3294/0x3294 [xhci_hcd]
> > [  236.814412]  __handle_irq_event_percpu+0x149/0x3db
> > [  236.814421]  handle_irq_event_percpu+0x65/0x109
> > [  236.814428]  ? __handle_irq_event_percpu+0x3db/0x3db
> > [  236.814436]  ? ttwu_do_wakeup.isra.18+0x3a2/0x3ce
> > [  236.814442]  handle_irq_event+0xa8/0x10a
> > [  236.814449]  handle_edge_irq+0x4b2/0x538
> > [  236.814458]  handle_irq+0x3e/0x45
> > [  236.814465]  do_IRQ+0x5c/0x126
> > [  236.814474]  common_interrupt+0x7a/0x7a
> > [  236.814478]  
> > [  236.814483] RIP: 0023:0xf79d3d82
> > [  236.814486] RSP: 002b:ffc588e8 EFLAGS: 00200282 ORIG_RAX: 
> > ffdc
> > [  236.814493] RAX:  RBX: f7bebd5c RCX: 
> > 
> > [  236.814496] RDX: 08d4197c RSI:  RDI: 
> > f746c020
> > [  236.814499] RBP: ffc588e8 R08:  R09: 
> > 
> > [  236.814503] R10:  R11: 00200206 R12: 
> > 
> > [  236.814506] R13:  R14:  R15: 
> > 
> > 
> > [  236.814513] Allocated by task 2082:
> > [  236.814521]  kasan_kmalloc.part.1+0x51/0xc7
> > [  236.814526]  kmem_cache_alloc_trace+0x178/0x187
> > [  236.814540]  xhci_segment_alloc.isra.11+0x9d/0x3bf [xhci_hcd]
> > [  236.814553]  xhci_alloc_segments_for_ring+0x9e/0x176 [xhci_hcd]
> > [  236.814566]  xhci_ring_alloc.constprop.16+0x197/0x4ba [xhci_hcd]
> > [  236.814579]  xhci_endpoint_init+0x77a/0x9ba [xhci_hcd]
> > [  236.814592]  xhci_add_endpoint+0x3bc/0x43b [xhci_hcd]
> > [  236.814615]  usb_hcd_alloc_bandwidth+0x7ef/0x857 [usbcore]
> > [  236.814637]  usb_set_interface+0x294/0x681 [usbcore]
> 

Re: convert parisc to the generic dma-noncoherent code

2018-07-17 Thread Christoph Hellwig
On Sun, Jul 15, 2018 at 12:29:37PM -0400, John David Anglin wrote:
> On 2018-07-13 4:14 AM, Helge Deller wrote:
>> On 11.07.2018 17:34, Christoph Hellwig wrote:
>>> ping?  Any comments?
>> I applied those 3 patches on top of git head, and booted the
>> 32-bit kernel on a HP 715/64 (PCX-L) and a HP B160L (PCX-L2).
>> On both machines I had problems with those drivers which use
>> DMA (I checked specifically the lasi NIC card, which is the
>> main onboard NIC card in both machines).
>> Getting IP via DHCP was unreliable, pings on the same network
>> to both machines gave lost packets, login via ssh sometimes failed
>> and so on.
>>
>> So, there is definitively some cache-flush missing in this patchset.
> Possibly, the code should use flush_kernel_vmap_range() as it purges the 
> TLB entries used
> for the cache flush.  Some of the routines in pci-dma.c currently use it.

That might be worth a try and is already discussed in my description.

Also it might be good to check that the last patch really is the culprit,
I'm pretty certain it is, but a double check never hurts.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: usb HC busted?

2018-07-17 Thread Greg KH
On Tue, Jul 17, 2018 at 12:41:04PM +0100, Sudip Mukherjee wrote:
> Hi Mathias,
> 
> On Sat, Jun 30, 2018 at 10:07:04PM +0100, Sudip Mukherjee wrote:
> > Hi Mathias,
> > 
> > On Fri, Jun 29, 2018 at 02:41:13PM +0300, Mathias Nyman wrote:
> > > On 27.06.2018 14:59, Sudip Mukherjee wrote:
> > > > > > Can you share a bit more details on the platform you are using, and 
> > > > > > what types of test you are running.
> > > > > 
> 
> > Then to track what is going on, I added the slub debugging and :(
> > I have attached part of dmesg for you to check.
> > Will appreciate your help in finding out the problem.
> 
> I did some more debugging. Tested with a KASAN enabled kernel and that
> shows the problem. The report is attached.
> 
> To my understanding:
> 
> btusb_work() is calling usb_set_interface() with alternate = 0. which
> again calls usb_hcd_alloc_bandwidth() and that frees the rings by
> xhci_free_endpoint_ring(). But then usb_set_interface() continues and
> calls usb_disable_interface() -> usb_hcd_flush_endpoint()->unlink1()->
> xhci_urb_dequeue() which at the end gives the command to stop endpoint.
> 
> In all the cycles I have tested I see that only in the fail case
> handle_cmd_completion() gets called, but in the cycles where the error
> is not there handle_cmd_completion() is not called with that command.
> 
> I am not sure what is happening, and you are the best person to understand
> what is happening. :)
> 
> But for now (untill you are back from holiday and suggest a proper solution),
> I made a hacky patch (attached) which is working and I donot get any
> corruption after that. Both KASAN and slub debug are also happy.
> 
> So, now waiting for you to analyze what is going on and suggest a proper
> fix.
> 
> Thanks in advance.
> 
> --
> Regards
> Sudip

> [  236.814156] 
> ==
> [  236.814187] BUG: KASAN: use-after-free in xhci_trb_virt_to_dma+0x2e/0x74 
> [xhci_hcd]
> [  236.814193] Read of size 8 at addr 8800789329c8 by task weston/138
> 
> [  236.814203] CPU: 0 PID: 138 Comm: weston Tainted: G U  W  O
> 4.14.47-20180606+ #7
> [  236.814206] Hardware name: xxx, BIOS 2017.01-00087-g43e04de 08/30/2017
> [  236.814209] Call Trace:
> [  236.814214]  
> [  236.814226]  dump_stack+0x46/0x59
> [  236.814238]  print_address_description+0x6b/0x23b
> [  236.814255]  ? xhci_trb_virt_to_dma+0x2e/0x74 [xhci_hcd]
> [  236.814262]  kasan_report+0x220/0x246
> [  236.814278]  xhci_trb_virt_to_dma+0x2e/0x74 [xhci_hcd]
> [  236.814294]  trb_in_td+0x3b/0x1cd [xhci_hcd]
> [  236.814311]  handle_cmd_completion+0x1181/0x2c9b [xhci_hcd]
> [  236.814329]  ? xhci_queue_new_dequeue_state+0x5d9/0x5d9 [xhci_hcd]
> [  236.814337]  ? drm_handle_vblank+0x4ec/0x590
> [  236.814352]  xhci_irq+0x529/0x3294 [xhci_hcd]
> [  236.814362]  ? __accumulate_pelt_segments+0x24/0x33
> [  236.814378]  ? finish_td.isra.40+0x223/0x223 [xhci_hcd]
> [  236.814384]  ? __accumulate_pelt_segments+0x24/0x33
> [  236.814390]  ? __accumulate_pelt_segments+0x24/0x33
> [  236.814405]  ? xhci_irq+0x3294/0x3294 [xhci_hcd]
> [  236.814412]  __handle_irq_event_percpu+0x149/0x3db
> [  236.814421]  handle_irq_event_percpu+0x65/0x109
> [  236.814428]  ? __handle_irq_event_percpu+0x3db/0x3db
> [  236.814436]  ? ttwu_do_wakeup.isra.18+0x3a2/0x3ce
> [  236.814442]  handle_irq_event+0xa8/0x10a
> [  236.814449]  handle_edge_irq+0x4b2/0x538
> [  236.814458]  handle_irq+0x3e/0x45
> [  236.814465]  do_IRQ+0x5c/0x126
> [  236.814474]  common_interrupt+0x7a/0x7a
> [  236.814478]  
> [  236.814483] RIP: 0023:0xf79d3d82
> [  236.814486] RSP: 002b:ffc588e8 EFLAGS: 00200282 ORIG_RAX: 
> ffdc
> [  236.814493] RAX:  RBX: f7bebd5c RCX: 
> 
> [  236.814496] RDX: 08d4197c RSI:  RDI: 
> f746c020
> [  236.814499] RBP: ffc588e8 R08:  R09: 
> 
> [  236.814503] R10:  R11: 00200206 R12: 
> 
> [  236.814506] R13:  R14:  R15: 
> 
> 
> [  236.814513] Allocated by task 2082:
> [  236.814521]  kasan_kmalloc.part.1+0x51/0xc7
> [  236.814526]  kmem_cache_alloc_trace+0x178/0x187
> [  236.814540]  xhci_segment_alloc.isra.11+0x9d/0x3bf [xhci_hcd]
> [  236.814553]  xhci_alloc_segments_for_ring+0x9e/0x176 [xhci_hcd]
> [  236.814566]  xhci_ring_alloc.constprop.16+0x197/0x4ba [xhci_hcd]
> [  236.814579]  xhci_endpoint_init+0x77a/0x9ba [xhci_hcd]
> [  236.814592]  xhci_add_endpoint+0x3bc/0x43b [xhci_hcd]
> [  236.814615]  usb_hcd_alloc_bandwidth+0x7ef/0x857 [usbcore]
> [  236.814637]  usb_set_interface+0x294/0x681 [usbcore]
> [  236.814645]  btusb_work+0x2e6/0x981 [btusb]
> [  236.814651]  process_one_work+0x579/0x9e9
> [  236.814656]  worker_thread+0x68f/0x804
> [  236.814662]  kthread+0x31c/0x32b
> [  236.814668]  ret_from_fork+0x35/0x40
> 
> [  236.814672] Freed by task 1533:
> [  236.814678]  

Re: usb HC busted?

2018-07-17 Thread Sudip Mukherjee
Hi Mathias,

On Sat, Jun 30, 2018 at 10:07:04PM +0100, Sudip Mukherjee wrote:
> Hi Mathias,
> 
> On Fri, Jun 29, 2018 at 02:41:13PM +0300, Mathias Nyman wrote:
> > On 27.06.2018 14:59, Sudip Mukherjee wrote:
> > > > > Can you share a bit more details on the platform you are using, and 
> > > > > what types of test you are running.
> > > > 

> Then to track what is going on, I added the slub debugging and :(
> I have attached part of dmesg for you to check.
> Will appreciate your help in finding out the problem.

I did some more debugging. Tested with a KASAN enabled kernel and that
shows the problem. The report is attached.

To my understanding:

btusb_work() is calling usb_set_interface() with alternate = 0. which
again calls usb_hcd_alloc_bandwidth() and that frees the rings by
xhci_free_endpoint_ring(). But then usb_set_interface() continues and
calls usb_disable_interface() -> usb_hcd_flush_endpoint()->unlink1()->
xhci_urb_dequeue() which at the end gives the command to stop endpoint.

In all the cycles I have tested I see that only in the fail case
handle_cmd_completion() gets called, but in the cycles where the error
is not there handle_cmd_completion() is not called with that command.

I am not sure what is happening, and you are the best person to understand
what is happening. :)

But for now (untill you are back from holiday and suggest a proper solution),
I made a hacky patch (attached) which is working and I donot get any
corruption after that. Both KASAN and slub debug are also happy.

So, now waiting for you to analyze what is going on and suggest a proper
fix.

Thanks in advance.

--
Regards
Sudip
[  236.814156] 
==
[  236.814187] BUG: KASAN: use-after-free in xhci_trb_virt_to_dma+0x2e/0x74 
[xhci_hcd]
[  236.814193] Read of size 8 at addr 8800789329c8 by task weston/138

[  236.814203] CPU: 0 PID: 138 Comm: weston Tainted: G U  W  O
4.14.47-20180606+ #7
[  236.814206] Hardware name: xxx, BIOS 2017.01-00087-g43e04de 08/30/2017
[  236.814209] Call Trace:
[  236.814214]  
[  236.814226]  dump_stack+0x46/0x59
[  236.814238]  print_address_description+0x6b/0x23b
[  236.814255]  ? xhci_trb_virt_to_dma+0x2e/0x74 [xhci_hcd]
[  236.814262]  kasan_report+0x220/0x246
[  236.814278]  xhci_trb_virt_to_dma+0x2e/0x74 [xhci_hcd]
[  236.814294]  trb_in_td+0x3b/0x1cd [xhci_hcd]
[  236.814311]  handle_cmd_completion+0x1181/0x2c9b [xhci_hcd]
[  236.814329]  ? xhci_queue_new_dequeue_state+0x5d9/0x5d9 [xhci_hcd]
[  236.814337]  ? drm_handle_vblank+0x4ec/0x590
[  236.814352]  xhci_irq+0x529/0x3294 [xhci_hcd]
[  236.814362]  ? __accumulate_pelt_segments+0x24/0x33
[  236.814378]  ? finish_td.isra.40+0x223/0x223 [xhci_hcd]
[  236.814384]  ? __accumulate_pelt_segments+0x24/0x33
[  236.814390]  ? __accumulate_pelt_segments+0x24/0x33
[  236.814405]  ? xhci_irq+0x3294/0x3294 [xhci_hcd]
[  236.814412]  __handle_irq_event_percpu+0x149/0x3db
[  236.814421]  handle_irq_event_percpu+0x65/0x109
[  236.814428]  ? __handle_irq_event_percpu+0x3db/0x3db
[  236.814436]  ? ttwu_do_wakeup.isra.18+0x3a2/0x3ce
[  236.814442]  handle_irq_event+0xa8/0x10a
[  236.814449]  handle_edge_irq+0x4b2/0x538
[  236.814458]  handle_irq+0x3e/0x45
[  236.814465]  do_IRQ+0x5c/0x126
[  236.814474]  common_interrupt+0x7a/0x7a
[  236.814478]  
[  236.814483] RIP: 0023:0xf79d3d82
[  236.814486] RSP: 002b:ffc588e8 EFLAGS: 00200282 ORIG_RAX: 
ffdc
[  236.814493] RAX:  RBX: f7bebd5c RCX: 
[  236.814496] RDX: 08d4197c RSI:  RDI: f746c020
[  236.814499] RBP: ffc588e8 R08:  R09: 
[  236.814503] R10:  R11: 00200206 R12: 
[  236.814506] R13:  R14:  R15: 

[  236.814513] Allocated by task 2082:
[  236.814521]  kasan_kmalloc.part.1+0x51/0xc7
[  236.814526]  kmem_cache_alloc_trace+0x178/0x187
[  236.814540]  xhci_segment_alloc.isra.11+0x9d/0x3bf [xhci_hcd]
[  236.814553]  xhci_alloc_segments_for_ring+0x9e/0x176 [xhci_hcd]
[  236.814566]  xhci_ring_alloc.constprop.16+0x197/0x4ba [xhci_hcd]
[  236.814579]  xhci_endpoint_init+0x77a/0x9ba [xhci_hcd]
[  236.814592]  xhci_add_endpoint+0x3bc/0x43b [xhci_hcd]
[  236.814615]  usb_hcd_alloc_bandwidth+0x7ef/0x857 [usbcore]
[  236.814637]  usb_set_interface+0x294/0x681 [usbcore]
[  236.814645]  btusb_work+0x2e6/0x981 [btusb]
[  236.814651]  process_one_work+0x579/0x9e9
[  236.814656]  worker_thread+0x68f/0x804
[  236.814662]  kthread+0x31c/0x32b
[  236.814668]  ret_from_fork+0x35/0x40

[  236.814672] Freed by task 1533:
[  236.814678]  kasan_slab_free+0xb3/0x15e
[  236.814683]  kfree+0x103/0x1a9
[  236.814696]  xhci_ring_free+0x205/0x286 [xhci_hcd]
[  236.814709]  xhci_free_endpoint_ring+0x4d/0x83 [xhci_hcd]
[  236.814722]  xhci_check_bandwidth+0x57b/0x65a [xhci_hcd]
[  236.814743]  usb_hcd_alloc_bandwidth+0x665/0x857 [usbcore]
[  

Re: [PATCH 04/16] iommu: sva: Add support for private PASIDs

2018-07-17 Thread Jean-Philippe Brucker
Hi Jordan,

Thanks for the patches, I finally got around testing them with SMMUv3.
It's an important feature, arguably more than SVA itself. I could pick
this one as part of the SVA series, what do you think?

Although I probably would have done the same, I dislike the interface
because it forces us to duplicate functions and IOMMU ops. The list is
small but growing:

iommu_map
iommu_map_sg
iommu_unmap
iommu_unmap_fast
iommu_iova_to_phys
iommu_tlb_range_add
iommu_flush_tlb_all

Each of these and their associated IOMMU op will have an iommu_sva_X
counterpart that takes one different argument. Modifying these functions
to take both a domain and a PASID argument would be more elegant. Or as
an intermediate solution, perhaps we could only change the IOMMU ops to
take an additional argument, like you did for map_sg?

In any case it requires invasive changes in lots of drivers and we can
always tidy up later, so unless Joerg has a preference I'd keep the
duplicates for now.

However, having to lookup pasid-to-io_mm on every map/unmap call is
cumbersome, especially since map/unmap are supposed to be as fast as
possible. iommu_sva_alloc_pasid should return a structure representing
the PASID instead of the value alone. The io_mm structure seems like a
good fit, and the device driver can access io_mm->pasid directly or via
an io_mm_get_pasid() function.

The new functions would then be:

struct io_mm *iommu_sva_alloc_pasid(domain, dev)
void iommu_sva_free_pasid(domain, io_mm)

int iommu_sva_map(io_mm, iova, paddr, size, prot)
size_t iommu_map_sg(io_mm, iova, sg, nents, prot)
size_t iommu_sva_unmap(io_mm, iova, size)
size_t iommu_sva_unmap_fast(io_mm, iova, size)
phys_addr_t iommu_sva_iova_to_phys(io_mm, iova)
void iommu_sva_flush_tlb_all(io_mm)
void iommu_sva_tlb_range_add(io_mm, iova, size)

A few more comments inline

On 18/05/18 22:34, Jordan Crouse wrote:
> Some older SMMU implementations that do not have a fully featured
> hardware PASID features have alternate workarounds for using multiple
> pagetables. For example, MSM GPUs have logic to automatically switch the
> user pagetable from hardware by writing the context bank directly.

The comment may be a bit too specific, sva_map/sva_unmap is also useful
for PASID-capable IOMMUs

> Support private PASIDs by creating a new io-pgtable instance map it
> to a PASID and provide the APIs for drivers to populate it manually.
> 
> Signed-off-by: Jordan Crouse 
> ---
[...]
> +int iommu_sva_alloc_pasid(struct iommu_domain *domain, struct device *dev)
> +{
> + int ret, pasid;
> + struct io_mm *io_mm;
> + struct iommu_sva_param *param = dev->iommu_param->sva_param;

We need a NULL check on the param, to ensure that the driver called
sva_device_init first.

> +
> + if (!domain->ops->mm_attach || !domain->ops->mm_detach)
> + return -ENODEV;
> +
> + if (domain->ops->mm_alloc)

I'd rather make mm_alloc and mm_free mandatory, but if we do make them
optional, then we need to check that both mm_alloc and mm_free are
present, or both absent.

> + io_mm = domain->ops->mm_alloc(domain, NULL, 0);
> + else
> + io_mm = kzalloc(sizeof(*io_mm), GFP_KERNEL);
> +
> + if (IS_ERR(io_mm))
> + return PTR_ERR(io_mm);
> + if (!io_mm)
> + return -ENOMEM;
> +
> + io_mm->domain = domain;
> + io_mm->type = IO_TYPE_PRIVATE;

This could be a IOMMU_SVA_FEAT_PRIVATE flag

> +
> + idr_preload(GFP_KERNEL);
> + spin_lock(_sva_lock);
> + pasid = idr_alloc_cyclic(_pasid_idr, io_mm, param->min_pasid,
> + param->max_pasid + 1, GFP_ATOMIC);
> + io_mm->pasid = pasid;
> + spin_unlock(_sva_lock);
> + idr_preload_end();
> +
> + if (pasid < 0) {
> + kfree(io_mm);
> + return pasid;
> + }
> +
> + ret = domain->ops->mm_attach(domain, dev, io_mm, false);

attach_domain should be true, otherwise the SMMUv3 driver won't write
the PASID table. But we should probably go through io_mm_attach here, to
make sure that PASID contexts are added to the mm list and cleaned up by
unbind_dev_all()

> +size_t iommu_sva_unmap(int pasid, unsigned long iova, size_t size)
> +{
> + struct io_mm *io_mm = get_io_mm(pasid);
> +
> + if (!io_mm || io_mm->type != IO_TYPE_PRIVATE)
> + return -ENODEV;
> +
> + return __iommu_unmap(io_mm->domain, , iova, size, false);

sync must be true here, and false in the unmap_fast() variant

> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_unmap);
> +
> +void iommu_sva_free_pasid(int pasid, struct device *dev)
> +{
> + struct io_mm *io_mm = get_io_mm(pasid);
> + struct iommu_domain *domain;
> +
> + if (!io_mm || io_mm->type != IO_TYPE_PRIVATE)
> + return;
> +
> + domain = io_mm->domain;
> +
> + domain->ops->mm_detach(domain, dev, io_mm, false);

Here too detach_domain should be true

> @@ -1841,16 +1854,23 @@ int iommu_map(struct iommu_domain *domain, unsigned 
> long iova,
>  
>   /* unroll 

Re: [PATCH v12 3/4] iommu/arm-smmu: Add the device_link between masters and smmu

2018-07-17 Thread Vivek Gautam




On 7/17/2018 1:16 PM, Rafael J. Wysocki wrote:

On Mon, Jul 16, 2018 at 1:46 PM, Vivek Gautam
 wrote:


On 7/16/2018 2:25 PM, Rafael J. Wysocki wrote:

On Thu, Jul 12, 2018 at 2:41 PM, Vivek Gautam
 wrote:

Hi Rafael,


On Wed, Jul 11, 2018 at 4:06 PM, Vivek Gautam
 wrote:

Hi Rafael,



On 7/11/2018 3:23 PM, Rafael J. Wysocki wrote:

On Sunday, July 8, 2018 7:34:12 PM CEST Vivek Gautam wrote:

From: Sricharan R 

Finally add the device link between the master device and
smmu, so that the smmu gets runtime enabled/disabled only when the
master needs it. This is done from add_device callback which gets
called once when the master is added to the smmu.

Signed-off-by: Sricharan R 
Signed-off-by: Vivek Gautam 
Reviewed-by: Tomasz Figa 
Cc: Rafael J. Wysocki 
Cc: Lukas Wunner 
---

- Change since v11
  * Replaced DL_FLAG_AUTOREMOVE flag with
DL_FLAG_AUTOREMOVE_SUPPLIER.

drivers/iommu/arm-smmu.c | 12 
1 file changed, 12 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 09265e206e2d..916cde4954d2 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1461,8 +1461,20 @@ static int arm_smmu_add_device(struct device
*dev)
  iommu_device_link(>iommu, dev);
+ if (pm_runtime_enabled(smmu->dev) &&

Why does the creation of the link depend on whether or not runtime PM
is enabled for the MMU device?


The main purpose of this device link is to handle the runtime PM
synchronization
between the supplier (iommu) and consumer (client devices, such as
GPU/display).
Moreover, the runtime pm is conditionally enabled for smmu devices that
support
such [1].

Is there something you would like me to modify in this patch?

Not really, as long as you are sure that it is correct. :-)

You need to remember, however, that if you add system-wide PM
callbacks to the driver, the ordering between them and the client
device callbacks during system-wide suspend matters as well.  Don't
you need the link the ensure the correct system-wide suspend ordering
too?


The fact that currently we handle clocks only through runtime pm callbacks,
would it be better to call runtime pm put/get in system-wide PM callbacks.
This would be same as i mentioned in the other thread.

Well, my point is that there's no reason for system-wide suspend to
depend directly on runtime PM (which can be effectively disabled by
user space as mentioned for multiple times in this thread).

It simply is not efficient to let the clock run while the system as a
whole is sleeping (even if power/control has been set to "on" for this
particular device) and it should not be too hard to prevent that from
happening.  You may need an additional flag in the driver for that or
similar, but it definitely should be doable.


Right, I will modify the things are required. Thanks again for pointing 
this out.


Best regards
Vivek



Now, that's my advice and I'm not the maintainer of that code, so it
is your call (as long as the maintainer agrees with it).

Thanks,
Rafael


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v12 3/4] iommu/arm-smmu: Add the device_link between masters and smmu

2018-07-17 Thread Rafael J. Wysocki
On Mon, Jul 16, 2018 at 1:46 PM, Vivek Gautam
 wrote:
>
>
> On 7/16/2018 2:25 PM, Rafael J. Wysocki wrote:
>>
>> On Thu, Jul 12, 2018 at 2:41 PM, Vivek Gautam
>>  wrote:
>>>
>>> Hi Rafael,
>>>
>>>
>>> On Wed, Jul 11, 2018 at 4:06 PM, Vivek Gautam
>>>  wrote:

 Hi Rafael,



 On 7/11/2018 3:23 PM, Rafael J. Wysocki wrote:
>
> On Sunday, July 8, 2018 7:34:12 PM CEST Vivek Gautam wrote:
>>
>> From: Sricharan R 
>>
>> Finally add the device link between the master device and
>> smmu, so that the smmu gets runtime enabled/disabled only when the
>> master needs it. This is done from add_device callback which gets
>> called once when the master is added to the smmu.
>>
>> Signed-off-by: Sricharan R 
>> Signed-off-by: Vivek Gautam 
>> Reviewed-by: Tomasz Figa 
>> Cc: Rafael J. Wysocki 
>> Cc: Lukas Wunner 
>> ---
>>
>>- Change since v11
>>  * Replaced DL_FLAG_AUTOREMOVE flag with
>> DL_FLAG_AUTOREMOVE_SUPPLIER.
>>
>>drivers/iommu/arm-smmu.c | 12 
>>1 file changed, 12 insertions(+)
>>
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index 09265e206e2d..916cde4954d2 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -1461,8 +1461,20 @@ static int arm_smmu_add_device(struct device
>> *dev)
>>  iommu_device_link(>iommu, dev);
>>+ if (pm_runtime_enabled(smmu->dev) &&
>
> Why does the creation of the link depend on whether or not runtime PM
> is enabled for the MMU device?


 The main purpose of this device link is to handle the runtime PM
 synchronization
 between the supplier (iommu) and consumer (client devices, such as
 GPU/display).
 Moreover, the runtime pm is conditionally enabled for smmu devices that
 support
 such [1].
>>>
>>> Is there something you would like me to modify in this patch?
>>
>> Not really, as long as you are sure that it is correct. :-)
>>
>> You need to remember, however, that if you add system-wide PM
>> callbacks to the driver, the ordering between them and the client
>> device callbacks during system-wide suspend matters as well.  Don't
>> you need the link the ensure the correct system-wide suspend ordering
>> too?
>
>
> The fact that currently we handle clocks only through runtime pm callbacks,
> would it be better to call runtime pm put/get in system-wide PM callbacks.
> This would be same as i mentioned in the other thread.

Well, my point is that there's no reason for system-wide suspend to
depend directly on runtime PM (which can be effectively disabled by
user space as mentioned for multiple times in this thread).

It simply is not efficient to let the clock run while the system as a
whole is sleeping (even if power/control has been set to "on" for this
particular device) and it should not be too hard to prevent that from
happening.  You may need an additional flag in the driver for that or
similar, but it definitely should be doable.

Now, that's my advice and I'm not the maintainer of that code, so it
is your call (as long as the maintainer agrees with it).

Thanks,
Rafael
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu