Re: [PATCH 2/8] usb: gadget: add anonymous definition in some struct for trace purpose

2023-09-14 Thread Alan Stern
On Fri, Sep 15, 2023 at 09:02:48AM +0800, Linyu Yuan wrote:
> 
> On 9/14/2023 10:54 PM, Alan Stern wrote:
> > You didn't include the version number in the Subject: line.  Undoubtedly
> > Greg's automatic error checker will warn you about this.  Unless the
> > version number is clearly marked for each patch, it's difficult for his
> > programs to tell which email message contains the most recent version.
> > 
> > On Thu, Sep 14, 2023 at 06:02:56PM +0800, Linyu Yuan wrote:
> > > Some UDC trace event will save usb udc information, but it use one int
> > > size buffer to save one bit information of usb udc, it is wast trace
> > > buffer.
> > > 
> > > Add anonymous union which have one u32 member can be used by trace event
> > > during fast assign stage to save more entries with same trace ring buffer
> > > size.
> > > 
> > > Signed-off-by: Linyu Yuan 
> > > ---
> > And you didn't include the version change information here, below the
> > "---" line.
> > 
> > Apart from that, this is a _lot_ better than before!  I don't know if
> > Greg will think this change is worth merging, but at least now it's
> > possible to read the code and understand what's going on.
> 
> 
> according Steven's comment, maybe will always save data in little endian at
> trace event
> 
> fast assign stage.
> 
> it will add definition of bit field back.

Yes, that would be even better because you wouldn't have to change the 
definition of struct usb_gadget or struct usb_endpoint at all.  The fast 
assign stage can simply do:

__entry->dw1 = (g->sg_supported << 0) |
(g->is_otg << 1) |
...

and then you can easily access the individual bits in __entry.  It 
wouldn't be as fast but it would still save a lot of space.

Alan Stern


Re: [PATCH 2/8] usb: gadget: add anonymous definition in some struct for trace purpose

2023-09-14 Thread Alan Stern
You didn't include the version number in the Subject: line.  Undoubtedly 
Greg's automatic error checker will warn you about this.  Unless the 
version number is clearly marked for each patch, it's difficult for his 
programs to tell which email message contains the most recent version.

On Thu, Sep 14, 2023 at 06:02:56PM +0800, Linyu Yuan wrote:
> Some UDC trace event will save usb udc information, but it use one int
> size buffer to save one bit information of usb udc, it is wast trace
> buffer.
> 
> Add anonymous union which have one u32 member can be used by trace event
> during fast assign stage to save more entries with same trace ring buffer
> size.
> 
> Signed-off-by: Linyu Yuan 
> ---

And you didn't include the version change information here, below the 
"---" line.

Apart from that, this is a _lot_ better than before!  I don't know if 
Greg will think this change is worth merging, but at least now it's 
possible to read the code and understand what's going on.

Alan Stern


Re: [PATCH 10/19] USB: gadget/legacy: remove sb_mutex

2023-09-13 Thread Alan Stern
On Wed, Sep 13, 2023 at 08:10:04AM -0300, Christoph Hellwig wrote:
> Creating new a new super_block vs freeing the old one for single instance
> file systems is serialized by the wait for SB_DEAD.
> 
> Remove the superfluous sb_mutex.
> 
> Signed-off-by: Christoph Hellwig 
> ---

You might mention that this is essentially a reversion of commit 
d18dcfe9860e ("USB: gadgetfs: Fix race between mounting and 
unmounting").

Alan Stern

>  drivers/usb/gadget/legacy/inode.c | 6 --
>  1 file changed, 6 deletions(-)
> 
> diff --git a/drivers/usb/gadget/legacy/inode.c 
> b/drivers/usb/gadget/legacy/inode.c
> index ce9e31f3d26bcc..a203266bc0dc82 100644
> --- a/drivers/usb/gadget/legacy/inode.c
> +++ b/drivers/usb/gadget/legacy/inode.c
> @@ -229,7 +229,6 @@ static void put_ep (struct ep_data *data)
>   */
>  
>  static const char *CHIP;
> -static DEFINE_MUTEX(sb_mutex);   /* Serialize superblock 
> operations */
>  
>  /*--*/
>  
> @@ -2012,8 +2011,6 @@ gadgetfs_fill_super (struct super_block *sb, struct 
> fs_context *fc)
>   struct dev_data *dev;
>   int rc;
>  
> - mutex_lock(_mutex);
> -
>   if (the_device) {
>   rc = -ESRCH;
>   goto Done;
> @@ -2069,7 +2066,6 @@ gadgetfs_fill_super (struct super_block *sb, struct 
> fs_context *fc)
>   rc = -ENOMEM;
>  
>   Done:
> - mutex_unlock(_mutex);
>   return rc;
>  }
>  
> @@ -2092,7 +2088,6 @@ static int gadgetfs_init_fs_context(struct fs_context 
> *fc)
>  static void
>  gadgetfs_kill_sb (struct super_block *sb)
>  {
> - mutex_lock(_mutex);
>   kill_litter_super (sb);
>   if (the_device) {
>   put_dev (the_device);
> @@ -2100,7 +2095,6 @@ gadgetfs_kill_sb (struct super_block *sb)
>   }
>   kfree(CHIP);
>   CHIP = NULL;
> - mutex_unlock(_mutex);
>  }
>  
>  /*--*/
> -- 
> 2.39.2
> 


Re: [PATCH] USB: Add reset-resume quirk for WD19's Realtek Hub

2021-04-20 Thread Alan Stern
On Wed, Apr 21, 2021 at 01:46:51AM +0800, chris.c...@canonical.com wrote:
> From: Chris Chiu 
> 
> Realtek Hub (0bda:5487) in Dell Dock WD19 sometimes fails to work
> after the system resumes from suspend with remote wakeup enabled
> device connected:
> [ 1947.640907] hub 5-2.3:1.0: hub_ext_port_status failed (err = -71)
> [ 1947.641208] usb 5-2.3-port5: cannot disable (err = -71)
> [ 1947.641401] hub 5-2.3:1.0: hub_ext_port_status failed (err = -71)
> [ 1947.641450] usb 5-2.3-port4: cannot reset (err = -71)
> 
> Information of this hub:
> T:  Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 10 Spd=480  MxCh= 5
> D:  Ver= 2.10 Cls=09(hub  ) Sub=00 Prot=02 MxPS=64 #Cfgs=  1
> P:  Vendor=0bda ProdID=5487 Rev= 1.47
> S:  Manufacturer=Dell Inc.
> S:  Product=Dell dock
> C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=  0mA
> I:  If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=01 Driver=hub
> E:  Ad=81(I) Atr=03(Int.) MxPS=   1 Ivl=256ms
> I:* If#= 0 Alt= 1 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=02 Driver=hub
> E:  Ad=81(I) Atr=03(Int.) MxPS=   1 Ivl=256ms
> 
> The failure results from the ETIMEDOUT by chance when turning on
> the suspend feature for the specified port of the hub. The port
> seems to be in an unknown state so the hub_activate during resume
> fails the hub_port_status, then the hub will fail to work.
> 
> The quirky hub needs the reset-resume quirk to function correctly.
> 
> Signed-off-by: Chris Chiu 
> ---
>  drivers/usb/core/quirks.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/usb/core/quirks.c b/drivers/usb/core/quirks.c
> index 76ac5d6555ae..4e2483e34250 100644
> --- a/drivers/usb/core/quirks.c
> +++ b/drivers/usb/core/quirks.c
> @@ -406,6 +406,7 @@ static const struct usb_device_id usb_quirk_list[] = {
>  
>   /* Realtek hub in Dell WD19 (Type-C) */
>   { USB_DEVICE(0x0bda, 0x0487), .driver_info = USB_QUIRK_NO_LPM },
> + { USB_DEVICE(0x0bda, 0x5487), .driver_info = USB_QUIRK_RESET_RESUME },
>  
>   /* Generic RTL8153 based ethernet adapters */
>   { USB_DEVICE(0x0bda, 0x8153), .driver_info = USB_QUIRK_NO_LPM },

Acked-by: Alan Stern 


Re: [PATCH v3] USB: Don't set USB_PORT_FEAT_SUSPEND on WD19's Realtek Hub

2021-04-20 Thread Alan Stern
On Tue, Apr 20, 2021 at 03:14:56PM +0800, Chris Chiu wrote:
> On Mon, Apr 19, 2021 at 10:19 PM Alan Stern  wrote:
> >
> > On Mon, Apr 19, 2021 at 01:11:38AM -0400, Chris Chiu wrote:
> > > Sorry that I didn't make myself clear. I found that if I applied 
> > > RESET_RESUME
> > > quirk on the problematic hub, the Set-Port-Feature(suspend) timeout error
> > > disappeared. SInce the timeout is not happening for each suspend by 
> > > default,
> > > I suspect maybe reset-resume take everything back to clean state for the 
> > > hub
> > > and the Set-Port-Feature(suspend) can be taken care of w/o problems.
> >
> > Okay, that's a good solution for system suspend.
> >
> > > I didn't like RESET_RESUME because runtime PM would not work on the 
> > > quirked
> > > device.
> >
> > A more interesting question is whether it will work for devices plugged
> > into the hub.  Even though the hub won't be runtime suspended, the
> > things attached to it might be.
> >
> > >  But if the Set-Port-Feature(suspend) can't be handled and
> > > skipped, I can't
> > > expect the runtime PM to work for all devices connected to the hub either.
> > > Is that right? If what I proposed in the patch can not get better
> > > result than existing
> > > quirk, I think using the RESET_RESUME would be a better option. Any 
> > > suggestions?
> >
> > Try the RESET_RESUME quirk and see how well it works with runtime
> > suspend.
> >
> > Alan Stern
> 
> [  453.064346] usb 3-4: finish reset-resume
> [  453.192387] usb 3-4: reset high-speed USB device number 2 using xhci_hcd
> [  453.339916] usb 3-4: USB quirks for this device: 2

Here 3-4 is problematic RealTek hub, right?

> Seems that even w/ the RESET_RESUME enabled, the connected device still
> can runtime suspend/resume. That's acceptable to me. I'll send the patch
> with the reset-resume quirk later.
> 
> [  626.081068] usb 3-4.3.1: usb auto-suspend, wakeup 0
> [  632.552071] usb 3-4.3.1: usb auto-resume
> [  632.617467] usb 3-4.3.1: Waited 0ms for CONNECT
> [  632.617471] usb 3-4.3.1: finish resume

Then 3-4.3 is another hub plugged into the Realtek hub, and 3-4.3.1 (the 
device being suspended and resumed) is plugged into that other hub.

I'm concerned about devices that are plugged directly into the Realtek 
hub.  For example, did you try allowing the 3-4.3 hub in the experiment 
above to suspend and resume?

Alan Stern


Re: [PATCH] usb: storage: datafab: remove redundant assignment of variable result

2021-04-20 Thread Alan Stern
On Tue, Apr 20, 2021 at 12:38:18PM +0100, Colin King wrote:
> From: Colin Ian King 
> 
> The variable result is being assigned with a value that is
> never read, the assignment is redundant and can be removed.
> 
> Addresses-Coverity: ("Unused value")
> Signed-off-by: Colin Ian King 
> ---

Acked-by: Alan Stern 

>  drivers/usb/storage/datafab.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/usb/storage/datafab.c b/drivers/usb/storage/datafab.c
> index 588818483f4b..bcc4a2fad863 100644
> --- a/drivers/usb/storage/datafab.c
> +++ b/drivers/usb/storage/datafab.c
> @@ -294,7 +294,6 @@ static int datafab_write_data(struct us_data *us,
>   if (reply[0] != 0x50 && reply[1] != 0) {
>   usb_stor_dbg(us, "Gah! write return code: %02x %02x\n",
>reply[0], reply[1]);
> - result = USB_STOR_TRANSPORT_ERROR;
>   goto leave;
>   }
>  
> -- 
> 2.30.2
> 


Re: [PATCH v3] USB: Don't set USB_PORT_FEAT_SUSPEND on WD19's Realtek Hub

2021-04-19 Thread Alan Stern
On Mon, Apr 19, 2021 at 01:11:38AM -0400, Chris Chiu wrote:
> Sorry that I didn't make myself clear. I found that if I applied RESET_RESUME
> quirk on the problematic hub, the Set-Port-Feature(suspend) timeout error
> disappeared. SInce the timeout is not happening for each suspend by default,
> I suspect maybe reset-resume take everything back to clean state for the hub
> and the Set-Port-Feature(suspend) can be taken care of w/o problems.

Okay, that's a good solution for system suspend.

> I didn't like RESET_RESUME because runtime PM would not work on the quirked
> device.

A more interesting question is whether it will work for devices plugged 
into the hub.  Even though the hub won't be runtime suspended, the 
things attached to it might be.

>  But if the Set-Port-Feature(suspend) can't be handled and
> skipped, I can't
> expect the runtime PM to work for all devices connected to the hub either.
> Is that right? If what I proposed in the patch can not get better
> result than existing
> quirk, I think using the RESET_RESUME would be a better option. Any 
> suggestions?

Try the RESET_RESUME quirk and see how well it works with runtime 
suspend.

Alan Stern


Re: [PATCH] usb: gadget: dummy_hcd: fix gpf in gadget_setup

2021-04-17 Thread Alan Stern
On Sat, Apr 17, 2021 at 06:22:09PM +0530, Anirudh Rayabharam wrote:
> Fix a general protection fault reported by syzbot due to a race between
> gadget_setup() and gadget_unbind() in raw_gadget.
> 
> The gadget core is supposed to guarantee that there won't be any more
> callbacks to the gadget driver once the driver's unbind routine is
> called. That guarantee is enforced in usb_gadget_remove_driver as
> follows:
> 
> usb_gadget_disconnect(udc->gadget);
> if (udc->gadget->irq)
> synchronize_irq(udc->gadget->irq);
> udc->driver->unbind(udc->gadget);
> usb_gadget_udc_stop(udc);
> 
> usb_gadget_disconnect turns off the pullup resistor, telling the host
> that the gadget is no longer connected and preventing the transmission
> of any more USB packets. Any packets that have already been received
> are sure to processed by the UDC driver's interrupt handler by the time
> synchronize_irq returns.
> 
> But this doesn't work with dummy_hcd, because dummy_hcd doesn't use
> interrupts; it uses a timer instead.  It does have code to emulate the
> effect of synchronize_irq, but that code doesn't get invoked at the
> right time -- it currently runs in usb_gadget_udc_stop, after the unbind
> callback instead of before.  Indeed, there's no way for
> usb_gadget_remove_driver to invoke this code before the unbind
> callback.
> 
> To fix this, move the synchronize_irq() emulation code to dummy_pullup
> so that it runs before unbind. Also, add a comment explaining why it is
> necessary to have it there.
> 
> Suggested-by: Alan Stern 
> Reported-by: syzbot+eb4674092e6cc8d9e...@syzkaller.appspotmail.com
> Signed-off-by: Anirudh Rayabharam 
> ---
>  drivers/usb/gadget/udc/dummy_hcd.c | 23 +++
>  1 file changed, 15 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/usb/gadget/udc/dummy_hcd.c 
> b/drivers/usb/gadget/udc/dummy_hcd.c
> index ce24d4f28f2a..d0dae6406612 100644
> --- a/drivers/usb/gadget/udc/dummy_hcd.c
> +++ b/drivers/usb/gadget/udc/dummy_hcd.c
> @@ -903,6 +903,21 @@ static int dummy_pullup(struct usb_gadget *_gadget, int 
> value)
>   spin_lock_irqsave(>lock, flags);
>   dum->pullup = (value != 0);
>   set_link_state(dum_hcd);
> + if (value == 0) {
> + /*
> +  * emulate synchronize_irq(): wait for callbacks to finish
> +  * This seems to be the best to place to emulate the call
> +  * to synchronize_irq(). Doing it in dummy_udc_stop() would
> +  * be too late since it is called after the unbind callback.
> +  * Also, there is no way for core:usb_gadget_remove_driver()
> +  * to invoke this code before the unbind callback.
> +  */

This comment could be edited a little better.  It should start with a 
capital letter, and there should be a period at the end of the first 
line.  There is an extra "to" before "place to emulate".  The last 
sentence isn't really needed.

Also, you could be more specific.  The call to synchronize_irq() which 
we want to emulate is the one in usb_gadget_remove_driver().  And the 
reason we want to do it before the unbind callback is because unbind 
shouldn't be invoked until all the other callbacks are finished.

Once those changes are made, you can add:

Acked-by: Alan Stern 

Alan Stern

> + while (dum->callback_usage > 0) {
> + spin_unlock_irqrestore(>lock, flags);
> + usleep_range(1000, 2000);
> + spin_lock_irqsave(>lock, flags);
> + }
> + }
>   spin_unlock_irqrestore(>lock, flags);
>  
>   usb_hcd_poll_rh_status(dummy_hcd_to_hcd(dum_hcd));
> @@ -1004,14 +1019,6 @@ static int dummy_udc_stop(struct usb_gadget *g)
>   spin_lock_irq(>lock);
>   dum->ints_enabled = 0;
>   stop_activity(dum);
> -
> - /* emulate synchronize_irq(): wait for callbacks to finish */
> - while (dum->callback_usage > 0) {
> - spin_unlock_irq(>lock);
> - usleep_range(1000, 2000);
> - spin_lock_irq(>lock);
> - }
> -
>   dum->driver = NULL;
>   spin_unlock_irq(>lock);
>  
> -- 
> 2.26.2
> 


Re: [RFC PATCH] USB:XHCI:skip hub registration

2021-04-17 Thread Alan Stern
On Sat, Apr 17, 2021 at 02:48:22PM +0800, liulongfang wrote:
> On 2021/4/16 23:20, Alan Stern wrote:
> > On Fri, Apr 16, 2021 at 10:03:21AM +0800, liulongfang wrote:
> >> The current method is an improved method of the above patch.
> >> This patch just make it skip registering USB-3 root hub if that hub has no 
> >> ports,
> > 
> > No, that isn't what this patch does.
> > 
> > If the root hub wasn't registered, hub_probe wouldn't get called.  But 
> > with your patch, the system tries to register the root hub, and it does 
> > call hub_probe, and then that function fails with a warning message.
> > 
> > The way to _really_ akip registering the root hub is to change the 
> > xhci-hcd code.  Make it skip calling usb_add_hcd.
> > 
> 
> If you do not register in the root hub, this will return an error code,

What will return an error code?  Are you talking about xhci_pci_probe()?  
You oight to be able to figure out how to make it work.

> which will make all the XHCI drivers unregister, causing the USB2.0 
> controllers
> on the xhci to be unavailable.

Alan Stern


Re: [syzbot] general protection fault in gadget_setup

2021-04-16 Thread Alan Stern
On Fri, Apr 16, 2021 at 10:35:20PM +0530, Anirudh Rayabharam wrote:
> On Fri, Apr 16, 2021 at 11:27:34AM -0400, Alan Stern wrote:
> > Actually, I wanted to move this emulation code into a new subroutine and 
> > then call that subroutine from _both_ places.  Would you like to write 
> 
> Does it really need to be called from both places?

You know, I was going to say Yes, but now I think you're right; it's not 
needed in dummy_udc_stop.  This is because core.c always calls 
usb_gadget_disconnect before usb_gadget_udc_stop.  And we can rely on 
this behavior; it's obviously necessary to disconnect from the host 
before stopping the UDC driver.

On the other hand, while checking that fact I noticed that 
soft_connect_store in core.c doesn't call synchronize_irq in between the 
other two, the way usb_gadget_remove_driver does.  That seems like a bug 
-- if it's necessary to synchronize with the IRQ handler on one path, it 
should be necessary on the other path as well.  But that's a matter for 
a separate patch.

Alan Stern

> > and submit a patch that does this?
> 
> Sure! I will do that.
> 
> Thanks!
> 
>   - Anirudh.


Re: [PATCH v3] USB: Don't set USB_PORT_FEAT_SUSPEND on WD19's Realtek Hub

2021-04-16 Thread Alan Stern
On Fri, Apr 16, 2021 at 09:24:30AM +0800, Chris Chiu wrote:
> On Fri, Apr 16, 2021 at 2:46 AM Alan Stern  wrote:
> >
> > On Fri, Apr 16, 2021 at 12:13:43AM +0800, Chris Chiu wrote:
> > > One thing worth mentioning here, I never hit the hub_ext_port_status -71
> > > problem if I resume by waking up from the keyboard connected to the hub.
> >
> > I thought you said earlier that the port got into trouble while it was
> > suspending, not while it was resuming.  You wrote:
> >
> > > [ 2789.679812] usb 3-4-port3: can't suspend, status -110
> >
> > So if the problem occurs during suspend, how can it be possible to avoid
> > the problem by taking some particular action later while resuming?
> >
> 
> The ETIMEDOUT is still there, but the port can resume w/o problems and I
> don't really know what the hub did. I can only reset_resume the hub to bring 
> it
> back if I'm not waking up from the connected keyboard.

What if two devices are plugged into the hub, only one of them is 
runtime suspended, and you need to runtime resume that device?  Then you 
can't do a reset-resume of the hub, because the hub isn't suspended.

> > > But the usbcore kernel log shows me wPortStatus: 0503 wPortChane: 0004
> > > of that port while resuming. In normal cases, they are 0507:.
> >
> > The 0004 bit of wPortChange means that the suspend status has changed;
> > the port is no longer suspended because the device attached to that port
> > (your keyboard) issued a wakeup request.
> >
> > >  I don't know how to SetPortFeature() with setting the status change bit 
> > > only.
> >
> > You can't.  Only the hub itself can set the wPortChange bits.
> >
> > > Or maybe it's just some kind of timing issue of the
> > > idle/suspend/resume signaling.
> >
> > Not timing.  It's because you woke the system up from the attached
> > keyboard.
> >
> > Alan Stern
> 
> Got it. I'm just confused by the USB 2.0 spec 11.24.2.7.2 that
> "Hubs may allow setting of the status change bits with a SetPortFeature()
>  request for diagnostic purposes."

Yeah, I don't think very many hubs actually do that.

> So for this case, I think USB_QUIRK_RESET_RESUME would be a
> better option to at least bring back the port. Any suggestion about what
> kind of test I can do on this hub? Thanks

I'm not sure what you're proposing.

If (as mentioned above) the hub doesn't handle the 
Set-Port-Feature(suspend) request properly, then we should avoid issuing 
that request.  Which means runtime suspend attempts should not be 
allowed, as in your most recent patch.

Alan Stern


Re: [syzbot] general protection fault in gadget_setup

2021-04-16 Thread Alan Stern
On Fri, Apr 16, 2021 at 09:21:12AM +0200, Dmitry Vyukov wrote:
> On Thu, Apr 15, 2021 at 10:59 PM Alan Stern  wrote:
> >
> > On Tue, Apr 13, 2021 at 07:11:11PM +0200, Dmitry Vyukov wrote:
> > > Yes, this is possible:
> > > http://bit.do/syzbot#syzkaller-reproducers
> >
> > That's not really what I had in mind.  I don't want to spend the time
> > and effort installing syskaller on my own system; I want to tell syzbot
> > to run a particular syzkaller program (the one that originally led to
> > this bug report) on a patched kernel.
> >
> > The syzbot instructions say that it can test bugs with reproducers.  The
> > problem here is that there doesn't seem to be any way to tell it to use
> > a particular syzkaller program as a reproducer.
> 
> Hi Alan,
> 
> This makes sense and I've found an existing feature request:
> https://github.com/google/syzkaller/issues/1611
> I've added a reference to this thread there.

Great!  Thank you.

Alan Stern


Re: [syzbot] general protection fault in gadget_setup

2021-04-16 Thread Alan Stern
On Fri, Apr 16, 2021 at 11:10:35AM +0530, Anirudh Rayabharam wrote:
> On Tue, Apr 13, 2021 at 12:13:11PM -0400, Alan Stern wrote:
> > Maybe we can test this reasoning by putting a delay just before the call 
> > to dum->driver->setup.  That runs in the timer handler, so it's not a 
> > good place to delay, but it may be okay just for testing purposes.
> > 
> > Hopefully this patch will make the race a lot more likely to occur.  Is 
> 
> Hi Alan, 
> 
> Indeed, I was able to reproduce this bug easily on my machine with your
> delay patch applied and using this syzkaller program:
> 
> syz_usb_connect$cdc_ncm(0x1, 0x6e, &(0x7f40)={{0x12, 0x1, 0x0, 0x2, 
> 0x0, 0x0, 0x8, 0x525, 0xa4a1, 0x40, 0x1, 0x2, 0x3, 0x1, [{{0x9, 0x2, 0x5c, 
> 0x2, 0x1, 0x0, 0x0, 0x0, {{0x9, 0x4, 0x0, 0x0, 0x1, 0x2, 0xd, 0x0, 0x0, 
> {{0x5}, {0x5}, {0xd}, {0x6}}, {{0x9, 0x5, 0x81, 0x3, 0x200}}]}}, 
> &(0x7f000480)={0x0, 0x0, 0x0, 0x0, 0x3, [{0x0, 0x0}, {0x0, 0x0}, {0x0, 
> 0x0}]})
> 
> I also tested doing the synchronize_irq emulation in dummy_pullup and it
> fixed the issue. The patch is below.

That's great!  Thanks for testing.

> Thanks!
> 
>   - Anirudh.
> 
> diff --git a/drivers/usb/gadget/udc/dummy_hcd.c 
> b/drivers/usb/gadget/udc/dummy_hcd.c
> index ce24d4f28f2a..931d4612d859 100644
> --- a/drivers/usb/gadget/udc/dummy_hcd.c
> +++ b/drivers/usb/gadget/udc/dummy_hcd.c
> @@ -903,6 +903,12 @@ static int dummy_pullup(struct usb_gadget *_gadget, int 
> value)
>   spin_lock_irqsave(>lock, flags);
>   dum->pullup = (value != 0);
>   set_link_state(dum_hcd);
> + /* emulate synchronize_irq(): wait for callbacks to finish */
> + while (dum->callback_usage > 0) {
> + spin_unlock_irqrestore(>lock, flags);
> + usleep_range(1000, 2000);
> + spin_lock_irqsave(>lock, flags);
> + }

We should do this only if value == 0.  No synchronization is needed when 
the pullup is turned on.

Also, there should be a comment explaining that this is necessary 
because there's no other place to emulate the call made to 
synchronize_irq() in core.c:usb_gadget_remove_driver().

>   spin_unlock_irqrestore(>lock, flags);
>  
>   usb_hcd_poll_rh_status(dummy_hcd_to_hcd(dum_hcd));
> @@ -1005,13 +1011,6 @@ static int dummy_udc_stop(struct usb_gadget *g)
>   dum->ints_enabled = 0;
>   stop_activity(dum);
>  
> - /* emulate synchronize_irq(): wait for callbacks to finish */
> - while (dum->callback_usage > 0) {
> - spin_unlock_irq(>lock);
> - usleep_range(1000, 2000);
> - spin_lock_irq(>lock);
> - }
> -
>   dum->driver = NULL;
>   spin_unlock_irq(>lock);

Actually, I wanted to move this emulation code into a new subroutine and 
then call that subroutine from _both_ places.  Would you like to write 
and submit a patch that does this?

Alan Stern


Re: [RFC PATCH] USB:XHCI:skip hub registration

2021-04-16 Thread Alan Stern
On Fri, Apr 16, 2021 at 10:03:21AM +0800, liulongfang wrote:
> On 2021/4/15 22:43, Alan Stern wrote:
> > On Thu, Apr 15, 2021 at 08:22:38PM +0800, Longfang Liu wrote:
> >> When the number of ports on the USB hub is 0, skip the registration
> >> operation of the USB hub.
> >>
> >> The current Kunpeng930's XHCI hardware controller is defective. The number
> >> of ports on its USB3.0 bus controller is 0, and the number of ports on
> >> the USB2.0 bus controller is 1.
> >>
> >> In order to solve this problem that the USB3.0 controller does not have
> >> a port which causes the registration of the hub to fail, this patch passes
> >> the defect information by adding flags in the quirks of xhci and usb_hcd,
> >> and finally skips the registration process of the hub directly according
> >> to the results of these flags when the hub is initialized.
> >>
> >> Signed-off-by: Longfang Liu 
> > 
> > The objections that Greg raised are all good ones.
> > 
> > But even aside from them, this patch doesn't actually do what the 
> > description says.  The patch doesn't remove the call to usb_add_hcd 
> > for the USB-3 bus.  If you simply skipped that call (and the 
> > corresponding call to usb_remove_hcd) when there are no 
> > ports on the root hub, none of the stuff in this patch would be needed.
> > 
> > Alan Stern
> > 
> 
> "[RFC PATCH] USB:XHCI:Adjust the log level of hub"

I don't understand.  What patch is that?  Do you have a URL for it?

> The current method is an improved method of the above patch.
> This patch just make it skip registering USB-3 root hub if that hub has no 
> ports,

No, that isn't what this patch does.

If the root hub wasn't registered, hub_probe wouldn't get called.  But 
with your patch, the system tries to register the root hub, and it does 
call hub_probe, and then that function fails with a warning message.

The way to _really_ akip registering the root hub is to change the 
xhci-hcd code.  Make it skip calling usb_add_hcd.

> after skipping registering, no port will not report error log,the goal of this
> patch is reached without error log output.

Why do you want to get rid of the error log output?  There really _is_ 
an error, because the USB-3 hardware on your controller is defective.  
Since the hardware is buggy, we _should_ print an error message in the 
kernel log.

Alan Stern


Re: [syzbot] general protection fault in gadget_setup

2021-04-15 Thread Alan Stern
On Tue, Apr 13, 2021 at 07:11:11PM +0200, Dmitry Vyukov wrote:
> On Tue, Apr 13, 2021 at 6:57 PM Alan Stern  wrote:
> >
> > On Tue, Apr 13, 2021 at 06:47:47PM +0200, Dmitry Vyukov wrote:
> > > On Tue, Apr 13, 2021 at 6:13 PM Alan Stern  
> > > wrote:
> > > > Hopefully this patch will make the race a lot more likely to occur.  Is
> > > > there any way to tell syzkaller to test it, despite the fact that
> > > > syzkaller doesn't think it has a reproducer for this issue?
> > >
> > > If there is no reproducer the only way syzbot can test it is if it's
> > > in linux-next under CONFIG_DEBUG_AID_FOR_SYZBOT:
> > > http://bit.do/syzbot#no-custom-patches
> >
> > There _is_ a theoretical reproducer: the test that provoked syzkaller's
> > original bug report.  But syzkaller doesn't realize that it is (or may
> > be) a reproducer.
> >
> > It ought to be possible to ask syzkaller to run a particular test that
> > it has done before, with a patch applied -- and without having to add
> > anything to linux-next.
> 
> Yes, this is possible:
> http://bit.do/syzbot#syzkaller-reproducers

That's not really what I had in mind.  I don't want to spend the time 
and effort installing syskaller on my own system; I want to tell syzbot 
to run a particular syzkaller program (the one that originally led to 
this bug report) on a patched kernel.

The syzbot instructions say that it can test bugs with reproducers.  The 
problem here is that there doesn't seem to be any way to tell it to use 
a particular syzkaller program as a reproducer.

Alan Stern

> The log of tests executed before the crash is available under the
> "console output" link:
> console output: https://syzkaller.appspot.com/x/log.txt?x=124adbf6d0
> And this log can be replayed using syz-execprog utility.


Re: [PATCH v3] USB: Don't set USB_PORT_FEAT_SUSPEND on WD19's Realtek Hub

2021-04-15 Thread Alan Stern
On Fri, Apr 16, 2021 at 12:13:43AM +0800, Chris Chiu wrote:
> One thing worth mentioning here, I never hit the hub_ext_port_status -71
> problem if I resume by waking up from the keyboard connected to the hub.

I thought you said earlier that the port got into trouble while it was 
suspending, not while it was resuming.  You wrote:

> [ 2789.679812] usb 3-4-port3: can't suspend, status -110

So if the problem occurs during suspend, how can it be possible to avoid 
the problem by taking some particular action later while resuming?

> But the usbcore kernel log shows me wPortStatus: 0503 wPortChane: 0004
> of that port while resuming. In normal cases, they are 0507:.

The 0004 bit of wPortChange means that the suspend status has changed; 
the port is no longer suspended because the device attached to that port 
(your keyboard) issued a wakeup request.

>  I don't know how to SetPortFeature() with setting the status change bit only.

You can't.  Only the hub itself can set the wPortChange bits.

> Or maybe it's just some kind of timing issue of the
> idle/suspend/resume signaling.

Not timing.  It's because you woke the system up from the attached 
keyboard.

Alan Stern


Re: [RFC PATCH] USB:XHCI:skip hub registration

2021-04-15 Thread Alan Stern
On Thu, Apr 15, 2021 at 08:22:38PM +0800, Longfang Liu wrote:
> When the number of ports on the USB hub is 0, skip the registration
> operation of the USB hub.
> 
> The current Kunpeng930's XHCI hardware controller is defective. The number
> of ports on its USB3.0 bus controller is 0, and the number of ports on
> the USB2.0 bus controller is 1.
> 
> In order to solve this problem that the USB3.0 controller does not have
> a port which causes the registration of the hub to fail, this patch passes
> the defect information by adding flags in the quirks of xhci and usb_hcd,
> and finally skips the registration process of the hub directly according
> to the results of these flags when the hub is initialized.
> 
> Signed-off-by: Longfang Liu 

The objections that Greg raised are all good ones.

But even aside from them, this patch doesn't actually do what the 
description says.  The patch doesn't remove the call to usb_add_hcd 
for the USB-3 bus.  If you simply skipped that call (and the 
corresponding call to usb_remove_hcd) when there are no 
ports on the root hub, none of the stuff in this patch would be needed.

Alan Stern

> ---
>  drivers/usb/core/hub.c  | 6 ++
>  drivers/usb/host/xhci-pci.c | 4 
>  drivers/usb/host/xhci.c | 5 +
>  drivers/usb/host/xhci.h | 1 +
>  include/linux/usb/hcd.h | 1 +
>  5 files changed, 17 insertions(+)
> 
> diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
> index b1e14be..2d6869d 100644
> --- a/drivers/usb/core/hub.c
> +++ b/drivers/usb/core/hub.c
> @@ -1769,9 +1769,15 @@ static int hub_probe(struct usb_interface *intf, const 
> struct usb_device_id *id)
>   struct usb_host_interface *desc;
>   struct usb_device *hdev;
>   struct usb_hub *hub;
> + struct usb_hcd *hcd;
>  
>   desc = intf->cur_altsetting;
>   hdev = interface_to_usbdev(intf);
> + hcd = bus_to_hcd(hdev->bus);
> + if (hcd->usb3_no_port) {
> + dev_warn(>dev, "USB hub has no port\n");
> + return -ENODEV;
> + }
>  
>   /*
>* Set default autosuspend delay as 0 to speedup bus suspend,
> diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
> index ef513c2..63b89a4 100644
> --- a/drivers/usb/host/xhci-pci.c
> +++ b/drivers/usb/host/xhci-pci.c
> @@ -281,6 +281,10 @@ static void xhci_pci_quirks(struct device *dev, struct 
> xhci_hcd *xhci)
>   if (xhci->quirks & XHCI_RESET_ON_RESUME)
>   xhci_dbg_trace(xhci, trace_xhci_dbg_quirks,
>   "QUIRK: Resetting on resume");
> +
> + if (pdev->vendor == PCI_VENDOR_ID_HUAWEI &&
> + pdev->device == 0xa23c)
> + xhci->quirks |= XHCI_USB3_NOPORT;
>  }
>  
>  #ifdef CONFIG_ACPI
> diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
> index bee5dec..e3e3573 100644
> --- a/drivers/usb/host/xhci.c
> +++ b/drivers/usb/host/xhci.c
> @@ -5184,6 +5184,11 @@ int xhci_gen_setup(struct usb_hcd *hcd, 
> xhci_get_quirks_t get_quirks)
>   /* xHCI private pointer was set in xhci_pci_probe for the second
>* registered roothub.
>*/
> + if (xhci->quirks & XHCI_USB3_NOPORT) {
> + xhci_info(xhci, "xHCI host has no port\n");
> + hcd->usb3_no_port = 1;
> + }
> +
>   return 0;
>   }
>  
> diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
> index 2c6c4f8..d3c658f 100644
> --- a/drivers/usb/host/xhci.h
> +++ b/drivers/usb/host/xhci.h
> @@ -1874,6 +1874,7 @@ struct xhci_hcd {
>  #define XHCI_RESET_PLL_ON_DISCONNECT BIT_ULL(34)
>  #define XHCI_SNPS_BROKEN_SUSPENDBIT_ULL(35)
>  #define XHCI_RENESAS_FW_QUIRKBIT_ULL(36)
> +#define XHCI_USB3_NOPORT BIT_ULL(37)
>  
>   unsigned intnum_active_eps;
>   unsigned intlimit_active_eps;
> diff --git a/include/linux/usb/hcd.h b/include/linux/usb/hcd.h
> index 3dbb42c..7df23a0f 100644
> --- a/include/linux/usb/hcd.h
> +++ b/include/linux/usb/hcd.h
> @@ -172,6 +172,7 @@ struct usb_hcd {
>   unsignedtpl_support:1; /* OTG & EH TPL support */
>   unsignedcant_recv_wakeups:1;
>   /* wakeup requests from downstream aren't received */
> + unsignedusb3_no_port:1; /* xHCI main_hcd has no port */
>  
>   unsigned intirq;/* irq allocated */
>   void __iomem*regs;  /* device memory/io */
> -- 
> 2.8.1
> 


Re: UBSAN: array-index-out-of-bounds in ehci_hub_control

2021-04-15 Thread Alan Stern
On Thu, Apr 15, 2021 at 04:10:45PM +0200, Dmitry Vyukov wrote:
> Hi,
> 
> I've got this report while booting v5.10.13 kernel, but upstream code
> seems to be the same.
> The access to port_status, the code is:
> 
> struct ehci_regs {
> u32 port_status[0]; /* up to N_PORTS */
> u32 reserved3[9];
> https://elixir.bootlin.com/linux/v5.12-rc7/source/include/linux/usb/ehci_def.h#L130
> 
> Question: should it be an empty array "[]" to prevent the undefined behavior?
> Or should it be declared as "[9]" to be more explicit?

Arnd has already looked at this:

https://marc.info/?t=15882824021=1=2

I thought we had arrived at an acceptable (though not great) solution, 
but he never posted any finished patches.  :-(

Alan Stern

> UBSAN: array-index-out-of-bounds in drivers/usb/host/ehci-hub.c:893:16
> index 1 is out of range for type 'u32 [0]'
> CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.10.13+ #1
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> rel-1.13.0-44-g88ab0c15525c-prebuilt.qemu.org 04/01/2014
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x183/0x225 lib/dump_stack.c:118
>  ubsan_epilogue lib/ubsan.c:148 [inline]
>  __ubsan_handle_out_of_bounds+0xdb/0x130 lib/ubsan.c:356
>  ehci_hub_control+0x1d27/0x2370 drivers/usb/host/ehci-hub.c:893
>  rh_call_control+0x938/0x11a0 drivers/usb/core/hcd.c:683
>  rh_urb_enqueue drivers/usb/core/hcd.c:842 [inline]
>  usb_hcd_submit_urb+0x2f2/0x5f0 drivers/usb/core/hcd.c:1543
>  usb_start_wait_urb+0x11a/0x550 drivers/usb/core/message.c:58
>  usb_internal_control_msg drivers/usb/core/message.c:102 [inline]
>  usb_control_msg+0x281/0x470 drivers/usb/core/message.c:153
>  hub_power_on+0x1a8/0x3c0 arch/x86/include/asm/bitops.h:219
>  hub_activate+0x330/0x1ba0 drivers/usb/core/hub.c:1076
>  hub_configure+0x19e0/0x2690 drivers/usb/core/hub.c:1680
>  hub_probe+0x82f/0x9b0 drivers/usb/core/hub.c:1882
>  usb_probe_interface+0x67b/0xb70 drivers/usb/core/driver.c:396
>  really_probe+0x58b/0x1580 drivers/base/dd.c:558
>  driver_probe_device+0x15a/0x310 drivers/base/dd.c:740
>  bus_for_each_drv+0x16a/0x1f0 drivers/base/bus.c:431
>  __device_attach+0x2b2/0x4b0 drivers/base/dd.c:914
>  bus_probe_device+0xb8/0x200 drivers/base/bus.c:491
>  device_add+0x8e7/0xcb0 drivers/base/core.c:2954
>  usb_set_configuration+0x1b98/0x2230 drivers/usb/core/message.c:2159
>  usb_generic_driver_probe+0x83/0x140 drivers/usb/core/generic.c:238
>  usb_probe_device+0x13a/0x260 drivers/usb/core/driver.c:293
>  really_probe+0x58b/0x1580 drivers/base/dd.c:558
>  driver_probe_device+0x15a/0x310 drivers/base/dd.c:740
>  bus_for_each_drv+0x16a/0x1f0 drivers/base/bus.c:431
>  __device_attach+0x2b2/0x4b0 drivers/base/dd.c:914
>  bus_probe_device+0xb8/0x200 drivers/base/bus.c:491
>  device_add+0x8e7/0xcb0 drivers/base/core.c:2954
>  usb_new_device+0xa43/0x1120 drivers/usb/core/hub.c:2554
>  register_root_hub+0x214/0x560 drivers/usb/core/hcd.c:1009
>  usb_add_hcd+0x8ee/0x1080 drivers/usb/core/hcd.c:2793
>  usb_hcd_pci_probe+0xa61/0x1280 drivers/usb/core/hcd-pci.c:264
>  local_pci_probe drivers/pci/pci-driver.c:308 [inline]
>  pci_call_probe drivers/pci/pci-driver.c:365 [inline]
>  __pci_device_probe drivers/pci/pci-driver.c:390 [inline]
>  pci_device_probe+0x3ef/0x630 drivers/pci/pci-driver.c:433
>  really_probe+0x544/0x1580 drivers/base/dd.c:554
>  driver_probe_device+0x15a/0x310 drivers/base/dd.c:740
>  device_driver_attach+0x176/0x280 drivers/base/dd.c:1015
>  __driver_attach+0xa7/0x490 drivers/base/dd.c:1092
>  bus_for_each_dev+0x168/0x1d0 drivers/base/bus.c:305
>  bus_add_driver+0x324/0x5e0 drivers/base/bus.c:622
>  driver_register+0x2e9/0x3e0 drivers/base/driver.c:171
>  do_one_initcall+0x1a3/0x410 init/main.c:1217
>  do_initcall_level+0x168/0x218 init/main.c:1290
>  do_initcalls+0x50/0x91 init/main.c:1306
>  kernel_init_freeable+0x33b/0x47f init/main.c:1527
>  kernel_init+0xd/0x290 init/main.c:1415


Re: [PATCH] USB: Don't set USB_PORT_FEAT_SUSPEND on WD19's Realtek Hub

2021-04-14 Thread Alan Stern
On Wed, Apr 14, 2021 at 01:07:43PM +0800, Chris Chiu wrote:
> Thanks for the instructions. I can hit the same timeout problem with
> runtime PM. The
> fail rate seems the same as normal PM. (around 1/4 ~ 1/7)
> root@:/sys/bus/usb/devices/3-4.3# echo auto > power/control
> root@:/sys/bus/usb/devices/3-4.3# echo on > power/control
> root@:/sys/bus/usb/devices/3-4.3# dmesg -c
> [ 2789.679807] usb 3-4: kworker/7:0 timed out on ep0out len=0/0
> [ 2789.679812] usb 3-4-port3: can't suspend, status -110
> [ 2789.680078] usb 3-4.3: Failed to suspend device, error -110

Since these are random failures, occurring at a low rate, maybe it would 
help simply to retry the transfer that timed out.  Have you tested this?

> > > > > diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
> > > > > index 7f71218cc1e5..8478d49bba77 100644
> > > > > --- a/drivers/usb/core/hub.c
> > > > > +++ b/drivers/usb/core/hub.c
> > > > > @@ -3329,8 +3329,11 @@ int usb_port_suspend(struct usb_device *udev, 
> > > > > pm_message_t msg)
> > > > >* descendants is enabled for remote wakeup.
> > > > >*/
> > > > >   else if (PMSG_IS_AUTO(msg) || 
> > > > > usb_wakeup_enabled_descendants(udev) > 0)
> > > > > - status = set_port_feature(hub->hdev, port1,
> > > > > - USB_PORT_FEAT_SUSPEND);
> > > > > + if (udev->quirks & USB_QUIRK_NO_SET_FEAT_SUSPEND)
> > > >
> > > > You should test hub->hdev->quirks, here, not udev->quirks.  The quirk
> > > > belongs to the Realtek hub, not to the device that's plugged into the
> > > > hub.
> > > >
> > >
> > > Thanks for pointing that out. I'll verify again and propose a V2 after
> > > it's done.
> >
> > Another thing to consider: You shouldn't return 0 from usb_port_suspend
> > if the port wasn't actually suspended.  We don't want to kernel to have
> > a false idea of the hardware's current state.
> >
> So we still need the "really_suspend=false". What if I replace it with
> the following?
> It's a little verbose but expressive enough. Any suggestions?
> 
> +   else if (!(hub->hdev->quirks & USB_QUIRK_NO_SET_FEAT_SUSPEND) &&
> +   (PMSG_IS_AUTO(msg) || usb_wakeup_enabled_descendants(udev) > 
> 0))
> +   status = set_port_feature(hub->hdev, port1,
> +   USB_PORT_FEAT_SUSPEND);
> else {
> really_suspend = false;
> status = 0;

You should do something more like this:

-   else if (PMSG_IS_AUTO(msg) || usb_wakeup_enabled_descendants(udev) > 0)
-   status = set_port_feature(hub->hdev, port1,
-   USB_PORT_FEAT_SUSPEND);
-   else {
+   else if (PMSG_IS_AUTO(msg) || usb_wakeup_enabled_descendants(udev) > 0) 
{
+       if (hub->hdev->quirks & USB_QUIRK_NO_SUSPEND)
+   status = -EIO;
+   else
+   status = set_port_feature(hub->hdev, port1,
+   USB_PORT_FEAT_SUSPEND);
+   } else {
really_suspend = false;
status = 0;
}

But I would prefer to find a way to make port suspend actually work, 
instead of giving up on it.

Alan Stern


Re: [syzbot] general protection fault in gadget_setup

2021-04-13 Thread Alan Stern
On Tue, Apr 13, 2021 at 06:47:47PM +0200, Dmitry Vyukov wrote:
> On Tue, Apr 13, 2021 at 6:13 PM Alan Stern  wrote:
> > Hopefully this patch will make the race a lot more likely to occur.  Is
> > there any way to tell syzkaller to test it, despite the fact that
> > syzkaller doesn't think it has a reproducer for this issue?
> 
> If there is no reproducer the only way syzbot can test it is if it's
> in linux-next under CONFIG_DEBUG_AID_FOR_SYZBOT:
> http://bit.do/syzbot#no-custom-patches

There _is_ a theoretical reproducer: the test that provoked syzkaller's 
original bug report.  But syzkaller doesn't realize that it is (or may 
be) a reproducer.

It ought to be possible to ask syzkaller to run a particular test that 
it has done before, with a patch applied -- and without having to add 
anything to linux-next.

Alan Stern


Re: [syzbot] general protection fault in gadget_setup

2021-04-13 Thread Alan Stern
On Tue, Apr 13, 2021 at 10:12:05AM +0200, Dmitry Vyukov wrote:
> On Tue, Apr 13, 2021 at 10:08 AM syzbot
>  wrote:
> >
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit:0f4498ce Merge tag 'for-5.12/dm-fixes-2' of git://git.kern..
> > git tree:   upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=124adbf6d0
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=daeff30c2474a60f
> > dashboard link: https://syzkaller.appspot.com/bug?extid=eb4674092e6cc8d9e0bd
> > userspace arch: i386
> >
> > Unfortunately, I don't have any reproducer for this issue yet.
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+eb4674092e6cc8d9e...@syzkaller.appspotmail.com
> 
> I suspect that the raw gadget_unbind() can be called while the timer
> is still active. gadget_unbind() sets gadget data to NULL.
> But I am not sure which unbind call this is:
> usb_gadget_remove_driver() or right in udc_bind_to_driver() due to a
> start error.

This certainly looks like a race between gadget_unbind and gadget_setup 
in raw_gadget.

In theory, this race shouldn't matter.  The gadget core is supposed to 
guarantee that there won't be any more callbacks to the gadget driver 
once the driver's unbind routine is called.  That guarantee is enforced 
in usb_gadget_remove_driver as follows:

usb_gadget_disconnect(udc->gadget);
if (udc->gadget->irq)
synchronize_irq(udc->gadget->irq);
udc->driver->unbind(udc->gadget);
usb_gadget_udc_stop(udc);

usb_gadget_disconnect turns off the pullup resistor, telling the host 
that the gadget is no longer connected and preventing the transmission 
of any more USB packets.  Any packets that have already been received 
are sure to processed by the UDC driver's interrupt handler by the time 
synchronize_irq returns.

But this doesn't work with dummy_hcd, because dummy_hcd doesn't use 
interrupts; it uses a timer instead.  It does have code to emulate the 
effect of synchronize_irq, but that code doesn't get invoked at the 
right time -- it currently runs in usb_gadget_udc_stop, after the unbind 
callback instead of before.  Indeed, there's no way for 
usb_gadget_remove_driver to invoke this code before the unbind 
callback,.

I thought the synchronize_irq emulation problem had been completely 
solved, but evidently it hasn't.  It looks like the best solution is to 
add a call of the synchronize_irq emulation code in dummy_pullup.

Maybe we can test this reasoning by putting a delay just before the call 
to dum->driver->setup.  That runs in the timer handler, so it's not a 
good place to delay, but it may be okay just for testing purposes.

Hopefully this patch will make the race a lot more likely to occur.  Is 
there any way to tell syzkaller to test it, despite the fact that 
syzkaller doesn't think it has a reproducer for this issue?

Alan Stern


Index: usb-devel/drivers/usb/gadget/udc/dummy_hcd.c
===
--- usb-devel.orig/drivers/usb/gadget/udc/dummy_hcd.c
+++ usb-devel/drivers/usb/gadget/udc/dummy_hcd.c
@@ -1900,6 +1900,7 @@ restart:
if (value > 0) {
++dum->callback_usage;
spin_unlock(>lock);
+   mdelay(5);
value = dum->driver->setup(>gadget,
);
spin_lock(>lock);


Re: [PATCH] USB: Don't set USB_PORT_FEAT_SUSPEND on WD19's Realtek Hub

2021-04-13 Thread Alan Stern
On Tue, Apr 13, 2021 at 03:52:14PM +0800, Chris Chiu wrote:
> On Mon, Apr 12, 2021 at 11:12 PM Alan Stern  wrote:
> >
> > On Mon, Apr 12, 2021 at 11:00:06PM +0800, chris.c...@canonical.com wrote:
> > > The USB_PORT_FEAT_SUSPEND is not really necessary due to the
> > > "global suspend" in USB 2.0 spec. It's only for many hub devices
> > > which don't relay wakeup requests from the devices connected to
> > > downstream ports. For this realtek hub, there's no problem waking
> > > up the system from connected keyboard.
> >
> > What about runtime suspend?  That _does_ require USB_PORT_FEAT_SUSPEND.
> 
> It's hard to reproduce the same thing with runtime PM. I also don't
> know the aggressive
> way to trigger runtime suspend. So I'm assuming the same thing will happen in
> runtime PM case because they both go the same usb_port_resume path. Could
> you please suggest a better way to verify this for runtime PM?

To put a USB device into runtime suspend, do this:

echo 0 >/sys/bus/usb/devices/.../bConfigurationValue
echo auto >/sys/bus/usb/devices/.../power/control

where ... is the pathname for the device you want to suspend.  (Note 
that this will unbind the device from its driver, so make sure there's 
no possibility of data loss before you do it.)

To resume the device, write "on" to the power/control file.  You can 
verify the runtime-PM status by reading the files in the power/ 
subdirectory.

> > > This commit bypasses the USB_PORT_FEAT_SUSPEND for the quirky hub.
> > >
> > > Signed-off-by: Chris Chiu 
> > > ---
> >
> >
> > > diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
> > > index 7f71218cc1e5..8478d49bba77 100644
> > > --- a/drivers/usb/core/hub.c
> > > +++ b/drivers/usb/core/hub.c
> > > @@ -3329,8 +3329,11 @@ int usb_port_suspend(struct usb_device *udev, 
> > > pm_message_t msg)
> > >* descendants is enabled for remote wakeup.
> > >*/
> > >   else if (PMSG_IS_AUTO(msg) || usb_wakeup_enabled_descendants(udev) 
> > > > 0)
> > > - status = set_port_feature(hub->hdev, port1,
> > > - USB_PORT_FEAT_SUSPEND);
> > > + if (udev->quirks & USB_QUIRK_NO_SET_FEAT_SUSPEND)
> >
> > You should test hub->hdev->quirks, here, not udev->quirks.  The quirk
> > belongs to the Realtek hub, not to the device that's plugged into the
> > hub.
> >
> 
> Thanks for pointing that out. I'll verify again and propose a V2 after
> it's done.

Another thing to consider: You shouldn't return 0 from usb_port_suspend 
if the port wasn't actually suspended.  We don't want to kernel to have 
a false idea of the hardware's current state.

Alan Stern


Re: [PATCH] USB: Don't set USB_PORT_FEAT_SUSPEND on WD19's Realtek Hub

2021-04-12 Thread Alan Stern
On Mon, Apr 12, 2021 at 11:00:06PM +0800, chris.c...@canonical.com wrote:
> From: Chris Chiu 
> 
> Realtek Hub (0bda:5413) in Dell Dock WD19 sometimes fails to work
> after the system resumes from suspend with remote wakeup enabled
> device connected:
> [ 1947.640907] hub 5-2.3:1.0: hub_ext_port_status failed (err = -71)
> [ 1947.641208] usb 5-2.3-port5: cannot disable (err = -71)
> [ 1947.641401] hub 5-2.3:1.0: hub_ext_port_status failed (err = -71)
> [ 1947.641450] usb 5-2.3-port4: cannot reset (err = -71)
> 
> Information of this hub:
> T:  Bus=01 Lev=02 Prnt=02 Port=02 Cnt=01 Dev#=  9 Spd=480  MxCh= 6
> D:  Ver= 2.10 Cls=09(hub  ) Sub=00 Prot=02 MxPS=64 #Cfgs=  1
> P:  Vendor=0bda ProdID=5413 Rev= 1.21
> S:  Manufacturer=Dell Inc.
> S:  Product=Dell dock
> C:* #Ifs= 1 Cfg#= 1 Atr=a0 MxPwr=  0mA
> I:  If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=01 Driver=hub
> E:  Ad=81(I) Atr=03(Int.) MxPS=   1 Ivl=256ms
> I:* If#= 0 Alt= 1 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=02 Driver=hub
> E:  Ad=81(I) Atr=03(Int.) MxPS=   1 Ivl=256ms
> 
> The failure results from the ETIMEDOUT by chance when turning on
> the suspend feature of the hub. The usb_resume_device will not be
> invoked since the device state is not set to suspended, then the
> hub fails to activate subsequently.
> 
> The USB_PORT_FEAT_SUSPEND is not really necessary due to the
> "global suspend" in USB 2.0 spec. It's only for many hub devices
> which don't relay wakeup requests from the devices connected to
> downstream ports. For this realtek hub, there's no problem waking
> up the system from connected keyboard.

What about runtime suspend?  That _does_ require USB_PORT_FEAT_SUSPEND.

> This commit bypasses the USB_PORT_FEAT_SUSPEND for the quirky hub.
> 
> Signed-off-by: Chris Chiu 
> ---


> diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
> index 7f71218cc1e5..8478d49bba77 100644
> --- a/drivers/usb/core/hub.c
> +++ b/drivers/usb/core/hub.c
> @@ -3329,8 +3329,11 @@ int usb_port_suspend(struct usb_device *udev, 
> pm_message_t msg)
>* descendants is enabled for remote wakeup.
>*/
>   else if (PMSG_IS_AUTO(msg) || usb_wakeup_enabled_descendants(udev) > 0)
> - status = set_port_feature(hub->hdev, port1,
> - USB_PORT_FEAT_SUSPEND);
> + if (udev->quirks & USB_QUIRK_NO_SET_FEAT_SUSPEND)

You should test hub->hdev->quirks, here, not udev->quirks.  The quirk 
belongs to the Realtek hub, not to the device that's plugged into the 
hub.

Alan Stern


Re: [RFC PATCH] usb: core: reduce power-on-good delay time of root hub

2021-04-09 Thread Alan Stern
On Fri, Apr 09, 2021 at 10:39:07AM +0800, Chunfeng Yun wrote:
> Return the exactly delay time given by root hub descriptor,
> this helps to reduce resume time etc.
> 
> Due to the root hub descriptor is usually provided by the host
> controller driver, if there is compatibility for a root hub,
> we can fix it easily without affect other root hub
> 
> Signed-off-by: Chunfeng Yun 

Acked-by: Alan Stern 

> ---
>  drivers/usb/core/hub.h | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/usb/core/hub.h b/drivers/usb/core/hub.h
> index 73f4482d833a..22ea1f4f2d66 100644
> --- a/drivers/usb/core/hub.h
> +++ b/drivers/usb/core/hub.h
> @@ -148,8 +148,10 @@ static inline unsigned hub_power_on_good_delay(struct 
> usb_hub *hub)
>  {
>   unsigned delay = hub->descriptor->bPwrOn2PwrGood * 2;
>  
> - /* Wait at least 100 msec for power to become stable */
> - return max(delay, 100U);
> + if (!hub->hdev->parent) /* root hub */
> + return delay;
> + else /* Wait at least 100 msec for power to become stable */
> + return max(delay, 100U);
>  }
>  
>  static inline int hub_port_debounce_be_connected(struct usb_hub *hub,
> -- 
> 2.18.0
> 


Re: [PATCH v2] USB:ohci:fix ohci interruption problem

2021-04-09 Thread Alan Stern
On Fri, Apr 09, 2021 at 03:47:02PM +0800, Longfang Liu wrote:
> The operating method of the system entering S4 sleep mode:
> echo reboot > /sys/power/disk
> echo disk > /sys/power/state
> 
> When OHCI enters the S4 sleep state, check the log and find that
> the USB sleep process will call check_root_hub_suspend() and
> ohci_bus_suspend() instead ohci_suspend() and ohci_bus_suspend(),
> which will cause the OHCI interrupt to not be closed.
> 
> At this time, if just one device interrupt is reported. the
> driver will not process and close this device interrupt. It will cause
> the entire system to be stuck during sleep, causing the device to
> fail to respond.
> 
> When the abnormal interruption reaches 100,000 times, the system will
> forcibly close the interruption and make the device unusable.
> 
> Because the root cause of the problem is that ohci_suspend is not
> called to perform normal interrupt shutdown operations when the system
> enters S4 sleep mode.
> 
> Therefore, our solution is to specify freeze interface in this mode to
> perform normal suspend_common() operations, and call ohci_suspend()
> after check_root_hub_suspend() is executed through the suspend_common()
> operation.
> 
> Signed-off-by: Longfang Liu 
> ---
> 
> Changes in V2:
>   - Modify comment and patch version information.

Please, instead of sending the same incorrect patch over and over again, 
spend some time figuring out what is really going wrong.  I have already 
explained why this patch is not the right thing to do.

You have to determine why the poweroff callback in hcd-pci.c (which 
points to hcd_pci_suspend) isn't getting called.  That's the real 
explanation for your problem.

Alan Stern

> Changes in V1:
>   - Call suspend_common by adding the hcd_pci_freeze function turn off
>   the interrupt instead of adding a shutdown operation in ohci_bus_suspend
>   to turn off the interrupt.
> 
>  drivers/usb/core/hcd-pci.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c
> index 1547aa6..c5844a3 100644
> --- a/drivers/usb/core/hcd-pci.c
> +++ b/drivers/usb/core/hcd-pci.c
> @@ -509,6 +509,11 @@ static int resume_common(struct device *dev, int event)
>  
>  #ifdef   CONFIG_PM_SLEEP
>  
> +static int hcd_pci_freeze(struct device *dev)
> +{
> + return suspend_common(dev, device_may_wakeup(dev));
> +}
> +
>  static int hcd_pci_suspend(struct device *dev)
>  {
>   return suspend_common(dev, device_may_wakeup(dev));
> @@ -605,7 +610,7 @@ const struct dev_pm_ops usb_hcd_pci_pm_ops = {
>   .suspend_noirq  = hcd_pci_suspend_noirq,
>   .resume_noirq   = hcd_pci_resume_noirq,
>   .resume = hcd_pci_resume,
> - .freeze = check_root_hub_suspended,
> + .freeze = hcd_pci_freeze,
>   .freeze_noirq   = check_root_hub_suspended,
>   .thaw_noirq = NULL,
>   .thaw   = NULL,
> -- 
> 2.8.1
> 


Re: [PATCH v4] USB:ehci:fix Kunpeng920 ehci hardware problem

2021-04-09 Thread Alan Stern
On Fri, Apr 09, 2021 at 04:48:01PM +0800, Longfang Liu wrote:
> Kunpeng920's EHCI controller does not have SBRN register.
> Reading the SBRN register when the controller driver is
> initialized will get 0.
> 
> When rebooting the EHCI driver, ehci_shutdown() will be called.
> if the sbrn flag is 0, ehci_shutdown() will return directly.
> The sbrn flag being 0 will cause the EHCI interrupt signal to
> not be turned off after reboot. this interrupt that is not closed
> will cause an exception to the device sharing the interrupt.
> 
> Therefore, the EHCI controller of Kunpeng920 needs to skip
> the read operation of the SBRN register.
> 
> Signed-off-by: Longfang Liu 
> ---

Acked-by: Alan Stern 

> Changes in v4:
>   - Modify the code implementation.
> 
> Changes in v3:
>   - Fix some code style issues.
>   - Update struct name.
> 
> Changes in v2:
>   - Fix some code style issues.
>   - Update function name.
> 
>  drivers/usb/host/ehci-pci.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/usb/host/ehci-pci.c b/drivers/usb/host/ehci-pci.c
> index 3c3820a..237a346 100644
> --- a/drivers/usb/host/ehci-pci.c
> +++ b/drivers/usb/host/ehci-pci.c
> @@ -291,6 +291,9 @@ static int ehci_pci_setup(struct usb_hcd *hcd)
>   if (pdev->vendor == PCI_VENDOR_ID_STMICRO
>   && pdev->device == PCI_DEVICE_ID_STMICRO_USB_HOST)
>   ;   /* ConneXT has no sbrn register */
> + else if (pdev->vendor == PCI_VENDOR_ID_HUAWEI
> +  && pdev->device == 0xa239)
> + ;   /* HUAWEI Kunpeng920 USB EHCI has no sbrn register */
>   else
>   pci_read_config_byte(pdev, 0x60, >sbrn);
>  
> -- 
> 2.8.1
> 


Re: [PATCH v2 0/2] USB:ehci:fix the no SRBN register problem

2021-04-08 Thread Alan Stern
On Thu, Apr 08, 2021 at 09:49:18PM +0800, Longfang Liu wrote:
> (1) Add a whitelist for EHCI devices without SBRN registers.
> (2) Add Kunpeng920's EHCI device to the whitelist.
> 
> Changes in v2:
>   - Fix some code style issues.
>   - Update function name.
> 
> Longfang Liu (2):
>   USB:ehci:Add a whitelist for EHCI controllers
>   USB:ehci:fix Kunpeng920 ehci hardware problem
> 
>  drivers/usb/host/ehci-pci.c | 30 ++
>  1 file changed, 26 insertions(+), 4 deletions(-)

I don't think we need a whole list, along with an associated lookup 
routine, when there are only two entries.  The total amount of code will 
be smaller if you just add a check for the Kunpeng920 controller to
the existing check for the STMICRO controller.

Alan Stern


Re: [PATCH] USB:ohci:fix ohci interruption problem

2021-04-02 Thread Alan Stern
On Fri, Apr 02, 2021 at 05:27:59PM +0800, Longfang Liu wrote:
> The operating method of the system entering S4 sleep mode:
> echo disk > /sys/power/state

This discussion is still not right.

> When OHCI enters the S4 sleep state,

To start with, you should be talking about hibernation (also known as 
suspend-to-disk), not S4.  When the system enters hibernation -- for 
example, when you write "disk" to /sys/power/state -- the controller may 
go into S4 or it may go into some other power-saving state.

>  the USB sleep process will call
> check_root_hub_suspend() and ohci_bus_suspend() instead of
> ohci_suspend() and ohci_bus_suspend(), this causes the OHCI interrupt
> to not be closed.

This isn't true.  The procedure _does_ call ohci_suspend, through the 
.poweroff callback in hcd-pci.c.  That callback goes to the 
hcd_pci_suspend routine, which calls suspend_common and then 
ohci_suspend.

However, these calls happen after the kernel image has be written to the 
storage area on the disk.  As a result, any log messages produced during 
the calls do not get saved, so they don't get reloaded when the system 
resumes from hibernation, and they aren't present in the log after the 
system wakes up.  That's why they didn't appear in the log you included 
in an earlier email.  The only way to observe them is to use a remote 
console, such as a network console.

In fact, that's pretty much the only way to debug problems that occur 
during a hibernation transition.

> At this time, if just one device interrupt is reported. Since rh_state
> has been changed to OHCI_RH_SUSPENDED after ohci_bus_suspend(), the
> driver will not process and close this device interrupt.

That's not true either.  The ohci_irq routine does indeed process 
interrupts even when rh_state is set to OHCI_RH_SUSPENDED.  How else 
could it handle a device's wakeup request?

> It will cause
> the entire system to be stuck during sleep, causing the device to
> fail to respond.

During hibernation, the system is powered off.  Obviously the kernel is 
not capable of handling interrupts at this time.

Also, why would a device interrupt be reported at this time?  What 
causes the interrupt request?

> When the abnormal interruption reaches 100,000 times, the system will
> forcibly close the interruption and make the device unusable.
> 
> Because the root cause of the problem is that ohci_suspend is not
> called to perform normal interrupt shutdown operations when the system
> enters S4 sleep mode.
> 
> Therefore, our solution is to specify freeze interface in this mode to
> perform normal suspend_common() operations, and call ohci_suspend()
> after check_root_hub_suspend() is executed through the suspend_common()
> operation.

No.  The freeze interface does not need to power-down the controller.  
All it needs to do is make sure that no communication between the 
computer and the attached USB devices takes place, and this is handled 
by ohci_bus_suspend.

Furthermore, it is a mistake for the freeze routine to change anything 
unless the thaw routine reverses the change.  Your patch leaves the thaw 
callback pointer set to NULL.

> After using this solution, it is verified by the stress test of sleep
> wake up in S4 mode for a long time that this problem no longer occurs.

Something else must be happeneing, something you don't understand.

Alan Stern

> Signed-off-by: Longfang Liu 
> ---
>  drivers/usb/core/hcd-pci.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c
> index 1547aa6..c5844a3 100644
> --- a/drivers/usb/core/hcd-pci.c
> +++ b/drivers/usb/core/hcd-pci.c
> @@ -509,6 +509,11 @@ static int resume_common(struct device *dev, int event)
>  
>  #ifdef   CONFIG_PM_SLEEP
>  
> +static int hcd_pci_freeze(struct device *dev)
> +{
> + return suspend_common(dev, device_may_wakeup(dev));
> +}
> +
>  static int hcd_pci_suspend(struct device *dev)
>  {
>   return suspend_common(dev, device_may_wakeup(dev));
> @@ -605,7 +610,7 @@ const struct dev_pm_ops usb_hcd_pci_pm_ops = {
>   .suspend_noirq  = hcd_pci_suspend_noirq,
>   .resume_noirq   = hcd_pci_resume_noirq,
>   .resume = hcd_pci_resume,
> - .freeze = check_root_hub_suspended,
> + .freeze = hcd_pci_freeze,
>   .freeze_noirq   = check_root_hub_suspended,
>   .thaw_noirq = NULL,
>   .thaw   = NULL,
> -- 
> 2.8.1
> 


Re: [syzbot] WARNING in ieee802154_del_seclevel

2021-04-01 Thread Alan Stern
On Wed, Mar 31, 2021 at 02:03:08PM -0700, syzbot wrote:
> syzbot has bisected this issue to:
> 
> commit 416dacb819f59180e4d86a5550052033ebb6d72c
> Author: Alan Stern 
> Date:   Wed Aug 21 17:27:12 2019 +
> 
> HID: hidraw: Fix invalid read in hidraw_ioctl
> 
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=127430fcd0
> start commit:   6e5a03bc ethernet/netronome/nfp: Fix a use after free in n..
> git tree:   net
> final oops: https://syzkaller.appspot.com/x/report.txt?x=117430fcd0
> console output: https://syzkaller.appspot.com/x/log.txt?x=167430fcd0
> kernel config:  https://syzkaller.appspot.com/x/.config?x=daeff30c2474a60f
> dashboard link: https://syzkaller.appspot.com/bug?extid=fbf4fc11a819824e027b
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=13bfe45ed0
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1188e31ad0
> 
> Reported-by: syzbot+fbf4fc11a819824e0...@syzkaller.appspotmail.com
> Fixes: 416dacb819f5 ("HID: hidraw: Fix invalid read in hidraw_ioctl")
> 
> For information about bisection process see: https://goo.gl/tpsmEJ#bisection

It seems likely that the bisection ran off the rails here.  This commit 
could not have caused a problem, although it may have revealed a 
pre-existing problem that previously was hidden.

By the way, what happened to the annotated stack dumps that syzkaller 
used to provide in its bug reports?

Alan Stern


Re: >20 KB URBs + EHCI = bad performance due to stalls

2021-03-31 Thread Alan Stern
On Wed, Mar 31, 2021 at 08:20:56PM +0200, Maciej S. Szmigiero wrote:
> On 29.03.2021 17:22, Alan Stern wrote:
> > On Sat, Mar 27, 2021 at 04:55:20PM +0100, Maciej S. Szmigiero wrote:
> > > Hi,
> > > 
> > > Is there any specific reason that URBs without URB_SHORT_NOT_OK flag that
> > > span multiple EHCI qTDs have Alternate Next qTD pointer set to the dummy
> > > qTD in their every qTD besides the last one (instead of to the first qTD
> > > of the next URB to that endpoint)?
> > 
> > Quick answer: I don't know.  I can't think of any good reason.  This
> > code was all written a long time ago.  Maybe the issue was overlooked
> > or the details were misunderstood.
> 
> I've dug out the original EHCI driver, that landed in 2.5.2:
> https://marc.info/?l=linux-usb-devel=100875066109269=2
> https://marc.info/?l=linux-usb-devel=100982880716373=2
> 
> It already had the following qTD setup code, roughly similar to what
> the current one does:
> > /* previous urb allows short rx? maybe optimize. */
> > if (!(last_qtd->urb->transfer_flags & USB_DISABLE_SPD)
> > && (epnum & 0x10)) {
> > // only the last QTD for now
> > last_qtd->hw_alt_next = hw_next;
> 
> The "for now" language seems to suggest that ultimately other-than-last
> qTDs were supposed to be set not to stall the queue, too.
> Just the code to handle this case was never written.

Probably it just slipped out of the developer's mind.

> It seems to me though, this should be possible with relatively few
> changes to the code:
> qh_append_tds() will need to patch these other-than-last qTDs
> hw_alt_next pointer to point to the (new) dummy qTD (instead of just
> pointing the last submitted qTD hw_next to it and adding the remaining
> qTDs verbatim to the qH qTD list).

Right.

> Then qh_completions() will need few changes:
> *
> >  } else if (IS_SHORT_READ (token)
> >   && !(qtd->hw_alt_next
> >& EHCI_LIST_END(ehci))) {
> This branch will need to be modified not to mark the queue as stopped
> and request its unlinking when such type of short qTD was processed.

This would be a good place to introduce a macro.  For example:

} else if (IS_SHORT_READ(token) && 
EHCI_PTR_IS_SET(qtd->hw_alt_next)) {

or something similar.

> * The ACTIVE bit should be treated as unset on any qTD following the
> one that hits the above condition until a qTD for a different URB is
> encountered.
> Otherwise the unprocessed remaining qTDs from that short URB will be
> considered pending active qTDs and the code will wait forever for their
> processing,

The treatment shouldn't be exactly the same as if ACTIVE is clear.  The 
following qTDs can be removed from the list and deallocated immediately, 
since the hardware won't look at them.  And they shouldn't affect the 
URB's status.

> * The code that patches the previous qTD hw_next pointer when removing a
> qTD that isn't currently at the qH will need changing to also patch
> hw_alt_next pointers of the qTDs belonging to the previous URB in case
> the previous URB was one of these short-read-ok ones.

Yes.  Awkward but necessary.

Although I know nothing at all about the USB API in Windows, I suspect 
that it manages to avoid this awkwardness entirely by not allowing URBs 
in the middle of the queue to be unlinked.  Or perhaps allowing it only 
for endpoint 0.  I've often wished Linux's API had been written that 
way.

> That's was my quick assessment what is required to handle these
> transactions effectively in the EHCI driver.
> 
> I suspect, however, there may be some corner cases involving
> non-ordinary qTD unlinking which might need fixing, too (like caused
> by usb_unlink_urb(), system suspend or HC removal).
> But I am not sure about this since I don't know this code well.

Those shouldn't present any difficulty.  There are inherently easier to 
handle because the QH won't be actively running when they occur.

> > > This causes that endpoint queue to stall in case of a short read that
> > > does not reach the last qTD (I guess this condition persists until an
> > > URB is (re)submitted to that endpoint, but I am not sure here).
> > 
> > It persists until the driver cleans up the queue.
> 
> I guess by "the driver" you mean the host controller driver, not the USB
> device driver.

Yes, I meant ehci-hcd.

> > > Looking at OHCI and UHCI host controller drivers the equivalent
> > > limits seem to be different there (8 KB and 2 KB), while I don't
> > > see any specific limit in the XHCI case.
> > 
> > I'd have to review the details of ohci-hcd and uh

Re: [PATCH v2 5/6] usb: Iterator for ports

2021-03-30 Thread Alan Stern
On Tue, Mar 30, 2021 at 12:07:35PM +0300, Heikki Krogerus wrote:
> On Mon, Mar 29, 2021 at 02:49:46PM -0400, Alan Stern wrote:
> > On Mon, Mar 29, 2021 at 11:44:25AM +0300, Heikki Krogerus wrote:
> > > Introducing usb_for_each_port(). It works the same way as
> > > usb_for_each_dev(), but instead of going through every USB
> > > device in the system, it walks through the USB ports in the
> > > system.
> > > 
> > > Signed-off-by: Heikki Krogerus 
> > > ---
> > >  drivers/usb/core/usb.c | 46 ++
> > >  include/linux/usb.h|  1 +
> > >  2 files changed, 47 insertions(+)
> > > 
> > > diff --git a/drivers/usb/core/usb.c b/drivers/usb/core/usb.c
> > > index 2ce3667ec6fae..62368c4ed37af 100644
> > > --- a/drivers/usb/core/usb.c
> > > +++ b/drivers/usb/core/usb.c
> > > @@ -398,6 +398,52 @@ int usb_for_each_dev(void *data, int (*fn)(struct 
> > > usb_device *, void *))
> > >  }
> > >  EXPORT_SYMBOL_GPL(usb_for_each_dev);
> > >  
> > > +struct each_hub_arg {
> > > + void *data;
> > > + int (*fn)(struct device *, void *);
> > > +};
> > > +
> > > +static int __each_hub(struct usb_device *hdev, void *data)
> > > +{
> > > + struct each_hub_arg *arg = (struct each_hub_arg *)data;
> > > + struct usb_hub *hub;
> > > + int ret = 0;
> > > + int i;
> > > +
> > > + hub = usb_hub_to_struct_hub(hdev);
> > > + if (!hub)
> > > + return 0;
> > 
> > What happens if the hub is removed exactly now?  Although hdev is 
> > reference-counted (and the loop iterator does take a reference to it), 
> > usb_hub_to_struct_hub doesn't take a reference to hub.  And hub->ports 
> > isn't refcounted at all.
> 
> If the hub is removed right now, and if hub_disconnect() also manages
> to remove the ports before we have time to take the lock below, then
> hdev->maxchild will be 0 by the time we can take the lock. In that
> case nothing happens here.

Okay, good.

> If on the other hand we manage to acquire the usb_port_peer_mutex
> before hub_disconnect(), then hub_disconnect() will simply have to
> wait until we are done, and only after that remove the ports.
> 
> > > + mutex_lock(_port_peer_mutex);
> > > +
> > > + for (i = 0; i < hdev->maxchild; i++) {
> > > + ret = arg->fn(>ports[i]->dev, arg->data);
> > > + if (ret)
> > > + break;
> > > + }
> > > +
> > > + mutex_unlock(_port_peer_mutex);
> > 
> > I have a feeling that it would be better to take and release this mutex 
> > in usb_for_each_port (or its caller), so that it is held over the whole 
> > loop.
> 
> I disagree. The lock is for the ports, not the hubs. We should take
> the lock when we are going through the ports of a hub, but release it
> between the hubs. Otherwise we will be only keeping things on hold for
> a long period of time for no good reason (I for example have to
> evaluate the _PLD of every single port which takes a lot of time). We
> don't need to prevent other things from happening to the hubs at the
> same time.

All right, you convinced me.

Acked-by: Alan Stern 

Alan Stern


Re: [PATCH v2 5/6] usb: Iterator for ports

2021-03-29 Thread Alan Stern
On Mon, Mar 29, 2021 at 11:44:25AM +0300, Heikki Krogerus wrote:
> Introducing usb_for_each_port(). It works the same way as
> usb_for_each_dev(), but instead of going through every USB
> device in the system, it walks through the USB ports in the
> system.
> 
> Signed-off-by: Heikki Krogerus 
> ---
>  drivers/usb/core/usb.c | 46 ++
>  include/linux/usb.h|  1 +
>  2 files changed, 47 insertions(+)
> 
> diff --git a/drivers/usb/core/usb.c b/drivers/usb/core/usb.c
> index 2ce3667ec6fae..62368c4ed37af 100644
> --- a/drivers/usb/core/usb.c
> +++ b/drivers/usb/core/usb.c
> @@ -398,6 +398,52 @@ int usb_for_each_dev(void *data, int (*fn)(struct 
> usb_device *, void *))
>  }
>  EXPORT_SYMBOL_GPL(usb_for_each_dev);
>  
> +struct each_hub_arg {
> + void *data;
> + int (*fn)(struct device *, void *);
> +};
> +
> +static int __each_hub(struct usb_device *hdev, void *data)
> +{
> + struct each_hub_arg *arg = (struct each_hub_arg *)data;
> + struct usb_hub *hub;
> + int ret = 0;
> + int i;
> +
> + hub = usb_hub_to_struct_hub(hdev);
> + if (!hub)
> + return 0;

What happens if the hub is removed exactly now?  Although hdev is 
reference-counted (and the loop iterator does take a reference to it), 
usb_hub_to_struct_hub doesn't take a reference to hub.  And hub->ports 
isn't refcounted at all.

> +
> + mutex_lock(_port_peer_mutex);
> +
> + for (i = 0; i < hdev->maxchild; i++) {
> + ret = arg->fn(>ports[i]->dev, arg->data);
> + if (ret)
> + break;
> + }
> +
> + mutex_unlock(_port_peer_mutex);

I have a feeling that it would be better to take and release this mutex 
in usb_for_each_port (or its caller), so that it is held over the whole 
loop.

Alan Stern

> +
> + return ret;
> +}
> +
> +/**
> + * usb_for_each_port - interate over all USB ports in the system
> + * @data: data pointer that will be handed to the callback function
> + * @fn: callback function to be called for each USB port
> + *
> + * Iterate over all USB ports and call @fn for each, passing it @data. If it
> + * returns anything other than 0, we break the iteration prematurely and 
> return
> + * that value.
> + */
> +int usb_for_each_port(void *data, int (*fn)(struct device *, void *))
> +{
> + struct each_hub_arg arg = {data, fn};
> +
> + return usb_for_each_dev(, __each_hub);
> +}
> +EXPORT_SYMBOL_GPL(usb_for_each_port);
> +
>  /**
>   * usb_release_dev - free a usb device structure when all users of it are 
> finished.
>   * @dev: device that's been disconnected
> diff --git a/include/linux/usb.h b/include/linux/usb.h
> index ddd2f5b2a2827..e4d2eb703cf89 100644
> --- a/include/linux/usb.h
> +++ b/include/linux/usb.h
> @@ -871,6 +871,7 @@ extern int usb_match_one_id(struct usb_interface 
> *interface,
>   const struct usb_device_id *id);
>  
>  extern int usb_for_each_dev(void *data, int (*fn)(struct usb_device *, void 
> *));
> +int usb_for_each_port(void *data, int (*fn)(struct device *, void *));
>  extern struct usb_interface *usb_find_interface(struct usb_driver *drv,
>   int minor);
>  extern struct usb_interface *usb_ifnum_to_if(const struct usb_device *dev,
> -- 
> 2.30.2
> 


Re: [RFC PATCH] USB:ohci:fix ohci interruption problem

2021-03-29 Thread Alan Stern
On Mon, Mar 29, 2021 at 04:38:10PM +0800, liulongfang wrote:
> On 2021/3/26 23:28, Alan Stern wrote:
> > On Fri, Mar 26, 2021 at 04:54:56PM +0800, Longfang Liu wrote:
> >> When OHCI enters the S4 sleep state, the USB sleep process will call
> >> check_root_hub_suspend() and ohci_bus_suspend() instead of
> >> ohci_suspend() and ohci_bus_suspend(), this causes the OHCI interrupt
> >> to not be closed.
> > 
> > What on earth are you talking about?  This isn't true at all.
> > 
> > Can you provide more information about your system?  Are you using a 
> > PCI-based OHCI controller or a platform device (and if so, which one)?  
> > Can you post system logs to back up your statements?
> > The system is UOS, the kernel version is kernel4.19, and the driver
> used is ohci-pci.c based on PCI.
> 
> By adding the log in ohci_suspend, and then viewing the dmesg after sleep,
> I can confirm that the system does not call ohci_suspend in S4 sleep mode.

Then your job is to figure out why not.  Doesn't entry into S4 sleep
call hcd_pci_suspend() in core/hcd-pci.c, and doesn't that call
suspend_common(), and doesn't that call hcd->driver->pci_suspend(),
and isn't that routine ohci_suspend()?

Alan Stern


Re: >20 KB URBs + EHCI = bad performance due to stalls

2021-03-29 Thread Alan Stern
On Sat, Mar 27, 2021 at 04:55:20PM +0100, Maciej S. Szmigiero wrote:
> Hi,
> 
> Is there any specific reason that URBs without URB_SHORT_NOT_OK flag that
> span multiple EHCI qTDs have Alternate Next qTD pointer set to the dummy
> qTD in their every qTD besides the last one (instead of to the first qTD
> of the next URB to that endpoint)?

Quick answer: I don't know.  I can't think of any good reason.  This
code was all written a long time ago.  Maybe the issue was overlooked
or the details were misunderstood.

> This causes that endpoint queue to stall in case of a short read that
> does not reach the last qTD (I guess this condition persists until an
> URB is (re)submitted to that endpoint, but I am not sure here).

It persists until the driver cleans up the queue.

> One of affected drivers here is drivers/net/usb/r8152.c.
> 
> If I simply reduce its per-URB transfer buffer to 20 KB (the maximum
> that fits in a well-aligned qTD) the RX rate increases from around
> 100 Mbps to 200+ Mbps (on an ICH8M controller):

> The driver default is to use 32 KB buffers (which span two qTDs),
> but the device rarely fully fills the first qTD resulting in
> repetitive stalls and more than halving the performance.
> 
> As far as I can see, the relevant code in
> drivers/usb/host/ehci-q.c::qh_urb_transaction() predates the git era.

Like I said, a long time ago.

> The comment in that function before setting the Alternate Next qTD
> pointer:
> > /*
> >  * short reads advance to a "magic" dummy instead of the next
> >  * qtd ... that forces the queue to stop, for manual cleanup.
> >  * (this will usually be overridden later.)
> >  */
> 
> ...suggests the idea was to override that pointer when
> URB_SHORT_NOT_OK is not set, but this is actually done only for
> the last qTD from the URB (also, that's the only one that ends
> with interrupt flag set).

The hw_alt_next field should be updated for all the qTDs in the URB.
Failure to this was probably an oversight.  Or maybe the omission was
to simplify the procedure for cleaning up the queue after a short
transfer.

> Looking at OHCI and UHCI host controller drivers the equivalent
> limits seem to be different there (8 KB and 2 KB), while I don't
> see any specific limit in the XHCI case.

I'd have to review the details of ohci-hcd and uhci-hcd to make
sure.  In principle, the queue isn't supposed to stop merely because
of a short transfer unless URB_SHORT_NOT_OK is set.  However, the UHCI
hardware in particular may offer no other way to handle a short transfer.

> Because of that variance in the URB buffer limit it seems strange
> to me that this should be managed by a particular USB device driver
> rather than by the host controller driver, because this would mean
> every such driver would need to either use the lowest common
> denominator for the URB buffer size (which is very small) or
> hardcode the limit for every host controller that the device can
> be connected to, which seems a bit inefficient.

I don't understand what you're saying in this paragraph.  What do you
think USB device drivers are supposed to be managing?  The URB buffer
size?  They should set that field without regard to the type of host
controller in use.

In short, the behavior you observed is a bug, resulting in a loss of
throughput (though not in any loss of data).  It needs to be fixed.

If you would like to write and submit a patch, that would be great.
Otherwise, I'll try to find time to work on it.

I would appreciate any effort you could make toward checking the code
in qh_completions(); I suspect that the checks it does involving
EHCI_LIST_END may not be right.  At the very least, they should be
encapsulated in a macro so that they are easier to understand.

Alan Stern


Re: [PATCH v3 0/4] scsi: add runtime PM workaround for SD cardreaders

2021-03-28 Thread Alan Stern
On Sun, Mar 28, 2021 at 12:25:27PM +0200, Martin Kepplinger wrote:
> hi,
> 
> In short: there are SD cardreaders that send MEDIA_CHANGED on
> (runtime) resume. We cannot use runtime PM with these devices as
> I/O always fails. I'd like to discuss a way to fix this
> or at least allow us to work around this problem:

In fact, as far as I know _all_ USB SD card readers send Media Changed 
notifications on resume.  Maybe there are some that don't, but I haven't 
heard of any.

Alan Stern


Re: [RFC PATCH] USB:ohci:fix ohci interruption problem

2021-03-26 Thread Alan Stern
On Fri, Mar 26, 2021 at 04:54:56PM +0800, Longfang Liu wrote:
> When OHCI enters the S4 sleep state, the USB sleep process will call
> check_root_hub_suspend() and ohci_bus_suspend() instead of
> ohci_suspend() and ohci_bus_suspend(), this causes the OHCI interrupt
> to not be closed.

What on earth are you talking about?  This isn't true at all.

Can you provide more information about your system?  Are you using a 
PCI-based OHCI controller or a platform device (and if so, which one)?  
Can you post system logs to back up your statements?

The proper order of calls is ohci_bus_suspend, then 
check_root_hub_suspended, then ohci_suspend.  Often the first one is 
called some time before the other two.

> At this time, if just one device interrupt is reported. Since rh_state
> has been changed to OHCI_RH_SUSPENDED after ohci_bus_suspend(), the
> driver will not process and close this device interrupt. It will cause
> the entire system to be stuck during sleep, causing the device to
> fail to respond.
> 
> When the abnormal interruption reaches 100,000 times, the system will
> forcibly close the interruption and make the device unusable.
> 
> Since the problem is that the interrupt is not closed, we copied the
> interrupt shutdown operation of ohci_suspend() into ohci_bus_suspend()
> during the S4 sleep period. We found that this method can solve this
> problem.
> 
> At present, we hope to be able to call ohci_suspend() directly during
> the sleep process of S4. Do you have any suggestions for this
> modification?
> 
> Signed-off-by: Longfang Liu 
> ---
>  drivers/usb/host/ohci-hub.c | 13 -
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/usb/host/ohci-hub.c b/drivers/usb/host/ohci-hub.c
> index 634f3c7..d468cef 100644
> --- a/drivers/usb/host/ohci-hub.c
> +++ b/drivers/usb/host/ohci-hub.c
> @@ -315,6 +315,14 @@ static int ohci_bus_suspend (struct usb_hcd *hcd)
>   del_timer_sync(>io_watchdog);
>   ohci->prev_frame_no = IO_WATCHDOG_OFF;
>   }
> +
> + spin_lock_irqsave(>lock, flags);
> + ohci_writel(ohci, OHCI_INTR_MIE, >regs->intrdisable);
> + (void)ohci_readl(ohci, >regs->intrdisable);
> +
> + clear_bit(HCD_FLAG_HW_ACCESSIBLE, >flags);
> + spin_unlock_irqrestore(>lock, flags);

This is completely wrong.  The hardware certainly remains accessible 
when the root hub stops running.  The HW_ACCESSIBLE flag should not be 
cleared here.

And if the Master Interrupt Enable bit is cleared, how will the driver 
ever learn if a remote wakeup request (such as a plug or unplug event) 
occurs?

Alan Stern

> +
>   return rc;
>  }
>  
> @@ -326,7 +334,10 @@ static int ohci_bus_resume (struct usb_hcd *hcd)
>   if (time_before (jiffies, ohci->next_statechange))
>   msleep(5);
>  
> - spin_lock_irq (>lock);
> + spin_lock_irq(>lock);
> + set_bit(HCD_FLAG_HW_ACCESSIBLE, >flags);
> + ohci_writel(ohci, OHCI_INTR_MIE, >regs->intrenable);
> + ohci_readl(ohci, >regs->intrenable);
>  
>   if (unlikely(!HCD_HW_ACCESSIBLE(hcd)))
>   rc = -ESHUTDOWN;
> -- 
> 2.8.1
> 


Re: [PATCH 1/6] usb: Iterator for ports

2021-03-25 Thread Alan Stern
On Thu, Mar 25, 2021 at 05:14:42PM +0200, Heikki Krogerus wrote:
> On Thu, Mar 25, 2021 at 10:41:09AM -0400, Alan Stern wrote:
> > On Thu, Mar 25, 2021 at 03:29:21PM +0300, Heikki Krogerus wrote:
> > > Introducing usb_for_each_port(). It works the same way as
> > > usb_for_each_dev(), but instead of going through every USB
> > > device in the system, it walks through the USB ports in the
> > > system.
> > > 
> > > Signed-off-by: Heikki Krogerus 
> > 
> > This has a couple of nasty errors.
> > 
> > > ---
> > >  drivers/usb/core/usb.c | 43 ++
> > >  include/linux/usb.h|  1 +
> > >  2 files changed, 44 insertions(+)
> > > 
> > > diff --git a/drivers/usb/core/usb.c b/drivers/usb/core/usb.c
> > > index 2ce3667ec6fae..6d49db9a1b208 100644
> > > --- a/drivers/usb/core/usb.c
> > > +++ b/drivers/usb/core/usb.c
> > > @@ -398,6 +398,49 @@ int usb_for_each_dev(void *data, int (*fn)(struct 
> > > usb_device *, void *))
> > >  }
> > >  EXPORT_SYMBOL_GPL(usb_for_each_dev);
> > >  
> > > +struct each_hub_arg {
> > > + void *data;
> > > + int (*fn)(struct device *, void *);
> > > +};
> > > +
> > > +static int __each_hub(struct device *dev, void *data)
> > > +{
> > > + struct each_hub_arg *arg = (struct each_hub_arg *)data;
> > > + struct usb_device *hdev = to_usb_device(dev);
> > 
> > to_usb_device() won't work properly if the struct device isn't embedded 
> > in an actual usb_device structure.  And that will happen, since the USB 
> > bus type holds usb_interface structures as well as usb_devices.
> 
> OK, so I need to filter them out.
> 
> > In fact, you should use usb_for_each_dev here; it already does what you 
> > want.
> 
> Unfortunately I can't use usb_for_each_dev here, because it deals with
> struct usb_device while I need to deal with struct device in the
> callback.

I see; the prototypes of arg->fn are different.  Oh well, it's a shame 
the code can't be reused.  In any case, you should copy what 
usb.c:__each_dev() does.

Alan Stern

> > > + struct usb_hub *hub;
> > > + int ret;
> > > + int i;
> > > +
> > > + hub = usb_hub_to_struct_hub(hdev);
> > > + if (!hub)
> > > + return 0;
> > > +
> > > + for (i = 0; i < hdev->maxchild; i++) {
> > > + ret = arg->fn(>ports[i]->dev, arg->data);
> > > + if (ret)
> > > + return ret;
> > > + }
> > > +
> > > + return 0;
> > > +}
> > 
> > Don't you need some sort of locking or refcounting here?  What would 
> > happen if this hub got removed while the routine was running?
> 
> I'll use a lock then.
> 
> thanks,
> 
> -- 
> heikki


Re: [PATCH 1/6] usb: Iterator for ports

2021-03-25 Thread Alan Stern
On Thu, Mar 25, 2021 at 03:29:21PM +0300, Heikki Krogerus wrote:
> Introducing usb_for_each_port(). It works the same way as
> usb_for_each_dev(), but instead of going through every USB
> device in the system, it walks through the USB ports in the
> system.
> 
> Signed-off-by: Heikki Krogerus 

This has a couple of nasty errors.

> ---
>  drivers/usb/core/usb.c | 43 ++
>  include/linux/usb.h|  1 +
>  2 files changed, 44 insertions(+)
> 
> diff --git a/drivers/usb/core/usb.c b/drivers/usb/core/usb.c
> index 2ce3667ec6fae..6d49db9a1b208 100644
> --- a/drivers/usb/core/usb.c
> +++ b/drivers/usb/core/usb.c
> @@ -398,6 +398,49 @@ int usb_for_each_dev(void *data, int (*fn)(struct 
> usb_device *, void *))
>  }
>  EXPORT_SYMBOL_GPL(usb_for_each_dev);
>  
> +struct each_hub_arg {
> + void *data;
> + int (*fn)(struct device *, void *);
> +};
> +
> +static int __each_hub(struct device *dev, void *data)
> +{
> + struct each_hub_arg *arg = (struct each_hub_arg *)data;
> + struct usb_device *hdev = to_usb_device(dev);

to_usb_device() won't work properly if the struct device isn't embedded 
in an actual usb_device structure.  And that will happen, since the USB 
bus type holds usb_interface structures as well as usb_devices.

In fact, you should use usb_for_each_dev here; it already does what you 
want.

> + struct usb_hub *hub;
> + int ret;
> + int i;
> +
> + hub = usb_hub_to_struct_hub(hdev);
> + if (!hub)
> + return 0;
> +
> + for (i = 0; i < hdev->maxchild; i++) {
> + ret = arg->fn(>ports[i]->dev, arg->data);
> + if (ret)
> + return ret;
> + }
> +
> + return 0;
> +}

Don't you need some sort of locking or refcounting here?  What would 
happen if this hub got removed while the routine was running?

Alan Stern


Re: [RFC PATCH] USB:XHCI:Adjust the log level of hub

2021-03-25 Thread Alan Stern
On Thu, Mar 25, 2021 at 02:59:03PM +0100, Greg KH wrote:
> On Thu, Mar 25, 2021 at 09:33:53PM +0800, liulongfang wrote:
> > On 2021/3/25 18:31, Greg KH wrote:
> > > On Thu, Mar 25, 2021 at 06:04:12PM +0800, Longfang Liu wrote:
> > >> When the number of ports of the hub is not between 1 and Maxports,
> > >> it will only exit the registration of the hub on the current controller,
> > >> but it will not affect the function of the controller itself. Its other
> > >> hubs can operate normally, so the log level here can be changed from
> > >> error to information.
> > >>
> > >> Signed-off-by: Longfang Liu 
> > >> ---
> > >>  drivers/usb/core/hub.c | 10 --
> > >>  1 file changed, 4 insertions(+), 6 deletions(-)
> > >>
> > >> diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
> > >> index b1e14be..70294ad 100644
> > >> --- a/drivers/usb/core/hub.c
> > >> +++ b/drivers/usb/core/hub.c
> > >> @@ -1409,13 +1409,11 @@ static int hub_configure(struct usb_hub *hub,
> > >>  maxchild = min_t(unsigned, maxchild, USB_SS_MAXPORTS);
> > >>  
> > >>  if (hub->descriptor->bNbrPorts > maxchild) {
> > >> -message = "hub has too many ports!";
> > >> -ret = -ENODEV;
> > >> -goto fail;
> > >> +dev_info(hub_dev, "hub has too many ports!\n");
> > > 
> > > Is this an error?  If so, report it as such, not as "information".
> > > 
> > >> +return -ENODEV;
> > >>  } else if (hub->descriptor->bNbrPorts == 0) {
> > >> -message = "hub doesn't have any ports!";
> > >> -ret = -ENODEV;
> > >> -goto fail;
> > >> +dev_info(hub_dev, "hub doesn't have any ports!\n");
> > > 
> > > Same here.
> > > 
> > > What problem are you trying to solve here?
> > > 
> > > What hub do you have that has no ports, or too many, that you think
> > > should still be able to work properly?
> > > 
> > > thanks,
> > > 
> > > greg k-h
> > > .
> > On our test platform, the xhci usb3 hub has no port.
> 
> Sounds like a broken device, why not fix that?

If this device is used only for testing and not for production, who 
cares how severe the log message is?

> > when initializing the usb3 hub, an error will be reported
> > because the port is 0, but in fact it will not affect
> > the use of usb2, and the usb2 hub is working normally.
> 
> But you can not have a USB3 hub with no ports, isn't that against
> against the USB spec?  How does this device pass the USB-IF
> certification?
> 
> > thanks, therefore, in order to reduce the severity of the log,
> > we hope to lower the level of this log.
> 
> You did not reduce the severity at all, everyone can still see it.
> 
> Please try fixing your hardware :

Alternatively, you could change the xhci-hcd driver.  Make it skip 
registering the USB-3 root hub if that hub has no ports.

But don't change these log messages.  They describe real errors, so they 
should be actual error messages.

Alan Stern


Re: [PATCH] USB: ehci: drop workaround for forced irq threading

2021-03-22 Thread Alan Stern
On Mon, Mar 22, 2021 at 05:59:17PM +0100, Sebastian Andrzej Siewior wrote:
> On 2021-03-22 12:42:00 [-0400], Alan Stern wrote:
> > What happens on RT systems?  Are they smart enough to avoid the whole 
> > problem by enabling interrupts during _all_ callbacks?
> 
> tl;dr: Yes. 
> 
> The referenced commit (id 81e2073c175b) disables interrupts only on !RT
> configs so for RT everything remains unchanged (the backports are
> already adjusted for the old stable trees to use the proper CONFIG_* for
> enabled RT).
> 
> All hrtimer callbacks run as HRTIMER_MODE_SOFT by default. The
> HRTIMER_MODE_HARD ones (which expire in HARDIRQ context) were audited /
> explicitly enabled.
> The same goes irq_work.
> The printk code is different compared to mainline. A printk() on RT in
> HARDIRQ context is printed once the HARDIRQ context is left. So the
> serial/console/… driver never gets a chance to acquire its lock in
> hardirq context.
> 
> An interrupt handler which is not forced-threaded must be marked as such
> and must not use any spinlock_t based locking. lockdep/might_sleep
> complain here already.

Okay, in that case:

Acked-by: Alan Stern 


Re: [PATCH] USB: ehci: drop workaround for forced irq threading

2021-03-22 Thread Alan Stern
On Mon, Mar 22, 2021 at 12:12:49PM +0100, Johan Hovold wrote:
> Force-threaded interrupt handlers used to run with interrupts enabled,
> something which could lead to deadlocks in case a threaded handler
> shared a lock with code running in hard interrupt context (e.g. timer
> callbacks) and did not explicitly disable interrupts.
> 
> Since commit 81e2073c175b ("genirq: Disable interrupts for force
> threaded handlers") interrupt handlers always run with interrupts
> disabled on non-RT so that drivers no longer need to do handle forced
> threading ("threadirqs").

What happens on RT systems?  Are they smart enough to avoid the whole 
problem by enabling interrupts during _all_ callbacks?

Alan Stern


Re: [PATCH v1 2/2] usb: host: ehci-tegra: Select USB_GADGET Kconfig option

2021-03-20 Thread Alan Stern
On Sat, Mar 20, 2021 at 06:19:15PM +0300, Dmitry Osipenko wrote:
> Select USB_GADGET Kconfig option in order to fix build failure which
> happens because ChipIdea driver has a build dependency on both USB_GADGET
> and USB_EHCI_HCD, while USB_EHCI_TEGRA force-selects the ChipIdea driver
> without taking into account the tristate USB_GADGET dependency. It's not
> possible to do anything about the cyclic dependency of the Kconfig
> options, but USB_EHCI_TEGRA is now a deprecated option that isn't used
> by defconfigs and USB_GADGET is wanted on Tegra by default, hence it's
> okay to have a bit clunky workaround for it.
> 
> Fixes: c3590c7656fb ("usb: host: ehci-tegra: Remove the driver")
> Reported-by: kernel test robot 
> Signed-off-by: Dmitry Osipenko 

Acked-by: Alan Stern 

> ---
>  drivers/usb/host/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/usb/host/Kconfig b/drivers/usb/host/Kconfig
> index b94f2a070c05..df9428f1dc5e 100644
> --- a/drivers/usb/host/Kconfig
> +++ b/drivers/usb/host/Kconfig
> @@ -272,6 +272,7 @@ config USB_EHCI_TEGRA
>   select USB_CHIPIDEA
>   select USB_CHIPIDEA_HOST
>   select USB_CHIPIDEA_TEGRA
> + select USB_GADGET
>   help
> This option is deprecated now and the driver was removed, use
> USB_CHIPIDEA_TEGRA instead.
> -- 
> 2.30.2
> 


Re: [PATCH] usb: dwc3: remove 'pm_runtime_set_active' in resume callback

2021-03-17 Thread Alan Stern
On Wed, Mar 17, 2021 at 05:25:20PM +0900, taehyun cho wrote:
> On Mon, Mar 15, 2021 at 10:13:35AM -0400, Alan Stern wrote:
> > On Mon, Mar 15, 2021 at 04:43:17PM +0900, taehyun cho wrote:
> > > 'pm_runtime_set_active' sets a flag to describe rumtime status.
> > > This flag is automatically set in pm_runtime_get_sync/put_sync API.
> > > 'pm_runtime_set_active' checks the runtime status of parent device.
> > > As a result, the below error message is printed.
> > > dwc3 .dwc3: runtime PM trying to activate child device
> > > .dwc3 but parent (.usb) is not active.
> > 
> > This is very suspicious.  That error message indicates a real error is 
> > present; removing these pm_runtime_set_active calls won't fix the error.
> > 
> > You need to determine why the parent platform device .usb isn't 
> > active when the dwc3 probe and resume routines are called.  It seems 
> > likely that there is a bug in the platform device's driver.
> > 
> > Alan Stern
> >
> 
> Alan,
> 
> Thanks to your comments, I checked our platform device driver and found
> the problem. Our parent platform device didn't set active in resume
> callback. This made a problem.

Ah, good.  Does the platform driver set the active flag in its probe 
routine?

>  Thank you for the help and sorry for
> disturbing you.

No problem at all.

Alan Stern


Re: [PATCH] usb: dwc3: remove 'pm_runtime_set_active' in resume callback

2021-03-15 Thread Alan Stern
On Mon, Mar 15, 2021 at 04:43:17PM +0900, taehyun cho wrote:
> 'pm_runtime_set_active' sets a flag to describe rumtime status.
> This flag is automatically set in pm_runtime_get_sync/put_sync API.
> 'pm_runtime_set_active' checks the runtime status of parent device.
> As a result, the below error message is printed.
> dwc3 .dwc3: runtime PM trying to activate child device
> .dwc3 but parent (.usb) is not active.

This is very suspicious.  That error message indicates a real error is 
present; removing these pm_runtime_set_active calls won't fix the error.

You need to determine why the parent platform device .usb isn't 
active when the dwc3 probe and resume routines are called.  It seems 
likely that there is a bug in the platform device's driver.

Alan Stern

> Signed-off-by: taehyun cho 
> ---
>  drivers/usb/dwc3/core.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
> index 94fdbe502ce9..e7c0e390f885 100644
> --- a/drivers/usb/dwc3/core.c
> +++ b/drivers/usb/dwc3/core.c
> @@ -1553,7 +1553,6 @@ static int dwc3_probe(struct platform_device *pdev)
>  
>   spin_lock_init(>lock);
>  
> - pm_runtime_set_active(dev);
>   pm_runtime_use_autosuspend(dev);
>   pm_runtime_set_autosuspend_delay(dev, DWC3_DEFAULT_AUTOSUSPEND_DELAY);
>   pm_runtime_enable(dev);
> @@ -1920,7 +1919,6 @@ static int dwc3_resume(struct device *dev)
>   return ret;
>  
>   pm_runtime_disable(dev);
> - pm_runtime_set_active(dev);
>   pm_runtime_enable(dev);
>  
>   return 0;
> -- 
> 2.26.0


Re: [xhci] usb 4-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd

2021-03-12 Thread Alan Stern
On Fri, Mar 12, 2021 at 07:26:31PM +0100, Sedat Dilek wrote:
> On Fri, Mar 12, 2021 at 7:05 PM Alan Stern  wrote:

> > Although it's not conclusive, this log seems to indicate that ata_id
> > is the only program causing resets.  Have you tried preventing the
> > ata_id program from running (for example, by renaming it)?
> >
> 
> This is /lib/udev/ata_id from Debian's udev package.

That does not answer my question.

> > > Your diff now should say; s/SCSI ioctl error/SCSI ioctl info'.
> >
> > No, it shouldn't.  The log message itself is an info, but the event it
> > reports is an error.
> >
> 
> OK.
> Some of these SCSI ioctl errors are not causing a xhci-reset.

Yes, I noticed that.  In fact, the commands that cause a reset are all 
A1 (and not all of them), never 85.

> > > Alan, so "t" flags should be added as a quirks to linux-kernel sources...
> > >
> > > t = NO_ATA_1X  (don't allow ATA(12) and ATA(16) commands, uas only);
> > >
> > > ...for my ASMedia USB-3.0 controller?
> >
> > That's not at all clear.  This is a very common and popular device,
> > and nobody else has reported these problems.  It could be that
> > something is odd about your particular drive or computer, not these
> > drives in general.
> >
> 
> So, the external USB-3.0 HDD is now in "UAS only" mode/status.

Why?  Did you change something?

Alan Stern

> Cannot judge if things got better or not.
> 
> - Sedat -


Re: [xhci] usb 4-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd

2021-03-12 Thread Alan Stern
smartd
> [Fri Mar 12 18:26:30 2021] SCSI ioctl error, cmd 85, prog smartd
> [Fri Mar 12 18:26:31 2021] SCSI ioctl error, cmd 85, prog smartd
> [Fri Mar 12 18:26:31 2021] SCSI ioctl error, cmd 85, prog smartd
> [Fri Mar 12 18:26:31 2021] SCSI ioctl error, cmd 85, prog smartd
> [Fri Mar 12 18:26:31 2021] SCSI ioctl error, cmd 85, prog smartd
> [Fri Mar 12 18:26:31 2021] SCSI ioctl error, cmd 85, prog smartd
> [Fri Mar 12 18:26:31 2021] SCSI ioctl error, cmd 85, prog smartd
> [Fri Mar 12 18:26:39 2021] SCSI ioctl error, cmd A1, prog ata_id
> [Fri Mar 12 18:26:39 2021] SCSI ioctl error, cmd A1, prog ata_id
> [Fri Mar 12 18:26:40 2021] SCSI ioctl error, cmd 85, prog hdparm
> [Fri Mar 12 18:26:40 2021] SCSI ioctl error, cmd 85, prog hdparm
> [Fri Mar 12 18:26:40 2021] SCSI ioctl error, cmd 85, prog hdparm
> [Fri Mar 12 18:26:40 2021] SCSI ioctl error, cmd A1, prog ata_id
> [Fri Mar 12 18:26:40 2021] SCSI ioctl error, cmd 85, prog hdparm
> [Fri Mar 12 18:26:40 2021] SCSI ioctl error, cmd 85, prog hdparm
> [Fri Mar 12 18:26:40 2021] SCSI ioctl error, cmd 85, prog hdparm
> [Fri Mar 12 18:26:43 2021] SCSI ioctl error, cmd 85, prog udisksd
> [Fri Mar 12 18:26:43 2021] SCSI ioctl error, cmd 85, prog udisksd
> [Fri Mar 12 18:26:44 2021] SCSI ioctl error, cmd 85, prog udisksd
> [Fri Mar 12 18:26:44 2021] SCSI ioctl error, cmd 85, prog udisksd
> [Fri Mar 12 18:26:44 2021] SCSI ioctl error, cmd 85, prog udisksd
> [Fri Mar 12 18:26:44 2021] SCSI ioctl error, cmd 85, prog udisksd
> [Fri Mar 12 18:26:44 2021] SCSI ioctl error, cmd 85, prog udisksd
> [Fri Mar 12 18:26:44 2021] SCSI ioctl error, cmd 85, prog udisksd
> [Fri Mar 12 18:26:44 2021] SCSI ioctl error, cmd 85, prog udisksd
> [Fri Mar 12 18:26:44 2021] SCSI ioctl error, cmd 85, prog udisksd
> [Fri Mar 12 18:26:49 2021] SCSI ioctl error, cmd 85, prog udisksd
> [Fri Mar 12 18:26:49 2021] SCSI ioctl error, cmd 85, prog udisksd
> [Fri Mar 12 18:26:49 2021] SCSI ioctl error, cmd 85, prog udisksd
> [Fri Mar 12 18:26:49 2021] SCSI ioctl error, cmd 85, prog udisksd
> [Fri Mar 12 18:26:50 2021] SCSI ioctl error, cmd 85, prog pool-udisksd
> [Fri Mar 12 18:26:50 2021] SCSI ioctl error, cmd 85, prog pool-udisksd
> [Fri Mar 12 18:26:50 2021] SCSI ioctl error, cmd 85, prog pool-udisksd
> [Fri Mar 12 18:26:50 2021] SCSI ioctl error, cmd 85, prog pool-udisksd
> [Fri Mar 12 18:26:50 2021] SCSI ioctl error, cmd 85, prog pool-udisksd
> [Fri Mar 12 18:26:50 2021] SCSI ioctl error, cmd 85, prog pool-udisksd
> [Fri Mar 12 18:26:50 2021] SCSI ioctl error, cmd 85, prog pool-udisksd
> [Fri Mar 12 18:26:50 2021] SCSI ioctl error, cmd 85, prog pool-udisksd
> [Fri Mar 12 18:26:50 2021] SCSI ioctl error, cmd 85, prog pool-udisksd
> [Fri Mar 12 18:26:50 2021] SCSI ioctl error, cmd 85, prog pool-udisksd
> [Fri Mar 12 18:28:09 2021] SCSI ioctl error, cmd 85, prog smartctl
> [Fri Mar 12 18:28:09 2021] SCSI ioctl error, cmd 85, prog smartctl
> [Fri Mar 12 18:28:10 2021] SCSI ioctl error, cmd 85, prog smartctl
> [Fri Mar 12 18:28:11 2021] SCSI ioctl error, cmd 85, prog smartctl
> [Fri Mar 12 18:36:49 2021] SCSI ioctl error, cmd 85, prog pool-udisksd
> [Fri Mar 12 18:36:49 2021] SCSI ioctl error, cmd 85, prog pool-udisksd
> [Fri Mar 12 18:36:49 2021] SCSI ioctl error, cmd 85, prog pool-udisksd
> [Fri Mar 12 18:36:49 2021] SCSI ioctl error, cmd 85, prog pool-udisksd
> [Fri Mar 12 18:36:49 2021] SCSI ioctl error, cmd 85, prog pool-udisksd
> [Fri Mar 12 18:36:50 2021] SCSI ioctl error, cmd 85, prog pool-udisksd
> [Fri Mar 12 18:36:50 2021] SCSI ioctl error, cmd 85, prog pool-udisksd
> [Fri Mar 12 18:36:50 2021] SCSI ioctl error, cmd 85, prog pool-udisksd
> [Fri Mar 12 18:36:50 2021] SCSI ioctl error, cmd 85, prog pool-udisksd
> [Fri Mar 12 18:36:50 2021] SCSI ioctl error, cmd 85, prog pool-udisksd

Although it's not conclusive, this log seems to indicate that ata_id
is the only program causing resets.  Have you tried preventing the 
ata_id program from running (for example, by renaming it)?

> Your diff now should say; s/SCSI ioctl error/SCSI ioctl info'.

No, it shouldn't.  The log message itself is an info, but the event it
reports is an error.

> Alan, so "t" flags should be added as a quirks to linux-kernel sources...
> 
> t = NO_ATA_1X  (don't allow ATA(12) and ATA(16) commands, uas only);
> 
> ...for my ASMedia USB-3.0 controller?

That's not at all clear.  This is a very common and popular device,
and nobody else has reported these problems.  It could be that
something is odd about your particular drive or computer, not these
drives in general.

Alan Stern


Re: [PATCH v10 1/2] scsi: ufs: Enable power management for wlun

2021-03-10 Thread Alan Stern
On Tue, Mar 09, 2021 at 08:04:53PM -0800, Asutosh Das (asd) wrote:
> On 3/9/2021 7:14 PM, Alan Stern wrote:
> > On Tue, Mar 09, 2021 at 07:04:34PM -0800, Asutosh Das (asd) wrote:
> > > Hello
> > > I & Can (thanks CanG) debugged this further:
> > > 
> > > Looks like this issue can occur if the sd probe is asynchronous.
> > > 
> > > Essentially, the sd_probe() is done asynchronously and 
> > > driver_probe_device()
> > > invokes pm_runtime_get_suppliers() before invoking sd_probe().
> > > 
> > > But scsi_probe_and_add_lun() runs in a separate context.
> > > So the scsi_autopm_put_device() invoked from scsi_scan_host() context
> > > reduces the link->rpm_active to 1. And sd_probe() invokes
> > > scsi_autopm_put_device() and starts a timer. And then 
> > > driver_probe_device()
> > > invoked from __device_attach_async_helper context reduces the
> > > link->rpm_active to 1 thus enabling the supplier to suspend before the
> > > consumer suspends.
> > 
> > > I don't see a way around this. Please let me know if you
> > > (@Alan/@Bart/@Adrian) have any thoughts on this.
> > 
> > How about changing the SCSI core so that it does a runtime_get before
> > starting an async probe, and the async probe routine does a
> > runtime_put when it is finished?  In other words, don't allow a device
> > to go into runtime suspend while it is waiting to be probed.
> > 
> > I don't think that would be too intrusive.
> > 
> > Alan Stern
> > 
> 
> Hi Alan
> Thanks for the suggestion.
> 
> Am trying to understand:
> 
> Do you mean something like this:
> 
> int scsi_sysfs_add_sdev(struct scsi_device *sdev)
> {
>   
>   scsi_autopm_get_device(sdev);
>   pm_runtime_get_noresume(>sdev_gendev);
>   [...]
>   scsi_autopm_put_device(sdev);
>   [...]
> }
> 
> static int sd_probe(struct device *dev)
> {
>   [...]
>   pm_runtime_put_noidle(dev);
>   scsi_autopm_put_device(sdp);
>   [...]
> }
> 
> This may work (I'm limited by my imagination in scsi layer :) ).

I'm not sure about this.  To be honest, I did not read the entirety of 
your last message; it had way too much detail.  THere's a time and place 
for that, but when you're brainstorming to figure out the underlying 
cause of a problem and come up with a strategy to fix it, you want to 
concentrate on the overall picture, not the details.

As I understand the situation, you've get a SCSI target with multiple 
logical units, let's say A and B, and you need to make sure that A never 
goes into runtime suspend unless B is already suspended.  In other 
words, B always has to suspend before A and resume after A.

To do this, you register a device link with A as the supplier and B as 
the consumer.  Then the PM core takes care of the ordering for you.

But I don't understand when you set up the device link.  If the timing 
is wrong then, thanks to async SCSI probing, you may have a situation 
where A is registered before B and before the link is set up.  Then 
there's temporarily nothing to stop A from suspending before B.

You also need to prevent each device from suspending before it is 
probed.  That's the easy part I was trying to address before (although 
it may not be so easy if the drivers are in loadable modules and not 
present in the kernel).

You need to think through these issues before proposing actual changes.

> But the pm_runtime_put_noidle() would have to be added to all registered
> scsi_driver{}, perhaps? Or may be I can check for sdp->type?

Like this; it's too early to worry about this sort of thing.

Alan Stern


Re: [PATCH v10 1/2] scsi: ufs: Enable power management for wlun

2021-03-09 Thread Alan Stern
On Tue, Mar 09, 2021 at 07:04:34PM -0800, Asutosh Das (asd) wrote:
> Hello
> I & Can (thanks CanG) debugged this further:
> 
> Looks like this issue can occur if the sd probe is asynchronous.
> 
> Essentially, the sd_probe() is done asynchronously and driver_probe_device()
> invokes pm_runtime_get_suppliers() before invoking sd_probe().
> 
> But scsi_probe_and_add_lun() runs in a separate context.
> So the scsi_autopm_put_device() invoked from scsi_scan_host() context
> reduces the link->rpm_active to 1. And sd_probe() invokes
> scsi_autopm_put_device() and starts a timer. And then driver_probe_device()
> invoked from __device_attach_async_helper context reduces the
> link->rpm_active to 1 thus enabling the supplier to suspend before the
> consumer suspends.

> I don't see a way around this. Please let me know if you
> (@Alan/@Bart/@Adrian) have any thoughts on this.

How about changing the SCSI core so that it does a runtime_get before
starting an async probe, and the async probe routine does a
runtime_put when it is finished?  In other words, don't allow a device
to go into runtime suspend while it is waiting to be probed.

I don't think that would be too intrusive.

Alan Stern


Re: [PATCH v2 16/18] usb: common: add function to get interval expressed in us unit

2021-03-08 Thread Alan Stern
On Mon, Mar 08, 2021 at 10:52:05AM +0800, Chunfeng Yun wrote:
> Add a new function to convert bInterval into the time expressed
> in 1us unit.
> 
> Signed-off-by: Chunfeng Yun 
> ---
> v2: move kerneldoc next to function's definition suggested by Alan
> ---
>  drivers/usb/common/common.c | 41 +
>  drivers/usb/core/devices.c  | 21 ---
>  drivers/usb/core/endpoint.c | 35 ---
>  include/linux/usb/ch9.h |  3 +++
>  4 files changed, 52 insertions(+), 48 deletions(-)

Acked-by: Alan Stern 


Re: [xhci] usb 4-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd

2021-03-07 Thread Alan Stern
On Sun, Mar 07, 2021 at 05:57:39PM +0100, Sedat Dilek wrote:
> On Sun, Mar 7, 2021 at 4:46 PM Alan Stern  wrote:
> >
> > On Sat, Mar 06, 2021 at 09:49:00PM +0100, Sedat Dilek wrote:
> >
> > > For testing purposes, I stopped these systemd services:
> > >
> > > 1. systemctl stop smartmontools.service
> > >
> > > 2. systemctl stop udisks2.service
> > >
> > > Last seen xhci-reset:
> > >
> > > [Sat Mar  6 21:37:40 2021] SCSI ioctl error, cmd 85, prog pool-udisksd
> > >
> > > So, that every 10min xhci-reset was caused by pool-udisksd from 
> > > udisks2.service.
> >
> > You have found the cause of your problem!  Great!
> >
> > And now, obviously order to fix the problem, you'll have to look into
> > the udisks2 service.  Maybe you can configure it so that it won't send
> > the problem-causing commands.
> >
> 
> I tried yesterday to add --debug option to the ExexStart line of
> udisks2.service, but did not see anything helpful.
> 
> There exist more user-space than udisks2 causing these xhci-resets.
> The cmd#s are also clear: A1 and 85 - whatever they mean.

Those are the two prefixes which indicate an ATA command is present.  
You can find them listed as ATA_12 and ATA_16 in 
include/scsi/scsi_proto.h.

> As said with Linux v5.10.y and Linux v5.11 I have not seen this.

Have you tried setting the quirk flag we discussed earlier _and_ turning 
off udisks2?  Maybe also turning off the other services which generate 
these commands?  Perhaps you'll find that when the quirk flag is 
present, some of those programs _don't_ generate any ATA commands.

> What about CCing linux-block and linux-scsi people?

Sure, go ahead if you want to.

Alan Stern


Re: [xhci] usb 4-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd

2021-03-07 Thread Alan Stern
On Sat, Mar 06, 2021 at 09:49:00PM +0100, Sedat Dilek wrote:

> For testing purposes, I stopped these systemd services:
> 
> 1. systemctl stop smartmontools.service
> 
> 2. systemctl stop udisks2.service
> 
> Last seen xhci-reset:
> 
> [Sat Mar  6 21:37:40 2021] SCSI ioctl error, cmd 85, prog pool-udisksd
> 
> So, that every 10min xhci-reset was caused by pool-udisksd from 
> udisks2.service.

You have found the cause of your problem!  Great!

And now, obviously order to fix the problem, you'll have to look into 
the udisks2 service.  Maybe you can configure it so that it won't send 
the problem-causing commands.

Alan Stern


Re: [xhci] usb 4-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd

2021-03-06 Thread Alan Stern
On Sat, Mar 06, 2021 at 07:42:30AM +0100, Sedat Dilek wrote:
> No, with Debian-Kernel 5.10.19-1 there are no xhci-resets:

Is the kernel the only thing that is different?  The rest of the 
operating system and environment is exactly the same?

> But I see there is already a quirk enabled and matches my ASmedia USB
> 3.0 controller (as I have *no* usb-storage-quirks enabled):
> 
> root# LC_ALL=C dmesg -T | grep -i quirks | egrep '174c|55aa'
> [Sat Mar  6 06:52:41 2021] usb-storage 4-1:1.0: Quirks match for vid
> 174c pid 55aa: 40

Yes, this is because that type of device already has a quirk entry built 
into the kernel.  You can find it by searching for "174c" in the kernel 
source file drivers/usb/storage/unusual_devs.h.

> Thanks Alan for all the hints and tips in the topic "usb-storage and
> quirks" and your patience.

You can try building a 5.11 kernel with the patch below.  I don't know 
whether it will show anything in the dmesg log when one of these resets 
occurs, but it might.

If that doesn't work out, another possibility is to use git bisect to 
find the commit between 5.10 and 5.11 which caused the problem to start.

Alan Stern


--- usb-devel.orig/block/scsi_ioctl.c
+++ usb-devel/block/scsi_ioctl.c
@@ -258,8 +258,11 @@ static int blk_complete_sghdr_rq(struct
hdr->host_status = host_byte(req->result);
hdr->driver_status = driver_byte(req->result);
hdr->info = 0;
-   if (hdr->masked_status || hdr->host_status || hdr->driver_status)
+   if (hdr->masked_status || hdr->host_status || hdr->driver_status) {
hdr->info |= SG_INFO_CHECK;
+   printk(KERN_INFO "SCSI ioctl error, cmd %02X, prog %s\n",
+   req->cmd[0], current->comm);
+   }
hdr->resid = req->resid_len;
hdr->sb_len_wr = 0;
 



Re: [PATCH v10 1/2] scsi: ufs: Enable power management for wlun

2021-03-06 Thread Alan Stern
On Fri, Mar 05, 2021 at 06:54:24PM -0800, Asutosh Das (asd) wrote:

> Now during my testing I see a weird issue sometimes (1 in 7).
> Scenario - bootups
> 
> Issue:
> The supplier 'ufs_device_wlun 0:0:0:49488' goes into runtime suspend even
> when one/more of its consumers are in RPM_ACTIVE state.
> 
> *Log:
> [   10.056379][  T206] sd 0:0:0:1: [sdb] Synchronizing SCSI cache
> [   10.062497][  T113] sd 0:0:0:5: [sdf] Synchronizing SCSI cache
> [   10.356600][   T32] sd 0:0:0:7: [sdh] Synchronizing SCSI cache
> [   10.362944][  T174] sd 0:0:0:3: [sdd] Synchronizing SCSI cache
> [   10.696627][   T83] sd 0:0:0:2: [sdc] Synchronizing SCSI cache
> [   10.704562][  T170] sd 0:0:0:6: [sdg] Synchronizing SCSI cache
> [   10.980602][T5] sd 0:0:0:0: [sda] Synchronizing SCSI cache
> 
> /** Printing all the consumer nodes of supplier **/
> [   10.987327][T5] ufs_device_wlun 0:0:0:49488: usage-count @ suspend: 0
> <-- this is the usage_count
> [   10.994440][T5] ufs_rpmb_wlun 0:0:0:49476: PM state - 2
> [   11.000402][T5] scsi 0:0:0:49456: PM state - 2
> [   11.005453][T5] sd 0:0:0:0: PM state - 2
> [   11.009958][T5] sd 0:0:0:1: PM state - 2
> [   11.014469][T5] sd 0:0:0:2: PM state - 2
> [   11.019072][T5] sd 0:0:0:3: PM state - 2
> [   11.023595][T5] sd 0:0:0:4: PM state - 0 << RPM_ACTIVE
> [   11.353298][T5] sd 0:0:0:5: PM state - 2
> [   11.357726][T5] sd 0:0:0:6: PM state - 2
> [   11.362155][T5] sd 0:0:0:7: PM state - 2
> [   11.366584][T5] ufshcd-qcom 1d84000.ufshc: __ufshcd_wl_suspend - 8709
> [   11.374366][T5] ufs_device_wlun 0:0:0:49488: __ufshcd_wl_suspend -
> (0) has rpm_active flags
> [   11.383376][T5] ufs_device_wlun 0:0:0:49488:
> ufshcd_wl_runtime_suspend <-- Supplier suspends fine.
> [   12.977318][  T174] sd 0:0:0:4: [sde] Synchronizing SCSI cache
> 
> And the the suspend of sde is stuck now:
> schedule+0x9c/0xe0
> schedule_timeout+0x40/0x128
> io_schedule_timeout+0x44/0x68
> wait_for_common_io+0x7c/0x100
> wait_for_completion_io+0x14/0x20
> blk_execute_rq+0x90/0xcc
> __scsi_execute+0x104/0x1c4
> sd_sync_cache+0xf8/0x2a0
> sd_suspend_common+0x74/0x11c
> sd_suspend_runtime+0x14/0x20
> scsi_runtime_suspend+0x64/0x94
> __rpm_callback+0x80/0x2a4
> rpm_suspend+0x308/0x614
> pm_runtime_work+0x98/0xa8
> 
> I added 'DL_FLAG_RPM_ACTIVE' while creating links.
>   if (hba->sdev_ufs_device) {
>   link = device_link_add(>sdev_gendev,
>   >sdev_ufs_device->sdev_gendev,
>  DL_FLAG_PM_RUNTIME|DL_FLAG_RPM_ACTIVE);
> I didn't expect this to resolve the issue anyway and it didn't.
> 
> Another interesting point here is when I resume any of the above suspended
> consumers, it all goes back to normal, which is kind of expected. I tried
> resuming the consumer and the supplier is resumed and the supplier is
> suspended when all the consumers are suspended.
> 
> Any pointers on this issue please?
> 
> @Bart/@Alan - Do you've any pointers please?

It's very noticeable that although you seem to have isolated a bug in
the power management subsystem (supplier goes into runtime suspend
even when one of its consumers is still active), you did not CC the
power management maintainer or mailing list.

I have added the appropriate CC's.

Alan Stern


Re: [xhci] usb 4-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd

2021-03-05 Thread Alan Stern
On Fri, Mar 05, 2021 at 08:41:40PM +0100, Sedat Dilek wrote:
> On Fri, Mar 5, 2021 at 8:30 PM Alan Stern  wrote:

> > Okay, that indicates the ATA commands are being sent not by the kernel
> > but by some program.  I'm not sure how you can easily find out which
> > program; probably the best thing to do is turn them off one by one until
> > you find the one responsible.
> >
> 
> I can hardly imagine which user-space tools other than powertop can
> interfere here.
> Any ideas?

No.  Especially since I have no idea what programs are running on your 
computer.

Alan Stern


Re: [xhci] usb 4-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd

2021-03-05 Thread Alan Stern
On Fri, Mar 05, 2021 at 08:22:22PM +0100, Sedat Dilek wrote:
> The quirks match:
> 
> [Fri Mar  5 20:06:56 2021] usb-storage 4-1:1.0: USB Mass Storage device 
> detected
> [Fri Mar  5 20:06:56 2021] usb-storage 4-1:1.0: Quirks match for vid
> 174c pid 55aa: 40
> 
> That seems not to be the trick:
> 
> root# LC_ALL=C dmesg -T | grep 'usb 4-1:'
> [Fri Mar  5 20:06:55 2021] usb 4-1: new SuperSpeed Gen 1 USB device
> number 2 using xhci_hcd
> [Fri Mar  5 20:06:55 2021] usb 4-1: New USB device found,
> idVendor=174c, idProduct=55aa, bcdDevice= 1.00
> [Fri Mar  5 20:06:55 2021] usb 4-1: New USB device strings: Mfr=2,
> Product=3, SerialNumber=1
> [Fri Mar  5 20:06:55 2021] usb 4-1: Product: MEDION HDDrive-n-GO
> [Fri Mar  5 20:06:55 2021] usb 4-1: Manufacturer: MEDION
> [Fri Mar  5 20:06:55 2021] usb 4-1: SerialNumber: 3180092C
> [Fri Mar  5 20:06:57 2021] usb 4-1: reset SuperSpeed Gen 1 USB device
> number 2 using xhci_hcd

Okay, that indicates the ATA commands are being sent not by the kernel 
but by some program.  I'm not sure how you can easily find out which 
program; probably the best thing to do is turn them off one by one until 
you find the one responsible.

Alan Stern


Re: [xhci] usb 4-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd

2021-03-05 Thread Alan Stern
On Fri, Mar 05, 2021 at 08:05:49PM +0100, Sedat Dilek wrote:
> On Fri, Mar 5, 2021 at 5:07 PM Alan Stern  wrote:

> > Don't worry about trying to decode the output.  To me it looks like the
> > drive crashes and needs to be reset at times when the computer sends it
> > an ATA command.  (Not all ATA commands, but some.)  You can prevent this
> > by setting the following module parameter for the usb-storage driver:
> >
> > quirks=174c:55aa:t
> >
> > where the two numbers are the Vendor and Product IDs for the external
> > drive, and the 't' is a quirks flag saying not to use any ATA commands.
> > If this module parameter fixes the problem, we can add a permanent quirk
> > setting to the kernel.
> >
> 
> Thanks Alan.
> 
> I did:
> 
> [ /etc/modules-load.d/usb-storage.conf ]
> 
> # Add quirks for ATA commands for usb-storage devices connected to
> ASMedia M1042 USB-3.0 controller
> options usb-storage quirks=174c:55aa:t
> - EOF -
> 
> It is:
> 
> /lib/modules/5.12.0-rc1-11-amd64-clang13-cfi/kernel/drivers/usb/storage/usb-storage.ko
> 
> But:
> 
> root# lsmod | grep usb | grep storage
> usb_storage90112  2 uas
> scsi_mod  307200  6 sd_mod,usb_storage,uas,libata,sg,sr_mod
> usbcore   385024  14
> usbserial,xhci_hcd,ehci_pci,usbnet,usbhid,usb_storage,usb_wwan,uvcvideo,ehci_hcd,btusb,xhci_pci,cdc_ether,uas,option

I don't understand.  What is the point of this listing?

> I have not rebooted yet.

Depending on how your system is set up, the new usb-storage.conf file 
might need to be copied into the initramfs image.

However, you don't need to reload the driver module or reboot.  To make 
the new quirk take effect, all you have to do is write 174c:55aa:t to
/sys/module/usb_storage/parameters/quirks.

> Interferences with PowerTop?

Maybe.  It's entirely possible that PowerTop or some other program is 
issuing the troublesome ATA commands.

> These xhci-resets happen every 10mins in a sequence of 4.
> 
> I have here a powertop.service (systemd) with passing --auto-tune option.
> That was not a problem with previous Linux-kernels >= v5.12-rc1, so.
> 
> Alan, what do you think?

Try turning the service off and see if that makes any difference.

Alan Stern


Re: [PATCH] tools/memory-model: Fix smp_mb__after_spinlock() spelling

2021-03-05 Thread Alan Stern
On Fri, Mar 05, 2021 at 10:26:50AM -0800, Paul E. McKenney wrote:
> On Fri, Mar 05, 2021 at 04:41:49PM +0100, Björn Töpel wrote:
> > On 2021-03-05 16:36, Alan Stern wrote:
> > > On Fri, Mar 05, 2021 at 11:28:23AM +0100, Björn Töpel wrote:
> > > > From: Björn Töpel 
> > > > 
> > > > A misspelled invokation of git-grep, revealed that
> > > ---^
> > > 
> > > Smetimes my brain is a little slow...  Do you confirm that this is a
> > > joke?
> > > 
> > 
> > I wish, Alan. I wish.
> > 
> > Looks like I can only spel function names correctly.
> 
> Heh!  I missed that one completely.  Please see below for a wortschmied
> commit.
> 
>   Thanx, Paul
> 
> 
> 
> commit 1c737ce34715db9431f6b034f92dbf09d954126d
> Author: Björn Töpel 
> Date:   Fri Mar 5 11:28:23 2021 +0100
> 
> tools/memory-model: Fix smp_mb__after_spinlock() spelling
> 
> A misspelled git-grep regex revealed that smp_mb__after_spinlock()
> was misspelled in explanation.txt.
> 
> This commit adds the missing "_" to smp_mb__after_spinlock().

Strictly speaking, the commit adds a missing "_" to 
smp_mb_after_spinlock().  If it added anything to 
smp_mb__after_spinlock(), the result would be incorrect.

How about just:

A misspelled git-grep regex revealed that smp_mb__after_spinlock()
was misspelled in explanation.txt.  This commit adds the missing "_".

Alan


Re: XDP socket rings, and LKMM litmus tests

2021-03-05 Thread Alan Stern
On Fri, Mar 05, 2021 at 09:12:30AM +0800, Boqun Feng wrote:
> On Thu, Mar 04, 2021 at 11:11:42AM -0500, Alan Stern wrote:

> > Forget about local variables for the time being and just consider
> > 
> > dep ; [Plain] ; rfi
> > 
> > For example:
> > 
> > A: r1 = READ_ONCE(x);
> >y = r1;
> > B: r2 = READ_ONCE(y);
> > 
> > Should B be ordered after A?  I don't see how any CPU could hope to 
> > excute B before A, but maybe I'm missing something.
> > 
> 
> Agreed.
> 
> > There's another twist, connected with the fact that herd7 can't detect 
> > control dependencies caused by unexecuted code.  If we have:
> > 
> > A: r1 = READ_ONCE(x);
> > if (r1)
> > WRITE_ONCE(y, 5);
> > r2 = READ_ONCE(y);
> > B: WRITE_ONCE(z, r2);
> > 
> > then in executions where x == 0, herd7 doesn't see any control 
> > dependency.  But CPUs do see control dependencies whenever there is a 
> > conditional branch, whether the branch is taken or not, and so they will 
> > never reorder B before A.
> > 
> 
> Right, because B in this example is a write, what if B is a read that
> depends on r2, like in my example? Let y be a pointer to a memory
> location, and initialized as a valid value (pointing to a valid memory
> location) you example changed to:
> 
>   A: r1 = READ_ONCE(x);
>   if (r1)
>   WRITE_ONCE(y, 5);
>   C: r2 = READ_ONCE(y);
>   B: r3 = READ_ONCE(*r2);
> 
> , then A don't have the control dependency to B, because A and B is
> read+read. So B can be ordered before A, right?

Yes, I think that's right: Both C and B can be executed before A.

> > One last thing to think about: My original assessment or Björn's problem 
> > wasn't right, because the dep in (dep ; rfi) doesn't include control 
> > dependencies.  Only data and address.  So I believe that the LKMM 
> 
> Ah, right. I was mising that part (ctrl is not in dep). So I guess my
> example is pointless for the question we are discussing here ;-(
> 
> > wouldn't consider A to be ordered before B in this example even if x 
> > was nonzero.
> 
> Yes, and similar to my example (changing B to a read).
> 
> I did try to run my example with herd, and got confused no matter I make
> dep; [Plain]; rfi as to-r (I got the same result telling me a reorder
> can happen). Now the reason is clear, because this is a ctrl; rfi not a
> dep; rfi.
> 
> Thanks so much for walking with me on this ;-)

You're welcome.  At this point, it looks like the only remaining 
question is whether to include (dep ; [Plain] ; rfi) in to-r.  This 
doesn't seem to be an urgent question.

Alan


Re: [xhci] usb 4-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd

2021-03-05 Thread Alan Stern
On Fri, Mar 05, 2021 at 01:09:16PM +0100, Sedat Dilek wrote:
> On Mon, Mar 1, 2021 at 4:53 PM Alan Stern  wrote:
> [ ... ]
> > You can use usbmon on bus 4 to record the USB traffic.  It may indicate
> > why the resets occur.
> >
> 
> Hi Alan,
> 
> I followed the instructions in [1].
> 
> root# modprobe -v usbmon
> 
> root# ls /sys/kernel/debug/usb/usbmon
> 0s  0u  1s  1t  1u  2s  2t  2u  3s  3t  3u  4s  4t  4u
> 
> root# cat /sys/kernel/debug/usb/usbmon/4u > /tmp/usbmon-log_4u.txt
> [ Ctrl+C ]
> 
> I recorded 13:03 - 13:04 (one minute).
> 
> So these xhci-resets should be included:
> 
> [Fri Mar  5 13:03:07 2021] usb 4-1: reset SuperSpeed Gen 1 USB device
> number 2 using xhci_hcd
> [Fri Mar  5 13:03:07 2021] usb 4-1: reset SuperSpeed Gen 1 USB device
> number 2 using xhci_hcd
> [Fri Mar  5 13:03:27 2021] usb 4-1: reset SuperSpeed Gen 1 USB device
> number 2 using xhci_hcd
> [Fri Mar  5 13:03:27 2021] usb 4-1: reset SuperSpeed Gen 1 USB device
> number 2 using xhci_hcd
> [Fri Mar  5 13:03:27 2021] usb 4-1: reset SuperSpeed Gen 1 USB device
> number 2 using xhci_hcd
> [Fri Mar  5 13:03:28 2021] usb 4-1: reset SuperSpeed Gen 1 USB device
> number 2 using xhci_hcd
> [Fri Mar  5 13:03:28 2021] usb 4-1: reset SuperSpeed Gen 1 USB device
> number 2 using xhci_hcd
> 
> The usbmon-log is attached.
> 
> Unsure how to interpret the log - the kernel-doc says `raw data`.
> How can I bring this into a human-readable format?
> Can you give me a hand?

Don't worry about trying to decode the output.  To me it looks like the 
drive crashes and needs to be reset at times when the computer sends it 
an ATA command.  (Not all ATA commands, but some.)  You can prevent this 
by setting the following module parameter for the usb-storage driver:

quirks=174c:55aa:t

where the two numbers are the Vendor and Product IDs for the external 
drive, and the 't' is a quirks flag saying not to use any ATA commands.  
If this module parameter fixes the problem, we can add a permanent quirk 
setting to the kernel.

Alan Stern


Re: [PATCH] tools/memory-model: Fix smp_mb__after_spinlock() spelling

2021-03-05 Thread Alan Stern
On Fri, Mar 05, 2021 at 11:28:23AM +0100, Björn Töpel wrote:
> From: Björn Töpel 
> 
> A misspelled invokation of git-grep, revealed that
---^

Smetimes my brain is a little slow...  Do you confirm that this is a 
joke?

Alan


Re: [PATCH 16/17] usb: common: add function to get interval expressed in us unit

2021-03-05 Thread Alan Stern
On Fri, Mar 05, 2021 at 05:02:54PM +0800, Chunfeng Yun wrote:
> Add a new function to convert bInterval into the time expressed
> in 1us unit.
> 
> Signed-off-by: Chunfeng Yun 
> ---

> --- a/drivers/usb/common/common.c
> +++ b/drivers/usb/common/common.c
> @@ -165,6 +165,39 @@ enum usb_dr_mode usb_get_dr_mode(struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(usb_get_dr_mode);
>  
> +unsigned int usb_decode_interval(const struct usb_endpoint_descriptor *epd,
> +  enum usb_device_speed speed)
> +{
> + unsigned int interval = 0;
> +
> + switch (usb_endpoint_type(epd)) {
> + case USB_ENDPOINT_XFER_CONTROL:
> + /* uframes per NAK */
> + if (speed == USB_SPEED_HIGH)
> + interval = epd->bInterval;
> + break;
> + case USB_ENDPOINT_XFER_ISOC:
> + interval = 1 << (epd->bInterval - 1);
> + break;
> + case USB_ENDPOINT_XFER_BULK:
> + /* uframes per NAK */
> + if (speed == USB_SPEED_HIGH && usb_endpoint_dir_out(epd))
> + interval = epd->bInterval;
> + break;
> + case USB_ENDPOINT_XFER_INT:
> + if (speed >= USB_SPEED_HIGH)
> + interval = 1 << (epd->bInterval - 1);
> + else
> + interval = epd->bInterval;
> + break;
> + }
> +
> + interval *= (speed >= USB_SPEED_HIGH) ? 125 : 1000;
> +
> + return interval;
> +}
> +EXPORT_SYMBOL_GPL(usb_decode_interval);

> --- a/include/linux/usb/ch9.h
> +++ b/include/linux/usb/ch9.h
> @@ -90,6 +90,17 @@ extern enum usb_ssp_rate usb_get_maximum_ssp_rate(struct 
> device *dev);
>   */
>  extern const char *usb_state_string(enum usb_device_state state);
>  
> +/**
> + * usb_decode_interval - Decode bInterval into the time expressed in 1us unit
> + * @epd: The descriptor of the endpoint
> + * @speed: The speed that the endpoint works as
> + *
> + * Function returns the interval expressed in 1us unit for servicing
> + * endpoint for data transfers.
> + */
> +unsigned int usb_decode_interval(const struct usb_endpoint_descriptor *epd,
> +  enum usb_device_speed speed);

As a general rule, I believe people expect to find the kerneldoc for a 
function next to the function's definition, not next to the declaration 
in a header file.

Alan Stern


Re: XDP socket rings, and LKMM litmus tests

2021-03-04 Thread Alan Stern
On Thu, Mar 04, 2021 at 11:05:15AM -0800, Paul E. McKenney wrote:
> On Thu, Mar 04, 2021 at 10:35:24AM -0500, Alan Stern wrote:
> > On Wed, Mar 03, 2021 at 09:04:07PM -0800, Paul E. McKenney wrote:
> > > On Wed, Mar 03, 2021 at 10:21:01PM -0500, Alan Stern wrote:
> > > > On Wed, Mar 03, 2021 at 02:03:48PM -0800, Paul E. McKenney wrote:
> > > > > On Wed, Mar 03, 2021 at 03:22:46PM -0500, Alan Stern wrote:
> > 
> > > > > > >  And I cannot immediately think of a situation where
> > > > > > > this approach would break that would not result in a data race 
> > > > > > > being
> > > > > > > flagged.  Or is this yet another failure of my imagination?
> > > > > > 
> > > > > > By definition, an access to a local variable cannot participate in 
> > > > > > a 
> > > > > > data race because all such accesses are confined to a single thread.
> > > > > 
> > > > > True, but its value might have come from a load from a shared 
> > > > > variable.
> > > > 
> > > > Then that load could have participated in a data race.  But the store 
> > > > to 
> > > > the local variable cannot.
> > > 
> > > Agreed.  My thought was that if the ordering from the initial (non-local)
> > > load mattered, then that initial load must have participated in a
> > > data race.  Is that true, or am I failing to perceive some corner case?
> > 
> > Ordering can matter even when no data race is involved.  Just think
> > about how much of the memory model is concerned with ordering of
> > marked accesses, which don't participate in data races unless there is
> > a conflicting plain access somewhere.
> 
> Fair point.  Should I have instead said "then that initial load must
> have run concurrently with a store to that same variable"?

I'm losing track of the point you were originally trying to make.

Does ordering matter when there are no conflicting accesses?  Sure.  
Consider this:

A: r1 = READ_ONCE(x);
B: WRITE_ONCE(y, r1);
   smp_wmb();
C: WRITE_ONCE(z, 1);

Even if there are no other accesses to y at all (let alone any 
conflicting ones), the mere existence of B forces A to be ordered before 
C, and this is easily detectable by a litmus test.

Alan


Re: XDP socket rings, and LKMM litmus tests

2021-03-04 Thread Alan Stern
On Thu, Mar 04, 2021 at 02:33:32PM +0800, Boqun Feng wrote:

> Right, I was thinking about something unrelated.. but how about the
> following case:
> 
>   local_v = 
>   r1 = READ_ONCE(*x); // f
> 
>   if (r1 == 1) {
>   local_v =  // e
>   } else {
>   local_v =  // d
>   }
> 
>   p = READ_ONCE(local_v); // g
> 
>   r2 = READ_ONCE(*p);   // h
> 
> if r1 == 1, we definitely think we have:
> 
>   f ->ctrl e ->rfi g ->addr h
> 
> , and if we treat ctrl;rfi as "to-r", then we have "f" happens before
> "h". However compile can optimze the above as:
> 
>   local_v = 
> 
>   r1 = READ_ONCE(*x); // f
> 
>   if (r1 != 1) {
>   local_v =  // d
>   }
> 
>   p = READ_ONCE(local_v); // g
> 
>   r2 = READ_ONCE(*p);   // h
> 
> , and when this gets executed, I don't think we have the guarantee we
> have "f" happens before "h", because CPU can do optimistic read for "g"
> and "h".

In your example, which accesses are supposed to be to actual memory and 
which to registers?  Also, remember that the memory model assumes the 
hardware does not reorder loads if there is an address dependency 
between them.

> Part of this is because when we take plain access into consideration, we
> won't guarantee a read-from or other relations exists if compiler
> optimization happens.
> 
> Maybe I'm missing something subtle, but just try to think through the
> effect of making dep; rfi as "to-r".

Forget about local variables for the time being and just consider

dep ; [Plain] ; rfi

For example:

A: r1 = READ_ONCE(x);
   y = r1;
B: r2 = READ_ONCE(y);

Should B be ordered after A?  I don't see how any CPU could hope to 
excute B before A, but maybe I'm missing something.

There's another twist, connected with the fact that herd7 can't detect 
control dependencies caused by unexecuted code.  If we have:

A: r1 = READ_ONCE(x);
if (r1)
WRITE_ONCE(y, 5);
r2 = READ_ONCE(y);
B: WRITE_ONCE(z, r2);

then in executions where x == 0, herd7 doesn't see any control 
dependency.  But CPUs do see control dependencies whenever there is a 
conditional branch, whether the branch is taken or not, and so they will 
never reorder B before A.

One last thing to think about: My original assessment or Björn's problem 
wasn't right, because the dep in (dep ; rfi) doesn't include control 
dependencies.  Only data and address.  So I believe that the LKMM 
wouldn't consider A to be ordered before B in this example even if x 
was nonzero.

Alan


Re: XDP socket rings, and LKMM litmus tests

2021-03-04 Thread Alan Stern
On Wed, Mar 03, 2021 at 09:04:07PM -0800, Paul E. McKenney wrote:
> On Wed, Mar 03, 2021 at 10:21:01PM -0500, Alan Stern wrote:
> > On Wed, Mar 03, 2021 at 02:03:48PM -0800, Paul E. McKenney wrote:
> > > On Wed, Mar 03, 2021 at 03:22:46PM -0500, Alan Stern wrote:

> > > > >  And I cannot immediately think of a situation where
> > > > > this approach would break that would not result in a data race being
> > > > > flagged.  Or is this yet another failure of my imagination?
> > > > 
> > > > By definition, an access to a local variable cannot participate in a 
> > > > data race because all such accesses are confined to a single thread.
> > > 
> > > True, but its value might have come from a load from a shared variable.
> > 
> > Then that load could have participated in a data race.  But the store to 
> > the local variable cannot.
> 
> Agreed.  My thought was that if the ordering from the initial (non-local)
> load mattered, then that initial load must have participated in a
> data race.  Is that true, or am I failing to perceive some corner case?

Ordering can matter even when no data race is involved.  Just think
about how much of the memory model is concerned with ordering of
marked accesses, which don't participate in data races unless there is
a conflicting plain access somewhere.

Alan


Re: XDP socket rings, and LKMM litmus tests

2021-03-03 Thread Alan Stern
On Wed, Mar 03, 2021 at 02:03:48PM -0800, Paul E. McKenney wrote:
> On Wed, Mar 03, 2021 at 03:22:46PM -0500, Alan Stern wrote:
> > On Wed, Mar 03, 2021 at 09:40:22AM -0800, Paul E. McKenney wrote:
> > > On Wed, Mar 03, 2021 at 12:12:21PM -0500, Alan Stern wrote:
> > 
> > > > Local variables absolutely should be treated just like CPU registers, 
> > > > if 
> > > > possible.  In fact, the compiler has the option of keeping local 
> > > > variables stored in registers.
> > > > 
> > > > (Of course, things may get complicated if anyone writes a litmus test 
> > > > that uses a pointer to a local variable,  Especially if the pointer 
> > > > could hold the address of a local variable in one execution and a 
> > > > shared variable in another!  Or if the pointer is itself a shared 
> > > > variable and is dereferenced in another thread!)
> > > 
> > > Good point!  I did miss this complication.  ;-)
> > 
> > I suspect it wouldn't be so bad if herd7 disallowed taking addresses of 
> > local variables.
> > 
> > > As you say, when its address is taken, the "local" variable needs to be
> > > treated as is it were shared.  There are exceptions where the pointed-to
> > > local is still used only by its process.  Are any of these exceptions
> > > problematic?
> > 
> > Easiest just to rule out the whole can of worms.
> 
> Good point, given that a global can be used instead of a local for
> any case where an address must be taken.

Another thing to consider: Almost all marked accesses involve using the 
address of the storage location (for example, smp_load_acquire's first 
argument must be a pointer).  As far as I can remember at the moment, 
the only ones that don't are READ_ONCE and WRITE_ONCE.  So although we 
might or might not want to allow READ_ONCE or WRITE_ONCE on a local 
variable, we won't have to worry about any of the other kinds of marked 
accesses.

> > > > But even if local variables are treated as non-shared storage 
> > > > locations, 
> > > > we should still handle this correctly.  Part of the problem seems to 
> > > > lie 
> > > > in the definition of the to-r dependency relation; the relevant portion 
> > > > is:
> > > > 
> > > > (dep ; [Marked] ; rfi)
> > > > 
> > > > Here dep is the control dependency from the READ_ONCE to the 
> > > > local-variable store, and the rfi refers to the following load of the 
> > > > local variable.  The problem is that the store to the local variable 
> > > > doesn't go in the Marked class, because it is notated as a plain C 
> > > > assignment.  (And likewise for the following load.)
> > > > 
> > > > Should we change the model to make loads from and stores to local 
> > > > variables always count as Marked?
> > > 
> > > As long as the initial (possibly unmarked) load would be properly
> > > complained about.
> > 
> > Sorry, I don't understand what you mean.
> 
> I was thinking in terms of something like this in one of the processes:
> 
>   p = gp; // Unmarked!
>   r1 = p;
>   q = r1; // Implicitly marked now?
>   if (q)
>   WRITE_ONCE(x, 1); // ctrl dep from gp???

I hope we won't have to worry about this!  :-)  Treating local variable 
accesses as if they are always marked looks wrong.

> > >  And I cannot immediately think of a situation where
> > > this approach would break that would not result in a data race being
> > > flagged.  Or is this yet another failure of my imagination?
> > 
> > By definition, an access to a local variable cannot participate in a 
> > data race because all such accesses are confined to a single thread.
> 
> True, but its value might have come from a load from a shared variable.

Then that load could have participated in a data race.  But the store to 
the local variable cannot.

> > However, there are other aspects to consider, in particular, the 
> > ordering relations on local-variable accesses.  But if, as Luc says, 
> > local variables are treated just like registers then perhaps the issue 
> > doesn't arise.
> 
> Here is hoping!
> 
> > > > What should have happened if the local variable were instead a shared 
> > > > variable which the other thread didn't access at all?  It seems like a 
> > > > weak point of the memory model that it treats these two things 
> > > > differently.
> > > 
> > > But is this really any different than the

Re: XDP socket rings, and LKMM litmus tests

2021-03-03 Thread Alan Stern
On Thu, Mar 04, 2021 at 09:26:31AM +0800, Boqun Feng wrote:
> On Wed, Mar 03, 2021 at 03:22:46PM -0500, Alan Stern wrote:

> > Which brings us back to the case of the
> > 
> > dep ; rfi
> > 
> > dependency relation, where the accesses in the middle are plain and 
> > non-racy.  Should the LKMM be changed to allow this?
> > 
> 
> For this particular question, do we need to consider code as the follow?
> 
>   r1 = READ_ONCE(x);  // f
>   if (r == 1) {
>   local_v =  // g
>   do_something_a();
>   }
>   else {
>   local_v = 
>   do_something_b();
>   }
> 
>   r2 = READ_ONCE(*local_v); // e
> 
> , do we have the guarantee that the first READ_ONCE() happens before the
> second one? Can compiler optimize the code as:
> 
>   r2 = READ_ONCE(y);
>   r1 = READ_ONCE(x);

Well, it can't do that because the compiler isn't allowed to reorder
volatile accesses (which includes READ_ONCE).  But the compiler could
do:

r1 = READ_ONCE(x);
r2 = READ_ONCE(y);

>   if (r == 1) {
>   do_something_a();
>   }
>   else {
>   do_something_b();
>   }
> 
> ? Although we have:
> 
>   f ->dep g ->rfi ->addr e

This would be an example of a problem Paul has described on several
occasions, where both arms of an "if" statement store the same value
(in this case to local_v).  This problem arises even when local
variables are not involved.  For example:

if (READ_ONCE(x) == 0) {
WRITE_ONCE(y, 1);
do_a();
} else {
WRITE_ONCE(y, 1);
do_b();
}

The compiler can change this to:

r = READ_ONCE(x);
WRITE_ONCE(y, 1);
if (r == 0)
do_a();
else
do_b();

thus allowing the marked accesses to be reordered by the CPU and
breaking the apparent control dependency.

So the answer to your question is: No, we don't have this guarantee,
but the reason is because of doing the same store in both arms, not
because of the use of local variables.

Alan


Re: XDP socket rings, and LKMM litmus tests

2021-03-03 Thread Alan Stern
On Wed, Mar 03, 2021 at 06:37:36PM +0100, maranget wrote:
> 
> 
> > On 3 Mar 2021, at 18:12, Alan Stern  wrote:
> > 
> > On Tue, Mar 02, 2021 at 03:50:19PM -0800, Paul E. McKenney wrote:
> >> On Tue, Mar 02, 2021 at 04:14:46PM -0500, Alan Stern wrote:
> > 
> >>> This result is wrong, apparently because of a bug in herd7.  There 
> >>> should be control dependencies from each of the two loads in P0 to each 
> >>> of the two stores, but herd7 doesn't detect them.
> >>> 
> >>> Maybe Luc can find some time to check whether this really is a bug and 
> >>> if it is, fix it.
> >> 
> >> I agree that herd7's control dependency tracking could be improved.
> >> 
> >> But sadly, it is currently doing exactly what I asked Luc to make it do,
> >> which is to confine the control dependency to its "if" statement.  But as
> >> usual I wasn't thinking globally enough.  And I am not exactly sure what
> >> to ask for.  Here a store to a local was control-dependency ordered after
> >> a read, and so that should propagate to a read from that local variable.
> >> Maybe treat local variables as if they were registers, so that from
> >> herd7's viewpoint the READ_ONCE()s are able to head control-dependency
> >> chains in multiple "if" statements?
> >> 
> >> Thoughts?
> > 
> > Local variables absolutely should be treated just like CPU registers, if 
> > possible.  In fact, the compiler has the option of keeping local 
> > variables stored in registers.
> > 
> 
> And indeed local variables are treated as registers by herd7.
> 
> 
> > (Of course, things may get complicated if anyone writes a litmus test 
> > that uses a pointer to a local variable,  Especially if the pointer 
> > could hold the address of a local variable in one execution and a 
> > shared variable in another!  Or if the pointer is itself a shared 
> > variable and is dereferenced in another thread!)
> > 
> > But even if local variables are treated as non-shared storage locations, 
> > we should still handle this correctly.  Part of the problem seems to lie 
> > in the definition of the to-r dependency relation; the relevant portion 
> > is:
> 
> In fact, I’d rather change the computation of “dep” here control-dependency 
> “ctrl”. Notice that “ctrl” is computed by herd7 and present in the initial 
> environment of the Cat interpreter.
> 
> I have made a PR to herd7 that performs the change. The commit message states 
> the new definition.

Shouldn't similar reasoning apply to data and address dependencies?

For example, suppose there is a control dependency from a load to a 
register variable, and then a data dependency from the register variable 
to a store.  This should be treated as an overall data dependency from 
the load to the store.

Does your change to herd7 do this?  I couldn't tell from the description 
in the PR.

Also, do you think it's reasonable to add a restriction to herd7 against 
taking the address of a local variable?

> > (dep ; [Marked] ; rfi)
> > 
> > Here dep is the control dependency from the READ_ONCE to the 
> > local-variable store, and the rfi refers to the following load of the 
> > local variable.  The problem is that the store to the local variable 
> > doesn't go in the Marked class, because it is notated as a plain C 
> > assignment.  (And likewise for the following load.)
> > 
> This is a related issue, I am not sure, but perhaps it can be formulated as
> "should rfi and rf on registers behave the  same?”

Aren't they already the same thing?  It's not possible to have an rfe 
from a register, is it?

Alan

> > Should we change the model to make loads from and stores to local 
> > variables always count as Marked?
> > 
> > What should have happened if the local variable were instead a shared 
> > variable which the other thread didn't access at all?  It seems like a 
> > weak point of the memory model that it treats these two things 
> > differently.
> > 
> > Alan


Re: XDP socket rings, and LKMM litmus tests

2021-03-03 Thread Alan Stern
On Wed, Mar 03, 2021 at 09:40:22AM -0800, Paul E. McKenney wrote:
> On Wed, Mar 03, 2021 at 12:12:21PM -0500, Alan Stern wrote:

> > Local variables absolutely should be treated just like CPU registers, if 
> > possible.  In fact, the compiler has the option of keeping local 
> > variables stored in registers.
> > 
> > (Of course, things may get complicated if anyone writes a litmus test 
> > that uses a pointer to a local variable,  Especially if the pointer 
> > could hold the address of a local variable in one execution and a 
> > shared variable in another!  Or if the pointer is itself a shared 
> > variable and is dereferenced in another thread!)
> 
> Good point!  I did miss this complication.  ;-)

I suspect it wouldn't be so bad if herd7 disallowed taking addresses of 
local variables.

> As you say, when its address is taken, the "local" variable needs to be
> treated as is it were shared.  There are exceptions where the pointed-to
> local is still used only by its process.  Are any of these exceptions
> problematic?

Easiest just to rule out the whole can of worms.

> > But even if local variables are treated as non-shared storage locations, 
> > we should still handle this correctly.  Part of the problem seems to lie 
> > in the definition of the to-r dependency relation; the relevant portion 
> > is:
> > 
> > (dep ; [Marked] ; rfi)
> > 
> > Here dep is the control dependency from the READ_ONCE to the 
> > local-variable store, and the rfi refers to the following load of the 
> > local variable.  The problem is that the store to the local variable 
> > doesn't go in the Marked class, because it is notated as a plain C 
> > assignment.  (And likewise for the following load.)
> > 
> > Should we change the model to make loads from and stores to local 
> > variables always count as Marked?
> 
> As long as the initial (possibly unmarked) load would be properly
> complained about.

Sorry, I don't understand what you mean.

>  And I cannot immediately think of a situation where
> this approach would break that would not result in a data race being
> flagged.  Or is this yet another failure of my imagination?

By definition, an access to a local variable cannot participate in a 
data race because all such accesses are confined to a single thread.

However, there are other aspects to consider, in particular, the 
ordering relations on local-variable accesses.  But if, as Luc says, 
local variables are treated just like registers then perhaps the issue 
doesn't arise.

> > What should have happened if the local variable were instead a shared 
> > variable which the other thread didn't access at all?  It seems like a 
> > weak point of the memory model that it treats these two things 
> > differently.
> 
> But is this really any different than the situation where a global
> variable is only accessed by a single thread?

Indeed; it is the _same_ situation.  Which leads to some interesting 
questions, such as: What does READ_ONCE(r) mean when r is a local 
variable?  Should it be allowed at all?  In what way is it different 
from a plain read of r?

One difference is that the LKMM doesn't allow dependencies to originate 
from a plain load.  Of course, when you're dealing with a local 
variable, what matters is not the load from that variable but rather the 
earlier loads which determined the value that had been stored there.  
Which brings us back to the case of the

dep ; rfi

dependency relation, where the accesses in the middle are plain and 
non-racy.  Should the LKMM be changed to allow this?

There are other differences to consider.  For example:

r = READ_ONCE(x);
smp_wmb();
WRITE_ONCE(y, 1);

If the write to r were treated as a marked store, the smp_wmb would 
order it (and consequently the READ_ONCE) before the WRITE_ONCE.  
However we don't want to do this when r is a local variable.  Indeed, a 
plain store wouldn't be ordered this way because the compiler might 
optimize the store away entirely, leaving the smp_wmb nothing to act on.

So overall the situation is rather puzzling.  Treating local variables 
as registers is probably the best answer.

Alan


Re: XDP socket rings, and LKMM litmus tests

2021-03-03 Thread Alan Stern
On Tue, Mar 02, 2021 at 03:50:19PM -0800, Paul E. McKenney wrote:
> On Tue, Mar 02, 2021 at 04:14:46PM -0500, Alan Stern wrote:

> > This result is wrong, apparently because of a bug in herd7.  There 
> > should be control dependencies from each of the two loads in P0 to each 
> > of the two stores, but herd7 doesn't detect them.
> > 
> > Maybe Luc can find some time to check whether this really is a bug and 
> > if it is, fix it.
> 
> I agree that herd7's control dependency tracking could be improved.
> 
> But sadly, it is currently doing exactly what I asked Luc to make it do,
> which is to confine the control dependency to its "if" statement.  But as
> usual I wasn't thinking globally enough.  And I am not exactly sure what
> to ask for.  Here a store to a local was control-dependency ordered after
> a read, and so that should propagate to a read from that local variable.
> Maybe treat local variables as if they were registers, so that from
> herd7's viewpoint the READ_ONCE()s are able to head control-dependency
> chains in multiple "if" statements?
> 
> Thoughts?

Local variables absolutely should be treated just like CPU registers, if 
possible.  In fact, the compiler has the option of keeping local 
variables stored in registers.

(Of course, things may get complicated if anyone writes a litmus test 
that uses a pointer to a local variable,  Especially if the pointer 
could hold the address of a local variable in one execution and a 
shared variable in another!  Or if the pointer is itself a shared 
variable and is dereferenced in another thread!)

But even if local variables are treated as non-shared storage locations, 
we should still handle this correctly.  Part of the problem seems to lie 
in the definition of the to-r dependency relation; the relevant portion 
is:

(dep ; [Marked] ; rfi)

Here dep is the control dependency from the READ_ONCE to the 
local-variable store, and the rfi refers to the following load of the 
local variable.  The problem is that the store to the local variable 
doesn't go in the Marked class, because it is notated as a plain C 
assignment.  (And likewise for the following load.)

Should we change the model to make loads from and stores to local 
variables always count as Marked?

What should have happened if the local variable were instead a shared 
variable which the other thread didn't access at all?  It seems like a 
weak point of the memory model that it treats these two things 
differently.

Alan


Re: XDP socket rings, and LKMM litmus tests

2021-03-02 Thread Alan Stern
p_mb(), and also
> the circular buffer in circular-buffers.txt (pre commit 6c43c091bdc5
> ("documentation: Update circular buffer for
> load-acquire/store-release")) is missing the smp_mb() at the
> producer-side.
> 
> I'm trying to wrap my head around why it's OK to remove the smp_mb()
> in the cases above? I'm worried that the current XDP socket ring
> implementation (which is missing smp_mb()) might be broken.

Because of the control dependencies, the smp_mb isn't needed.  The 
dependencies will order both of the stores after both of the loads.

Alan Stern


Re: [PATCH 2/2] usb-storage: revert from scsi_add_host_with_dma() to scsi_add_host()

2021-03-01 Thread Alan Stern
On Fri, Feb 26, 2021 at 06:53:52AM +0100, Christoph Hellwig wrote:
> On Thu, Feb 25, 2021 at 11:35:57AM -0500, Alan Stern wrote:
> > This thread seems to have fallen through the cracks.  Maybe now would be 
> > a good time to address the problem (since we originally planned to fix 
> > it in 5.11!).
> > 
> > The questions listed below are pretty self-contained, although the rest
> > of the discussion isn't.  But I never received any answers.
> 
> usb-storage must use scsi_add_host_with_dma to use the right device
> for DMA mapping and parameters.  The calls to set the DMA options
> on the device are needed so that IOMMU merging doesn't change the
> imposed requirements.  If these requirements slow you down you need
> to relax them, as apparently the hardware is able to handle bigger
> limits.

Hans, don't you have the right equipment to test this approach?

Alan Stern


Re: [xhci] usb 4-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd

2021-03-01 Thread Alan Stern
On Mon, Mar 01, 2021 at 09:54:38AM +0100, Sedat Dilek wrote:
> On Wed, Feb 24, 2021 at 6:25 PM Sedat Dilek  wrote:
> >
> > On Wed, Feb 24, 2021 at 2:44 PM Sedat Dilek  wrote:
> > >
> > > Hi Mathias,
> > >
> > > I am here on Linux-v5.11-10201-gc03c21ba6f4e.
> > >
> > > I see a lot xhci-resets in my dmesg-log:
> > >
> > > root# LC_ALL=C dmesg -T | grep 'usb 4-1: reset SuperSpeed Gen 1 USB
> > > device number 2 using xhci_hcd' | wc -l
> > > 75
> > >
> > > This is what I have:
> > >
> > > root# lsusb -s 004:001
> > > Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
> > >
> > > root# lsusb -s 004:002
> > > Bus 004 Device 002: ID 174c:55aa ASMedia Technology Inc. ASM1051E SATA
> > > 6Gb/s bridge, ASM1053E SATA 6Gb/s bridge, ASM1153 SATA 3Gb/s bridge,
> > > ASM1153E SATA 6Gb/s bridge
> > >
> > > My external USB 3.0 HDD contains the partition with my Debian-system
> > > and is attached to the above xhci bus/device.
> > >
> > > Can you enlighten what this means?
> > > Is this a known issue?
> > > Is there a fix around?
> > >
> > > BTW, in which Git tree is the xhci development happening?
> > > Can you point me to it?
> > >
> > > I am attaching my linux-config and full dmesg-log.
> > >
> > > Also I have attached outputs of:
> > >
> > > $ sudo lsusb -vvv -d 1d6b:0003
> > > $ sudo lsusb -vvv -d 174c:55aa
> > >
> > > If you need further information, please let me know.
> > >
> > > Thanks.
> > >
> >
> > Looks like that xhci-reset happens here every 10min.
> >
> 
> [ To Greg ]
> 
> The problem still remains with Linux v5.12-rc1 (see [1]).
> 
> Yesterday, I ran some disk-health checks with smartctl and gsmartcontrol.
> All good.
> 
> For the first time I used badblocks from e2fsprogs Debian package:
> 
> root# LC_ALL=C badblocks -v -p 1 -s /dev/sdc -o
> badblocks-v-p-1-s_dev-sdc_$(uname -r).txt
> Checking blocks 0 to 976762583
> Checking for bad blocks (read-only test): done
> Pass completed, 0 bad blocks found. (0/0/0 errors)
> 
> Good, there is no file-system corruption or badblocks or even a hardware 
> damage.
> 
> Anyway, feedback is much appreciated.
> 
> Thanks.

You can use usbmon on bus 4 to record the USB traffic.  It may indicate 
why the resets occur.

Alan Stern


Re: [RFC PATCH] USB:XHCI:Modify XHCI driver for USB2.0 controller

2021-02-27 Thread Alan Stern
On Sat, Feb 27, 2021 at 11:38:08AM +0800, liulongfang wrote:
> On 2021/2/27 0:30, Alan Stern wrote:
> > On Fri, Feb 26, 2021 at 04:21:37PM +0800, Longfang Liu wrote:
> >> Our current XHCI hardware controller has been customized to only
> >> support USB 2.0 ports. When using the current xhci driver, an xhci
> >> controller device and an ehci controller device will be created
> >> automatically.
> > 
> > That sentence makes no sense at all.  An EHCI controller device is a 
> > piece of hardware.  How can an xHCI driver, which is a piece of 
> > software, create a piece of hardware?
> > 
> > Alan Stern
> > .
> > 
> The hardware device is a complete USB3.0 controller,
> but I hope to support a USB2.0-only mode through software configuration.

Even if it only supports USB-2.0 connections, an xHCI controller is 
still an xHCI controller.  It doesn't magically transform into an EHCI 
controller.

You are not creating an EHCI controller device.  Rather, you are trying 
to restrict an xHCI controller device to make it handle only USB-2.0 
connections.  If you run lsusb on a system that has an xHCI controller, 
you'll see that the controller is bound to two USB buses: a USB-2 bus 
and a USB-3 bus.  But for both buses, the controller is xHCI -- not 
EHCI.

Your patch description is inaccurate.

Alan Stern


Re: [RFC PATCH] USB:XHCI:Modify XHCI driver for USB2.0 controller

2021-02-26 Thread Alan Stern
On Fri, Feb 26, 2021 at 04:21:37PM +0800, Longfang Liu wrote:
> Our current XHCI hardware controller has been customized to only
> support USB 2.0 ports. When using the current xhci driver, an xhci
> controller device and an ehci controller device will be created
> automatically.

That sentence makes no sense at all.  An EHCI controller device is a 
piece of hardware.  How can an xHCI driver, which is a piece of 
software, create a piece of hardware?

Alan Stern


Re: [PATCH] drivers/hid: fix for the big hid report length

2021-02-26 Thread Alan Stern
On Fri, Feb 26, 2021 at 02:13:36PM +0600, Sabyrzhan Tasbolatov wrote:
> On Thu, 25 Feb 2021 10:59:14 -0500, Alan Stern wrote:
> > Won't this cause silent errors?
> 
> Agree. But there are already such as cases like in:
> 
> // net/bluetooth/hidp/core.c
> static void hidp_process_report(..)
> {
>   ..
>   if (len > HID_MAX_BUFFER_SIZE)
>   len = HID_MAX_BUFFER_SIZE;
>   ..
> 
> // drivers/hid/hid-core.c
> int hid_report_raw_event(..)
> {
>   ..
>   rsize = hid_compute_report_size(report);
> 
>   if (report_enum->numbered && rsize >= HID_MAX_BUFFER_SIZE)
>   rsize = HID_MAX_BUFFER_SIZE - 1;
>   else if (rsize > HID_MAX_BUFFER_SIZE)
>   rsize = HID_MAX_BUFFER_SIZE;
>   ..
> 
> // drivers/staging/greybus/hid.c
> static int gb_hid_start(..)
> {
>   ..
>   if (bufsize > HID_MAX_BUFFER_SIZE)
>   bufsize = HID_MAX_BUFFER_SIZE;
>   ..
> 
> > How about instead just rejecting any device which includes a report 
> > whose length is too big (along with an line in the system log explaining 
> > what's wrong)?
> 
> Not everywhere, but there are already such as logs when > HID_MAX_BUFFER_SIZE
> 
> // drivers/hid/hidraw.c
> static ssize_t hidraw_send_report(..)
> {
>   ..
>   if (count > HID_MAX_BUFFER_SIZE) {
>   hid_warn(dev, "pid %d passed too large report\n",
>task_pid_nr(current));
>   ret = -EINVAL;
>   goto out;
>   }
> 
> 
> I've just noticed that hid_compute_report_size() doing the same thing as
> hid_report_len(). So I will replace it with latter one with length check.
> 
> So in [PATCH v2] I will do following:
> 
>  1. replace hid_compute_report_size() with hid_report_len()
> 
>  2. in hid_report_len() we can hid_warn() if length > HID_MAX_BUFFER_SIZE,
> and return HID_MAX_BUFFER_SIZE. Or we can return 0 in hid_report_len() to let
> functions like hid_hw_raw_request(), hid_hw_output_report() to validate
> invalid report length and return -EINVAL. Though I'll need to add !length
> missing checks in other places.
> 
> Please let me know what it's preferred way in 2nd step.

It's been too long since I worked on this stuff; you should check with 
the maintainers.

Another thing to consider: There probably are devices with multiple 
reports, where one of the reports is too big but people only want to use 
the other, smaller reports.  For situations like that, we don't want to 
reject the entire device.

I don't know if parsing a partiall part is a good thing to do.

Alan Stern


Re: [PATCH 2/2] usb-storage: revert from scsi_add_host_with_dma() to scsi_add_host()

2021-02-25 Thread Alan Stern
This thread seems to have fallen through the cracks.  Maybe now would be 
a good time to address the problem (since we originally planned to fix 
it in 5.11!).

The questions listed below are pretty self-contained, although the rest
of the discussion isn't.  But I never received any answers.

Alan Stern

On Mon, Nov 30, 2020 at 03:36:18PM -0500, Alan Stern wrote:
> [Added linux-scsi to CC: list.  When discussing code in a particular 
> subsystem, it's a good idea to include that subsystem's mailing list in 
> the CC:.]
> 
> On Tue, Dec 01, 2020 at 03:01:56AM +0800, Tom Yan wrote:
> > For the record,
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/scsi/scsi_host.h?h=v5.10-rc6#n753
> > 
> > On Tue, 1 Dec 2020 at 02:57, Tom Yan  wrote:
> > >
> > > This maybe? 
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/scsi/scsi_lib.c?h=v5.10-rc6#n1816
> > >
> > > UAS:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/storage/uas.c?h=v5.10-rc6#n918
> > > BOT (AFAICT):
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/scsi/hosts.c?h=v5.10-rc6#n466
> > >
> > > It would explain why the issue is only triggered with UAS drives.
> 
> In brief, a recent change -- calling scsi_add_host_with_dma rather than 
> scsi_add_host -- in the USB uas driver has caused a regression in 
> performance.  (Note that the shost->dma_dev value is set differently as 
> a result of this change.)  Hans has determined that the problem seems 
> to be related to permanent changes in the dma_dev's settings caused by 
> scsi_add_host_with_dma.
> 
> Tom pointed out that __scsi_init_queue contains a couple of questionable 
> assignments:
> 
>   dma_set_seg_boundary(dev, shost->dma_boundary);
> 
> and
> 
>   dma_set_max_seg_size(dev, queue_max_segment_size(q));
> 
> where dev = shost->dma_dev -- in this case, a USB host controller.
> 
> So an important question is why decisions related to a particular SCSI 
> host should affect the DMA settings of a device somewhere else in the 
> heirarchy?  Sure, the properties of the USB controller should constrain 
> the settings available to the SCSI host, but there doesn't seem to be 
> any good reason for restrictions to go in the other direction.
> 
> Doesn't the way we handle DMA permit a child device to impose additional 
> restrictions (such as a smaller max segment size) beyond those of the 
> parent device which actually performs the DMA transfer?
> 
> > > The questions (from me) are:
> > > 1. From the scsi layer POV (as per what __scsi_init_queue() does),
> > > what/which should we use as dma_dev?
> 
> We should be using the USB host controller, because it is the device 
> which actually performs the DMA transfers.
> 
> > > 2. Do we really need to set dma_boundary in the UAS host template (to
> > > PAGE_SIZE - 1)?
> 
> I don't know.  But in theory it should be possible to have settings 
> (like this one) which affect only the transfers carried out by the SCSI 
> host, not the transfers carred out by other drivers which might use the 
> same USB controller.
> 
> > > 3. Kind of the same question as #1: when we clamp hw_max_sectors to
> > > dma max mapping size, should the size actually be "the smaller one
> > > among dev and sysdev"? Or is one of the two sizes *always* the smaller
> > > one?
> 
> I assume you're referring to code in the uas driver.  There the value of 
> dev is meaningless as far as DMA is concerned.  Only sysdev matters.
> 
> Alan Stern
> 
> > > On Tue, 1 Dec 2020 at 02:19, Hans de Goede  wrote:
> > > >
> > > > Hi,
> > > >
> > > > On 11/30/20 6:20 PM, Alan Stern wrote:
> > > > > On Mon, Nov 30, 2020 at 02:36:38PM +0100, Hans de Goede wrote:
> > > > >> Hi,
> > > > >>
> > > > >> On 11/30/20 2:30 PM, Greg KH wrote:
> > > > >>> On Mon, Nov 30, 2020 at 02:23:48PM +0100, Hans de Goede wrote:
> > > > >>>> Hi,
> > > > >>>>
> > > > >>>> On 11/30/20 1:58 PM, Tom Yan wrote:
> > > > >>>>> It's merely a moving of comment moving for/and a 
> > > > >>>>> no-behavioral-change
> > > > >>>>> adaptation for the reversion.>
> > > > >>>>
> > > > >>>> IMHO the revert of the troublesome commit and the other/new 
> > > > >>>> chang

Re: [PATCH] drivers/hid: fix for the big hid report length

2021-02-25 Thread Alan Stern
On Thu, Feb 25, 2021 at 08:52:15PM +0600, Sabyrzhan Tasbolatov wrote:
> syzbot found WARNING in hid_alloc_report_buf[1] when the raw buffer for
> report is kmalloc() allocated with length > KMALLOC_MAX_SIZE, causing
> order >= MAX_ORDER condition:
> 
> u8 *hid_alloc_report_buf(struct hid_report *report, gfp_t flags)
> {
>   /*
>* 7 extra bytes are necessary to achieve proper functionality
>* of implement() working on 8 byte chunks
>*/
> 
>   u32 len = hid_report_len(report) + 7;
> 
>   return kmalloc(len, flags);
> 
> The restriction with HID_MAX_BUFFER_SIZE (16kb) is, seems, a valid max
> limit. I've come up with this in all hid_report_len() xrefs.
> 
> The fix inside hid_report_len(), not in *hid_alloc_report_buf() is also
> fixing out-of-bounds here in memcpy():
> 
> statc int hid_submit_ctrl(..)
> {
> ..
>   len = hid_report_len(report);
>   if (dir == USB_DIR_OUT) {
>   ..
>   if (raw_report) {
>   memcpy(usbhid->ctrlbuf, raw_report, len);
> ..
> 
> So I've decided to return HID_MAX_BUFFER_SIZE if it the report length is
> bigger than limit, otherwise the return the report length.
> 
> [1]
> Call Trace:
>  alloc_pages include/linux/gfp.h:547 [inline]
>  kmalloc_order+0x40/0x130 mm/slab_common.c:837
>  kmalloc_order_trace+0x15/0x70 mm/slab_common.c:853
>  kmalloc_large include/linux/slab.h:481 [inline]
>  __kmalloc+0x257/0x330 mm/slub.c:3974
>  kmalloc include/linux/slab.h:557 [inline]
>  hid_alloc_report_buf+0x70/0xa0 drivers/hid/hid-core.c:1648
>  __usbhid_submit_report drivers/hid/usbhid/hid-core.c:590 [inline]
> 
> Reported-by: syzbot+ab02336a647181a88...@syzkaller.appspotmail.com
> Signed-off-by: Sabyrzhan Tasbolatov 
> ---
>  drivers/hid/usbhid/hid-core.c | 2 +-
>  include/linux/hid.h   | 4 +++-
>  2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/hid/usbhid/hid-core.c b/drivers/hid/usbhid/hid-core.c
> index 86257ce6d619..4e9077363c96 100644
> --- a/drivers/hid/usbhid/hid-core.c
> +++ b/drivers/hid/usbhid/hid-core.c
> @@ -374,7 +374,7 @@ static int hid_submit_ctrl(struct hid_device *hid)
>   raw_report = usbhid->ctrl[usbhid->ctrltail].raw_report;
>   dir = usbhid->ctrl[usbhid->ctrltail].dir;
>  
> - len = ((report->size - 1) >> 3) + 1 + (report->id > 0);
> + len = hid_report_len(report);
>   if (dir == USB_DIR_OUT) {
>   usbhid->urbctrl->pipe = usb_sndctrlpipe(hid_to_usb_dev(hid), 0);
>   usbhid->urbctrl->transfer_buffer_length = len;
> diff --git a/include/linux/hid.h b/include/linux/hid.h
> index c39d71eb1fd0..509a6ffdca00 100644
> --- a/include/linux/hid.h
> +++ b/include/linux/hid.h
> @@ -1156,7 +1156,9 @@ static inline void hid_hw_wait(struct hid_device *hdev)
>  static inline u32 hid_report_len(struct hid_report *report)
>  {
>   /* equivalent to DIV_ROUND_UP(report->size, 8) + !!(report->id > 0) */
> - return ((report->size - 1) >> 3) + 1 + (report->id > 0);
> + u32 len = ((report->size - 1) >> 3) + 1 + (report->id > 0);
> +
> + return len > HID_MAX_BUFFER_SIZE ? HID_MAX_BUFFER_SIZE : len;

Won't this cause silent errors?

How about instead just rejecting any device which includes a report 
whose length is too big (along with an line in the system log explaining 
what's wrong)?

Alan Stern


Re: [PATCH v2 2/2] usb: host: ehci-platform: add ignore_oc DT support

2021-02-23 Thread Alan Stern
On Tue, Feb 23, 2021 at 05:16:44PM +0100, Álvaro Fernández Rojas wrote:
> Over-current reporting isn't supported on some platforms such as bcm63xx.
> These devices will incorrectly report over-current if this flag isn't properly
> activated.
> 
> Signed-off-by: Álvaro Fernández Rojas 
> ---
>  v2: change flag name and improve documentation as suggested by Alan Stern.
> 
>  drivers/usb/host/ehci-platform.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/usb/host/ehci-platform.c 
> b/drivers/usb/host/ehci-platform.c
> index a48dd3fac153..2c587e31d010 100644
> --- a/drivers/usb/host/ehci-platform.c
> +++ b/drivers/usb/host/ehci-platform.c
> @@ -286,6 +286,9 @@ static int ehci_platform_probe(struct platform_device 
> *dev)
>   if (of_property_read_bool(dev->dev.of_node, "big-endian"))
>   ehci->big_endian_mmio = ehci->big_endian_desc = 1;
>  
> + if (of_property_read_bool(dev->dev.of_node, "spurious-oc"))
> + ehci->ignore_oc = 1;
> +
>   if (of_property_read_bool(dev->dev.of_node,
>     "needs-reset-on-resume"))
>   priv->reset_on_resume = true;

Acked-by: Alan Stern 


Re: [PATCH 1/2] dt-bindings: usb: generic-ehci: document ignore-oc flag

2021-02-23 Thread Alan Stern
On Tue, Feb 23, 2021 at 05:04:57PM +0100, Álvaro Fernández Rojas wrote:
> Hi Alan,
> 
> > El 23 feb 2021, a las 16:54, Alan Stern  
> > escribió:
> > 
> > On Tue, Feb 23, 2021 at 04:50:04PM +0100, Álvaro Fernández Rojas wrote:
> >> Over-current reporting isn't supported on some platforms such as bcm63xx.
> >> These devices will incorrectly report over-current if this flag isn't 
> >> properly
> >> activated.
> >> 
> >> Signed-off-by: Álvaro Fernández Rojas 
> >> ---
> >> Documentation/devicetree/bindings/usb/generic-ehci.yaml | 5 +
> >> 1 file changed, 5 insertions(+)
> >> 
> >> diff --git a/Documentation/devicetree/bindings/usb/generic-ehci.yaml 
> >> b/Documentation/devicetree/bindings/usb/generic-ehci.yaml
> >> index cf83f2d9afac..294bbf02399e 100644
> >> --- a/Documentation/devicetree/bindings/usb/generic-ehci.yaml
> >> +++ b/Documentation/devicetree/bindings/usb/generic-ehci.yaml
> >> @@ -117,6 +117,11 @@ properties:
> >>   Set this flag if EHCI has a Transaction Translator built into
> >>   the root hub.
> >> 
> >> +  ignore-oc:
> >> +$ref: /schemas/types.yaml#/definitions/flag
> >> +description:
> >> +  Set this flag for HCDs without over-current reporting support.
> > 
> > This is not a good description of a device property.  DT entries are 
> > supposed to described the hardware, not talk about how to use it.
> 
> Any suggestions on a proper description?
> 
> > 
> > When you say that the bcm63xx doesn't support over-current reporting, 
> > what exactly do you mean?  Do you mean that sometimes the hardware turns 
> > on the over-current bit when an over-current isn't actually present?  Or 
> > do you mean something else?
> 
> Yes, the hardware turns on the over-current bit with no over-current present.

Okay, in that case the property should be named something like 
"spurious_oc", and the description should say something like:

Set this flag to indicate that the hardware sometimes turns on 
the OC bit when an over-current isn't actually present.

Alan Stern


Re: [PATCH 1/2] dt-bindings: usb: generic-ehci: document ignore-oc flag

2021-02-23 Thread Alan Stern
On Tue, Feb 23, 2021 at 04:50:04PM +0100, Álvaro Fernández Rojas wrote:
> Over-current reporting isn't supported on some platforms such as bcm63xx.
> These devices will incorrectly report over-current if this flag isn't properly
> activated.
> 
> Signed-off-by: Álvaro Fernández Rojas 
> ---
>  Documentation/devicetree/bindings/usb/generic-ehci.yaml | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/usb/generic-ehci.yaml 
> b/Documentation/devicetree/bindings/usb/generic-ehci.yaml
> index cf83f2d9afac..294bbf02399e 100644
> --- a/Documentation/devicetree/bindings/usb/generic-ehci.yaml
> +++ b/Documentation/devicetree/bindings/usb/generic-ehci.yaml
> @@ -117,6 +117,11 @@ properties:
>Set this flag if EHCI has a Transaction Translator built into
>the root hub.
>  
> +  ignore-oc:
> +$ref: /schemas/types.yaml#/definitions/flag
> +description:
> +  Set this flag for HCDs without over-current reporting support.

This is not a good description of a device property.  DT entries are 
supposed to described the hardware, not talk about how to use it.

When you say that the bcm63xx doesn't support over-current reporting, 
what exactly do you mean?  Do you mean that sometimes the hardware turns 
on the over-current bit when an over-current isn't actually present?  Or 
do you mean something else?

Alan Stern

> +
>needs-reset-on-resume:
>  $ref: /schemas/types.yaml#/definitions/flag
>  description:
> -- 
> 2.20.1
> 


Re: usb: dwc3: gadget: Change runtime pm function for DWC3 runtime suspend

2021-02-16 Thread Alan Stern
On Tue, Feb 16, 2021 at 10:30:52AM +0900, Jung Daehwan wrote:
> Hello, Alan
> 
> On Mon, Feb 15, 2021 at 12:41:45PM -0500, Alan Stern wrote:
> > On Mon, Feb 15, 2021 at 11:38:58AM +0900, Daehwan Jung wrote:
> > > It seems pm_runtime_put calls runtime_idle callback not runtime_suspend 
> > > callback.
> > 
> > How is this fact related to your patch?
> 
> I think we should cause dwc3_runtime_suspend at the time.

Why do you think so?

> That's why I use pm_runtime_put_sync_suspend.
> 
> > 
> > > It's better to use pm_runtime_put_sync_suspend to allow DWC3 runtime 
> > > suspend.
> > 
> > Why do you think it is better?  The advantage of pm_runtime_put is that 
> > it allows the suspend to occur at a later time in a workqueue thread, so 
> > the caller doesn't have to wait for the device to go into suspend.
> > 
> 
> We can assume DWC3 was already in suspend state if pm_runtime_get_sync
> returns 0. DWC3 resumes due to pm_rumtime_get_sync but it doesn't
> re-enter runtime_suspend but runtime_idle. pm_runtime_put decreases
> usage_count but doesn't cause runtime_suspend.
> 
> 1. USB disconnected
> 2. UDC unbinded
> 3. DWC3 runtime suspend
> 4. UDC binded unexpectedly
> 5. DWC3 runtime resume (pm_runtime_get_sync)
> 6. DWC3 runtime idle (pm_runtime_put)
>-> DWC3 runtime suspend again (pm_runtime_put_sync_suspend)

That's what happens with your patch.  Now look at what happens without 
the patch:

1. USB disconnected
2. UDC unbound
3. DWC3 suspend request is added to waitqueue
4. UDC bound unexpectedly
5. DWC3 suspend request is removed from waitqueue
6. DWC3 runtime idle
7. DWC3 runtime suspend

The difference is that this way, we avoid doing an unnecessary suspend 
and resume, and we save the time they would have required.

> I've talked with Wesley in other patch.
> 
> usbb: dwc3: gadget: skip pullup and set_speed after suspend
> patchwork.kernel.org/project/linux-usb/patch/163968-102424-1-git-send-email-dh10.j...@samsung.com
> 
> @ Wesley
> 
> I think We should guarantee DWC3 enters suspend again at the time.

Why do you think we should guarantee this?

Alan Stern


Re: usb: dwc3: gadget: Change runtime pm function for DWC3 runtime suspend

2021-02-15 Thread Alan Stern
On Mon, Feb 15, 2021 at 11:38:58AM +0900, Daehwan Jung wrote:
> It seems pm_runtime_put calls runtime_idle callback not runtime_suspend 
> callback.

How is this fact related to your patch?

> It's better to use pm_runtime_put_sync_suspend to allow DWC3 runtime suspend.

Why do you think it is better?  The advantage of pm_runtime_put is that 
it allows the suspend to occur at a later time in a workqueue thread, so 
the caller doesn't have to wait for the device to go into suspend.

Alan Stern

> Signed-off-by: Daehwan Jung 
> ---
>  drivers/usb/dwc3/gadget.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index aebcf8e..4a4b93b 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -2229,7 +2229,7 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int 
> is_on)
>*/
>   ret = pm_runtime_get_sync(dwc->dev);
>   if (!ret || ret < 0) {
> - pm_runtime_put(dwc->dev);
> + pm_runtime_put_sync_suspend(dwc->dev);
>   return 0;
>   }
>  
> -- 
> 2.7.4
> 


Re: WARNING in go7007_usb_onboard_write_interrupt/usb_submit_urb

2021-02-07 Thread Alan Stern
Hans:

On Sat, Feb 06, 2021 at 11:40:13PM -0800, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:64eaa0fa platform/chrome: cros_ec_typec: Fix call to typec..
> git tree:   
> https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git usb-testing
> console output: https://syzkaller.appspot.com/x/log.txt?x=12d5c090d0
> kernel config:  https://syzkaller.appspot.com/x/.config?x=29ec25b819cb42f3
> dashboard link: https://syzkaller.appspot.com/bug?extid=d4ebc877b1223f20d5a0
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=16b47dd8d0
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=164896c4d0
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+d4ebc877b1223f20d...@syzkaller.appspotmail.com
> 
> usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
> usb 1-1: SerialNumber: syz
> usb 1-1: config 0 descriptor??
> [ cut here ]
> usb 1-1: BOGUS urb xfer, pipe 2 != type 3
> WARNING: CPU: 0 PID: 2608 at drivers/usb/core/urb.c:493 
> usb_submit_urb+0xd27/0x1540 drivers/usb/core/urb.c:493
> Modules linked in:
> CPU: 0 PID: 2608 Comm: kworker/0:2 Not tainted 5.11.0-rc5-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 01/01/2011
> Workqueue: usb_hub_wq hub_event
> RIP: 0010:usb_submit_urb+0xd27/0x1540 drivers/usb/core/urb.c:493
> Code: 84 d4 02 00 00 e8 e9 e9 ba fd 4c 89 ef e8 f1 3d 1d ff 41 89 d8 44 89 e1 
> 4c 89 f2 48 89 c6 48 c7 c7 00 6a 61 86 e8 31 ee f9 01 <0f> 0b e9 81 f8 ff ff 
> e8 bd e9 ba fd 48 81 c5 30 06 00 00 e9 ad f7
> RSP: 0018:c90006cf6c70 EFLAGS: 00010286
> RAX:  RBX: 0003 RCX: 
> RDX: 88810a895040 RSI: 81299db3 RDI: f52000d9ed80
> RBP: 88810155f8a0 R08: 0001 R09: 
> R10: 8149c4ab R11:  R12: 0002
> R13: 8881121670a0 R14: 8881031c06e0 R15: 888102b46f00
> FS:  () GS:8881f680() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 55bfb7b520d0 CR3: 00010a1f1000 CR4: 001506f0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>  usb_start_wait_urb+0x101/0x4c0 drivers/usb/core/message.c:58
>  usb_internal_control_msg drivers/usb/core/message.c:102 [inline]
>  usb_control_msg+0x31c/0x4a0 drivers/usb/core/message.c:153
>  go7007_usb_onboard_write_interrupt+0x26a/0x340 
> drivers/media/usb/go7007/go7007-usb.c:735

It looks like the go7007 driver isn't very careful about checking that 
the endpoints it uses have the right types.  In particular, this bug was 
caused by not checking that ep2 is a control endpoint (highly unusual to 
have a control endpoint other than 0, but allowed).

Alan Stern


Re: [RESEND PATCH v3 12/12] usb: misc: usbtest: update to use the usb_control_msg_{send|recv}() API

2021-01-27 Thread Alan Stern
;
>   return retval;
> @@ -1845,30 +1842,22 @@ static int ctrl_out(struct usbtest_dev *dev,
>   /* write patterned data */
>   for (j = 0; j < len; j++)
>   buf[j] = (u8)(i + j);
> - retval = usb_control_msg(udev, usb_sndctrlpipe(udev, 0),
> - 0x5b, USB_DIR_OUT|USB_TYPE_VENDOR,
> - 0, 0, buf, len, USB_CTRL_SET_TIMEOUT);
> - if (retval != len) {
> + retval = usb_control_msg_send(udev, 0, 0x5b,
> +   USB_DIR_OUT | USB_TYPE_VENDOR, 0,
> +   0, buf, len, USB_CTRL_SET_TIMEOUT,
> +   GFP_KERNEL);
> + if (retval < 0) {
>   what = "write";
> - if (retval >= 0) {
> - ERROR(dev, "ctrl_out, wlen %d (expected %d)\n",
> - retval, len);
> - retval = -EBADMSG;
> - }
>   break;
>   }

Here buf doesn't need to be copied, and a short write will return an 
error code anyway.

>  
>   /* read it back -- assuming nothing intervened!!  */
> - retval = usb_control_msg(udev, usb_rcvctrlpipe(udev, 0),
> - 0x5c, USB_DIR_IN|USB_TYPE_VENDOR,
> - 0, 0, buf, len, USB_CTRL_GET_TIMEOUT);
> - if (retval != len) {
> + retval = usb_control_msg_recv(udev, 0,
> +   0x5c, USB_DIR_IN|USB_TYPE_VENDOR,
> +   0, 0, buf, len, 
> USB_CTRL_GET_TIMEOUT,
> +   GFP_KERNEL);
> + if (retval < 0) {
>   what = "read";
> - if (retval >= 0) {
> - ERROR(dev, "ctrl_out, rlen %d (expected %d)\n",
> - retval, len);
> - retval = -EBADMSG;
> - }

Similar to one of the cases above.

Alan Stern

>   break;
>   }
>  
> -- 
> 2.25.1
> 


Re: [PATCH] usb: host: ehci-tegra: fix Kconfig depencies

2021-01-25 Thread Alan Stern
On Mon, Jan 25, 2021 at 12:32:30PM +0100, Arnd Bergmann wrote:
> From: Arnd Bergmann 
> 
> Selecting the chipidea driver from the old Kconfig symbol
> can lead to a missing dependency:

Arnd:

I found this whole patch a little confusing.  For example, in the 
sentence above, what does "the old Kconfig symbol" refer to?

Comparing the various Kconfig files, I see what the problem is.  The 
commit which this one fixes made CONFIG_EHCI_TEGRA select 
CONFIG_USB_CHIPIDEA, but it didn't make EHCI_TEGRA depend on the things 
that USB_CHIPIDEA depends on.  Can you please state this more explicitly 
in the patch description?

> WARNING: unmet direct dependencies detected for USB_CHIPIDEA
>   Depends on [m]: USB_SUPPORT [=y] && (USB_EHCI_HCD [=y] && USB_GADGET [=m] 
> || USB_EHCI_HCD [=y] && !USB_GADGET [=m] || !USB_EHCI_HCD [=y] && USB_GADGET 
> [=m]) && HAS_DMA [=y]
>   Selected by [y]:
>   - USB_EHCI_TEGRA [=y] && USB_SUPPORT [=y] && USB [=y] && USB_EHCI_HCD [=y] 
> && ARCH_TEGRA [=y]
> aarch64-linux-ld: drivers/usb/chipidea/otg.o: in function 
> `ci_handle_vbus_change':
> otg.c:(.text+0x3c8): undefined reference to `usb_gadget_vbus_connect'
> aarch64-linux-ld: otg.c:(.text+0x42c): undefined reference to 
> `usb_gadget_vbus_disconnect'
> aarch64-linux-ld: drivers/usb/chipidea/otg.o: in function `ci_otg_work':
> otg.c:(.text+0x5d4): undefined reference to `usb_gadget_vbus_disconnect'
> ...
> 
> Duplicate the dependency to ensure that this driver can
> only be a loadable module if one of its dependencies is.
> 
> Fixes: c3590c7656fb ("usb: host: ehci-tegra: Remove the driver")
> Signed-off-by: Arnd Bergmann 
> ---
>  drivers/usb/host/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/usb/host/Kconfig b/drivers/usb/host/Kconfig
> index 160e5d3927e2..66b01b619ecd 100644
> --- a/drivers/usb/host/Kconfig
> +++ b/drivers/usb/host/Kconfig
> @@ -269,6 +269,7 @@ config USB_EHCI_HCD_AT91
>  config USB_EHCI_TEGRA
>   tristate "NVIDIA Tegra HCD support"
>   depends on ARCH_TEGRA
> + depends on ((USB_EHCI_HCD && USB_GADGET) || (USB_EHCI_HCD && 
> !USB_GADGET) || (!USB_EHCI_HCD && USB_GADGET)) && HAS_DMA
>   select USB_CHIPIDEA
>   select USB_CHIPIDEA_HOST
>   select USB_CHIPIDEA_TEGRA

Isn't there at least one other missing dependency?  This entry selects 
USB_CHIPIDEA_TEGRA, which depends on OF.  So shouldn't this entry also 
depend on OF?  Or does the Kconfig system detect that for us?

Also, while I'm no expert on the Kconfig language, it seems that the new 
"depends" line could be a lot easier to understand if it was refactored 
with some comments added.  Yes, I realize you just copied the existing 
dependency from the USB_CHIPIDEA entry -- that one could stand to be 
cleaned up as well.

For instance, how about putting the HAS_DMA part into a separate line, 
since it's unrelated to the other stuff?  And the rest looks like it 
could be changed to:

    depends on USB_EHCI_HCD || USB_GADGET

although that probably isn't quite valid.  Still, can't it be changed to 
something simpler than

(USB_EHCI_HCD && USB_GADGET) || (USB_EHCI_HCD && !USB_GADGET) ||
(!USB_EHCI_HCD && USB_GADGET)

?

Alan Stern


Re: [PATCH v6 1/4] usb: gadget: udc: core: Introduce check_config to verify USB configuration

2021-01-22 Thread Alan Stern
On Thu, Jan 21, 2021 at 08:01:37PM -0800, Wesley Cheng wrote:
> Some UDCs may have constraints on how many high bandwidth endpoints it can
> support in a certain configuration.  This API allows for the composite
> driver to pass down the total number of endpoints to the UDC so it can verify
> it has the required resources to support the configuration.
> 
> Signed-off-by: Wesley Cheng 


> --- a/include/linux/usb/gadget.h
> +++ b/include/linux/usb/gadget.h
> @@ -328,6 +328,7 @@ struct usb_gadget_ops {
>   struct usb_ep *(*match_ep)(struct usb_gadget *,
>   struct usb_endpoint_descriptor *,
>   struct usb_ss_ep_comp_descriptor *);
> + int (*check_config)(struct usb_gadget *gadget, unsigned long 
> ep_map);
>  };
>  
>  /**
> @@ -607,6 +608,7 @@ int usb_gadget_connect(struct usb_gadget *gadget);
>  int usb_gadget_disconnect(struct usb_gadget *gadget);
>  int usb_gadget_deactivate(struct usb_gadget *gadget);
>  int usb_gadget_activate(struct usb_gadget *gadget);
> +int usb_gadget_check_config(struct usb_gadget *gadget, unsigned long ep_map);
>  #else
>  static inline int usb_gadget_frame_number(struct usb_gadget *gadget)
>  { return 0; }

Don't you also need an entry for the case where CONFIG_USB_GADGET isn't 
enabled?

Alan Stern


Re: [PATCH] usb: gadget: f_mass_storage: cahnge wait_event to wait_event_timeout

2021-01-21 Thread Alan Stern
On Thu, Jan 21, 2021 at 03:56:45PM +0900, Oh Eomji wrote:
> Changed to return a timeout error if there is no response for a certain
> period of time in order to solve the problem of waiting for a event
> complete while executing unbind.
> 
> Signed-off-by: Oh Eomji 
> ---
>  drivers/usb/gadget/function/f_mass_storage.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/usb/gadget/function/f_mass_storage.c 
> b/drivers/usb/gadget/function/f_mass_storage.c
> index 950c943..b474840 100644
> --- a/drivers/usb/gadget/function/f_mass_storage.c
> +++ b/drivers/usb/gadget/function/f_mass_storage.c
> @@ -3000,7 +3000,7 @@ static void fsg_unbind(struct usb_configuration *c, 
> struct usb_function *f)
>   if (fsg->common->fsg == fsg) {
>   __raise_exception(fsg->common, FSG_STATE_CONFIG_CHANGE, NULL);
>   /* FIXME: make interruptible or killable somehow? */
> - wait_event(common->fsg_wait, common->fsg != fsg);
> + wait_event_timeout(common->fsg_wait, common->fsg != fsg, HZ / 
> 4);
>   }
>  
>   usb_free_all_descriptors(>function);

No, no, no!

This patch completely misunderstands the purpose of the wait_event call.  
The reason it isn't interruptible is not because that would be difficult 
-- all it takes is adding a timeout argument, as you did here.

The reason is because the driver will get invalid memory accesses if 
fsg_unbind returns too early.  fsg will be deallocated, but 
fsg_set_interface will continue to use it.  This patch completely 
ignores that issue.

Was there any real reason for writing this patch?  Does it solve a 
real-world problem?  Did you encounter a situation where the wait_event 
call would wait for more than 1/4 second?

Alan Stern


Re: [PATCH 1/1] usb: xhci: setup packets don't need DMA mapping

2021-01-14 Thread Alan Stern
On Thu, Jan 14, 2021 at 01:04:02PM +0800, Peter Chen wrote:
> On 21-01-14 11:59:07, Daewoong Kim wrote:
> > DMA mapping of urb->setup_packet is not necessary for xHCI host
> > controllers. The xHCI specification says that Setup Stage TRB includes
> > whole Setup Data; therefore, urb->setup_dma will not be used in the xhci
> > HCD code.
> > 
> 
> How about bypass map/unmap operation for xHCI control transfer directly?
> 
> diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
> index 91ab81c3fc79..0a0ab14b7638 100644
> --- a/drivers/usb/host/xhci.c
> +++ b/drivers/usb/host/xhci.c
> @@ -1374,7 +1374,8 @@ static int xhci_map_urb_for_dma(struct usb_hcd *hcd, 
> struct urb *urb,
>  
>   xhci = hcd_to_xhci(hcd);
>  
> - if (xhci_urb_suitable_for_idt(urb))
> + if (xhci_urb_suitable_for_idt(urb) ||
> + (usb_endpoint_xfer_control(>ep->desc)))
>   return 0;

Would this affect the map/unmap operations for the DATA packets in a 
control transfer, along with the SETUP packet?

Alan Stern


Re: WARNING in corrupted/usb_submit_urb (2)

2021-01-13 Thread Alan Stern
On Tue, Jan 12, 2021 at 09:20:10PM -0800, syzbot wrote:
> syzbot suspects this issue was fixed by commit:
> 
> commit c318840fb2a42ce25febc95c4c19357acf1ae5ca
> Author: Alan Stern 
> Date:   Wed Dec 30 16:20:44 2020 +
> 
> USB: Gadget: dummy-hcd: Fix shift-out-of-bounds bug
> 
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=17a8972f50
> start commit:   cd796ed3 Merge tag 'trace-v5.10-rc7' of git://git.kernel.o..
> git tree:   upstream
> kernel config:  https://syzkaller.appspot.com/x/.config?x=59df2a4dced5f928
> dashboard link: https://syzkaller.appspot.com/bug?extid=4feb9bb7280fb554f021
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=1653b8a750
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=170d946b50
> 
> If the result looks correct, please mark the issue as fixed by replying with:
> 
> #syz fix: USB: Gadget: dummy-hcd: Fix shift-out-of-bounds bug
> 
> For information about bisection process see: https://goo.gl/tpsmEJ#bisection

I strongly believe that syzbot's conclusion is wrong.  The commit it 
identified has nothing to do with the original problem.

Alan Stern


Re: [PATCH v2] USB:ehci:fix an interrupt calltrace error

2021-01-11 Thread Alan Stern
On Mon, Jan 11, 2021 at 07:29:37PM +0800, Longfang Liu wrote:
> The system that use Synopsys USB host controllers goes to suspend
> when using USB audio player. This causes the USB host controller
> continuous send interrupt signal to system, When the number of
> interrupts exceeds 10, the system will forcibly close the
> interrupts and output a calltrace error.
> 
> When the system goes to suspend, the last interrupt is reported to
> the driver. At this time, the system has set the state to suspend.
> This causes the last interrupt to not be processed by the system and
> not clear the interrupt flag. This uncleared interrupt flag constantly
> triggers new interrupt event. This causing the driver to receive more
> than 100,000 interrupts, which causes the system to forcibly close the
> interrupt report and report the calltrace error.
> 
> so, when the driver goes to sleep and changes the system state to
> suspend, the interrupt flag needs to be cleared.
> 
> Signed-off-by: Longfang Liu 

Acked-by: Alan Stern 

> ---
> 
> Changes in v2:
> - updated cleared registers
> 
>  drivers/usb/host/ehci-hub.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/usb/host/ehci-hub.c b/drivers/usb/host/ehci-hub.c
> index ce0eaf7..a99c1ac 100644
> --- a/drivers/usb/host/ehci-hub.c
> +++ b/drivers/usb/host/ehci-hub.c
> @@ -346,8 +346,12 @@ static int ehci_bus_suspend (struct usb_hcd *hcd)
>  
>   unlink_empty_async_suspended(ehci);
>  
> + /* Some Synopsys controllers mistakenly leave IAA turned on */
> + ehci_writel(ehci, STS_IAA, >regs->status);
> +
>   /* Any IAA cycle that started before the suspend is now invalid */
>   end_iaa_cycle(ehci);
> +
>   ehci_handle_start_intr_unlinks(ehci);
>   ehci_handle_intr_unlinks(ehci);
>   end_free_itds(ehci);
> -- 
> 2.8.1
> 


Re: [PATCH v2 1/2] usb: ohci: Default to per-port over-current protection

2021-01-09 Thread Alan Stern
On Sun, Dec 27, 2020 at 12:22:34PM +0100, Paul Kocialkowski wrote:
> Hi,

Sorry it has taken so long to respond to this.  The holidays intervened, 
but that's no excuse.

> On Fri 11 Sep 20, 09:25, Hamish Martin wrote:
> > Some integrated OHCI controller hubs do not expose all ports of the hub
> > to pins on the SoC. In some cases the unconnected ports generate
> > spurious over-current events. For example the Broadcom 56060/Ranger 2 SoC
> > contains a nominally 3 port hub but only the first port is wired.
> > 
> > Default behaviour for ohci-platform driver is to use global over-current
> > protection mode (AKA "ganged"). This leads to the spurious over-current
> > events affecting all ports in the hub.
> > 
> > We now alter the default to use per-port over-current protection.
> 
> This specific patch lead to breaking OHCI on my mom's laptop (whom was about
> to buy a new one thinking the hardware had failed). I get no OHCI interrupt at
> all and no USB 1 device is ever detected.
> 
> I haven't really found a reasonable explanation about why that is, but here
> are some notes I was able to collect:
> - The issue showed up on 5.8,18 and 5.9.15, which don't include the patch
>   from this series that sets distrust_firmware = false; This results in the 
> NPS
>   bit being set via OHCI_QUIRK_HUB_POWER.
> - Adding val &= ~RH_A_PSM; (as was done before this change) solves the issue
>   which is weird because the bit is supposed to be inactive when NPS is set;
> - Setting ohci_hcd.distrust_firmware=0 in the cmdline results in not setting
>   the NPS bit and also solves the issue;
> - The initial value of the register at function entry is 0x1001104 (PSM bit
>   is set, NPS is unset);
> - The OHCI controller is the following:
> 00:03.0 USB controller: Silicon Integrated Systems [SiS] USB 1.1 Controller 
> (rev 0f) (prog-if 10 [OHCI])
>   Subsystem: ASUSTeK Computer Inc. Device 1aa7

Great reporting -- thanks.

> Does that make any sense to you?
> 
> I really wonder what a proper fix could be and here are some suggestions:
> - Adding a specific quirk to clear the PSM bit for this hardware which seems 
> to
>   consider the bit regardless of NPS;

We don't need a quirk for this.  There shouldn't be anything wrong with 
_always_ clearing PSM whenever NPS is set, since the controller is 
supposed to ignore PSM under that condition.

Would you like to submit a patch for this?

> - Adding the patch that sets distrust_firmware = false to stable branches;

That's certainly reasonable.  Nobody has reported any problems caused by 
that patch, so adding it to the stable branches should be safe enough.

> What do you think?

We could even do both.  That would help if, for example, somebody 
decided to set ohci_hcd.distrust_firmware=true explicitly.

Greg, in the meantime can we have commit c4005a8f65ed ("usb: ohci: Make 
distrust_firmware param default to false") added to all the stable 
kernels which have back-ported versions of commit b77d2a0a223b?

Alan Stern


Re: [PATCH] media: zr364xx: fix memory leaks in probe()

2021-01-06 Thread Alan Stern
On Wed, Jan 06, 2021 at 01:10:05PM +0300, Dan Carpenter wrote:
> Syzbot discovered that the probe error handling doesn't clean up the
> resources allocated in zr364xx_board_init().  There are several
> related bugs in this code so I have re-written the error handling.

Dan:

I recently sent in a patch for a similar problem in the gspca driver
(commit e469d0b09a19 "media: gspca: Fix memory leak in probe").  It
seems there may be similar issues in that driver: one single function
call tries to undo an indeterminate number of initializations.

I don't know enough about these subsystems to evaluate this.  Can you
take a look at it?

Thank,

Alan Stern


Re: Re: UBSAN: shift-out-of-bounds in dummy_hub_control

2020-12-29 Thread Alan Stern
On Tue, Dec 29, 2020 at 08:33:39AM -0800, syzbot wrote:
> > #syz test: upstream e37b12e4
>
> "upstream" does not look like a valid git repo address.

I thought syzbot had been changed to recognize "upstream" as a valid
repo name.

Alan Stern

#syz test: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
e37b12e4

Index: usb-devel/drivers/usb/gadget/udc/dummy_hcd.c
===
--- usb-devel.orig/drivers/usb/gadget/udc/dummy_hcd.c
+++ usb-devel/drivers/usb/gadget/udc/dummy_hcd.c
@@ -2114,9 +2114,21 @@ static int dummy_hub_control(
dum_hcd->port_status &= ~USB_PORT_STAT_POWER;
set_link_state(dum_hcd);
break;
-   default:
+   case USB_PORT_FEAT_ENABLE:
+   case USB_PORT_FEAT_C_ENABLE:
+   case USB_PORT_FEAT_C_SUSPEND:
+   /* Not allowed for USB-3 */
+   if (hcd->speed == HCD_USB3)
+   goto error;
+   fallthrough;
+   case USB_PORT_FEAT_C_CONNECTION:
+   case USB_PORT_FEAT_C_RESET:
dum_hcd->port_status &= ~(1 << wValue);
set_link_state(dum_hcd);
+   break;
+   default:
+   /* Disallow INDICATOR and C_OVER_CURRENT */
+   goto error;
}
break;
case GetHubDescriptor:
@@ -2277,18 +2289,17 @@ static int dummy_hub_control(
 */
dum_hcd->re_timeout = jiffies + msecs_to_jiffies(50);
fallthrough;
+   case USB_PORT_FEAT_C_CONNECTION:
+   case USB_PORT_FEAT_C_RESET:
+   case USB_PORT_FEAT_C_ENABLE:
+   case USB_PORT_FEAT_C_SUSPEND:
+   /* Not allowed for USB-3, and ignored for USB-2 */
+   if (hcd->speed == HCD_USB3)
+   goto error;
+   break;
default:
-   if (hcd->speed == HCD_USB3) {
-   if ((dum_hcd->port_status &
-USB_SS_PORT_STAT_POWER) != 0) {
-   dum_hcd->port_status |= (1 << wValue);
-   }
-   } else
-   if ((dum_hcd->port_status &
-USB_PORT_STAT_POWER) != 0) {
-   dum_hcd->port_status |= (1 << wValue);
-   }
-   set_link_state(dum_hcd);
+   /* Disallow TEST, INDICATOR, and C_OVER_CURRENT */
+   goto error;
}
break;
case GetPortErrorCount:



Re: UBSAN: shift-out-of-bounds in dummy_hub_control

2020-12-29 Thread Alan Stern
On Fri, Dec 25, 2020 at 12:05:22PM -0800, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:e37b12e4 Merge tag 'for-linus-5.11-ofs1' of git://git.kern..
> git tree:   upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1742993750
> kernel config:  https://syzkaller.appspot.com/x/.config?x=98408202fed1f636
> dashboard link: https://syzkaller.appspot.com/bug?extid=5925509f78293baa7331
> compiler:   gcc (GCC) 10.1.0-syz 20200507
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=1781fc5b50
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=157cd12350
> 
> The issue was bisected to:
> 
> commit 8442b02bf3c6770e0d7e7ea17be36c30e95987b6
> Author: Andrey Konovalov 
> Date:   Mon Oct 21 14:20:58 2019 +
> 
> USB: dummy-hcd: increase max number of devices to 32
> 
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=1631d0db50
> final oops: https://syzkaller.appspot.com/x/report.txt?x=1531d0db50
> console output: https://syzkaller.appspot.com/x/log.txt?x=1131d0db50
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+5925509f78293baa7...@syzkaller.appspotmail.com
> Fixes: 8442b02bf3c6 ("USB: dummy-hcd: increase max number of devices to 32")
> 
> 
> UBSAN: shift-out-of-bounds in drivers/usb/gadget/udc/dummy_hcd.c:2293:33
> shift exponent 257 is too large for 32-bit type 'int'
> CPU: 0 PID: 8526 Comm: syz-executor949 Not tainted 5.10.0-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:79 [inline]
>  dump_stack+0x107/0x163 lib/dump_stack.c:120
>  ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
>  __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395
>  dummy_hub_control.cold+0x1a/0xbc drivers/usb/gadget/udc/dummy_hcd.c:2293
>  rh_call_control drivers/usb/core/hcd.c:683 [inline]
>  rh_urb_enqueue drivers/usb/core/hcd.c:841 [inline]
>  usb_hcd_submit_urb+0xcaa/0x22d0 drivers/usb/core/hcd.c:1544
>  usb_submit_urb+0x6e4/0x1560 drivers/usb/core/urb.c:585
>  usb_start_wait_urb+0x101/0x4c0 drivers/usb/core/message.c:58
>  usb_internal_control_msg drivers/usb/core/message.c:102 [inline]
>  usb_control_msg+0x31c/0x4a0 drivers/usb/core/message.c:153
>  do_proc_control+0x4cb/0x9c0 drivers/usb/core/devio.c:1165
>  proc_control drivers/usb/core/devio.c:1191 [inline]
>  usbdev_do_ioctl drivers/usb/core/devio.c:2535 [inline]
>  usbdev_ioctl+0x12c1/0x3b20 drivers/usb/core/devio.c:2708
>  vfs_ioctl fs/ioctl.c:48 [inline]
>  __do_sys_ioctl fs/ioctl.c:753 [inline]
>  __se_sys_ioctl fs/ioctl.c:739 [inline]
>  __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739
>  do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x443f29
> Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 
> 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 
> 83 fb d7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:7ffc10df4328 EFLAGS: 0246 ORIG_RAX: 0010
> RAX: ffda RBX: 004002e0 RCX: 00443f29
> RDX: 2000 RSI: c0185500 RDI: 0003
> RBP: 006ce018 R08:  R09: 004002e0
> R10: 000f R11: 0246 R12: 00401bb0
> R13: 00401c40 R14:  R15: 
> ====

The cause is pretty obvious.  dummy-hcd assumes that requests sent to 
the root hub are always valid, which isn't always true when they come 
from userspace.

Alan Stern

#syz test: upstream e37b12e4

Index: usb-devel/drivers/usb/gadget/udc/dummy_hcd.c
===
--- usb-devel.orig/drivers/usb/gadget/udc/dummy_hcd.c
+++ usb-devel/drivers/usb/gadget/udc/dummy_hcd.c
@@ -2114,9 +2114,21 @@ static int dummy_hub_control(
dum_hcd->port_status &= ~USB_PORT_STAT_POWER;
set_link_state(dum_hcd);
break;
-   default:
+   case USB_PORT_FEAT_ENABLE:
+   case USB_PORT_FEAT_C_ENABLE:
+   case USB_PORT_FEAT_C_SUSPEND:
+   /* Not allowed for USB-3 */
+   if (hcd->speed == HCD_USB3)
+   goto error;
+   fallthrough;
+   case USB_PORT_FEAT_C_CONNECTION:
+   case USB_PORT_FEAT_C_RESET:
dum_hc

Re: [PATCH v2 7/8] usb: host: ehci-tegra: Remove the driver

2020-12-17 Thread Alan Stern
On Thu, Dec 17, 2020 at 12:40:06PM +0300, Dmitry Osipenko wrote:
> The ChipIdea driver now provides USB2 host mode support for NVIDIA Tegra
> SoCs. The ehci-tegra driver is obsolete now, remove it and redirect the
> older Kconfig entry to the CI driver.
> 
> Tested-by: Matt Merhar 
> Tested-by: Nicolas Chauvet 
> Tested-by: Peter Geis 
> Tested-by: Ion Agorria 
> Signed-off-by: Dmitry Osipenko 
> ---
>  drivers/usb/host/Kconfig  |   8 +-
>  drivers/usb/host/Makefile |   1 -
>  drivers/usb/host/ehci-tegra.c | 604 --
>  3 files changed, 6 insertions(+), 607 deletions(-)
>  delete mode 100644 drivers/usb/host/ehci-tegra.c
> 
> diff --git a/drivers/usb/host/Kconfig b/drivers/usb/host/Kconfig
> index 31e59309da1f..318315602337 100644
> --- a/drivers/usb/host/Kconfig
> +++ b/drivers/usb/host/Kconfig
> @@ -269,9 +269,13 @@ config USB_EHCI_HCD_AT91
>  config USB_EHCI_TEGRA
>   tristate "NVIDIA Tegra HCD support"
>   depends on ARCH_TEGRA
> - select USB_EHCI_ROOT_HUB_TT
> - select USB_TEGRA_PHY
> + select USB_CHIPIDEA
> + select USB_CHIPIDEA_HOST
> + select USB_CHIPIDEA_TEGRA
>   help
> +   This option is deprecated now and the driver was removed, use
> +   USB_CHIPIDEA_TEGRA instead.
> +
> This driver enables support for the internal USB Host Controllers
> found in NVIDIA Tegra SoCs. The controllers are EHCI compliant.

It doesn't really make sense to say "... the driver was removed..."
and then in the next paragraph say "This driver enables...".  You
should change the second paragraph to begin: "Enable support for...".

That's a minor matter, though, and you can easily fix it in the next
patch version.  Everything else is okay.

Acked-by: Alan Stern 

Alan Stern


Re: [PATCH v1 7/8] usb: host: ehci-tegra: Remove the driver

2020-12-16 Thread Alan Stern
On Wed, Dec 16, 2020 at 08:09:51PM +0300, Dmitry Osipenko wrote:
> 16.12.2020 19:45, Alan Stern пишет:
> > On Tue, Dec 15, 2020 at 11:21:12PM +0300, Dmitry Osipenko wrote:
> >> The ChipIdea driver now provides USB2 host mode support for NVIDIA Tegra
> >> SoCs. The ehci-tegra driver is obsolete now, remove it.
> >>
> >> Tested-by: Matt Merhar 
> >> Tested-by: Nicolas Chauvet 
> >> Tested-by: Peter Geis 
> >> Signed-off-by: Dmitry Osipenko 
> >> ---
> >>  drivers/usb/host/Kconfig  |   9 -
> >>  drivers/usb/host/Makefile |   1 -
> >>  drivers/usb/host/ehci-tegra.c | 604 --
> >>  3 files changed, 614 deletions(-)
> >>  delete mode 100644 drivers/usb/host/ehci-tegra.c
> >>
> >> diff --git a/drivers/usb/host/Kconfig b/drivers/usb/host/Kconfig
> >> index 31e59309da1f..9c9e6ff9c43a 100644
> >> --- a/drivers/usb/host/Kconfig
> >> +++ b/drivers/usb/host/Kconfig
> >> @@ -266,15 +266,6 @@ config USB_EHCI_HCD_AT91
> >>  Enables support for the on-chip EHCI controller on
> >>  Atmel chips.
> >>  
> >> -config USB_EHCI_TEGRA
> >> -  tristate "NVIDIA Tegra HCD support"
> >> -  depends on ARCH_TEGRA
> >> -  select USB_EHCI_ROOT_HUB_TT
> >> -  select USB_TEGRA_PHY
> >> -  help
> >> -This driver enables support for the internal USB Host Controllers
> >> -found in NVIDIA Tegra SoCs. The controllers are EHCI compliant.
> > 
> > For people upgrading from earlier kernel versions, do you think it
> > would help to add a pointer here telling them which Kconfig option
> > they need to enable now in order to get this functionality?
> 
> Could you please clarify what do you mean by the "pointer"?

See the entries for USB_CNS3XXX_EHCI, USB_OCTEON_EHCI, 
USB_OHCI_HCD_OMAP3, and USB_OHCI_SH in that Kconfig file for examples.

Alan Stern


Re: [PATCH v1 7/8] usb: host: ehci-tegra: Remove the driver

2020-12-16 Thread Alan Stern
On Tue, Dec 15, 2020 at 11:21:12PM +0300, Dmitry Osipenko wrote:
> The ChipIdea driver now provides USB2 host mode support for NVIDIA Tegra
> SoCs. The ehci-tegra driver is obsolete now, remove it.
> 
> Tested-by: Matt Merhar 
> Tested-by: Nicolas Chauvet 
> Tested-by: Peter Geis 
> Signed-off-by: Dmitry Osipenko 
> ---
>  drivers/usb/host/Kconfig  |   9 -
>  drivers/usb/host/Makefile |   1 -
>  drivers/usb/host/ehci-tegra.c | 604 --
>  3 files changed, 614 deletions(-)
>  delete mode 100644 drivers/usb/host/ehci-tegra.c
> 
> diff --git a/drivers/usb/host/Kconfig b/drivers/usb/host/Kconfig
> index 31e59309da1f..9c9e6ff9c43a 100644
> --- a/drivers/usb/host/Kconfig
> +++ b/drivers/usb/host/Kconfig
> @@ -266,15 +266,6 @@ config USB_EHCI_HCD_AT91
> Enables support for the on-chip EHCI controller on
> Atmel chips.
>  
> -config USB_EHCI_TEGRA
> - tristate "NVIDIA Tegra HCD support"
> - depends on ARCH_TEGRA
> - select USB_EHCI_ROOT_HUB_TT
> - select USB_TEGRA_PHY
> - help
> -   This driver enables support for the internal USB Host Controllers
> -   found in NVIDIA Tegra SoCs. The controllers are EHCI compliant.

For people upgrading from earlier kernel versions, do you think it
would help to add a pointer here telling them which Kconfig option
they need to enable now in order to get this functionality?

Alan Stern


[tip: core/rcu] tools: memory-model: Document that the LKMM can easily miss control dependencies

2020-12-13 Thread tip-bot2 for Alan Stern
The following commit has been merged into the core/rcu branch of tip:

Commit-ID: 9270e1a744f8ed953009b0e94b26ed0912d9ec1c
Gitweb:
https://git.kernel.org/tip/9270e1a744f8ed953009b0e94b26ed0912d9ec1c
Author:Alan Stern 
AuthorDate:Sat, 03 Oct 2020 21:40:22 -04:00
Committer: Paul E. McKenney 
CommitterDate: Mon, 26 Oct 2020 16:18:53 -07:00

tools: memory-model: Document that the LKMM can easily miss control dependencies

Add a small section to the litmus-tests.txt documentation file for
the Linux Kernel Memory Model explaining that the memory model often
fails to recognize certain control dependencies.

Suggested-by: Akira Yokosawa 
Signed-off-by: Alan Stern 
Reviewed-by: Joel Fernandes (Google) 
Signed-off-by: Paul E. McKenney 
---
 tools/memory-model/Documentation/litmus-tests.txt | 17 ++-
 1 file changed, 17 insertions(+)

diff --git a/tools/memory-model/Documentation/litmus-tests.txt 
b/tools/memory-model/Documentation/litmus-tests.txt
index 2f840dc..8a9d5d2 100644
--- a/tools/memory-model/Documentation/litmus-tests.txt
+++ b/tools/memory-model/Documentation/litmus-tests.txt
@@ -946,6 +946,23 @@ Limitations of the Linux-kernel memory model (LKMM) 
include:
carrying a dependency, then the compiler can break that dependency
by substituting a constant of that value.
 
+   Conversely, LKMM sometimes doesn't recognize that a particular
+   optimization is not allowed, and as a result, thinks that a
+   dependency is not present (because the optimization would break it).
+   The memory model misses some pretty obvious control dependencies
+   because of this limitation.  A simple example is:
+
+   r1 = READ_ONCE(x);
+   if (r1 == 0)
+   smp_mb();
+   WRITE_ONCE(y, 1);
+
+   There is a control dependency from the READ_ONCE to the WRITE_ONCE,
+   even when r1 is nonzero, but LKMM doesn't realize this and thinks
+   that the write may execute before the read if r1 != 0.  (Yes, that
+   doesn't make sense if you think about it, but the memory model's
+   intelligence is limited.)
+
 2. Multiple access sizes for a single variable are not supported,
and neither are misaligned or partially overlapping accesses.
 


Re: RFC: arch: shall we have generic readl_be()/writel_be()/... or in_be32()/out_be32() ?

2020-12-09 Thread Alan Stern
On Wed, Dec 09, 2020 at 12:08:51PM +0100, Enrico Weigelt, metux IT consult 
wrote:
> Hello folks,
> 
> 
> while trying to make some more drivers compile-test'able, i've
> discovered some arch specific calls in here, eg.:
> 
> 
> In file included from
> /home/nekrad/src/apu2-dev/pkg/kernel.apu2.git/drivers/usb/host/ehci-spear.c:23:
> /home/nekrad/src/apu2-dev/pkg/kernel.apu2.git/drivers/usb/host/ehci.h:
> In function 'ehci_readl':
> /home/nekrad/src/apu2-dev/pkg/kernel.apu2.git/drivers/usb/host/ehci.h:743:3:
> error: implicit declaration of function 'readl_be'; did you mean
> 'readsb'? [-Werror=implicit-function-declaration]
>   743 |   readl_be(regs) :
>   |   ^~~~
>   |   readsb
> /home/nekrad/src/apu2-dev/pkg/kernel.apu2.git/drivers/usb/host/ehci.h:
> In function 'ehci_writel':
> /home/nekrad/src/apu2-dev/pkg/kernel.apu2.git/drivers/usb/host/ehci.h:767:3:
> error: implicit declaration of function 'writel_be'; did you mean
> 'writesb'? [-Werror=implicit-function-declaration]
>   767 |   writel_be(val, regs) :
>   |   ^
>   |   writesb
> In file included from
> /home/nekrad/src/apu2-dev/pkg/kernel.apu2.git/drivers/usb/host/ehci-hcd.c:97:
> /home/nekrad/src/apu2-dev/pkg/kernel.apu2.git/drivers/usb/host/ehci.h:
> In function 'ehci_readl':
> /home/nekrad/src/apu2-dev/pkg/kernel.apu2.git/drivers/usb/host/ehci.h:743:3:
> error: implicit declaration of function 'readl_be'; did you mean
> 'readsb'? [-Werror=implicit-function-declaration]
>   743 |   readl_be(regs) :
>   |   ^~~~
>   |   readsb
> /home/nekrad/src/apu2-dev/pkg/kernel.apu2.git/drivers/usb/host/ehci.h:
> In function 'ehci_writel':
> /home/nekrad/src/apu2-dev/pkg/kernel.apu2.git/drivers/usb/host/ehci.h:767:3:
> error: implicit declaration of function 'writel_be'; did you mean
> 'writesb'? [-Werror=implicit-function-declaration]
>   767 |   writel_be(val, regs) :
>   |   ^
>   |   writesb
> 
> 
> It seems that only few archs (microblaze, ppc, sparc) define them.
> 
> Also drivers/usb/host/ehci.h defines them, but only for one particular
> arch/subarch.
> 
> IIRC, these funcs are for accessing hw registers that are in BEs, so
> BE cpus can do direct access, while LE cpus need to do a conversion.
> 
> OTOH, we also have in_be32() / out_be32. They seem to do quite the same
> thing, referenced much more often, but also just defined on a few archs.
> 
> 
> I believe we should have generic functions, that all archs implement
> (possibly doing automatic conversion, if necessary), which are used
> by everybody else.
> 
> What's your oppionion on that ?

It certainly seems reasonable.  Another possibility, less stringent, is 
to require that definitions exist on all architectures that can have 
big-endian MMIO (or port-based IO).  For example, any architecture 
which might select CONFIG_EHCI_BIG_ENDIAN_MMIO, as used in ehci.h.

Otherwise we're left in the unfortunate position of having to provide 
definitions for these functions, but _only_ on architectures that don't 
already make their own definitions -- basically an impossible task.

Alan Stern


Re: linux-next: build failure after merge of the scsi-mkp tree

2020-12-08 Thread Alan Stern
On Tue, Dec 08, 2020 at 08:38:59PM +1100, Stephen Rothwell wrote:
> Hi all,
> 
> On Tue, 8 Dec 2020 20:28:53 +1100 Stephen Rothwell  
> wrote:
> >
> > Hi all,
> > 
> > After merging the scsi-mkp tree, today's linux-next build (sparc64
> > defconfig) failed like this:
> > 
> > drivers/mtd/nand/raw/intel-nand-controller.c:17:10: fatal error: 
> > linux/mtd/nand_ecc.h: No such file or directory
> >17 | #include 
> >   |  ^~
> 
> Clearly, it did not fail like that :-)
> 
> block/blk-core.c: In function 'blk_queue_enter':
> block/blk-core.c:443:18: error: 'struct request_queue' has no member named 
> 'rpm_status'; did you mean 'stats'?
> if ((pm && q->rpm_status != RPM_SUSPENDED) ||
>   ^~
>   stats
> 
> > Caused by commit
> > 
> >   81a395cdc176 ("scsi: block: Do not accept any requests while suspended")
> > 
> > # CONFIG_PM is not set
> > 
> > I have applied the following patch:
> > 
> > From: Stephen Rothwell 
> > Date: Tue, 8 Dec 2020 20:12:33 +1100
> > Subject: [PATCH] scsi: block: fix for "scsi: block: Do not accept any 
> > requests while suspended"
> > 
> > Fixes: 81a395cdc176 ("scsi: block: Do not accept any requests while 
> > suspended")
> > Signed-off-by: Stephen Rothwell 
> > ---
> >  block/blk-core.c | 18 ++
> >  1 file changed, 14 insertions(+), 4 deletions(-)
> > 
> > diff --git a/block/blk-core.c b/block/blk-core.c
> > index a71a5c9429d6..9c9aec1382be 100644
> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -421,6 +421,18 @@ void blk_cleanup_queue(struct request_queue *q)
> >  }
> >  EXPORT_SYMBOL(blk_cleanup_queue);
> >  
> > +#ifdef CONFIG_PM
> > +static bool rq_suspended(struct request_queue *q)
> > +{
> > +   return q->rpm_status == RPM_SUSPENDED;
> > +}
> > +#else
> > +static bool rq_suspended(struct request_queue *q)
> > +{
> > +   return false;
> > +}
> > +#endif
> > +
> >  /**
> >   * blk_queue_enter() - try to increase q->q_usage_counter
> >   * @q: request queue pointer
> > @@ -440,12 +452,10 @@ int blk_queue_enter(struct request_queue *q, 
> > blk_mq_req_flags_t flags)
> >  * responsible for ensuring that that counter is
> >  * globally visible before the queue is unfrozen.
> >  */
> > -   if ((pm && q->rpm_status != RPM_SUSPENDED) ||
> > -   !blk_queue_pm_only(q)) {
> > +   if ((pm && !rq_suspended(q)) || !blk_queue_pm_only(q))
> > success = true;
> > -   } else {
> > +   else
> > percpu_ref_put(>q_usage_counter);
> > -   }
> > }
> > rcu_read_unlock();

Yes, that certainly is the proper fix.  It's all to easy to miss these 
issues that depend on your kernel configuration.

Bart, can you fold it into a new version of the patch?

Alan Stern


  1   2   3   4   5   6   7   8   9   10   >