Re: [PATCH 2/2] USB: hub: change the locking in hub_activate

2016-09-02 Thread Alan Stern
On Thu, 1 Sep 2016, Vaibhav Hiremath wrote:

> >> I can put prints to trace the execution flow, lets see what comes out
> >> from it.
> > How easily can you reproduce the problem?
> 
> Very easily. I would say 7-8 times out of 10.

Then you can add a printk in hub_activate() just before the call to
device_lock().  Maybe print out the values of the count, owner, and
magic fields of hdev->dev.mutex.  Or anything else you can think of
that might be helpful.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] USB: hub: change the locking in hub_activate

2016-09-01 Thread Vaibhav Hiremath



On Thursday 01 September 2016 07:59 PM, Alan Stern wrote:

On Thu, 1 Sep 2016, Viresh Kumar wrote:


On 31-08-16, 12:46, Alan Stern wrote:

On Wed, 31 Aug 2016, Viresh Kumar wrote:


On 05-08-16, 11:51, Alan Stern wrote:

+++ usb-4.x/drivers/usb/core/hub.c
@@ -1052,7 +1052,7 @@ static void hub_activate(struct usb_hub
  
  	/* Continue a partial initialization */

if (type == HUB_INIT2 || type == HUB_INIT3) {
-   device_lock(hub->intfdev);
+   device_lock(>dev);

Hi Alan,

I have received reports of kernel crashes (NULL dereference) due to this patch
in some of the corner cases. Note that we have backported this patch (and few
other) to 3.10 kernel. I have attached my hub.c file as well for reference.

Here is the reported kernel OOPs:

[   19.476228] Unable to handle kernel NULL pointer dereference at virtual 
address 
[   19.476231] pgd = ffc7d000
[   19.476236] [] *pgd=0e90b003, *pmd=0e90c003, 
*pte=00e0f9000407
[   19.476242] Internal error: Oops: 9645 [#1] PREEMPT SMP
[   19.476273] Modules linked in: gb_vibrator(O) gb_usb(O) gb_uart(O) gb_spi(O) 
gb_sdio(O) gb_raw(O) gb_pwm(O) gb_power_supply(O) gb)
[   19.476279] CPU: 0 PID: 344 Comm: kworker/0:3 Tainted: G   O 
3.10.97-g4b7224f-dirty #454
[   19.476290] Workqueue: events hub_init_func2
[   19.476293] task: ffc09b3560c0 ti: ffc09ada8000 task.ti: 
ffc09ada8000
[   19.476300] PC is at __mutex_lock_slowpath+0x138/0x224
[   19.476303] LR is at __mutex_lock_slowpath+0x128/0x224

...

[   19.476582] Call trace:
[   19.476586] [] __mutex_lock_slowpath+0x138/0x224
[   19.476590] [] mutex_lock+0x2c/0x48
[   19.476593] [] hub_activate+0x50/0x4d8
[   19.476596] [] hub_init_func2+0x14/0x1c
[   19.476602] [] process_one_work+0x26c/0x3cc
[   19.476605] [] worker_thread+0x208/0x358
[   19.476610] [] kthread+0xbc/0xc4

...

This happens when the device is infinitely generating connected and then removed
(not manually, but due to some hardware issues).

If I'm reading this right, it means that hub->hdev is NULL in

I am not sure I am in sync here :(


hub_activate().

We would have gotten the crash right from hub_activate() in that case, isn't it?

The fact that the call sequence reached mutex_lock() here, it means that
hub->hdev->dev was valid at least. The mutex dev->mutex is somewhat corrupted or
uninitialized, etc..

Ah, that makes a lot more sense.  Thanks for staightening me out.  And
some pointer embedded in the mutex must have been NULL.


  And that's where it all went wrong. As
__mutex_lock_slowpath() is called, it means that the mutex had a count of 0
instead of 1 during the lock and then we crashed during __mutex_lock_slowpath(),
which can only happen if the mutex is uninitialized in the first place.

The mutex gets initialized as part of device_add() and so things can go wrong if
device_add() was skipped here somehow.

I may be completely wrong, but that's what I read :)

Okay.  But I don't see how device_add() could have been skipped.  The
hub_init_func2() call occurs after the original HUB_INIT hub_activate()
call, which is in hub_configure() and occurs during probing.  The hub
interface doesn't get probed until the hub device is registered (the
hub interface is a child of the hub device).

Another possibility is that the mutex _was_ initialized but got
corrupted somehow.  Of course, that kind of thing is very hard to track
down.


On Thu, 1 Sep 2016, Vaibhav Hiremath wrote:


I have some more update on this,

It seems the culprit was my laptop USB port (I have to say bad port),
which resulted into continuous connect/disconnect event, on both
laptop and phone (Android), which eventually resulted into kernel panic.

Well, the bad port was the trigger.  But even with a bad port, the
kernel should not panic.


Yes, certainly kernel should not panic.

It could be a memory corruption or race somewhere (not sure though),
which is where device is not initialized properly when execution reached
to hub_activate().

I connected phone to another USB port and things started working as
expected.

I can put prints to trace the execution flow, lets see what comes out
from it.

How easily can you reproduce the problem?


Very easily. I would say 7-8 times out of 10.

Thanks,
Vaibhav

Alan Stern



--
Thanks,
Vaibhav

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] USB: hub: change the locking in hub_activate

2016-09-01 Thread Alan Stern
On Thu, 1 Sep 2016, Viresh Kumar wrote:

> On 31-08-16, 12:46, Alan Stern wrote:
> > On Wed, 31 Aug 2016, Viresh Kumar wrote:
> > 
> > > On 05-08-16, 11:51, Alan Stern wrote:
> > > > +++ usb-4.x/drivers/usb/core/hub.c
> > > > @@ -1052,7 +1052,7 @@ static void hub_activate(struct usb_hub
> > > >  
> > > > /* Continue a partial initialization */
> > > > if (type == HUB_INIT2 || type == HUB_INIT3) {
> > > > -   device_lock(hub->intfdev);
> > > > +   device_lock(>dev);
> > > 
> > > Hi Alan,
> > > 
> > > I have received reports of kernel crashes (NULL dereference) due to this 
> > > patch
> > > in some of the corner cases. Note that we have backported this patch (and 
> > > few
> > > other) to 3.10 kernel. I have attached my hub.c file as well for 
> > > reference.
> > > 
> > > Here is the reported kernel OOPs:
> > > 
> > > [   19.476228] Unable to handle kernel NULL pointer dereference at 
> > > virtual address 
> > > [   19.476231] pgd = ffc7d000
> > > [   19.476236] [] *pgd=0e90b003, *pmd=0e90c003, 
> > > *pte=00e0f9000407
> > > [   19.476242] Internal error: Oops: 9645 [#1] PREEMPT SMP
> > > [   19.476273] Modules linked in: gb_vibrator(O) gb_usb(O) gb_uart(O) 
> > > gb_spi(O) gb_sdio(O) gb_raw(O) gb_pwm(O) gb_power_supply(O) gb)
> > > [   19.476279] CPU: 0 PID: 344 Comm: kworker/0:3 Tainted: G   O 
> > > 3.10.97-g4b7224f-dirty #454
> > > [   19.476290] Workqueue: events hub_init_func2
> > > [   19.476293] task: ffc09b3560c0 ti: ffc09ada8000 task.ti: 
> > > ffc09ada8000
> > > [   19.476300] PC is at __mutex_lock_slowpath+0x138/0x224
> > > [   19.476303] LR is at __mutex_lock_slowpath+0x128/0x224
> > ...
> > > [   19.476582] Call trace:
> > > [   19.476586] [] __mutex_lock_slowpath+0x138/0x224
> > > [   19.476590] [] mutex_lock+0x2c/0x48
> > > [   19.476593] [] hub_activate+0x50/0x4d8
> > > [   19.476596] [] hub_init_func2+0x14/0x1c
> > > [   19.476602] [] process_one_work+0x26c/0x3cc
> > > [   19.476605] [] worker_thread+0x208/0x358
> > > [   19.476610] [] kthread+0xbc/0xc4
> > ...
> > > This happens when the device is infinitely generating connected and then 
> > > removed
> > > (not manually, but due to some hardware issues).
> > 
> > If I'm reading this right, it means that hub->hdev is NULL in
> 
> I am not sure I am in sync here :(
> 
> > hub_activate().
> 
> We would have gotten the crash right from hub_activate() in that case, isn't 
> it?
> 
> The fact that the call sequence reached mutex_lock() here, it means that
> hub->hdev->dev was valid at least. The mutex dev->mutex is somewhat corrupted 
> or
> uninitialized, etc..

Ah, that makes a lot more sense.  Thanks for staightening me out.  And 
some pointer embedded in the mutex must have been NULL.

>  And that's where it all went wrong. As
> __mutex_lock_slowpath() is called, it means that the mutex had a count of 0
> instead of 1 during the lock and then we crashed during 
> __mutex_lock_slowpath(),
> which can only happen if the mutex is uninitialized in the first place.
> 
> The mutex gets initialized as part of device_add() and so things can go wrong 
> if
> device_add() was skipped here somehow.
> 
> I may be completely wrong, but that's what I read :)

Okay.  But I don't see how device_add() could have been skipped.  The
hub_init_func2() call occurs after the original HUB_INIT hub_activate()
call, which is in hub_configure() and occurs during probing.  The hub
interface doesn't get probed until the hub device is registered (the
hub interface is a child of the hub device).

Another possibility is that the mutex _was_ initialized but got 
corrupted somehow.  Of course, that kind of thing is very hard to track 
down.


On Thu, 1 Sep 2016, Vaibhav Hiremath wrote:

> I have some more update on this,
>
> It seems the culprit was my laptop USB port (I have to say bad port),
> which resulted into continuous connect/disconnect event, on both
> laptop and phone (Android), which eventually resulted into kernel panic.

Well, the bad port was the trigger.  But even with a bad port, the 
kernel should not panic.

> It could be a memory corruption or race somewhere (not sure though),
> which is where device is not initialized properly when execution reached
> to hub_activate().
> 
> I connected phone to another USB port and things started working as
> expected.
> 
> I can put prints to trace the execution flow, lets see what comes out
> from it.

How easily can you reproduce the problem?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] USB: hub: change the locking in hub_activate

2016-09-01 Thread Vaibhav Hiremath



On Thursday 01 September 2016 10:32 AM, Viresh Kumar wrote:

On 31-08-16, 12:46, Alan Stern wrote:

On Wed, 31 Aug 2016, Viresh Kumar wrote:


On 05-08-16, 11:51, Alan Stern wrote:

+++ usb-4.x/drivers/usb/core/hub.c
@@ -1052,7 +1052,7 @@ static void hub_activate(struct usb_hub
  
  	/* Continue a partial initialization */

if (type == HUB_INIT2 || type == HUB_INIT3) {
-   device_lock(hub->intfdev);
+   device_lock(>dev);

Hi Alan,

I have received reports of kernel crashes (NULL dereference) due to this patch
in some of the corner cases. Note that we have backported this patch (and few
other) to 3.10 kernel. I have attached my hub.c file as well for reference.

Here is the reported kernel OOPs:

[   19.476228] Unable to handle kernel NULL pointer dereference at virtual 
address 
[   19.476231] pgd = ffc7d000
[   19.476236] [] *pgd=0e90b003, *pmd=0e90c003, 
*pte=00e0f9000407
[   19.476242] Internal error: Oops: 9645 [#1] PREEMPT SMP
[   19.476273] Modules linked in: gb_vibrator(O) gb_usb(O) gb_uart(O) gb_spi(O) 
gb_sdio(O) gb_raw(O) gb_pwm(O) gb_power_supply(O) gb)
[   19.476279] CPU: 0 PID: 344 Comm: kworker/0:3 Tainted: G   O 
3.10.97-g4b7224f-dirty #454
[   19.476290] Workqueue: events hub_init_func2
[   19.476293] task: ffc09b3560c0 ti: ffc09ada8000 task.ti: 
ffc09ada8000
[   19.476300] PC is at __mutex_lock_slowpath+0x138/0x224
[   19.476303] LR is at __mutex_lock_slowpath+0x128/0x224

...

[   19.476582] Call trace:
[   19.476586] [] __mutex_lock_slowpath+0x138/0x224
[   19.476590] [] mutex_lock+0x2c/0x48
[   19.476593] [] hub_activate+0x50/0x4d8
[   19.476596] [] hub_init_func2+0x14/0x1c
[   19.476602] [] process_one_work+0x26c/0x3cc
[   19.476605] [] worker_thread+0x208/0x358
[   19.476610] [] kthread+0xbc/0xc4

...

This happens when the device is infinitely generating connected and then removed
(not manually, but due to some hardware issues).

If I'm reading this right, it means that hub->hdev is NULL in

I am not sure I am in sync here :(


hub_activate().

We would have gotten the crash right from hub_activate() in that case, isn't it?

The fact that the call sequence reached mutex_lock() here, it means that
hub->hdev->dev was valid at least. The mutex dev->mutex is somewhat corrupted or
uninitialized, etc.. And that's where it all went wrong. As
__mutex_lock_slowpath() is called, it means that the mutex had a count of 0
instead of 1 during the lock and then we crashed during __mutex_lock_slowpath(),
which can only happen if the mutex is uninitialized in the first place.

The mutex gets initialized as part of device_add() and so things can go wrong if
device_add() was skipped here somehow.

I may be completely wrong, but that's what I read :)


I don't see how that could be true; it is initialized
to a non-NULL value and then never changed until the hub structure is
deallocated.

Is it possible to add some debugging printk's in there to find out
what's happening?

I will ask Vaibhav (cc'd) to do it, not sure if he is around today or not.

Thanks for your quick reply.



I have some more update on this,

It seems the culprit was my laptop USB port (I have to say bad port),
which resulted into continuous connect/disconnect event, on both
laptop and phone (Android), which eventually resulted into kernel panic.

It could be a memory corruption or race somewhere (not sure though),
which is where device is not initialized properly when execution reached
to hub_activate().

I connected phone to another USB port and things started working as
expected.

I can put prints to trace the execution flow, lets see what comes out 
from it.


--
Thanks,
Vaibhav

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] USB: hub: change the locking in hub_activate

2016-08-31 Thread Viresh Kumar
On 31-08-16, 12:46, Alan Stern wrote:
> On Wed, 31 Aug 2016, Viresh Kumar wrote:
> 
> > On 05-08-16, 11:51, Alan Stern wrote:
> > > +++ usb-4.x/drivers/usb/core/hub.c
> > > @@ -1052,7 +1052,7 @@ static void hub_activate(struct usb_hub
> > >  
> > >   /* Continue a partial initialization */
> > >   if (type == HUB_INIT2 || type == HUB_INIT3) {
> > > - device_lock(hub->intfdev);
> > > + device_lock(>dev);
> > 
> > Hi Alan,
> > 
> > I have received reports of kernel crashes (NULL dereference) due to this 
> > patch
> > in some of the corner cases. Note that we have backported this patch (and 
> > few
> > other) to 3.10 kernel. I have attached my hub.c file as well for reference.
> > 
> > Here is the reported kernel OOPs:
> > 
> > [   19.476228] Unable to handle kernel NULL pointer dereference at virtual 
> > address 
> > [   19.476231] pgd = ffc7d000
> > [   19.476236] [] *pgd=0e90b003, *pmd=0e90c003, 
> > *pte=00e0f9000407
> > [   19.476242] Internal error: Oops: 9645 [#1] PREEMPT SMP
> > [   19.476273] Modules linked in: gb_vibrator(O) gb_usb(O) gb_uart(O) 
> > gb_spi(O) gb_sdio(O) gb_raw(O) gb_pwm(O) gb_power_supply(O) gb)
> > [   19.476279] CPU: 0 PID: 344 Comm: kworker/0:3 Tainted: G   O 
> > 3.10.97-g4b7224f-dirty #454
> > [   19.476290] Workqueue: events hub_init_func2
> > [   19.476293] task: ffc09b3560c0 ti: ffc09ada8000 task.ti: 
> > ffc09ada8000
> > [   19.476300] PC is at __mutex_lock_slowpath+0x138/0x224
> > [   19.476303] LR is at __mutex_lock_slowpath+0x128/0x224
> ...
> > [   19.476582] Call trace:
> > [   19.476586] [] __mutex_lock_slowpath+0x138/0x224
> > [   19.476590] [] mutex_lock+0x2c/0x48
> > [   19.476593] [] hub_activate+0x50/0x4d8
> > [   19.476596] [] hub_init_func2+0x14/0x1c
> > [   19.476602] [] process_one_work+0x26c/0x3cc
> > [   19.476605] [] worker_thread+0x208/0x358
> > [   19.476610] [] kthread+0xbc/0xc4
> ...
> > This happens when the device is infinitely generating connected and then 
> > removed
> > (not manually, but due to some hardware issues).
> 
> If I'm reading this right, it means that hub->hdev is NULL in

I am not sure I am in sync here :(

> hub_activate().

We would have gotten the crash right from hub_activate() in that case, isn't it?

The fact that the call sequence reached mutex_lock() here, it means that
hub->hdev->dev was valid at least. The mutex dev->mutex is somewhat corrupted or
uninitialized, etc.. And that's where it all went wrong. As
__mutex_lock_slowpath() is called, it means that the mutex had a count of 0
instead of 1 during the lock and then we crashed during __mutex_lock_slowpath(),
which can only happen if the mutex is uninitialized in the first place.

The mutex gets initialized as part of device_add() and so things can go wrong if
device_add() was skipped here somehow.

I may be completely wrong, but that's what I read :)

> I don't see how that could be true; it is initialized
> to a non-NULL value and then never changed until the hub structure is
> deallocated.
> 
> Is it possible to add some debugging printk's in there to find out 
> what's happening?

I will ask Vaibhav (cc'd) to do it, not sure if he is around today or not.

Thanks for your quick reply.

-- 
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] USB: hub: change the locking in hub_activate

2016-08-31 Thread Alan Stern
On Wed, 31 Aug 2016, Viresh Kumar wrote:

> On 05-08-16, 11:51, Alan Stern wrote:
> > +++ usb-4.x/drivers/usb/core/hub.c
> > @@ -1052,7 +1052,7 @@ static void hub_activate(struct usb_hub
> >  
> > /* Continue a partial initialization */
> > if (type == HUB_INIT2 || type == HUB_INIT3) {
> > -   device_lock(hub->intfdev);
> > +   device_lock(>dev);
> 
> Hi Alan,
> 
> I have received reports of kernel crashes (NULL dereference) due to this patch
> in some of the corner cases. Note that we have backported this patch (and few
> other) to 3.10 kernel. I have attached my hub.c file as well for reference.
> 
> Here is the reported kernel OOPs:
> 
> [   19.476228] Unable to handle kernel NULL pointer dereference at virtual 
> address 
> [   19.476231] pgd = ffc7d000
> [   19.476236] [] *pgd=0e90b003, *pmd=0e90c003, 
> *pte=00e0f9000407
> [   19.476242] Internal error: Oops: 9645 [#1] PREEMPT SMP
> [   19.476273] Modules linked in: gb_vibrator(O) gb_usb(O) gb_uart(O) 
> gb_spi(O) gb_sdio(O) gb_raw(O) gb_pwm(O) gb_power_supply(O) gb)
> [   19.476279] CPU: 0 PID: 344 Comm: kworker/0:3 Tainted: G   O 
> 3.10.97-g4b7224f-dirty #454
> [   19.476290] Workqueue: events hub_init_func2
> [   19.476293] task: ffc09b3560c0 ti: ffc09ada8000 task.ti: 
> ffc09ada8000
> [   19.476300] PC is at __mutex_lock_slowpath+0x138/0x224
> [   19.476303] LR is at __mutex_lock_slowpath+0x128/0x224
...
> [   19.476582] Call trace:
> [   19.476586] [] __mutex_lock_slowpath+0x138/0x224
> [   19.476590] [] mutex_lock+0x2c/0x48
> [   19.476593] [] hub_activate+0x50/0x4d8
> [   19.476596] [] hub_init_func2+0x14/0x1c
> [   19.476602] [] process_one_work+0x26c/0x3cc
> [   19.476605] [] worker_thread+0x208/0x358
> [   19.476610] [] kthread+0xbc/0xc4
...
> This happens when the device is infinitely generating connected and then 
> removed
> (not manually, but due to some hardware issues).

If I'm reading this right, it means that hub->hdev is NULL in
hub_activate().  I don't see how that could be true; it is initialized
to a non-NULL value and then never changed until the hub structure is
deallocated.

Is it possible to add some debugging printk's in there to find out 
what's happening?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] USB: hub: change the locking in hub_activate

2016-08-05 Thread Viresh Kumar
On 05-08-16, 11:51, Alan Stern wrote:
> The locking in hub_activate() is not adequate to provide full mutual
> exclusion with hub_quiesce().  The subroutine locks the hub's
> usb_interface, but the callers of hub_quiesce() (such as
> hub_pre_reset() and hub_event()) hold the lock to the hub's
> usb_device.
> 
> This patch changes hub_activate() to make it acquire the same lock as
> those other routines.
> 
> Signed-off-by: Alan Stern 
> CC:  #4.4+
> 
> ---
> 
> 
> [as1808]
> 
> 
>  drivers/usb/core/hub.c |6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> Index: usb-4.x/drivers/usb/core/hub.c
> ===
> --- usb-4.x.orig/drivers/usb/core/hub.c
> +++ usb-4.x/drivers/usb/core/hub.c
> @@ -1052,7 +1052,7 @@ static void hub_activate(struct usb_hub
>  
>   /* Continue a partial initialization */
>   if (type == HUB_INIT2 || type == HUB_INIT3) {
> - device_lock(hub->intfdev);
> + device_lock(>dev);
>  
>   /* Was the hub disconnected while we were waiting? */
>   if (hub->disconnected)
> @@ -1259,7 +1259,7 @@ static void hub_activate(struct usb_hub
>   queue_delayed_work(system_power_efficient_wq,
>   >init_work,
>   msecs_to_jiffies(delay));
> - device_unlock(hub->intfdev);
> + device_unlock(>dev);
>   return; /* Continues at init3: below */
>   } else {
>   msleep(delay);
> @@ -1282,7 +1282,7 @@ static void hub_activate(struct usb_hub
>   /* Allow autosuspend if it was suppressed */
>   disconnected:
>   usb_autopm_put_interface_async(to_usb_interface(hub->intfdev));
> - device_unlock(hub->intfdev);
> + device_unlock(>dev);
>   }
>  
>   kref_put(>kref, hub_release);

As I do not have good knowledge of which lock protects what here, I will stay
away from giving a meaningless Reviewed-by :)

But thanks for the patch..

-- 
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html