Re: [PATCH v10 1/2] scsi: ufs: Enable power management for wlun

2021-03-18 Thread Asutosh Das (asd)

On 3/18/2021 12:16 PM, Adrian Hunter wrote:

On 18/03/21 7:58 pm, Asutosh Das (asd) wrote:

On 3/18/2021 10:54 AM, Rafael J. Wysocki wrote:

On Thu, Mar 18, 2021 at 6:33 PM Asutosh Das (asd)
 wrote:


On 3/18/2021 7:00 AM, Rafael J. Wysocki wrote:

On Wed, Mar 17, 2021 at 7:37 AM Adrian Hunter  wrote:


On 16/03/21 10:35 pm, Asutosh Das (asd) wrote:

On 3/16/2021 12:48 AM, Adrian Hunter wrote:

On 16/03/21 12:22 am, Asutosh Das (asd) wrote:

On 3/14/2021 1:11 AM, Adrian Hunter wrote:

On 10/03/21 5:04 am, Asutosh Das (asd) wrote:

On 3/9/2021 7:56 AM, Asutosh Das (asd) wrote:

On 3/8/2021 9:17 AM, Rafael J. Wysocki wrote:

On Mon, Mar 8, 2021 at 5:21 PM Rafael J. Wysocki  wrote:


On Sat, Mar 6, 2021 at 5:17 PM Alan Stern  wrote:


On Fri, Mar 05, 2021 at 06:54:24PM -0800, Asutosh Das (asd) wrote:


Now during my testing I see a weird issue sometimes (1 in 7).
Scenario - bootups

Issue:
The supplier 'ufs_device_wlun 0:0:0:49488' goes into runtime suspend even
when one/more of its consumers are in RPM_ACTIVE state.

*Log:
[   10.056379][  T206] sd 0:0:0:1: [sdb] Synchronizing SCSI cache
[   10.062497][  T113] sd 0:0:0:5: [sdf] Synchronizing SCSI cache
[   10.356600][   T32] sd 0:0:0:7: [sdh] Synchronizing SCSI cache
[   10.362944][  T174] sd 0:0:0:3: [sdd] Synchronizing SCSI cache
[   10.696627][   T83] sd 0:0:0:2: [sdc] Synchronizing SCSI cache
[   10.704562][  T170] sd 0:0:0:6: [sdg] Synchronizing SCSI cache
[   10.980602][    T5] sd 0:0:0:0: [sda] Synchronizing SCSI cache

/** Printing all the consumer nodes of supplier **/
[   10.987327][    T5] ufs_device_wlun 0:0:0:49488: usage-count @ suspend: 0
<-- this is the usage_count
[   10.994440][    T5] ufs_rpmb_wlun 0:0:0:49476: PM state - 2
[   11.000402][    T5] scsi 0:0:0:49456: PM state - 2
[   11.005453][    T5] sd 0:0:0:0: PM state - 2
[   11.009958][    T5] sd 0:0:0:1: PM state - 2
[   11.014469][    T5] sd 0:0:0:2: PM state - 2
[   11.019072][    T5] sd 0:0:0:3: PM state - 2
[   11.023595][    T5] sd 0:0:0:4: PM state - 0 << RPM_ACTIVE
[   11.353298][    T5] sd 0:0:0:5: PM state - 2
[   11.357726][    T5] sd 0:0:0:6: PM state - 2
[   11.362155][    T5] sd 0:0:0:7: PM state - 2
[   11.366584][    T5] ufshcd-qcom 1d84000.ufshc: __ufshcd_wl_suspend - 8709
[   11.374366][    T5] ufs_device_wlun 0:0:0:49488: __ufshcd_wl_suspend -
(0) has rpm_active flags


Do you mean that rpm_active of the link between the consumer and the
supplier is greater than 0 at this point and the consumer is


I mean is rpm_active of the link greater than 1 (because 1 means "no
active references to the supplier")?

Hi Rafael:
No - it is not greater than 1.

I'm trying to understand what's going on in it; will update when I've something.




RPM_ACTIVE, but the supplier suspends successfully nevertheless?


[   11.383376][    T5] ufs_device_wlun 0:0:0:49488:
ufshcd_wl_runtime_suspend <-- Supplier suspends fine.
[   12.977318][  T174] sd 0:0:0:4: [sde] Synchronizing SCSI cache

And the the suspend of sde is stuck now:
schedule+0x9c/0xe0
schedule_timeout+0x40/0x128
io_schedule_timeout+0x44/0x68
wait_for_common_io+0x7c/0x100
wait_for_completion_io+0x14/0x20
blk_execute_rq+0x90/0xcc
__scsi_execute+0x104/0x1c4
sd_sync_cache+0xf8/0x2a0
sd_suspend_common+0x74/0x11c
sd_suspend_runtime+0x14/0x20
scsi_runtime_suspend+0x64/0x94
__rpm_callback+0x80/0x2a4
rpm_suspend+0x308/0x614
pm_runtime_work+0x98/0xa8

I added 'DL_FLAG_RPM_ACTIVE' while creating links.
    if (hba->sdev_ufs_device) {
    link = device_link_add(>sdev_gendev,
    >sdev_ufs_device->sdev_gendev,
   DL_FLAG_PM_RUNTIME|DL_FLAG_RPM_ACTIVE);
I didn't expect this to resolve the issue anyway and it didn't.

Another interesting point here is when I resume any of the above suspended
consumers, it all goes back to normal, which is kind of expected. I tried
resuming the consumer and the supplier is resumed and the supplier is
suspended when all the consumers are suspended.

Any pointers on this issue please?

@Bart/@Alan - Do you've any pointers please?


It's very noticeable that although you seem to have isolated a bug in
the power management subsystem (supplier goes into runtime suspend
even when one of its consumers is still active), you did not CC the
power management maintainer or mailing list.

I have added the appropriate CC's.


Thanks Alan!





Hello
I & Can (thanks CanG) debugged this further:

Looks like this issue can occur if the sd probe is asynchronous.

Essentially, the sd_probe() is done asynchronously and driver_probe_device() 
invokes pm_runtime_get_suppliers() before invoking sd_probe().

But scsi_probe_and_add_lun() runs in a separate context.
So the scsi_autopm_put_device() invoked from scsi_scan_host() context reduces the 
link->rpm_active to 1. And sd_probe() invokes scsi_autopm_put_device() and starts 
a timer. And then driver_probe_device() invoked from __device_attach_async_helper 
context reduces the 

Re: [PATCH v10 1/2] scsi: ufs: Enable power management for wlun

2021-03-18 Thread Adrian Hunter
On 18/03/21 7:58 pm, Asutosh Das (asd) wrote:
> On 3/18/2021 10:54 AM, Rafael J. Wysocki wrote:
>> On Thu, Mar 18, 2021 at 6:33 PM Asutosh Das (asd)
>>  wrote:
>>>
>>> On 3/18/2021 7:00 AM, Rafael J. Wysocki wrote:
 On Wed, Mar 17, 2021 at 7:37 AM Adrian Hunter  
 wrote:
>
> On 16/03/21 10:35 pm, Asutosh Das (asd) wrote:
>> On 3/16/2021 12:48 AM, Adrian Hunter wrote:
>>> On 16/03/21 12:22 am, Asutosh Das (asd) wrote:
 On 3/14/2021 1:11 AM, Adrian Hunter wrote:
> On 10/03/21 5:04 am, Asutosh Das (asd) wrote:
>> On 3/9/2021 7:56 AM, Asutosh Das (asd) wrote:
>>> On 3/8/2021 9:17 AM, Rafael J. Wysocki wrote:
 On Mon, Mar 8, 2021 at 5:21 PM Rafael J. Wysocki 
  wrote:
>
> On Sat, Mar 6, 2021 at 5:17 PM Alan Stern 
>  wrote:
>>
>> On Fri, Mar 05, 2021 at 06:54:24PM -0800, Asutosh Das (asd) 
>> wrote:
>>
>>> Now during my testing I see a weird issue sometimes (1 in 7).
>>> Scenario - bootups
>>>
>>> Issue:
>>> The supplier 'ufs_device_wlun 0:0:0:49488' goes into runtime 
>>> suspend even
>>> when one/more of its consumers are in RPM_ACTIVE state.
>>>
>>> *Log:
>>> [   10.056379][  T206] sd 0:0:0:1: [sdb] Synchronizing SCSI 
>>> cache
>>> [   10.062497][  T113] sd 0:0:0:5: [sdf] Synchronizing SCSI 
>>> cache
>>> [   10.356600][   T32] sd 0:0:0:7: [sdh] Synchronizing SCSI 
>>> cache
>>> [   10.362944][  T174] sd 0:0:0:3: [sdd] Synchronizing SCSI 
>>> cache
>>> [   10.696627][   T83] sd 0:0:0:2: [sdc] Synchronizing SCSI 
>>> cache
>>> [   10.704562][  T170] sd 0:0:0:6: [sdg] Synchronizing SCSI 
>>> cache
>>> [   10.980602][    T5] sd 0:0:0:0: [sda] Synchronizing SCSI 
>>> cache
>>>
>>> /** Printing all the consumer nodes of supplier **/
>>> [   10.987327][    T5] ufs_device_wlun 0:0:0:49488: usage-count 
>>> @ suspend: 0
>>> <-- this is the usage_count
>>> [   10.994440][    T5] ufs_rpmb_wlun 0:0:0:49476: PM state - 2
>>> [   11.000402][    T5] scsi 0:0:0:49456: PM state - 2
>>> [   11.005453][    T5] sd 0:0:0:0: PM state - 2
>>> [   11.009958][    T5] sd 0:0:0:1: PM state - 2
>>> [   11.014469][    T5] sd 0:0:0:2: PM state - 2
>>> [   11.019072][    T5] sd 0:0:0:3: PM state - 2
>>> [   11.023595][    T5] sd 0:0:0:4: PM state - 0 << RPM_ACTIVE
>>> [   11.353298][    T5] sd 0:0:0:5: PM state - 2
>>> [   11.357726][    T5] sd 0:0:0:6: PM state - 2
>>> [   11.362155][    T5] sd 0:0:0:7: PM state - 2
>>> [   11.366584][    T5] ufshcd-qcom 1d84000.ufshc: 
>>> __ufshcd_wl_suspend - 8709
>>> [   11.374366][    T5] ufs_device_wlun 0:0:0:49488: 
>>> __ufshcd_wl_suspend -
>>> (0) has rpm_active flags
>
> Do you mean that rpm_active of the link between the consumer and 
> the
> supplier is greater than 0 at this point and the consumer is

 I mean is rpm_active of the link greater than 1 (because 1 means 
 "no
 active references to the supplier")?
>>> Hi Rafael:
>>> No - it is not greater than 1.
>>>
>>> I'm trying to understand what's going on in it; will update when 
>>> I've something.
>>>

> RPM_ACTIVE, but the supplier suspends successfully nevertheless?
>
>>> [   11.383376][    T5] ufs_device_wlun 0:0:0:49488:
>>> ufshcd_wl_runtime_suspend <-- Supplier suspends fine.
>>> [   12.977318][  T174] sd 0:0:0:4: [sde] Synchronizing SCSI 
>>> cache
>>>
>>> And the the suspend of sde is stuck now:
>>> schedule+0x9c/0xe0
>>> schedule_timeout+0x40/0x128
>>> io_schedule_timeout+0x44/0x68
>>> wait_for_common_io+0x7c/0x100
>>> wait_for_completion_io+0x14/0x20
>>> blk_execute_rq+0x90/0xcc
>>> __scsi_execute+0x104/0x1c4
>>> sd_sync_cache+0xf8/0x2a0
>>> sd_suspend_common+0x74/0x11c
>>> sd_suspend_runtime+0x14/0x20
>>> scsi_runtime_suspend+0x64/0x94
>>> __rpm_callback+0x80/0x2a4
>>> rpm_suspend+0x308/0x614
>>> pm_runtime_work+0x98/0xa8
>>>
>>> I added 'DL_FLAG_RPM_ACTIVE' while creating links.
>>>    if (hba->sdev_ufs_device) {
>>>    link = device_link_add(>sdev_gendev,
>>>    

Re: [PATCH v10 1/2] scsi: ufs: Enable power management for wlun

2021-03-18 Thread Asutosh Das (asd)

On 3/18/2021 10:54 AM, Rafael J. Wysocki wrote:

On Thu, Mar 18, 2021 at 6:33 PM Asutosh Das (asd)
 wrote:


On 3/18/2021 7:00 AM, Rafael J. Wysocki wrote:

On Wed, Mar 17, 2021 at 7:37 AM Adrian Hunter  wrote:


On 16/03/21 10:35 pm, Asutosh Das (asd) wrote:

On 3/16/2021 12:48 AM, Adrian Hunter wrote:

On 16/03/21 12:22 am, Asutosh Das (asd) wrote:

On 3/14/2021 1:11 AM, Adrian Hunter wrote:

On 10/03/21 5:04 am, Asutosh Das (asd) wrote:

On 3/9/2021 7:56 AM, Asutosh Das (asd) wrote:

On 3/8/2021 9:17 AM, Rafael J. Wysocki wrote:

On Mon, Mar 8, 2021 at 5:21 PM Rafael J. Wysocki  wrote:


On Sat, Mar 6, 2021 at 5:17 PM Alan Stern  wrote:


On Fri, Mar 05, 2021 at 06:54:24PM -0800, Asutosh Das (asd) wrote:


Now during my testing I see a weird issue sometimes (1 in 7).
Scenario - bootups

Issue:
The supplier 'ufs_device_wlun 0:0:0:49488' goes into runtime suspend even
when one/more of its consumers are in RPM_ACTIVE state.

*Log:
[   10.056379][  T206] sd 0:0:0:1: [sdb] Synchronizing SCSI cache
[   10.062497][  T113] sd 0:0:0:5: [sdf] Synchronizing SCSI cache
[   10.356600][   T32] sd 0:0:0:7: [sdh] Synchronizing SCSI cache
[   10.362944][  T174] sd 0:0:0:3: [sdd] Synchronizing SCSI cache
[   10.696627][   T83] sd 0:0:0:2: [sdc] Synchronizing SCSI cache
[   10.704562][  T170] sd 0:0:0:6: [sdg] Synchronizing SCSI cache
[   10.980602][T5] sd 0:0:0:0: [sda] Synchronizing SCSI cache

/** Printing all the consumer nodes of supplier **/
[   10.987327][T5] ufs_device_wlun 0:0:0:49488: usage-count @ suspend: 0
<-- this is the usage_count
[   10.994440][T5] ufs_rpmb_wlun 0:0:0:49476: PM state - 2
[   11.000402][T5] scsi 0:0:0:49456: PM state - 2
[   11.005453][T5] sd 0:0:0:0: PM state - 2
[   11.009958][T5] sd 0:0:0:1: PM state - 2
[   11.014469][T5] sd 0:0:0:2: PM state - 2
[   11.019072][T5] sd 0:0:0:3: PM state - 2
[   11.023595][T5] sd 0:0:0:4: PM state - 0 << RPM_ACTIVE
[   11.353298][T5] sd 0:0:0:5: PM state - 2
[   11.357726][T5] sd 0:0:0:6: PM state - 2
[   11.362155][T5] sd 0:0:0:7: PM state - 2
[   11.366584][T5] ufshcd-qcom 1d84000.ufshc: __ufshcd_wl_suspend - 8709
[   11.374366][T5] ufs_device_wlun 0:0:0:49488: __ufshcd_wl_suspend -
(0) has rpm_active flags


Do you mean that rpm_active of the link between the consumer and the
supplier is greater than 0 at this point and the consumer is


I mean is rpm_active of the link greater than 1 (because 1 means "no
active references to the supplier")?

Hi Rafael:
No - it is not greater than 1.

I'm trying to understand what's going on in it; will update when I've something.




RPM_ACTIVE, but the supplier suspends successfully nevertheless?


[   11.383376][T5] ufs_device_wlun 0:0:0:49488:
ufshcd_wl_runtime_suspend <-- Supplier suspends fine.
[   12.977318][  T174] sd 0:0:0:4: [sde] Synchronizing SCSI cache

And the the suspend of sde is stuck now:
schedule+0x9c/0xe0
schedule_timeout+0x40/0x128
io_schedule_timeout+0x44/0x68
wait_for_common_io+0x7c/0x100
wait_for_completion_io+0x14/0x20
blk_execute_rq+0x90/0xcc
__scsi_execute+0x104/0x1c4
sd_sync_cache+0xf8/0x2a0
sd_suspend_common+0x74/0x11c
sd_suspend_runtime+0x14/0x20
scsi_runtime_suspend+0x64/0x94
__rpm_callback+0x80/0x2a4
rpm_suspend+0x308/0x614
pm_runtime_work+0x98/0xa8

I added 'DL_FLAG_RPM_ACTIVE' while creating links.
   if (hba->sdev_ufs_device) {
   link = device_link_add(>sdev_gendev,
   >sdev_ufs_device->sdev_gendev,
  DL_FLAG_PM_RUNTIME|DL_FLAG_RPM_ACTIVE);
I didn't expect this to resolve the issue anyway and it didn't.

Another interesting point here is when I resume any of the above suspended
consumers, it all goes back to normal, which is kind of expected. I tried
resuming the consumer and the supplier is resumed and the supplier is
suspended when all the consumers are suspended.

Any pointers on this issue please?

@Bart/@Alan - Do you've any pointers please?


It's very noticeable that although you seem to have isolated a bug in
the power management subsystem (supplier goes into runtime suspend
even when one of its consumers is still active), you did not CC the
power management maintainer or mailing list.

I have added the appropriate CC's.


Thanks Alan!





Hello
I & Can (thanks CanG) debugged this further:

Looks like this issue can occur if the sd probe is asynchronous.

Essentially, the sd_probe() is done asynchronously and driver_probe_device() 
invokes pm_runtime_get_suppliers() before invoking sd_probe().

But scsi_probe_and_add_lun() runs in a separate context.
So the scsi_autopm_put_device() invoked from scsi_scan_host() context reduces the 
link->rpm_active to 1. And sd_probe() invokes scsi_autopm_put_device() and starts 
a timer. And then driver_probe_device() invoked from __device_attach_async_helper 
context reduces the link->rpm_active to 1 thus enabling the supplier to suspend 
before the consumer suspends.

So if:
Context 

Re: [PATCH v10 1/2] scsi: ufs: Enable power management for wlun

2021-03-18 Thread Rafael J. Wysocki
On Thu, Mar 18, 2021 at 6:33 PM Asutosh Das (asd)
 wrote:
>
> On 3/18/2021 7:00 AM, Rafael J. Wysocki wrote:
> > On Wed, Mar 17, 2021 at 7:37 AM Adrian Hunter  
> > wrote:
> >>
> >> On 16/03/21 10:35 pm, Asutosh Das (asd) wrote:
> >>> On 3/16/2021 12:48 AM, Adrian Hunter wrote:
>  On 16/03/21 12:22 am, Asutosh Das (asd) wrote:
> > On 3/14/2021 1:11 AM, Adrian Hunter wrote:
> >> On 10/03/21 5:04 am, Asutosh Das (asd) wrote:
> >>> On 3/9/2021 7:56 AM, Asutosh Das (asd) wrote:
>  On 3/8/2021 9:17 AM, Rafael J. Wysocki wrote:
> > On Mon, Mar 8, 2021 at 5:21 PM Rafael J. Wysocki 
> >  wrote:
> >>
> >> On Sat, Mar 6, 2021 at 5:17 PM Alan Stern 
> >>  wrote:
> >>>
> >>> On Fri, Mar 05, 2021 at 06:54:24PM -0800, Asutosh Das (asd) wrote:
> >>>
>  Now during my testing I see a weird issue sometimes (1 in 7).
>  Scenario - bootups
> 
>  Issue:
>  The supplier 'ufs_device_wlun 0:0:0:49488' goes into runtime 
>  suspend even
>  when one/more of its consumers are in RPM_ACTIVE state.
> 
>  *Log:
>  [   10.056379][  T206] sd 0:0:0:1: [sdb] Synchronizing SCSI cache
>  [   10.062497][  T113] sd 0:0:0:5: [sdf] Synchronizing SCSI cache
>  [   10.356600][   T32] sd 0:0:0:7: [sdh] Synchronizing SCSI cache
>  [   10.362944][  T174] sd 0:0:0:3: [sdd] Synchronizing SCSI cache
>  [   10.696627][   T83] sd 0:0:0:2: [sdc] Synchronizing SCSI cache
>  [   10.704562][  T170] sd 0:0:0:6: [sdg] Synchronizing SCSI cache
>  [   10.980602][T5] sd 0:0:0:0: [sda] Synchronizing SCSI cache
> 
>  /** Printing all the consumer nodes of supplier **/
>  [   10.987327][T5] ufs_device_wlun 0:0:0:49488: usage-count 
>  @ suspend: 0
>  <-- this is the usage_count
>  [   10.994440][T5] ufs_rpmb_wlun 0:0:0:49476: PM state - 2
>  [   11.000402][T5] scsi 0:0:0:49456: PM state - 2
>  [   11.005453][T5] sd 0:0:0:0: PM state - 2
>  [   11.009958][T5] sd 0:0:0:1: PM state - 2
>  [   11.014469][T5] sd 0:0:0:2: PM state - 2
>  [   11.019072][T5] sd 0:0:0:3: PM state - 2
>  [   11.023595][T5] sd 0:0:0:4: PM state - 0 << RPM_ACTIVE
>  [   11.353298][T5] sd 0:0:0:5: PM state - 2
>  [   11.357726][T5] sd 0:0:0:6: PM state - 2
>  [   11.362155][T5] sd 0:0:0:7: PM state - 2
>  [   11.366584][T5] ufshcd-qcom 1d84000.ufshc: 
>  __ufshcd_wl_suspend - 8709
>  [   11.374366][T5] ufs_device_wlun 0:0:0:49488: 
>  __ufshcd_wl_suspend -
>  (0) has rpm_active flags
> >>
> >> Do you mean that rpm_active of the link between the consumer and 
> >> the
> >> supplier is greater than 0 at this point and the consumer is
> >
> > I mean is rpm_active of the link greater than 1 (because 1 means "no
> > active references to the supplier")?
>  Hi Rafael:
>  No - it is not greater than 1.
> 
>  I'm trying to understand what's going on in it; will update when 
>  I've something.
> 
> >
> >> RPM_ACTIVE, but the supplier suspends successfully nevertheless?
> >>
>  [   11.383376][T5] ufs_device_wlun 0:0:0:49488:
>  ufshcd_wl_runtime_suspend <-- Supplier suspends fine.
>  [   12.977318][  T174] sd 0:0:0:4: [sde] Synchronizing SCSI cache
> 
>  And the the suspend of sde is stuck now:
>  schedule+0x9c/0xe0
>  schedule_timeout+0x40/0x128
>  io_schedule_timeout+0x44/0x68
>  wait_for_common_io+0x7c/0x100
>  wait_for_completion_io+0x14/0x20
>  blk_execute_rq+0x90/0xcc
>  __scsi_execute+0x104/0x1c4
>  sd_sync_cache+0xf8/0x2a0
>  sd_suspend_common+0x74/0x11c
>  sd_suspend_runtime+0x14/0x20
>  scsi_runtime_suspend+0x64/0x94
>  __rpm_callback+0x80/0x2a4
>  rpm_suspend+0x308/0x614
>  pm_runtime_work+0x98/0xa8
> 
>  I added 'DL_FLAG_RPM_ACTIVE' while creating links.
>    if (hba->sdev_ufs_device) {
>    link = device_link_add(>sdev_gendev,
>    
>  >sdev_ufs_device->sdev_gendev,
>   
>  DL_FLAG_PM_RUNTIME|DL_FLAG_RPM_ACTIVE);
>  I didn't expect this to resolve the issue anyway and it didn't.
> 
>  Another interesting point here is when I resume any of the above 
> 

Re: [PATCH v10 1/2] scsi: ufs: Enable power management for wlun

2021-03-18 Thread Asutosh Das (asd)

On 3/18/2021 7:00 AM, Rafael J. Wysocki wrote:

On Wed, Mar 17, 2021 at 7:37 AM Adrian Hunter  wrote:


On 16/03/21 10:35 pm, Asutosh Das (asd) wrote:

On 3/16/2021 12:48 AM, Adrian Hunter wrote:

On 16/03/21 12:22 am, Asutosh Das (asd) wrote:

On 3/14/2021 1:11 AM, Adrian Hunter wrote:

On 10/03/21 5:04 am, Asutosh Das (asd) wrote:

On 3/9/2021 7:56 AM, Asutosh Das (asd) wrote:

On 3/8/2021 9:17 AM, Rafael J. Wysocki wrote:

On Mon, Mar 8, 2021 at 5:21 PM Rafael J. Wysocki  wrote:


On Sat, Mar 6, 2021 at 5:17 PM Alan Stern  wrote:


On Fri, Mar 05, 2021 at 06:54:24PM -0800, Asutosh Das (asd) wrote:


Now during my testing I see a weird issue sometimes (1 in 7).
Scenario - bootups

Issue:
The supplier 'ufs_device_wlun 0:0:0:49488' goes into runtime suspend even
when one/more of its consumers are in RPM_ACTIVE state.

*Log:
[   10.056379][  T206] sd 0:0:0:1: [sdb] Synchronizing SCSI cache
[   10.062497][  T113] sd 0:0:0:5: [sdf] Synchronizing SCSI cache
[   10.356600][   T32] sd 0:0:0:7: [sdh] Synchronizing SCSI cache
[   10.362944][  T174] sd 0:0:0:3: [sdd] Synchronizing SCSI cache
[   10.696627][   T83] sd 0:0:0:2: [sdc] Synchronizing SCSI cache
[   10.704562][  T170] sd 0:0:0:6: [sdg] Synchronizing SCSI cache
[   10.980602][T5] sd 0:0:0:0: [sda] Synchronizing SCSI cache

/** Printing all the consumer nodes of supplier **/
[   10.987327][T5] ufs_device_wlun 0:0:0:49488: usage-count @ suspend: 0
<-- this is the usage_count
[   10.994440][T5] ufs_rpmb_wlun 0:0:0:49476: PM state - 2
[   11.000402][T5] scsi 0:0:0:49456: PM state - 2
[   11.005453][T5] sd 0:0:0:0: PM state - 2
[   11.009958][T5] sd 0:0:0:1: PM state - 2
[   11.014469][T5] sd 0:0:0:2: PM state - 2
[   11.019072][T5] sd 0:0:0:3: PM state - 2
[   11.023595][T5] sd 0:0:0:4: PM state - 0 << RPM_ACTIVE
[   11.353298][T5] sd 0:0:0:5: PM state - 2
[   11.357726][T5] sd 0:0:0:6: PM state - 2
[   11.362155][T5] sd 0:0:0:7: PM state - 2
[   11.366584][T5] ufshcd-qcom 1d84000.ufshc: __ufshcd_wl_suspend - 8709
[   11.374366][T5] ufs_device_wlun 0:0:0:49488: __ufshcd_wl_suspend -
(0) has rpm_active flags


Do you mean that rpm_active of the link between the consumer and the
supplier is greater than 0 at this point and the consumer is


I mean is rpm_active of the link greater than 1 (because 1 means "no
active references to the supplier")?

Hi Rafael:
No - it is not greater than 1.

I'm trying to understand what's going on in it; will update when I've something.




RPM_ACTIVE, but the supplier suspends successfully nevertheless?


[   11.383376][T5] ufs_device_wlun 0:0:0:49488:
ufshcd_wl_runtime_suspend <-- Supplier suspends fine.
[   12.977318][  T174] sd 0:0:0:4: [sde] Synchronizing SCSI cache

And the the suspend of sde is stuck now:
schedule+0x9c/0xe0
schedule_timeout+0x40/0x128
io_schedule_timeout+0x44/0x68
wait_for_common_io+0x7c/0x100
wait_for_completion_io+0x14/0x20
blk_execute_rq+0x90/0xcc
__scsi_execute+0x104/0x1c4
sd_sync_cache+0xf8/0x2a0
sd_suspend_common+0x74/0x11c
sd_suspend_runtime+0x14/0x20
scsi_runtime_suspend+0x64/0x94
__rpm_callback+0x80/0x2a4
rpm_suspend+0x308/0x614
pm_runtime_work+0x98/0xa8

I added 'DL_FLAG_RPM_ACTIVE' while creating links.
  if (hba->sdev_ufs_device) {
  link = device_link_add(>sdev_gendev,
  >sdev_ufs_device->sdev_gendev,
 DL_FLAG_PM_RUNTIME|DL_FLAG_RPM_ACTIVE);
I didn't expect this to resolve the issue anyway and it didn't.

Another interesting point here is when I resume any of the above suspended
consumers, it all goes back to normal, which is kind of expected. I tried
resuming the consumer and the supplier is resumed and the supplier is
suspended when all the consumers are suspended.

Any pointers on this issue please?

@Bart/@Alan - Do you've any pointers please?


It's very noticeable that although you seem to have isolated a bug in
the power management subsystem (supplier goes into runtime suspend
even when one of its consumers is still active), you did not CC the
power management maintainer or mailing list.

I have added the appropriate CC's.


Thanks Alan!





Hello
I & Can (thanks CanG) debugged this further:

Looks like this issue can occur if the sd probe is asynchronous.

Essentially, the sd_probe() is done asynchronously and driver_probe_device() 
invokes pm_runtime_get_suppliers() before invoking sd_probe().

But scsi_probe_and_add_lun() runs in a separate context.
So the scsi_autopm_put_device() invoked from scsi_scan_host() context reduces the 
link->rpm_active to 1. And sd_probe() invokes scsi_autopm_put_device() and starts 
a timer. And then driver_probe_device() invoked from __device_attach_async_helper 
context reduces the link->rpm_active to 1 thus enabling the supplier to suspend 
before the consumer suspends.

So if:
Context T1:
[1] scsi_probe_and_add_lun()
[2]|- scsi_autopm_put_device() - reduce the link->rpm_active to 1

Context 

Re: [PATCH v10 1/2] scsi: ufs: Enable power management for wlun

2021-03-18 Thread Rafael J. Wysocki
On Wed, Mar 17, 2021 at 7:37 AM Adrian Hunter  wrote:
>
> On 16/03/21 10:35 pm, Asutosh Das (asd) wrote:
> > On 3/16/2021 12:48 AM, Adrian Hunter wrote:
> >> On 16/03/21 12:22 am, Asutosh Das (asd) wrote:
> >>> On 3/14/2021 1:11 AM, Adrian Hunter wrote:
>  On 10/03/21 5:04 am, Asutosh Das (asd) wrote:
> > On 3/9/2021 7:56 AM, Asutosh Das (asd) wrote:
> >> On 3/8/2021 9:17 AM, Rafael J. Wysocki wrote:
> >>> On Mon, Mar 8, 2021 at 5:21 PM Rafael J. Wysocki  
> >>> wrote:
> 
>  On Sat, Mar 6, 2021 at 5:17 PM Alan Stern 
>   wrote:
> >
> > On Fri, Mar 05, 2021 at 06:54:24PM -0800, Asutosh Das (asd) wrote:
> >
> >> Now during my testing I see a weird issue sometimes (1 in 7).
> >> Scenario - bootups
> >>
> >> Issue:
> >> The supplier 'ufs_device_wlun 0:0:0:49488' goes into runtime 
> >> suspend even
> >> when one/more of its consumers are in RPM_ACTIVE state.
> >>
> >> *Log:
> >> [   10.056379][  T206] sd 0:0:0:1: [sdb] Synchronizing SCSI cache
> >> [   10.062497][  T113] sd 0:0:0:5: [sdf] Synchronizing SCSI cache
> >> [   10.356600][   T32] sd 0:0:0:7: [sdh] Synchronizing SCSI cache
> >> [   10.362944][  T174] sd 0:0:0:3: [sdd] Synchronizing SCSI cache
> >> [   10.696627][   T83] sd 0:0:0:2: [sdc] Synchronizing SCSI cache
> >> [   10.704562][  T170] sd 0:0:0:6: [sdg] Synchronizing SCSI cache
> >> [   10.980602][T5] sd 0:0:0:0: [sda] Synchronizing SCSI cache
> >>
> >> /** Printing all the consumer nodes of supplier **/
> >> [   10.987327][T5] ufs_device_wlun 0:0:0:49488: usage-count @ 
> >> suspend: 0
> >> <-- this is the usage_count
> >> [   10.994440][T5] ufs_rpmb_wlun 0:0:0:49476: PM state - 2
> >> [   11.000402][T5] scsi 0:0:0:49456: PM state - 2
> >> [   11.005453][T5] sd 0:0:0:0: PM state - 2
> >> [   11.009958][T5] sd 0:0:0:1: PM state - 2
> >> [   11.014469][T5] sd 0:0:0:2: PM state - 2
> >> [   11.019072][T5] sd 0:0:0:3: PM state - 2
> >> [   11.023595][T5] sd 0:0:0:4: PM state - 0 << RPM_ACTIVE
> >> [   11.353298][T5] sd 0:0:0:5: PM state - 2
> >> [   11.357726][T5] sd 0:0:0:6: PM state - 2
> >> [   11.362155][T5] sd 0:0:0:7: PM state - 2
> >> [   11.366584][T5] ufshcd-qcom 1d84000.ufshc: 
> >> __ufshcd_wl_suspend - 8709
> >> [   11.374366][T5] ufs_device_wlun 0:0:0:49488: 
> >> __ufshcd_wl_suspend -
> >> (0) has rpm_active flags
> 
>  Do you mean that rpm_active of the link between the consumer and the
>  supplier is greater than 0 at this point and the consumer is
> >>>
> >>> I mean is rpm_active of the link greater than 1 (because 1 means "no
> >>> active references to the supplier")?
> >> Hi Rafael:
> >> No - it is not greater than 1.
> >>
> >> I'm trying to understand what's going on in it; will update when I've 
> >> something.
> >>
> >>>
>  RPM_ACTIVE, but the supplier suspends successfully nevertheless?
> 
> >> [   11.383376][T5] ufs_device_wlun 0:0:0:49488:
> >> ufshcd_wl_runtime_suspend <-- Supplier suspends fine.
> >> [   12.977318][  T174] sd 0:0:0:4: [sde] Synchronizing SCSI cache
> >>
> >> And the the suspend of sde is stuck now:
> >> schedule+0x9c/0xe0
> >> schedule_timeout+0x40/0x128
> >> io_schedule_timeout+0x44/0x68
> >> wait_for_common_io+0x7c/0x100
> >> wait_for_completion_io+0x14/0x20
> >> blk_execute_rq+0x90/0xcc
> >> __scsi_execute+0x104/0x1c4
> >> sd_sync_cache+0xf8/0x2a0
> >> sd_suspend_common+0x74/0x11c
> >> sd_suspend_runtime+0x14/0x20
> >> scsi_runtime_suspend+0x64/0x94
> >> __rpm_callback+0x80/0x2a4
> >> rpm_suspend+0x308/0x614
> >> pm_runtime_work+0x98/0xa8
> >>
> >> I added 'DL_FLAG_RPM_ACTIVE' while creating links.
> >>  if (hba->sdev_ufs_device) {
> >>  link = device_link_add(>sdev_gendev,
> >>  
> >> >sdev_ufs_device->sdev_gendev,
> >> 
> >> DL_FLAG_PM_RUNTIME|DL_FLAG_RPM_ACTIVE);
> >> I didn't expect this to resolve the issue anyway and it didn't.
> >>
> >> Another interesting point here is when I resume any of the above 
> >> suspended
> >> consumers, it all goes back to normal, which is kind of expected. 
> >> I tried
> >> resuming the consumer and the supplier is resumed and the supplier 
> >> is
> >> suspended when all the consumers are suspended.
> >>
> >> Any pointers on this issue 

Re: [PATCH v10 1/2] scsi: ufs: Enable power management for wlun

2021-03-17 Thread Adrian Hunter
On 16/03/21 10:35 pm, Asutosh Das (asd) wrote:
> On 3/16/2021 12:48 AM, Adrian Hunter wrote:
>> On 16/03/21 12:22 am, Asutosh Das (asd) wrote:
>>> On 3/14/2021 1:11 AM, Adrian Hunter wrote:
 On 10/03/21 5:04 am, Asutosh Das (asd) wrote:
> On 3/9/2021 7:56 AM, Asutosh Das (asd) wrote:
>> On 3/8/2021 9:17 AM, Rafael J. Wysocki wrote:
>>> On Mon, Mar 8, 2021 at 5:21 PM Rafael J. Wysocki  
>>> wrote:

 On Sat, Mar 6, 2021 at 5:17 PM Alan Stern  
 wrote:
>
> On Fri, Mar 05, 2021 at 06:54:24PM -0800, Asutosh Das (asd) wrote:
>
>> Now during my testing I see a weird issue sometimes (1 in 7).
>> Scenario - bootups
>>
>> Issue:
>> The supplier 'ufs_device_wlun 0:0:0:49488' goes into runtime suspend 
>> even
>> when one/more of its consumers are in RPM_ACTIVE state.
>>
>> *Log:
>> [   10.056379][  T206] sd 0:0:0:1: [sdb] Synchronizing SCSI cache
>> [   10.062497][  T113] sd 0:0:0:5: [sdf] Synchronizing SCSI cache
>> [   10.356600][   T32] sd 0:0:0:7: [sdh] Synchronizing SCSI cache
>> [   10.362944][  T174] sd 0:0:0:3: [sdd] Synchronizing SCSI cache
>> [   10.696627][   T83] sd 0:0:0:2: [sdc] Synchronizing SCSI cache
>> [   10.704562][  T170] sd 0:0:0:6: [sdg] Synchronizing SCSI cache
>> [   10.980602][T5] sd 0:0:0:0: [sda] Synchronizing SCSI cache
>>
>> /** Printing all the consumer nodes of supplier **/
>> [   10.987327][T5] ufs_device_wlun 0:0:0:49488: usage-count @ 
>> suspend: 0
>> <-- this is the usage_count
>> [   10.994440][T5] ufs_rpmb_wlun 0:0:0:49476: PM state - 2
>> [   11.000402][T5] scsi 0:0:0:49456: PM state - 2
>> [   11.005453][T5] sd 0:0:0:0: PM state - 2
>> [   11.009958][T5] sd 0:0:0:1: PM state - 2
>> [   11.014469][T5] sd 0:0:0:2: PM state - 2
>> [   11.019072][T5] sd 0:0:0:3: PM state - 2
>> [   11.023595][T5] sd 0:0:0:4: PM state - 0 << RPM_ACTIVE
>> [   11.353298][T5] sd 0:0:0:5: PM state - 2
>> [   11.357726][T5] sd 0:0:0:6: PM state - 2
>> [   11.362155][T5] sd 0:0:0:7: PM state - 2
>> [   11.366584][T5] ufshcd-qcom 1d84000.ufshc: 
>> __ufshcd_wl_suspend - 8709
>> [   11.374366][T5] ufs_device_wlun 0:0:0:49488: 
>> __ufshcd_wl_suspend -
>> (0) has rpm_active flags

 Do you mean that rpm_active of the link between the consumer and the
 supplier is greater than 0 at this point and the consumer is
>>>
>>> I mean is rpm_active of the link greater than 1 (because 1 means "no
>>> active references to the supplier")?
>> Hi Rafael:
>> No - it is not greater than 1.
>>
>> I'm trying to understand what's going on in it; will update when I've 
>> something.
>>
>>>
 RPM_ACTIVE, but the supplier suspends successfully nevertheless?

>> [   11.383376][T5] ufs_device_wlun 0:0:0:49488:
>> ufshcd_wl_runtime_suspend <-- Supplier suspends fine.
>> [   12.977318][  T174] sd 0:0:0:4: [sde] Synchronizing SCSI cache
>>
>> And the the suspend of sde is stuck now:
>> schedule+0x9c/0xe0
>> schedule_timeout+0x40/0x128
>> io_schedule_timeout+0x44/0x68
>> wait_for_common_io+0x7c/0x100
>> wait_for_completion_io+0x14/0x20
>> blk_execute_rq+0x90/0xcc
>> __scsi_execute+0x104/0x1c4
>> sd_sync_cache+0xf8/0x2a0
>> sd_suspend_common+0x74/0x11c
>> sd_suspend_runtime+0x14/0x20
>> scsi_runtime_suspend+0x64/0x94
>> __rpm_callback+0x80/0x2a4
>> rpm_suspend+0x308/0x614
>> pm_runtime_work+0x98/0xa8
>>
>> I added 'DL_FLAG_RPM_ACTIVE' while creating links.
>>  if (hba->sdev_ufs_device) {
>>  link = device_link_add(>sdev_gendev,
>>  
>> >sdev_ufs_device->sdev_gendev,
>> 
>> DL_FLAG_PM_RUNTIME|DL_FLAG_RPM_ACTIVE);
>> I didn't expect this to resolve the issue anyway and it didn't.
>>
>> Another interesting point here is when I resume any of the above 
>> suspended
>> consumers, it all goes back to normal, which is kind of expected. I 
>> tried
>> resuming the consumer and the supplier is resumed and the supplier is
>> suspended when all the consumers are suspended.
>>
>> Any pointers on this issue please?
>>
>> @Bart/@Alan - Do you've any pointers please?
>
> It's very noticeable that although you seem to have isolated a bug in
> the power management subsystem (supplier goes into runtime suspend
> even when one of its 

Re: [PATCH v10 1/2] scsi: ufs: Enable power management for wlun

2021-03-16 Thread Asutosh Das (asd)

On 3/16/2021 12:48 AM, Adrian Hunter wrote:

On 16/03/21 12:22 am, Asutosh Das (asd) wrote:

On 3/14/2021 1:11 AM, Adrian Hunter wrote:

On 10/03/21 5:04 am, Asutosh Das (asd) wrote:

On 3/9/2021 7:56 AM, Asutosh Das (asd) wrote:

On 3/8/2021 9:17 AM, Rafael J. Wysocki wrote:

On Mon, Mar 8, 2021 at 5:21 PM Rafael J. Wysocki  wrote:


On Sat, Mar 6, 2021 at 5:17 PM Alan Stern  wrote:


On Fri, Mar 05, 2021 at 06:54:24PM -0800, Asutosh Das (asd) wrote:


Now during my testing I see a weird issue sometimes (1 in 7).
Scenario - bootups

Issue:
The supplier 'ufs_device_wlun 0:0:0:49488' goes into runtime suspend even
when one/more of its consumers are in RPM_ACTIVE state.

*Log:
[   10.056379][  T206] sd 0:0:0:1: [sdb] Synchronizing SCSI cache
[   10.062497][  T113] sd 0:0:0:5: [sdf] Synchronizing SCSI cache
[   10.356600][   T32] sd 0:0:0:7: [sdh] Synchronizing SCSI cache
[   10.362944][  T174] sd 0:0:0:3: [sdd] Synchronizing SCSI cache
[   10.696627][   T83] sd 0:0:0:2: [sdc] Synchronizing SCSI cache
[   10.704562][  T170] sd 0:0:0:6: [sdg] Synchronizing SCSI cache
[   10.980602][    T5] sd 0:0:0:0: [sda] Synchronizing SCSI cache

/** Printing all the consumer nodes of supplier **/
[   10.987327][    T5] ufs_device_wlun 0:0:0:49488: usage-count @ suspend: 0
<-- this is the usage_count
[   10.994440][    T5] ufs_rpmb_wlun 0:0:0:49476: PM state - 2
[   11.000402][    T5] scsi 0:0:0:49456: PM state - 2
[   11.005453][    T5] sd 0:0:0:0: PM state - 2
[   11.009958][    T5] sd 0:0:0:1: PM state - 2
[   11.014469][    T5] sd 0:0:0:2: PM state - 2
[   11.019072][    T5] sd 0:0:0:3: PM state - 2
[   11.023595][    T5] sd 0:0:0:4: PM state - 0 << RPM_ACTIVE
[   11.353298][    T5] sd 0:0:0:5: PM state - 2
[   11.357726][    T5] sd 0:0:0:6: PM state - 2
[   11.362155][    T5] sd 0:0:0:7: PM state - 2
[   11.366584][    T5] ufshcd-qcom 1d84000.ufshc: __ufshcd_wl_suspend - 8709
[   11.374366][    T5] ufs_device_wlun 0:0:0:49488: __ufshcd_wl_suspend -
(0) has rpm_active flags


Do you mean that rpm_active of the link between the consumer and the
supplier is greater than 0 at this point and the consumer is


I mean is rpm_active of the link greater than 1 (because 1 means "no
active references to the supplier")?

Hi Rafael:
No - it is not greater than 1.

I'm trying to understand what's going on in it; will update when I've something.




RPM_ACTIVE, but the supplier suspends successfully nevertheless?


[   11.383376][    T5] ufs_device_wlun 0:0:0:49488:
ufshcd_wl_runtime_suspend <-- Supplier suspends fine.
[   12.977318][  T174] sd 0:0:0:4: [sde] Synchronizing SCSI cache

And the the suspend of sde is stuck now:
schedule+0x9c/0xe0
schedule_timeout+0x40/0x128
io_schedule_timeout+0x44/0x68
wait_for_common_io+0x7c/0x100
wait_for_completion_io+0x14/0x20
blk_execute_rq+0x90/0xcc
__scsi_execute+0x104/0x1c4
sd_sync_cache+0xf8/0x2a0
sd_suspend_common+0x74/0x11c
sd_suspend_runtime+0x14/0x20
scsi_runtime_suspend+0x64/0x94
__rpm_callback+0x80/0x2a4
rpm_suspend+0x308/0x614
pm_runtime_work+0x98/0xa8

I added 'DL_FLAG_RPM_ACTIVE' while creating links.
     if (hba->sdev_ufs_device) {
     link = device_link_add(>sdev_gendev,
     >sdev_ufs_device->sdev_gendev,
    DL_FLAG_PM_RUNTIME|DL_FLAG_RPM_ACTIVE);
I didn't expect this to resolve the issue anyway and it didn't.

Another interesting point here is when I resume any of the above suspended
consumers, it all goes back to normal, which is kind of expected. I tried
resuming the consumer and the supplier is resumed and the supplier is
suspended when all the consumers are suspended.

Any pointers on this issue please?

@Bart/@Alan - Do you've any pointers please?


It's very noticeable that although you seem to have isolated a bug in
the power management subsystem (supplier goes into runtime suspend
even when one of its consumers is still active), you did not CC the
power management maintainer or mailing list.

I have added the appropriate CC's.


Thanks Alan!





Hello
I & Can (thanks CanG) debugged this further:

Looks like this issue can occur if the sd probe is asynchronous.

Essentially, the sd_probe() is done asynchronously and driver_probe_device() 
invokes pm_runtime_get_suppliers() before invoking sd_probe().

But scsi_probe_and_add_lun() runs in a separate context.
So the scsi_autopm_put_device() invoked from scsi_scan_host() context reduces the 
link->rpm_active to 1. And sd_probe() invokes scsi_autopm_put_device() and starts 
a timer. And then driver_probe_device() invoked from __device_attach_async_helper 
context reduces the link->rpm_active to 1 thus enabling the supplier to suspend 
before the consumer suspends.

So if:
Context T1:
[1] scsi_probe_and_add_lun()
[2]    |- scsi_autopm_put_device() - reduce the link->rpm_active to 1

Context T2:
__device_attach_async_helper()
  |- driver_probe_device()
  |- sd_probe()
In between [1] and [2] say, driver_probe_device() -> sd_probe() is 

Re: [PATCH v10 1/2] scsi: ufs: Enable power management for wlun

2021-03-16 Thread Adrian Hunter
On 16/03/21 12:22 am, Asutosh Das (asd) wrote:
> On 3/14/2021 1:11 AM, Adrian Hunter wrote:
>> On 10/03/21 5:04 am, Asutosh Das (asd) wrote:
>>> On 3/9/2021 7:56 AM, Asutosh Das (asd) wrote:
 On 3/8/2021 9:17 AM, Rafael J. Wysocki wrote:
> On Mon, Mar 8, 2021 at 5:21 PM Rafael J. Wysocki  
> wrote:
>>
>> On Sat, Mar 6, 2021 at 5:17 PM Alan Stern  
>> wrote:
>>>
>>> On Fri, Mar 05, 2021 at 06:54:24PM -0800, Asutosh Das (asd) wrote:
>>>
 Now during my testing I see a weird issue sometimes (1 in 7).
 Scenario - bootups

 Issue:
 The supplier 'ufs_device_wlun 0:0:0:49488' goes into runtime suspend 
 even
 when one/more of its consumers are in RPM_ACTIVE state.

 *Log:
 [   10.056379][  T206] sd 0:0:0:1: [sdb] Synchronizing SCSI cache
 [   10.062497][  T113] sd 0:0:0:5: [sdf] Synchronizing SCSI cache
 [   10.356600][   T32] sd 0:0:0:7: [sdh] Synchronizing SCSI cache
 [   10.362944][  T174] sd 0:0:0:3: [sdd] Synchronizing SCSI cache
 [   10.696627][   T83] sd 0:0:0:2: [sdc] Synchronizing SCSI cache
 [   10.704562][  T170] sd 0:0:0:6: [sdg] Synchronizing SCSI cache
 [   10.980602][    T5] sd 0:0:0:0: [sda] Synchronizing SCSI cache

 /** Printing all the consumer nodes of supplier **/
 [   10.987327][    T5] ufs_device_wlun 0:0:0:49488: usage-count @ 
 suspend: 0
 <-- this is the usage_count
 [   10.994440][    T5] ufs_rpmb_wlun 0:0:0:49476: PM state - 2
 [   11.000402][    T5] scsi 0:0:0:49456: PM state - 2
 [   11.005453][    T5] sd 0:0:0:0: PM state - 2
 [   11.009958][    T5] sd 0:0:0:1: PM state - 2
 [   11.014469][    T5] sd 0:0:0:2: PM state - 2
 [   11.019072][    T5] sd 0:0:0:3: PM state - 2
 [   11.023595][    T5] sd 0:0:0:4: PM state - 0 << RPM_ACTIVE
 [   11.353298][    T5] sd 0:0:0:5: PM state - 2
 [   11.357726][    T5] sd 0:0:0:6: PM state - 2
 [   11.362155][    T5] sd 0:0:0:7: PM state - 2
 [   11.366584][    T5] ufshcd-qcom 1d84000.ufshc: __ufshcd_wl_suspend 
 - 8709
 [   11.374366][    T5] ufs_device_wlun 0:0:0:49488: 
 __ufshcd_wl_suspend -
 (0) has rpm_active flags
>>
>> Do you mean that rpm_active of the link between the consumer and the
>> supplier is greater than 0 at this point and the consumer is
>
> I mean is rpm_active of the link greater than 1 (because 1 means "no
> active references to the supplier")?
 Hi Rafael:
 No - it is not greater than 1.

 I'm trying to understand what's going on in it; will update when I've 
 something.

>
>> RPM_ACTIVE, but the supplier suspends successfully nevertheless?
>>
 [   11.383376][    T5] ufs_device_wlun 0:0:0:49488:
 ufshcd_wl_runtime_suspend <-- Supplier suspends fine.
 [   12.977318][  T174] sd 0:0:0:4: [sde] Synchronizing SCSI cache

 And the the suspend of sde is stuck now:
 schedule+0x9c/0xe0
 schedule_timeout+0x40/0x128
 io_schedule_timeout+0x44/0x68
 wait_for_common_io+0x7c/0x100
 wait_for_completion_io+0x14/0x20
 blk_execute_rq+0x90/0xcc
 __scsi_execute+0x104/0x1c4
 sd_sync_cache+0xf8/0x2a0
 sd_suspend_common+0x74/0x11c
 sd_suspend_runtime+0x14/0x20
 scsi_runtime_suspend+0x64/0x94
 __rpm_callback+0x80/0x2a4
 rpm_suspend+0x308/0x614
 pm_runtime_work+0x98/0xa8

 I added 'DL_FLAG_RPM_ACTIVE' while creating links.
     if (hba->sdev_ufs_device) {
     link = device_link_add(>sdev_gendev,
     >sdev_ufs_device->sdev_gendev,
    
 DL_FLAG_PM_RUNTIME|DL_FLAG_RPM_ACTIVE);
 I didn't expect this to resolve the issue anyway and it didn't.

 Another interesting point here is when I resume any of the above 
 suspended
 consumers, it all goes back to normal, which is kind of expected. I 
 tried
 resuming the consumer and the supplier is resumed and the supplier is
 suspended when all the consumers are suspended.

 Any pointers on this issue please?

 @Bart/@Alan - Do you've any pointers please?
>>>
>>> It's very noticeable that although you seem to have isolated a bug in
>>> the power management subsystem (supplier goes into runtime suspend
>>> even when one of its consumers is still active), you did not CC the
>>> power management maintainer or mailing list.
>>>
>>> I have added the appropriate CC's.
>>
>> Thanks Alan!


>>>
>>> Hello
>>> I & Can (thanks CanG) debugged this further:
>>>
>>> Looks like this issue can occur if the sd probe is asynchronous.

Re: [PATCH v10 1/2] scsi: ufs: Enable power management for wlun

2021-03-15 Thread Asutosh Das (asd)

On 3/14/2021 1:11 AM, Adrian Hunter wrote:

On 10/03/21 5:04 am, Asutosh Das (asd) wrote:

On 3/9/2021 7:56 AM, Asutosh Das (asd) wrote:

On 3/8/2021 9:17 AM, Rafael J. Wysocki wrote:

On Mon, Mar 8, 2021 at 5:21 PM Rafael J. Wysocki  wrote:


On Sat, Mar 6, 2021 at 5:17 PM Alan Stern  wrote:


On Fri, Mar 05, 2021 at 06:54:24PM -0800, Asutosh Das (asd) wrote:


Now during my testing I see a weird issue sometimes (1 in 7).
Scenario - bootups

Issue:
The supplier 'ufs_device_wlun 0:0:0:49488' goes into runtime suspend even
when one/more of its consumers are in RPM_ACTIVE state.

*Log:
[   10.056379][  T206] sd 0:0:0:1: [sdb] Synchronizing SCSI cache
[   10.062497][  T113] sd 0:0:0:5: [sdf] Synchronizing SCSI cache
[   10.356600][   T32] sd 0:0:0:7: [sdh] Synchronizing SCSI cache
[   10.362944][  T174] sd 0:0:0:3: [sdd] Synchronizing SCSI cache
[   10.696627][   T83] sd 0:0:0:2: [sdc] Synchronizing SCSI cache
[   10.704562][  T170] sd 0:0:0:6: [sdg] Synchronizing SCSI cache
[   10.980602][    T5] sd 0:0:0:0: [sda] Synchronizing SCSI cache

/** Printing all the consumer nodes of supplier **/
[   10.987327][    T5] ufs_device_wlun 0:0:0:49488: usage-count @ suspend: 0
<-- this is the usage_count
[   10.994440][    T5] ufs_rpmb_wlun 0:0:0:49476: PM state - 2
[   11.000402][    T5] scsi 0:0:0:49456: PM state - 2
[   11.005453][    T5] sd 0:0:0:0: PM state - 2
[   11.009958][    T5] sd 0:0:0:1: PM state - 2
[   11.014469][    T5] sd 0:0:0:2: PM state - 2
[   11.019072][    T5] sd 0:0:0:3: PM state - 2
[   11.023595][    T5] sd 0:0:0:4: PM state - 0 << RPM_ACTIVE
[   11.353298][    T5] sd 0:0:0:5: PM state - 2
[   11.357726][    T5] sd 0:0:0:6: PM state - 2
[   11.362155][    T5] sd 0:0:0:7: PM state - 2
[   11.366584][    T5] ufshcd-qcom 1d84000.ufshc: __ufshcd_wl_suspend - 8709
[   11.374366][    T5] ufs_device_wlun 0:0:0:49488: __ufshcd_wl_suspend -
(0) has rpm_active flags


Do you mean that rpm_active of the link between the consumer and the
supplier is greater than 0 at this point and the consumer is


I mean is rpm_active of the link greater than 1 (because 1 means "no
active references to the supplier")?

Hi Rafael:
No - it is not greater than 1.

I'm trying to understand what's going on in it; will update when I've something.




RPM_ACTIVE, but the supplier suspends successfully nevertheless?


[   11.383376][    T5] ufs_device_wlun 0:0:0:49488:
ufshcd_wl_runtime_suspend <-- Supplier suspends fine.
[   12.977318][  T174] sd 0:0:0:4: [sde] Synchronizing SCSI cache

And the the suspend of sde is stuck now:
schedule+0x9c/0xe0
schedule_timeout+0x40/0x128
io_schedule_timeout+0x44/0x68
wait_for_common_io+0x7c/0x100
wait_for_completion_io+0x14/0x20
blk_execute_rq+0x90/0xcc
__scsi_execute+0x104/0x1c4
sd_sync_cache+0xf8/0x2a0
sd_suspend_common+0x74/0x11c
sd_suspend_runtime+0x14/0x20
scsi_runtime_suspend+0x64/0x94
__rpm_callback+0x80/0x2a4
rpm_suspend+0x308/0x614
pm_runtime_work+0x98/0xa8

I added 'DL_FLAG_RPM_ACTIVE' while creating links.
    if (hba->sdev_ufs_device) {
    link = device_link_add(>sdev_gendev,
    >sdev_ufs_device->sdev_gendev,
   DL_FLAG_PM_RUNTIME|DL_FLAG_RPM_ACTIVE);
I didn't expect this to resolve the issue anyway and it didn't.

Another interesting point here is when I resume any of the above suspended
consumers, it all goes back to normal, which is kind of expected. I tried
resuming the consumer and the supplier is resumed and the supplier is
suspended when all the consumers are suspended.

Any pointers on this issue please?

@Bart/@Alan - Do you've any pointers please?


It's very noticeable that although you seem to have isolated a bug in
the power management subsystem (supplier goes into runtime suspend
even when one of its consumers is still active), you did not CC the
power management maintainer or mailing list.

I have added the appropriate CC's.


Thanks Alan!





Hello
I & Can (thanks CanG) debugged this further:

Looks like this issue can occur if the sd probe is asynchronous.

Essentially, the sd_probe() is done asynchronously and driver_probe_device() 
invokes pm_runtime_get_suppliers() before invoking sd_probe().

But scsi_probe_and_add_lun() runs in a separate context.
So the scsi_autopm_put_device() invoked from scsi_scan_host() context reduces the 
link->rpm_active to 1. And sd_probe() invokes scsi_autopm_put_device() and starts 
a timer. And then driver_probe_device() invoked from __device_attach_async_helper 
context reduces the link->rpm_active to 1 thus enabling the supplier to suspend 
before the consumer suspends.

So if:
Context T1:
[1] scsi_probe_and_add_lun()
[2]    |- scsi_autopm_put_device() - reduce the link->rpm_active to 1

Context T2:
__device_attach_async_helper()
 |- driver_probe_device()
     |- sd_probe()
In between [1] and [2] say, driver_probe_device() -> sd_probe() is invoked in a 
separate context from __device_attach_async_helper().
The driver_probe_device() -> 

Re: [PATCH v10 1/2] scsi: ufs: Enable power management for wlun

2021-03-14 Thread Adrian Hunter
On 10/03/21 5:04 am, Asutosh Das (asd) wrote:
> On 3/9/2021 7:56 AM, Asutosh Das (asd) wrote:
>> On 3/8/2021 9:17 AM, Rafael J. Wysocki wrote:
>>> On Mon, Mar 8, 2021 at 5:21 PM Rafael J. Wysocki  wrote:

 On Sat, Mar 6, 2021 at 5:17 PM Alan Stern  
 wrote:
>
> On Fri, Mar 05, 2021 at 06:54:24PM -0800, Asutosh Das (asd) wrote:
>
>> Now during my testing I see a weird issue sometimes (1 in 7).
>> Scenario - bootups
>>
>> Issue:
>> The supplier 'ufs_device_wlun 0:0:0:49488' goes into runtime suspend even
>> when one/more of its consumers are in RPM_ACTIVE state.
>>
>> *Log:
>> [   10.056379][  T206] sd 0:0:0:1: [sdb] Synchronizing SCSI cache
>> [   10.062497][  T113] sd 0:0:0:5: [sdf] Synchronizing SCSI cache
>> [   10.356600][   T32] sd 0:0:0:7: [sdh] Synchronizing SCSI cache
>> [   10.362944][  T174] sd 0:0:0:3: [sdd] Synchronizing SCSI cache
>> [   10.696627][   T83] sd 0:0:0:2: [sdc] Synchronizing SCSI cache
>> [   10.704562][  T170] sd 0:0:0:6: [sdg] Synchronizing SCSI cache
>> [   10.980602][    T5] sd 0:0:0:0: [sda] Synchronizing SCSI cache
>>
>> /** Printing all the consumer nodes of supplier **/
>> [   10.987327][    T5] ufs_device_wlun 0:0:0:49488: usage-count @ 
>> suspend: 0
>> <-- this is the usage_count
>> [   10.994440][    T5] ufs_rpmb_wlun 0:0:0:49476: PM state - 2
>> [   11.000402][    T5] scsi 0:0:0:49456: PM state - 2
>> [   11.005453][    T5] sd 0:0:0:0: PM state - 2
>> [   11.009958][    T5] sd 0:0:0:1: PM state - 2
>> [   11.014469][    T5] sd 0:0:0:2: PM state - 2
>> [   11.019072][    T5] sd 0:0:0:3: PM state - 2
>> [   11.023595][    T5] sd 0:0:0:4: PM state - 0 << RPM_ACTIVE
>> [   11.353298][    T5] sd 0:0:0:5: PM state - 2
>> [   11.357726][    T5] sd 0:0:0:6: PM state - 2
>> [   11.362155][    T5] sd 0:0:0:7: PM state - 2
>> [   11.366584][    T5] ufshcd-qcom 1d84000.ufshc: __ufshcd_wl_suspend - 
>> 8709
>> [   11.374366][    T5] ufs_device_wlun 0:0:0:49488: __ufshcd_wl_suspend -
>> (0) has rpm_active flags

 Do you mean that rpm_active of the link between the consumer and the
 supplier is greater than 0 at this point and the consumer is
>>>
>>> I mean is rpm_active of the link greater than 1 (because 1 means "no
>>> active references to the supplier")?
>> Hi Rafael:
>> No - it is not greater than 1.
>>
>> I'm trying to understand what's going on in it; will update when I've 
>> something.
>>
>>>
 RPM_ACTIVE, but the supplier suspends successfully nevertheless?

>> [   11.383376][    T5] ufs_device_wlun 0:0:0:49488:
>> ufshcd_wl_runtime_suspend <-- Supplier suspends fine.
>> [   12.977318][  T174] sd 0:0:0:4: [sde] Synchronizing SCSI cache
>>
>> And the the suspend of sde is stuck now:
>> schedule+0x9c/0xe0
>> schedule_timeout+0x40/0x128
>> io_schedule_timeout+0x44/0x68
>> wait_for_common_io+0x7c/0x100
>> wait_for_completion_io+0x14/0x20
>> blk_execute_rq+0x90/0xcc
>> __scsi_execute+0x104/0x1c4
>> sd_sync_cache+0xf8/0x2a0
>> sd_suspend_common+0x74/0x11c
>> sd_suspend_runtime+0x14/0x20
>> scsi_runtime_suspend+0x64/0x94
>> __rpm_callback+0x80/0x2a4
>> rpm_suspend+0x308/0x614
>> pm_runtime_work+0x98/0xa8
>>
>> I added 'DL_FLAG_RPM_ACTIVE' while creating links.
>>    if (hba->sdev_ufs_device) {
>>    link = device_link_add(>sdev_gendev,
>>    >sdev_ufs_device->sdev_gendev,
>>   DL_FLAG_PM_RUNTIME|DL_FLAG_RPM_ACTIVE);
>> I didn't expect this to resolve the issue anyway and it didn't.
>>
>> Another interesting point here is when I resume any of the above 
>> suspended
>> consumers, it all goes back to normal, which is kind of expected. I tried
>> resuming the consumer and the supplier is resumed and the supplier is
>> suspended when all the consumers are suspended.
>>
>> Any pointers on this issue please?
>>
>> @Bart/@Alan - Do you've any pointers please?
>
> It's very noticeable that although you seem to have isolated a bug in
> the power management subsystem (supplier goes into runtime suspend
> even when one of its consumers is still active), you did not CC the
> power management maintainer or mailing list.
>
> I have added the appropriate CC's.

 Thanks Alan!
>>
>>
> 
> Hello
> I & Can (thanks CanG) debugged this further:
> 
> Looks like this issue can occur if the sd probe is asynchronous.
> 
> Essentially, the sd_probe() is done asynchronously and driver_probe_device() 
> invokes pm_runtime_get_suppliers() before invoking sd_probe().
> 
> But scsi_probe_and_add_lun() runs in a separate context.
> So the scsi_autopm_put_device() invoked from scsi_scan_host() context reduces 
> the link->rpm_active to 1. And sd_probe() invokes 

Re: [PATCH v10 1/2] scsi: ufs: Enable power management for wlun

2021-03-10 Thread Asutosh Das (asd)

On 3/10/2021 8:27 AM, Alan Stern wrote:

On Tue, Mar 09, 2021 at 08:04:53PM -0800, Asutosh Das (asd) wrote:

On 3/9/2021 7:14 PM, Alan Stern wrote:

On Tue, Mar 09, 2021 at 07:04:34PM -0800, Asutosh Das (asd) wrote:

Hello
I & Can (thanks CanG) debugged this further:

Looks like this issue can occur if the sd probe is asynchronous.

Essentially, the sd_probe() is done asynchronously and driver_probe_device()
invokes pm_runtime_get_suppliers() before invoking sd_probe().

But scsi_probe_and_add_lun() runs in a separate context.
So the scsi_autopm_put_device() invoked from scsi_scan_host() context
reduces the link->rpm_active to 1. And sd_probe() invokes
scsi_autopm_put_device() and starts a timer. And then driver_probe_device()
invoked from __device_attach_async_helper context reduces the
link->rpm_active to 1 thus enabling the supplier to suspend before the
consumer suspends.



I don't see a way around this. Please let me know if you
(@Alan/@Bart/@Adrian) have any thoughts on this.


How about changing the SCSI core so that it does a runtime_get before
starting an async probe, and the async probe routine does a
runtime_put when it is finished?  In other words, don't allow a device
to go into runtime suspend while it is waiting to be probed.

I don't think that would be too intrusive.

Alan Stern



Hi Alan
Thanks for the suggestion.

Am trying to understand:

Do you mean something like this:

int scsi_sysfs_add_sdev(struct scsi_device *sdev)
{

scsi_autopm_get_device(sdev);
pm_runtime_get_noresume(>sdev_gendev);
[...]
scsi_autopm_put_device(sdev);
[...]
}

static int sd_probe(struct device *dev)
{
[...]
pm_runtime_put_noidle(dev);
scsi_autopm_put_device(sdp);
[...]
}

This may work (I'm limited by my imagination in scsi layer :) ).


I'm not sure about this.  To be honest, I did not read the entirety of
your last message; it had way too much detail.  THere's a time and place
for that, but when you're brainstorming to figure out the underlying
cause of a problem and come up with a strategy to fix it, you want to
concentrate on the overall picture, not the details.

As I understand the situation, you've get a SCSI target with multiple
logical units, let's say A and B, and you need to make sure that A never
goes into runtime suspend unless B is already suspended.  In other
words, B always has to suspend before A and resume after A.

To do this, you register a device link with A as the supplier and B as
the consumer.  Then the PM core takes care of the ordering for you.

But I don't understand when you set up the device link.  If the timing
is wrong then, thanks to async SCSI probing, you may have a situation
where A is registered before B and before the link is set up.  Then
there's temporarily nothing to stop A from suspending before B.

You also need to prevent each device from suspending before it is
probed.  That's the easy part I was trying to address before (although
it may not be so easy if the drivers are in loadable modules and not
present in the kernel).

You need to think through these issues before proposing actual changes.


But the pm_runtime_put_noidle() would have to be added to all registered
scsi_driver{}, perhaps? Or may be I can check for sdp->type?


Like this; it's too early to worry about this sort of thing.

Alan Stern


Hi Alan
Thanks. Understood.

I will check the details and see if I can come up with something.
I'll propose an alternate fix otherwise and drop this change altogether.

Thanks!
-asd

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
Linux Foundation Collaborative Project


Re: [PATCH v10 1/2] scsi: ufs: Enable power management for wlun

2021-03-10 Thread Alan Stern
On Tue, Mar 09, 2021 at 08:04:53PM -0800, Asutosh Das (asd) wrote:
> On 3/9/2021 7:14 PM, Alan Stern wrote:
> > On Tue, Mar 09, 2021 at 07:04:34PM -0800, Asutosh Das (asd) wrote:
> > > Hello
> > > I & Can (thanks CanG) debugged this further:
> > > 
> > > Looks like this issue can occur if the sd probe is asynchronous.
> > > 
> > > Essentially, the sd_probe() is done asynchronously and 
> > > driver_probe_device()
> > > invokes pm_runtime_get_suppliers() before invoking sd_probe().
> > > 
> > > But scsi_probe_and_add_lun() runs in a separate context.
> > > So the scsi_autopm_put_device() invoked from scsi_scan_host() context
> > > reduces the link->rpm_active to 1. And sd_probe() invokes
> > > scsi_autopm_put_device() and starts a timer. And then 
> > > driver_probe_device()
> > > invoked from __device_attach_async_helper context reduces the
> > > link->rpm_active to 1 thus enabling the supplier to suspend before the
> > > consumer suspends.
> > 
> > > I don't see a way around this. Please let me know if you
> > > (@Alan/@Bart/@Adrian) have any thoughts on this.
> > 
> > How about changing the SCSI core so that it does a runtime_get before
> > starting an async probe, and the async probe routine does a
> > runtime_put when it is finished?  In other words, don't allow a device
> > to go into runtime suspend while it is waiting to be probed.
> > 
> > I don't think that would be too intrusive.
> > 
> > Alan Stern
> > 
> 
> Hi Alan
> Thanks for the suggestion.
> 
> Am trying to understand:
> 
> Do you mean something like this:
> 
> int scsi_sysfs_add_sdev(struct scsi_device *sdev)
> {
>   
>   scsi_autopm_get_device(sdev);
>   pm_runtime_get_noresume(>sdev_gendev);
>   [...]
>   scsi_autopm_put_device(sdev);
>   [...]
> }
> 
> static int sd_probe(struct device *dev)
> {
>   [...]
>   pm_runtime_put_noidle(dev);
>   scsi_autopm_put_device(sdp);
>   [...]
> }
> 
> This may work (I'm limited by my imagination in scsi layer :) ).

I'm not sure about this.  To be honest, I did not read the entirety of 
your last message; it had way too much detail.  THere's a time and place 
for that, but when you're brainstorming to figure out the underlying 
cause of a problem and come up with a strategy to fix it, you want to 
concentrate on the overall picture, not the details.

As I understand the situation, you've get a SCSI target with multiple 
logical units, let's say A and B, and you need to make sure that A never 
goes into runtime suspend unless B is already suspended.  In other 
words, B always has to suspend before A and resume after A.

To do this, you register a device link with A as the supplier and B as 
the consumer.  Then the PM core takes care of the ordering for you.

But I don't understand when you set up the device link.  If the timing 
is wrong then, thanks to async SCSI probing, you may have a situation 
where A is registered before B and before the link is set up.  Then 
there's temporarily nothing to stop A from suspending before B.

You also need to prevent each device from suspending before it is 
probed.  That's the easy part I was trying to address before (although 
it may not be so easy if the drivers are in loadable modules and not 
present in the kernel).

You need to think through these issues before proposing actual changes.

> But the pm_runtime_put_noidle() would have to be added to all registered
> scsi_driver{}, perhaps? Or may be I can check for sdp->type?

Like this; it's too early to worry about this sort of thing.

Alan Stern


Re: [PATCH v10 1/2] scsi: ufs: Enable power management for wlun

2021-03-09 Thread Asutosh Das (asd)

On 3/9/2021 7:14 PM, Alan Stern wrote:

On Tue, Mar 09, 2021 at 07:04:34PM -0800, Asutosh Das (asd) wrote:

Hello
I & Can (thanks CanG) debugged this further:

Looks like this issue can occur if the sd probe is asynchronous.

Essentially, the sd_probe() is done asynchronously and driver_probe_device()
invokes pm_runtime_get_suppliers() before invoking sd_probe().

But scsi_probe_and_add_lun() runs in a separate context.
So the scsi_autopm_put_device() invoked from scsi_scan_host() context
reduces the link->rpm_active to 1. And sd_probe() invokes
scsi_autopm_put_device() and starts a timer. And then driver_probe_device()
invoked from __device_attach_async_helper context reduces the
link->rpm_active to 1 thus enabling the supplier to suspend before the
consumer suspends.



I don't see a way around this. Please let me know if you
(@Alan/@Bart/@Adrian) have any thoughts on this.


How about changing the SCSI core so that it does a runtime_get before
starting an async probe, and the async probe routine does a
runtime_put when it is finished?  In other words, don't allow a device
to go into runtime suspend while it is waiting to be probed.

I don't think that would be too intrusive.

Alan Stern



Hi Alan
Thanks for the suggestion.

Am trying to understand:

Do you mean something like this:

int scsi_sysfs_add_sdev(struct scsi_device *sdev)
{

scsi_autopm_get_device(sdev);
pm_runtime_get_noresume(>sdev_gendev);
[...]
scsi_autopm_put_device(sdev);
[...]
}

static int sd_probe(struct device *dev)
{
[...]
pm_runtime_put_noidle(dev);
scsi_autopm_put_device(sdp);
[...]
}

This may work (I'm limited by my imagination in scsi layer :) ).

But the pm_runtime_put_noidle() would have to be added to all registered 
scsi_driver{}, perhaps? Or may be I can check for sdp->type?


-asd

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
Linux Foundation Collaborative Project


Re: [PATCH v10 1/2] scsi: ufs: Enable power management for wlun

2021-03-09 Thread Alan Stern
On Tue, Mar 09, 2021 at 07:04:34PM -0800, Asutosh Das (asd) wrote:
> Hello
> I & Can (thanks CanG) debugged this further:
> 
> Looks like this issue can occur if the sd probe is asynchronous.
> 
> Essentially, the sd_probe() is done asynchronously and driver_probe_device()
> invokes pm_runtime_get_suppliers() before invoking sd_probe().
> 
> But scsi_probe_and_add_lun() runs in a separate context.
> So the scsi_autopm_put_device() invoked from scsi_scan_host() context
> reduces the link->rpm_active to 1. And sd_probe() invokes
> scsi_autopm_put_device() and starts a timer. And then driver_probe_device()
> invoked from __device_attach_async_helper context reduces the
> link->rpm_active to 1 thus enabling the supplier to suspend before the
> consumer suspends.

> I don't see a way around this. Please let me know if you
> (@Alan/@Bart/@Adrian) have any thoughts on this.

How about changing the SCSI core so that it does a runtime_get before
starting an async probe, and the async probe routine does a
runtime_put when it is finished?  In other words, don't allow a device
to go into runtime suspend while it is waiting to be probed.

I don't think that would be too intrusive.

Alan Stern


Re: [PATCH v10 1/2] scsi: ufs: Enable power management for wlun

2021-03-09 Thread Asutosh Das (asd)

On 3/9/2021 7:56 AM, Asutosh Das (asd) wrote:

On 3/8/2021 9:17 AM, Rafael J. Wysocki wrote:
On Mon, Mar 8, 2021 at 5:21 PM Rafael J. Wysocki  
wrote:


On Sat, Mar 6, 2021 at 5:17 PM Alan Stern  
wrote:


On Fri, Mar 05, 2021 at 06:54:24PM -0800, Asutosh Das (asd) wrote:


Now during my testing I see a weird issue sometimes (1 in 7).
Scenario - bootups

Issue:
The supplier 'ufs_device_wlun 0:0:0:49488' goes into runtime 
suspend even

when one/more of its consumers are in RPM_ACTIVE state.

*Log:
[   10.056379][  T206] sd 0:0:0:1: [sdb] Synchronizing SCSI cache
[   10.062497][  T113] sd 0:0:0:5: [sdf] Synchronizing SCSI cache
[   10.356600][   T32] sd 0:0:0:7: [sdh] Synchronizing SCSI cache
[   10.362944][  T174] sd 0:0:0:3: [sdd] Synchronizing SCSI cache
[   10.696627][   T83] sd 0:0:0:2: [sdc] Synchronizing SCSI cache
[   10.704562][  T170] sd 0:0:0:6: [sdg] Synchronizing SCSI cache
[   10.980602][    T5] sd 0:0:0:0: [sda] Synchronizing SCSI cache

/** Printing all the consumer nodes of supplier **/
[   10.987327][    T5] ufs_device_wlun 0:0:0:49488: usage-count @ 
suspend: 0

<-- this is the usage_count
[   10.994440][    T5] ufs_rpmb_wlun 0:0:0:49476: PM state - 2
[   11.000402][    T5] scsi 0:0:0:49456: PM state - 2
[   11.005453][    T5] sd 0:0:0:0: PM state - 2
[   11.009958][    T5] sd 0:0:0:1: PM state - 2
[   11.014469][    T5] sd 0:0:0:2: PM state - 2
[   11.019072][    T5] sd 0:0:0:3: PM state - 2
[   11.023595][    T5] sd 0:0:0:4: PM state - 0 << RPM_ACTIVE
[   11.353298][    T5] sd 0:0:0:5: PM state - 2
[   11.357726][    T5] sd 0:0:0:6: PM state - 2
[   11.362155][    T5] sd 0:0:0:7: PM state - 2
[   11.366584][    T5] ufshcd-qcom 1d84000.ufshc: 
__ufshcd_wl_suspend - 8709
[   11.374366][    T5] ufs_device_wlun 0:0:0:49488: 
__ufshcd_wl_suspend -

(0) has rpm_active flags


Do you mean that rpm_active of the link between the consumer and the
supplier is greater than 0 at this point and the consumer is


I mean is rpm_active of the link greater than 1 (because 1 means "no
active references to the supplier")?

Hi Rafael:
No - it is not greater than 1.

I'm trying to understand what's going on in it; will update when I've 
something.





RPM_ACTIVE, but the supplier suspends successfully nevertheless?


[   11.383376][    T5] ufs_device_wlun 0:0:0:49488:
ufshcd_wl_runtime_suspend <-- Supplier suspends fine.
[   12.977318][  T174] sd 0:0:0:4: [sde] Synchronizing SCSI cache

And the the suspend of sde is stuck now:
schedule+0x9c/0xe0
schedule_timeout+0x40/0x128
io_schedule_timeout+0x44/0x68
wait_for_common_io+0x7c/0x100
wait_for_completion_io+0x14/0x20
blk_execute_rq+0x90/0xcc
__scsi_execute+0x104/0x1c4
sd_sync_cache+0xf8/0x2a0
sd_suspend_common+0x74/0x11c
sd_suspend_runtime+0x14/0x20
scsi_runtime_suspend+0x64/0x94
__rpm_callback+0x80/0x2a4
rpm_suspend+0x308/0x614
pm_runtime_work+0x98/0xa8

I added 'DL_FLAG_RPM_ACTIVE' while creating links.
   if (hba->sdev_ufs_device) {
   link = device_link_add(>sdev_gendev,
   >sdev_ufs_device->sdev_gendev,
  
DL_FLAG_PM_RUNTIME|DL_FLAG_RPM_ACTIVE);

I didn't expect this to resolve the issue anyway and it didn't.

Another interesting point here is when I resume any of the above 
suspended
consumers, it all goes back to normal, which is kind of expected. I 
tried

resuming the consumer and the supplier is resumed and the supplier is
suspended when all the consumers are suspended.

Any pointers on this issue please?

@Bart/@Alan - Do you've any pointers please?


It's very noticeable that although you seem to have isolated a bug in
the power management subsystem (supplier goes into runtime suspend
even when one of its consumers is still active), you did not CC the
power management maintainer or mailing list.

I have added the appropriate CC's.


Thanks Alan!





Hello
I & Can (thanks CanG) debugged this further:

Looks like this issue can occur if the sd probe is asynchronous.

Essentially, the sd_probe() is done asynchronously and 
driver_probe_device() invokes pm_runtime_get_suppliers() before invoking 
sd_probe().


But scsi_probe_and_add_lun() runs in a separate context.
So the scsi_autopm_put_device() invoked from scsi_scan_host() context 
reduces the link->rpm_active to 1. And sd_probe() invokes 
scsi_autopm_put_device() and starts a timer. And then 
driver_probe_device() invoked from __device_attach_async_helper context 
reduces the link->rpm_active to 1 thus enabling the supplier to suspend 
before the consumer suspends.


So if:
Context T1:
[1] scsi_probe_and_add_lun()
[2] |- scsi_autopm_put_device() - reduce the link->rpm_active to 1

Context T2:
__device_attach_async_helper()
|- driver_probe_device()
|- sd_probe()
In between [1] and [2] say, driver_probe_device() -> sd_probe() is 
invoked in a separate context from __device_attach_async_helper().
The driver_probe_device() -> pm_runtime_get_suppliers() but [2] would 
reduce link->rpm_active 

Re: [PATCH v10 1/2] scsi: ufs: Enable power management for wlun

2021-03-09 Thread Asutosh Das (asd)

On 3/8/2021 9:17 AM, Rafael J. Wysocki wrote:

On Mon, Mar 8, 2021 at 5:21 PM Rafael J. Wysocki  wrote:


On Sat, Mar 6, 2021 at 5:17 PM Alan Stern  wrote:


On Fri, Mar 05, 2021 at 06:54:24PM -0800, Asutosh Das (asd) wrote:


Now during my testing I see a weird issue sometimes (1 in 7).
Scenario - bootups

Issue:
The supplier 'ufs_device_wlun 0:0:0:49488' goes into runtime suspend even
when one/more of its consumers are in RPM_ACTIVE state.

*Log:
[   10.056379][  T206] sd 0:0:0:1: [sdb] Synchronizing SCSI cache
[   10.062497][  T113] sd 0:0:0:5: [sdf] Synchronizing SCSI cache
[   10.356600][   T32] sd 0:0:0:7: [sdh] Synchronizing SCSI cache
[   10.362944][  T174] sd 0:0:0:3: [sdd] Synchronizing SCSI cache
[   10.696627][   T83] sd 0:0:0:2: [sdc] Synchronizing SCSI cache
[   10.704562][  T170] sd 0:0:0:6: [sdg] Synchronizing SCSI cache
[   10.980602][T5] sd 0:0:0:0: [sda] Synchronizing SCSI cache

/** Printing all the consumer nodes of supplier **/
[   10.987327][T5] ufs_device_wlun 0:0:0:49488: usage-count @ suspend: 0
<-- this is the usage_count
[   10.994440][T5] ufs_rpmb_wlun 0:0:0:49476: PM state - 2
[   11.000402][T5] scsi 0:0:0:49456: PM state - 2
[   11.005453][T5] sd 0:0:0:0: PM state - 2
[   11.009958][T5] sd 0:0:0:1: PM state - 2
[   11.014469][T5] sd 0:0:0:2: PM state - 2
[   11.019072][T5] sd 0:0:0:3: PM state - 2
[   11.023595][T5] sd 0:0:0:4: PM state - 0 << RPM_ACTIVE
[   11.353298][T5] sd 0:0:0:5: PM state - 2
[   11.357726][T5] sd 0:0:0:6: PM state - 2
[   11.362155][T5] sd 0:0:0:7: PM state - 2
[   11.366584][T5] ufshcd-qcom 1d84000.ufshc: __ufshcd_wl_suspend - 8709
[   11.374366][T5] ufs_device_wlun 0:0:0:49488: __ufshcd_wl_suspend -
(0) has rpm_active flags


Do you mean that rpm_active of the link between the consumer and the
supplier is greater than 0 at this point and the consumer is


I mean is rpm_active of the link greater than 1 (because 1 means "no
active references to the supplier")?

Hi Rafael:
No - it is not greater than 1.

I'm trying to understand what's going on in it; will update when I've 
something.





RPM_ACTIVE, but the supplier suspends successfully nevertheless?


[   11.383376][T5] ufs_device_wlun 0:0:0:49488:
ufshcd_wl_runtime_suspend <-- Supplier suspends fine.
[   12.977318][  T174] sd 0:0:0:4: [sde] Synchronizing SCSI cache

And the the suspend of sde is stuck now:
schedule+0x9c/0xe0
schedule_timeout+0x40/0x128
io_schedule_timeout+0x44/0x68
wait_for_common_io+0x7c/0x100
wait_for_completion_io+0x14/0x20
blk_execute_rq+0x90/0xcc
__scsi_execute+0x104/0x1c4
sd_sync_cache+0xf8/0x2a0
sd_suspend_common+0x74/0x11c
sd_suspend_runtime+0x14/0x20
scsi_runtime_suspend+0x64/0x94
__rpm_callback+0x80/0x2a4
rpm_suspend+0x308/0x614
pm_runtime_work+0x98/0xa8

I added 'DL_FLAG_RPM_ACTIVE' while creating links.
   if (hba->sdev_ufs_device) {
   link = device_link_add(>sdev_gendev,
   >sdev_ufs_device->sdev_gendev,
  DL_FLAG_PM_RUNTIME|DL_FLAG_RPM_ACTIVE);
I didn't expect this to resolve the issue anyway and it didn't.

Another interesting point here is when I resume any of the above suspended
consumers, it all goes back to normal, which is kind of expected. I tried
resuming the consumer and the supplier is resumed and the supplier is
suspended when all the consumers are suspended.

Any pointers on this issue please?

@Bart/@Alan - Do you've any pointers please?


It's very noticeable that although you seem to have isolated a bug in
the power management subsystem (supplier goes into runtime suspend
even when one of its consumers is still active), you did not CC the
power management maintainer or mailing list.

I have added the appropriate CC's.


Thanks Alan!



--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
Linux Foundation Collaborative Project


Re: [PATCH v10 1/2] scsi: ufs: Enable power management for wlun

2021-03-08 Thread Rafael J. Wysocki
On Mon, Mar 8, 2021 at 5:21 PM Rafael J. Wysocki  wrote:
>
> On Sat, Mar 6, 2021 at 5:17 PM Alan Stern  wrote:
> >
> > On Fri, Mar 05, 2021 at 06:54:24PM -0800, Asutosh Das (asd) wrote:
> >
> > > Now during my testing I see a weird issue sometimes (1 in 7).
> > > Scenario - bootups
> > >
> > > Issue:
> > > The supplier 'ufs_device_wlun 0:0:0:49488' goes into runtime suspend even
> > > when one/more of its consumers are in RPM_ACTIVE state.
> > >
> > > *Log:
> > > [   10.056379][  T206] sd 0:0:0:1: [sdb] Synchronizing SCSI cache
> > > [   10.062497][  T113] sd 0:0:0:5: [sdf] Synchronizing SCSI cache
> > > [   10.356600][   T32] sd 0:0:0:7: [sdh] Synchronizing SCSI cache
> > > [   10.362944][  T174] sd 0:0:0:3: [sdd] Synchronizing SCSI cache
> > > [   10.696627][   T83] sd 0:0:0:2: [sdc] Synchronizing SCSI cache
> > > [   10.704562][  T170] sd 0:0:0:6: [sdg] Synchronizing SCSI cache
> > > [   10.980602][T5] sd 0:0:0:0: [sda] Synchronizing SCSI cache
> > >
> > > /** Printing all the consumer nodes of supplier **/
> > > [   10.987327][T5] ufs_device_wlun 0:0:0:49488: usage-count @ 
> > > suspend: 0
> > > <-- this is the usage_count
> > > [   10.994440][T5] ufs_rpmb_wlun 0:0:0:49476: PM state - 2
> > > [   11.000402][T5] scsi 0:0:0:49456: PM state - 2
> > > [   11.005453][T5] sd 0:0:0:0: PM state - 2
> > > [   11.009958][T5] sd 0:0:0:1: PM state - 2
> > > [   11.014469][T5] sd 0:0:0:2: PM state - 2
> > > [   11.019072][T5] sd 0:0:0:3: PM state - 2
> > > [   11.023595][T5] sd 0:0:0:4: PM state - 0 << RPM_ACTIVE
> > > [   11.353298][T5] sd 0:0:0:5: PM state - 2
> > > [   11.357726][T5] sd 0:0:0:6: PM state - 2
> > > [   11.362155][T5] sd 0:0:0:7: PM state - 2
> > > [   11.366584][T5] ufshcd-qcom 1d84000.ufshc: __ufshcd_wl_suspend - 
> > > 8709
> > > [   11.374366][T5] ufs_device_wlun 0:0:0:49488: __ufshcd_wl_suspend -
> > > (0) has rpm_active flags
>
> Do you mean that rpm_active of the link between the consumer and the
> supplier is greater than 0 at this point and the consumer is

I mean is rpm_active of the link greater than 1 (because 1 means "no
active references to the supplier")?

> RPM_ACTIVE, but the supplier suspends successfully nevertheless?
>
> > > [   11.383376][T5] ufs_device_wlun 0:0:0:49488:
> > > ufshcd_wl_runtime_suspend <-- Supplier suspends fine.
> > > [   12.977318][  T174] sd 0:0:0:4: [sde] Synchronizing SCSI cache
> > >
> > > And the the suspend of sde is stuck now:
> > > schedule+0x9c/0xe0
> > > schedule_timeout+0x40/0x128
> > > io_schedule_timeout+0x44/0x68
> > > wait_for_common_io+0x7c/0x100
> > > wait_for_completion_io+0x14/0x20
> > > blk_execute_rq+0x90/0xcc
> > > __scsi_execute+0x104/0x1c4
> > > sd_sync_cache+0xf8/0x2a0
> > > sd_suspend_common+0x74/0x11c
> > > sd_suspend_runtime+0x14/0x20
> > > scsi_runtime_suspend+0x64/0x94
> > > __rpm_callback+0x80/0x2a4
> > > rpm_suspend+0x308/0x614
> > > pm_runtime_work+0x98/0xa8
> > >
> > > I added 'DL_FLAG_RPM_ACTIVE' while creating links.
> > >   if (hba->sdev_ufs_device) {
> > >   link = device_link_add(>sdev_gendev,
> > >   >sdev_ufs_device->sdev_gendev,
> > >  DL_FLAG_PM_RUNTIME|DL_FLAG_RPM_ACTIVE);
> > > I didn't expect this to resolve the issue anyway and it didn't.
> > >
> > > Another interesting point here is when I resume any of the above suspended
> > > consumers, it all goes back to normal, which is kind of expected. I tried
> > > resuming the consumer and the supplier is resumed and the supplier is
> > > suspended when all the consumers are suspended.
> > >
> > > Any pointers on this issue please?
> > >
> > > @Bart/@Alan - Do you've any pointers please?
> >
> > It's very noticeable that although you seem to have isolated a bug in
> > the power management subsystem (supplier goes into runtime suspend
> > even when one of its consumers is still active), you did not CC the
> > power management maintainer or mailing list.
> >
> > I have added the appropriate CC's.
>
> Thanks Alan!


Re: [PATCH v10 1/2] scsi: ufs: Enable power management for wlun

2021-03-08 Thread Rafael J. Wysocki
On Sat, Mar 6, 2021 at 5:17 PM Alan Stern  wrote:
>
> On Fri, Mar 05, 2021 at 06:54:24PM -0800, Asutosh Das (asd) wrote:
>
> > Now during my testing I see a weird issue sometimes (1 in 7).
> > Scenario - bootups
> >
> > Issue:
> > The supplier 'ufs_device_wlun 0:0:0:49488' goes into runtime suspend even
> > when one/more of its consumers are in RPM_ACTIVE state.
> >
> > *Log:
> > [   10.056379][  T206] sd 0:0:0:1: [sdb] Synchronizing SCSI cache
> > [   10.062497][  T113] sd 0:0:0:5: [sdf] Synchronizing SCSI cache
> > [   10.356600][   T32] sd 0:0:0:7: [sdh] Synchronizing SCSI cache
> > [   10.362944][  T174] sd 0:0:0:3: [sdd] Synchronizing SCSI cache
> > [   10.696627][   T83] sd 0:0:0:2: [sdc] Synchronizing SCSI cache
> > [   10.704562][  T170] sd 0:0:0:6: [sdg] Synchronizing SCSI cache
> > [   10.980602][T5] sd 0:0:0:0: [sda] Synchronizing SCSI cache
> >
> > /** Printing all the consumer nodes of supplier **/
> > [   10.987327][T5] ufs_device_wlun 0:0:0:49488: usage-count @ suspend: 0
> > <-- this is the usage_count
> > [   10.994440][T5] ufs_rpmb_wlun 0:0:0:49476: PM state - 2
> > [   11.000402][T5] scsi 0:0:0:49456: PM state - 2
> > [   11.005453][T5] sd 0:0:0:0: PM state - 2
> > [   11.009958][T5] sd 0:0:0:1: PM state - 2
> > [   11.014469][T5] sd 0:0:0:2: PM state - 2
> > [   11.019072][T5] sd 0:0:0:3: PM state - 2
> > [   11.023595][T5] sd 0:0:0:4: PM state - 0 << RPM_ACTIVE
> > [   11.353298][T5] sd 0:0:0:5: PM state - 2
> > [   11.357726][T5] sd 0:0:0:6: PM state - 2
> > [   11.362155][T5] sd 0:0:0:7: PM state - 2
> > [   11.366584][T5] ufshcd-qcom 1d84000.ufshc: __ufshcd_wl_suspend - 8709
> > [   11.374366][T5] ufs_device_wlun 0:0:0:49488: __ufshcd_wl_suspend -
> > (0) has rpm_active flags

Do you mean that rpm_active of the link between the consumer and the
supplier is greater than 0 at this point and the consumer is
RPM_ACTIVE, but the supplier suspends successfully nevertheless?

> > [   11.383376][T5] ufs_device_wlun 0:0:0:49488:
> > ufshcd_wl_runtime_suspend <-- Supplier suspends fine.
> > [   12.977318][  T174] sd 0:0:0:4: [sde] Synchronizing SCSI cache
> >
> > And the the suspend of sde is stuck now:
> > schedule+0x9c/0xe0
> > schedule_timeout+0x40/0x128
> > io_schedule_timeout+0x44/0x68
> > wait_for_common_io+0x7c/0x100
> > wait_for_completion_io+0x14/0x20
> > blk_execute_rq+0x90/0xcc
> > __scsi_execute+0x104/0x1c4
> > sd_sync_cache+0xf8/0x2a0
> > sd_suspend_common+0x74/0x11c
> > sd_suspend_runtime+0x14/0x20
> > scsi_runtime_suspend+0x64/0x94
> > __rpm_callback+0x80/0x2a4
> > rpm_suspend+0x308/0x614
> > pm_runtime_work+0x98/0xa8
> >
> > I added 'DL_FLAG_RPM_ACTIVE' while creating links.
> >   if (hba->sdev_ufs_device) {
> >   link = device_link_add(>sdev_gendev,
> >   >sdev_ufs_device->sdev_gendev,
> >  DL_FLAG_PM_RUNTIME|DL_FLAG_RPM_ACTIVE);
> > I didn't expect this to resolve the issue anyway and it didn't.
> >
> > Another interesting point here is when I resume any of the above suspended
> > consumers, it all goes back to normal, which is kind of expected. I tried
> > resuming the consumer and the supplier is resumed and the supplier is
> > suspended when all the consumers are suspended.
> >
> > Any pointers on this issue please?
> >
> > @Bart/@Alan - Do you've any pointers please?
>
> It's very noticeable that although you seem to have isolated a bug in
> the power management subsystem (supplier goes into runtime suspend
> even when one of its consumers is still active), you did not CC the
> power management maintainer or mailing list.
>
> I have added the appropriate CC's.

Thanks Alan!


Re: [PATCH v10 1/2] scsi: ufs: Enable power management for wlun

2021-03-06 Thread Alan Stern
On Fri, Mar 05, 2021 at 06:54:24PM -0800, Asutosh Das (asd) wrote:

> Now during my testing I see a weird issue sometimes (1 in 7).
> Scenario - bootups
> 
> Issue:
> The supplier 'ufs_device_wlun 0:0:0:49488' goes into runtime suspend even
> when one/more of its consumers are in RPM_ACTIVE state.
> 
> *Log:
> [   10.056379][  T206] sd 0:0:0:1: [sdb] Synchronizing SCSI cache
> [   10.062497][  T113] sd 0:0:0:5: [sdf] Synchronizing SCSI cache
> [   10.356600][   T32] sd 0:0:0:7: [sdh] Synchronizing SCSI cache
> [   10.362944][  T174] sd 0:0:0:3: [sdd] Synchronizing SCSI cache
> [   10.696627][   T83] sd 0:0:0:2: [sdc] Synchronizing SCSI cache
> [   10.704562][  T170] sd 0:0:0:6: [sdg] Synchronizing SCSI cache
> [   10.980602][T5] sd 0:0:0:0: [sda] Synchronizing SCSI cache
> 
> /** Printing all the consumer nodes of supplier **/
> [   10.987327][T5] ufs_device_wlun 0:0:0:49488: usage-count @ suspend: 0
> <-- this is the usage_count
> [   10.994440][T5] ufs_rpmb_wlun 0:0:0:49476: PM state - 2
> [   11.000402][T5] scsi 0:0:0:49456: PM state - 2
> [   11.005453][T5] sd 0:0:0:0: PM state - 2
> [   11.009958][T5] sd 0:0:0:1: PM state - 2
> [   11.014469][T5] sd 0:0:0:2: PM state - 2
> [   11.019072][T5] sd 0:0:0:3: PM state - 2
> [   11.023595][T5] sd 0:0:0:4: PM state - 0 << RPM_ACTIVE
> [   11.353298][T5] sd 0:0:0:5: PM state - 2
> [   11.357726][T5] sd 0:0:0:6: PM state - 2
> [   11.362155][T5] sd 0:0:0:7: PM state - 2
> [   11.366584][T5] ufshcd-qcom 1d84000.ufshc: __ufshcd_wl_suspend - 8709
> [   11.374366][T5] ufs_device_wlun 0:0:0:49488: __ufshcd_wl_suspend -
> (0) has rpm_active flags
> [   11.383376][T5] ufs_device_wlun 0:0:0:49488:
> ufshcd_wl_runtime_suspend <-- Supplier suspends fine.
> [   12.977318][  T174] sd 0:0:0:4: [sde] Synchronizing SCSI cache
> 
> And the the suspend of sde is stuck now:
> schedule+0x9c/0xe0
> schedule_timeout+0x40/0x128
> io_schedule_timeout+0x44/0x68
> wait_for_common_io+0x7c/0x100
> wait_for_completion_io+0x14/0x20
> blk_execute_rq+0x90/0xcc
> __scsi_execute+0x104/0x1c4
> sd_sync_cache+0xf8/0x2a0
> sd_suspend_common+0x74/0x11c
> sd_suspend_runtime+0x14/0x20
> scsi_runtime_suspend+0x64/0x94
> __rpm_callback+0x80/0x2a4
> rpm_suspend+0x308/0x614
> pm_runtime_work+0x98/0xa8
> 
> I added 'DL_FLAG_RPM_ACTIVE' while creating links.
>   if (hba->sdev_ufs_device) {
>   link = device_link_add(>sdev_gendev,
>   >sdev_ufs_device->sdev_gendev,
>  DL_FLAG_PM_RUNTIME|DL_FLAG_RPM_ACTIVE);
> I didn't expect this to resolve the issue anyway and it didn't.
> 
> Another interesting point here is when I resume any of the above suspended
> consumers, it all goes back to normal, which is kind of expected. I tried
> resuming the consumer and the supplier is resumed and the supplier is
> suspended when all the consumers are suspended.
> 
> Any pointers on this issue please?
> 
> @Bart/@Alan - Do you've any pointers please?

It's very noticeable that although you seem to have isolated a bug in
the power management subsystem (supplier goes into runtime suspend
even when one of its consumers is still active), you did not CC the
power management maintainer or mailing list.

I have added the appropriate CC's.

Alan Stern


Re: [PATCH v10 1/2] scsi: ufs: Enable power management for wlun

2021-03-05 Thread Asutosh Das (asd)

On 3/4/2021 7:35 AM, Adrian Hunter wrote:

On 3/03/21 12:52 am, Asutosh Das wrote:

During runtime-suspend of ufs host, the scsi devices are
already suspended and so are the queues associated with them.
But the ufs host sends SSU to wlun during its runtime-suspend.
During the process blk_queue_enter checks if the queue is not in
suspended state. If so, it waits for the queue to resume, and never
comes out of it.
The commit
(d55d15a33: scsi: block: Do not accept any requests while suspended)
adds the check if the queue is in suspended state in blk_queue_enter().

Call trace:
  __switch_to+0x174/0x2c4
  __schedule+0x478/0x764
  schedule+0x9c/0xe0
  blk_queue_enter+0x158/0x228
  blk_mq_alloc_request+0x40/0xa4
  blk_get_request+0x2c/0x70
  __scsi_execute+0x60/0x1c4
  ufshcd_set_dev_pwr_mode+0x124/0x1e4
  ufshcd_suspend+0x208/0x83c
  ufshcd_runtime_suspend+0x40/0x154
  ufshcd_pltfrm_runtime_suspend+0x14/0x20
  pm_generic_runtime_suspend+0x28/0x3c
  __rpm_callback+0x80/0x2a4
  rpm_suspend+0x308/0x614
  rpm_idle+0x158/0x228
  pm_runtime_work+0x84/0xac
  process_one_work+0x1f0/0x470
  worker_thread+0x26c/0x4c8
  kthread+0x13c/0x320
  ret_from_fork+0x10/0x18

Fix this by registering ufs device wlun as a scsi driver and
registering it for block runtime-pm. Also make this as a
supplier for all other luns. That way, this device wlun
suspends after all the consumers and resumes after
hba resumes.

Co-developed-by: Can Guo 
Signed-off-by: Can Guo 
Signed-off-by: Asutosh Das 


It looks good, but still a few further comments below.

Also, do you think ufshcd_err_handling_prepare()/unprepare()
need changes?  And ufshcd_recover_pm_error()?

And maybe ufshcd_auto_hibern8_update() when it is called from ufs_sysfs.c?


Hi Adrian
Sure, I'll fix it in the next version.

Now during my testing I see a weird issue sometimes (1 in 7).
Scenario - bootups

Issue:
The supplier 'ufs_device_wlun 0:0:0:49488' goes into runtime suspend 
even when one/more of its consumers are in RPM_ACTIVE state.


*Log:
[   10.056379][  T206] sd 0:0:0:1: [sdb] Synchronizing SCSI cache
[   10.062497][  T113] sd 0:0:0:5: [sdf] Synchronizing SCSI cache
[   10.356600][   T32] sd 0:0:0:7: [sdh] Synchronizing SCSI cache
[   10.362944][  T174] sd 0:0:0:3: [sdd] Synchronizing SCSI cache
[   10.696627][   T83] sd 0:0:0:2: [sdc] Synchronizing SCSI cache
[   10.704562][  T170] sd 0:0:0:6: [sdg] Synchronizing SCSI cache
[   10.980602][T5] sd 0:0:0:0: [sda] Synchronizing SCSI cache

/** Printing all the consumer nodes of supplier **/
[   10.987327][T5] ufs_device_wlun 0:0:0:49488: usage-count @ 
suspend: 0 <-- this is the usage_count

[   10.994440][T5] ufs_rpmb_wlun 0:0:0:49476: PM state - 2
[   11.000402][T5] scsi 0:0:0:49456: PM state - 2
[   11.005453][T5] sd 0:0:0:0: PM state - 2
[   11.009958][T5] sd 0:0:0:1: PM state - 2
[   11.014469][T5] sd 0:0:0:2: PM state - 2
[   11.019072][T5] sd 0:0:0:3: PM state - 2
[   11.023595][T5] sd 0:0:0:4: PM state - 0 << RPM_ACTIVE
[   11.353298][T5] sd 0:0:0:5: PM state - 2
[   11.357726][T5] sd 0:0:0:6: PM state - 2
[   11.362155][T5] sd 0:0:0:7: PM state - 2
[   11.366584][T5] ufshcd-qcom 1d84000.ufshc: __ufshcd_wl_suspend - 8709
[   11.374366][T5] ufs_device_wlun 0:0:0:49488: __ufshcd_wl_suspend 
- (0) has rpm_active flags
[   11.383376][T5] ufs_device_wlun 0:0:0:49488: 
ufshcd_wl_runtime_suspend <-- Supplier suspends fine.

[   12.977318][  T174] sd 0:0:0:4: [sde] Synchronizing SCSI cache

And the the suspend of sde is stuck now:
schedule+0x9c/0xe0
schedule_timeout+0x40/0x128
io_schedule_timeout+0x44/0x68
wait_for_common_io+0x7c/0x100
wait_for_completion_io+0x14/0x20
blk_execute_rq+0x90/0xcc
__scsi_execute+0x104/0x1c4
sd_sync_cache+0xf8/0x2a0
sd_suspend_common+0x74/0x11c
sd_suspend_runtime+0x14/0x20
scsi_runtime_suspend+0x64/0x94
__rpm_callback+0x80/0x2a4
rpm_suspend+0x308/0x614
pm_runtime_work+0x98/0xa8

I added 'DL_FLAG_RPM_ACTIVE' while creating links.
  if (hba->sdev_ufs_device) {
  link = device_link_add(>sdev_gendev,
  >sdev_ufs_device->sdev_gendev,
 DL_FLAG_PM_RUNTIME|DL_FLAG_RPM_ACTIVE);
I didn't expect this to resolve the issue anyway and it didn't.

Another interesting point here is when I resume any of the above 
suspended consumers, it all goes back to normal, which is kind of 
expected. I tried resuming the consumer and the supplier is resumed and 
the supplier is suspended when all the consumers are suspended.


Any pointers on this issue please?

@Bart/@Alan - Do you've any pointers please?

Thanks,
AsutoshD





---
  drivers/scsi/ufs/cdns-pltfrm.c |   2 +
  drivers/scsi/ufs/tc-dwc-g210-pci.c |   2 +
  drivers/scsi/ufs/ufs-exynos.c  |   2 +
  drivers/scsi/ufs/ufs-hisi.c|   2 +
  drivers/scsi/ufs/ufs-mediatek.c|   2 +
  drivers/scsi/ufs/ufs-qcom.c|   2 +
  drivers/scsi/ufs/ufs_bsg.c |   6 +-
  drivers/scsi/ufs/ufshcd-pci.c  |  32 

Re: [PATCH v10 1/2] scsi: ufs: Enable power management for wlun

2021-03-04 Thread Adrian Hunter
On 3/03/21 12:52 am, Asutosh Das wrote:
> During runtime-suspend of ufs host, the scsi devices are
> already suspended and so are the queues associated with them.
> But the ufs host sends SSU to wlun during its runtime-suspend.
> During the process blk_queue_enter checks if the queue is not in
> suspended state. If so, it waits for the queue to resume, and never
> comes out of it.
> The commit
> (d55d15a33: scsi: block: Do not accept any requests while suspended)
> adds the check if the queue is in suspended state in blk_queue_enter().
> 
> Call trace:
>  __switch_to+0x174/0x2c4
>  __schedule+0x478/0x764
>  schedule+0x9c/0xe0
>  blk_queue_enter+0x158/0x228
>  blk_mq_alloc_request+0x40/0xa4
>  blk_get_request+0x2c/0x70
>  __scsi_execute+0x60/0x1c4
>  ufshcd_set_dev_pwr_mode+0x124/0x1e4
>  ufshcd_suspend+0x208/0x83c
>  ufshcd_runtime_suspend+0x40/0x154
>  ufshcd_pltfrm_runtime_suspend+0x14/0x20
>  pm_generic_runtime_suspend+0x28/0x3c
>  __rpm_callback+0x80/0x2a4
>  rpm_suspend+0x308/0x614
>  rpm_idle+0x158/0x228
>  pm_runtime_work+0x84/0xac
>  process_one_work+0x1f0/0x470
>  worker_thread+0x26c/0x4c8
>  kthread+0x13c/0x320
>  ret_from_fork+0x10/0x18
> 
> Fix this by registering ufs device wlun as a scsi driver and
> registering it for block runtime-pm. Also make this as a
> supplier for all other luns. That way, this device wlun
> suspends after all the consumers and resumes after
> hba resumes.
> 
> Co-developed-by: Can Guo 
> Signed-off-by: Can Guo 
> Signed-off-by: Asutosh Das 

It looks good, but still a few further comments below.

Also, do you think ufshcd_err_handling_prepare()/unprepare()
need changes?  And ufshcd_recover_pm_error()?

And maybe ufshcd_auto_hibern8_update() when it is called from ufs_sysfs.c?

> ---
>  drivers/scsi/ufs/cdns-pltfrm.c |   2 +
>  drivers/scsi/ufs/tc-dwc-g210-pci.c |   2 +
>  drivers/scsi/ufs/ufs-exynos.c  |   2 +
>  drivers/scsi/ufs/ufs-hisi.c|   2 +
>  drivers/scsi/ufs/ufs-mediatek.c|   2 +
>  drivers/scsi/ufs/ufs-qcom.c|   2 +
>  drivers/scsi/ufs/ufs_bsg.c |   6 +-
>  drivers/scsi/ufs/ufshcd-pci.c  |  32 +-
>  drivers/scsi/ufs/ufshcd.c  | 583 
> +++--
>  drivers/scsi/ufs/ufshcd.h  |   7 +
>  include/trace/events/ufs.h |  20 ++
>  11 files changed, 478 insertions(+), 182 deletions(-)
> 
> diff --git a/drivers/scsi/ufs/cdns-pltfrm.c b/drivers/scsi/ufs/cdns-pltfrm.c
> index 149391f..3e70c23 100644
> --- a/drivers/scsi/ufs/cdns-pltfrm.c
> +++ b/drivers/scsi/ufs/cdns-pltfrm.c
> @@ -319,6 +319,8 @@ static const struct dev_pm_ops cdns_ufs_dev_pm_ops = {
>   .runtime_suspend = ufshcd_pltfrm_runtime_suspend,
>   .runtime_resume  = ufshcd_pltfrm_runtime_resume,
>   .runtime_idle= ufshcd_pltfrm_runtime_idle,
> + .prepare = ufshcd_suspend_prepare,
> + .complete   = ufshcd_resume_complete,
>  };
>  
>  static struct platform_driver cdns_ufs_pltfrm_driver = {
> diff --git a/drivers/scsi/ufs/tc-dwc-g210-pci.c 
> b/drivers/scsi/ufs/tc-dwc-g210-pci.c
> index 67a6a61..b01db12 100644
> --- a/drivers/scsi/ufs/tc-dwc-g210-pci.c
> +++ b/drivers/scsi/ufs/tc-dwc-g210-pci.c
> @@ -148,6 +148,8 @@ static const struct dev_pm_ops tc_dwc_g210_pci_pm_ops = {
>   .runtime_suspend = tc_dwc_g210_pci_runtime_suspend,
>   .runtime_resume  = tc_dwc_g210_pci_runtime_resume,
>   .runtime_idle= tc_dwc_g210_pci_runtime_idle,
> + .prepare = ufshcd_suspend_prepare,
> + .complete   = ufshcd_resume_complete,
>  };
>  
>  static const struct pci_device_id tc_dwc_g210_pci_tbl[] = {
> diff --git a/drivers/scsi/ufs/ufs-exynos.c b/drivers/scsi/ufs/ufs-exynos.c
> index 267943a1..45c0b02 100644
> --- a/drivers/scsi/ufs/ufs-exynos.c
> +++ b/drivers/scsi/ufs/ufs-exynos.c
> @@ -1268,6 +1268,8 @@ static const struct dev_pm_ops exynos_ufs_pm_ops = {
>   .runtime_suspend = ufshcd_pltfrm_runtime_suspend,
>   .runtime_resume  = ufshcd_pltfrm_runtime_resume,
>   .runtime_idle= ufshcd_pltfrm_runtime_idle,
> + .prepare = ufshcd_suspend_prepare,
> + .complete   = ufshcd_resume_complete,
>  };
>  
>  static struct platform_driver exynos_ufs_pltform = {
> diff --git a/drivers/scsi/ufs/ufs-hisi.c b/drivers/scsi/ufs/ufs-hisi.c
> index 0aa5813..d463b44 100644
> --- a/drivers/scsi/ufs/ufs-hisi.c
> +++ b/drivers/scsi/ufs/ufs-hisi.c
> @@ -574,6 +574,8 @@ static const struct dev_pm_ops ufs_hisi_pm_ops = {
>   .runtime_suspend = ufshcd_pltfrm_runtime_suspend,
>   .runtime_resume  = ufshcd_pltfrm_runtime_resume,
>   .runtime_idle= ufshcd_pltfrm_runtime_idle,
> + .prepare = ufshcd_suspend_prepare,
> + .complete   = ufshcd_resume_complete,
>  };
>  
>  static struct platform_driver ufs_hisi_pltform = {
> diff --git a/drivers/scsi/ufs/ufs-mediatek.c b/drivers/scsi/ufs/ufs-mediatek.c
> index c55202b..df1eabb 100644
> --- a/drivers/scsi/ufs/ufs-mediatek.c
> +++ b/drivers/scsi/ufs/ufs-mediatek.c
> @@ -1097,6 

[PATCH v10 1/2] scsi: ufs: Enable power management for wlun

2021-03-03 Thread Asutosh Das
During runtime-suspend of ufs host, the scsi devices are
already suspended and so are the queues associated with them.
But the ufs host sends SSU to wlun during its runtime-suspend.
During the process blk_queue_enter checks if the queue is not in
suspended state. If so, it waits for the queue to resume, and never
comes out of it.
The commit
(d55d15a33: scsi: block: Do not accept any requests while suspended)
adds the check if the queue is in suspended state in blk_queue_enter().

Call trace:
 __switch_to+0x174/0x2c4
 __schedule+0x478/0x764
 schedule+0x9c/0xe0
 blk_queue_enter+0x158/0x228
 blk_mq_alloc_request+0x40/0xa4
 blk_get_request+0x2c/0x70
 __scsi_execute+0x60/0x1c4
 ufshcd_set_dev_pwr_mode+0x124/0x1e4
 ufshcd_suspend+0x208/0x83c
 ufshcd_runtime_suspend+0x40/0x154
 ufshcd_pltfrm_runtime_suspend+0x14/0x20
 pm_generic_runtime_suspend+0x28/0x3c
 __rpm_callback+0x80/0x2a4
 rpm_suspend+0x308/0x614
 rpm_idle+0x158/0x228
 pm_runtime_work+0x84/0xac
 process_one_work+0x1f0/0x470
 worker_thread+0x26c/0x4c8
 kthread+0x13c/0x320
 ret_from_fork+0x10/0x18

Fix this by registering ufs device wlun as a scsi driver and
registering it for block runtime-pm. Also make this as a
supplier for all other luns. That way, this device wlun
suspends after all the consumers and resumes after
hba resumes.

Co-developed-by: Can Guo 
Signed-off-by: Can Guo 
Signed-off-by: Asutosh Das 
---
 drivers/scsi/ufs/cdns-pltfrm.c |   2 +
 drivers/scsi/ufs/tc-dwc-g210-pci.c |   2 +
 drivers/scsi/ufs/ufs-exynos.c  |   2 +
 drivers/scsi/ufs/ufs-hisi.c|   2 +
 drivers/scsi/ufs/ufs-mediatek.c|   2 +
 drivers/scsi/ufs/ufs-qcom.c|   2 +
 drivers/scsi/ufs/ufs_bsg.c |   6 +-
 drivers/scsi/ufs/ufshcd-pci.c  |  32 +-
 drivers/scsi/ufs/ufshcd.c  | 583 +++--
 drivers/scsi/ufs/ufshcd.h  |   7 +
 include/trace/events/ufs.h |  20 ++
 11 files changed, 478 insertions(+), 182 deletions(-)

diff --git a/drivers/scsi/ufs/cdns-pltfrm.c b/drivers/scsi/ufs/cdns-pltfrm.c
index 149391f..3e70c23 100644
--- a/drivers/scsi/ufs/cdns-pltfrm.c
+++ b/drivers/scsi/ufs/cdns-pltfrm.c
@@ -319,6 +319,8 @@ static const struct dev_pm_ops cdns_ufs_dev_pm_ops = {
.runtime_suspend = ufshcd_pltfrm_runtime_suspend,
.runtime_resume  = ufshcd_pltfrm_runtime_resume,
.runtime_idle= ufshcd_pltfrm_runtime_idle,
+   .prepare = ufshcd_suspend_prepare,
+   .complete   = ufshcd_resume_complete,
 };
 
 static struct platform_driver cdns_ufs_pltfrm_driver = {
diff --git a/drivers/scsi/ufs/tc-dwc-g210-pci.c 
b/drivers/scsi/ufs/tc-dwc-g210-pci.c
index 67a6a61..b01db12 100644
--- a/drivers/scsi/ufs/tc-dwc-g210-pci.c
+++ b/drivers/scsi/ufs/tc-dwc-g210-pci.c
@@ -148,6 +148,8 @@ static const struct dev_pm_ops tc_dwc_g210_pci_pm_ops = {
.runtime_suspend = tc_dwc_g210_pci_runtime_suspend,
.runtime_resume  = tc_dwc_g210_pci_runtime_resume,
.runtime_idle= tc_dwc_g210_pci_runtime_idle,
+   .prepare = ufshcd_suspend_prepare,
+   .complete   = ufshcd_resume_complete,
 };
 
 static const struct pci_device_id tc_dwc_g210_pci_tbl[] = {
diff --git a/drivers/scsi/ufs/ufs-exynos.c b/drivers/scsi/ufs/ufs-exynos.c
index 267943a1..45c0b02 100644
--- a/drivers/scsi/ufs/ufs-exynos.c
+++ b/drivers/scsi/ufs/ufs-exynos.c
@@ -1268,6 +1268,8 @@ static const struct dev_pm_ops exynos_ufs_pm_ops = {
.runtime_suspend = ufshcd_pltfrm_runtime_suspend,
.runtime_resume  = ufshcd_pltfrm_runtime_resume,
.runtime_idle= ufshcd_pltfrm_runtime_idle,
+   .prepare = ufshcd_suspend_prepare,
+   .complete   = ufshcd_resume_complete,
 };
 
 static struct platform_driver exynos_ufs_pltform = {
diff --git a/drivers/scsi/ufs/ufs-hisi.c b/drivers/scsi/ufs/ufs-hisi.c
index 0aa5813..d463b44 100644
--- a/drivers/scsi/ufs/ufs-hisi.c
+++ b/drivers/scsi/ufs/ufs-hisi.c
@@ -574,6 +574,8 @@ static const struct dev_pm_ops ufs_hisi_pm_ops = {
.runtime_suspend = ufshcd_pltfrm_runtime_suspend,
.runtime_resume  = ufshcd_pltfrm_runtime_resume,
.runtime_idle= ufshcd_pltfrm_runtime_idle,
+   .prepare = ufshcd_suspend_prepare,
+   .complete   = ufshcd_resume_complete,
 };
 
 static struct platform_driver ufs_hisi_pltform = {
diff --git a/drivers/scsi/ufs/ufs-mediatek.c b/drivers/scsi/ufs/ufs-mediatek.c
index c55202b..df1eabb 100644
--- a/drivers/scsi/ufs/ufs-mediatek.c
+++ b/drivers/scsi/ufs/ufs-mediatek.c
@@ -1097,6 +1097,8 @@ static const struct dev_pm_ops ufs_mtk_pm_ops = {
.runtime_suspend = ufshcd_pltfrm_runtime_suspend,
.runtime_resume  = ufshcd_pltfrm_runtime_resume,
.runtime_idle= ufshcd_pltfrm_runtime_idle,
+   .prepare = ufshcd_suspend_prepare,
+   .complete   = ufshcd_resume_complete,
 };
 
 static struct platform_driver ufs_mtk_pltform = {
diff --git a/drivers/scsi/ufs/ufs-qcom.c b/drivers/scsi/ufs/ufs-qcom.c
index