Re: Freezable workqueue blocks non-freezable workqueue during the system resume process

2016-03-19 Thread Tejun Heo
Hello, Jan, Alan.

On Tue, Mar 15, 2016 at 10:25:43AM +0100, Jan Kara wrote:
> > The kernel does suspend device drivers; that is, it invokes their
> > suspend callbacks.  But it doesn't "freeze" them in any sense.  Once a 
> > driver has been suspended, it assumes it won't receive any I/O requests 
> > until it has been resumed.  Therefore the kernel first has to prevent 
> > all the upper layers from generating such requests and/or sending them 
> > to the low-level drivers.
> 
> OK, so Tejun and you should talk together because you both seem to want
> something else... If I understand it right, Tejun wants suspended devices
> to just queue requests that have been submitted after these devices were
> suspended and complete them once they are resumed...

Yeah, I suppose that's why we have the code base we do now.  I don't
think freezing kernel threads is the right mechanism to plug IO
devices during suspend.  It's way too error-prone and causes a
dependency nightmare as it acts essentially as a system-wide lock.

More complex drivers already plug themselves which are necessary no
matter what as upper layers or some kthreads aren't the only sources
of commands to devices.  We can plug at block layer for IOs coming
down from higher layers.  We can even provide a mechanism to plug
certain kthreads if necessary but they should be contained in the
driver - e.g. the suspend callback specifically blocking certain
specific kthreads - instead of the vague "the system is generally
stopped now and it seems to work most of the time" that we're doing
now.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Freezable workqueue blocks non-freezable workqueue during the system resume process

2016-03-15 Thread Jan Kara
On Mon 14-03-16 10:37:22, Alan Stern wrote:
> On Mon, 14 Mar 2016, Jan Kara wrote:
> 
> > On Fri 11-03-16 12:56:10, Tejun Heo wrote:
> > > Hello, Jan.
> > > 
> > > On Thu, Mar 03, 2016 at 10:33:10AM +0100, Jan Kara wrote:
> > > > > Ugh... that's nasty.  I wonder whether the right thing to do is making
> > > > > writeback workers non-freezable.  IOs are supposed to be blocked from
> > > > > lower layer anyway.  Jan, what do you think?
> > > > 
> > > > Well no, at least currently IO is not blocked in lower layers AFAIK - 
> > > > for
> > > > that you'd need to freeze block devices & filesystems and there are 
> > > > issues
> > > 
> > > At least libata does and I think SCSI does too, but yeah, there
> > > probably are drivers which depend on block layer blocking IOs, which
> > > btw is a pretty fragile way to go about as upper layers might not be
> > > the only source of activities.
> > > 
> > > > with that (Jiri Kosina was the last one which was trying to make this 
> > > > work
> > > > AFAIR). And I think you need to stop writeback (and generally any IO) 
> > > > to be
> > > > generated so that it doesn't interact in a strange way with device 
> > > > drivers
> > > > being frozen. So IMO until suspend freezes filesystems & devices 
> > > > properly
> > > > you have to freeze writeback workqueue.
> 
> What do you mean by "freezes ... devices"?  Only a piece of code can be 
> frozen -- not a device.

By that I meant block device and filesystem freezing. That way filesystem
is frozen so that it doesn't submit any more IO to the device.

> The kernel does suspend device drivers; that is, it invokes their
> suspend callbacks.  But it doesn't "freeze" them in any sense.  Once a 
> driver has been suspended, it assumes it won't receive any I/O requests 
> until it has been resumed.  Therefore the kernel first has to prevent 
> all the upper layers from generating such requests and/or sending them 
> to the low-level drivers.

OK, so Tejun and you should talk together because you both seem to want
something else... If I understand it right, Tejun wants suspended devices
to just queue requests that have been submitted after these devices were
suspended and complete them once they are resumed...

> > > I still think the right thing to do is plugging that block layer or
> > > low level drivers.  It's like we're trying to plug multiple sources
> > > when we can plug the point where they come together anyway.
> > 
> > I agree that freezing writeback workers is a workaround for real issues at
> > best and ideally we shouldn't have to do that. But at least for now I had
> > the impression that it is needed for suspend to work reasonably reliably.
> 
> The design is not to plug low-level drivers, but instead to prevent
> them from receiving any requests by plugging or freezing high-level
> code.
> 
> It's pretty clear that we don't want to have ongoing I/O during a 
> system suspend, right?  And that means the I/O has to be prevented (or 
> "plugged", if you prefer) somewhere -- either at an upper layer or at a 
> lower layer.  There was a choice to be made, and the decision was to do 
> it at an upper layer.

I agree the IO has to be plugged somewhere. And Tejun seems to want to plug
it at lower layer...

Honza
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Freezable workqueue blocks non-freezable workqueue during the system resume process

2016-03-14 Thread Alan Stern
On Mon, 14 Mar 2016, Jan Kara wrote:

> On Fri 11-03-16 12:56:10, Tejun Heo wrote:
> > Hello, Jan.
> > 
> > On Thu, Mar 03, 2016 at 10:33:10AM +0100, Jan Kara wrote:
> > > > Ugh... that's nasty.  I wonder whether the right thing to do is making
> > > > writeback workers non-freezable.  IOs are supposed to be blocked from
> > > > lower layer anyway.  Jan, what do you think?
> > > 
> > > Well no, at least currently IO is not blocked in lower layers AFAIK - for
> > > that you'd need to freeze block devices & filesystems and there are issues
> > 
> > At least libata does and I think SCSI does too, but yeah, there
> > probably are drivers which depend on block layer blocking IOs, which
> > btw is a pretty fragile way to go about as upper layers might not be
> > the only source of activities.
> > 
> > > with that (Jiri Kosina was the last one which was trying to make this work
> > > AFAIR). And I think you need to stop writeback (and generally any IO) to 
> > > be
> > > generated so that it doesn't interact in a strange way with device drivers
> > > being frozen. So IMO until suspend freezes filesystems & devices properly
> > > you have to freeze writeback workqueue.

What do you mean by "freezes ... devices"?  Only a piece of code can be 
frozen -- not a device.

The kernel does suspend device drivers; that is, it invokes their
suspend callbacks.  But it doesn't "freeze" them in any sense.  Once a 
driver has been suspended, it assumes it won't receive any I/O requests 
until it has been resumed.  Therefore the kernel first has to prevent 
all the upper layers from generating such requests and/or sending them 
to the low-level drivers.

> > I still think the right thing to do is plugging that block layer or
> > low level drivers.  It's like we're trying to plug multiple sources
> > when we can plug the point where they come together anyway.
> 
> I agree that freezing writeback workers is a workaround for real issues at
> best and ideally we shouldn't have to do that. But at least for now I had
> the impression that it is needed for suspend to work reasonably reliably.

The design is not to plug low-level drivers, but instead to prevent
them from receiving any requests by plugging or freezing high-level
code.

It's pretty clear that we don't want to have ongoing I/O during a 
system suspend, right?  And that means the I/O has to be prevented (or 
"plugged", if you prefer) somewhere -- either at an upper layer or at a 
lower layer.  There was a choice to be made, and the decision was to do 
it at an upper layer.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Freezable workqueue blocks non-freezable workqueue during the system resume process

2016-03-14 Thread Jan Kara
On Fri 11-03-16 12:56:10, Tejun Heo wrote:
> Hello, Jan.
> 
> On Thu, Mar 03, 2016 at 10:33:10AM +0100, Jan Kara wrote:
> > > Ugh... that's nasty.  I wonder whether the right thing to do is making
> > > writeback workers non-freezable.  IOs are supposed to be blocked from
> > > lower layer anyway.  Jan, what do you think?
> > 
> > Well no, at least currently IO is not blocked in lower layers AFAIK - for
> > that you'd need to freeze block devices & filesystems and there are issues
> 
> At least libata does and I think SCSI does too, but yeah, there
> probably are drivers which depend on block layer blocking IOs, which
> btw is a pretty fragile way to go about as upper layers might not be
> the only source of activities.
> 
> > with that (Jiri Kosina was the last one which was trying to make this work
> > AFAIR). And I think you need to stop writeback (and generally any IO) to be
> > generated so that it doesn't interact in a strange way with device drivers
> > being frozen. So IMO until suspend freezes filesystems & devices properly
> > you have to freeze writeback workqueue.
> 
> I still think the right thing to do is plugging that block layer or
> low level drivers.  It's like we're trying to plug multiple sources
> when we can plug the point where they come together anyway.

I agree that freezing writeback workers is a workaround for real issues at
best and ideally we shouldn't have to do that. But at least for now I had
the impression that it is needed for suspend to work reasonably reliably.

Honza
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Freezable workqueue blocks non-freezable workqueue during the system resume process

2016-03-11 Thread Tejun Heo
Hello, Jan.

On Thu, Mar 03, 2016 at 10:33:10AM +0100, Jan Kara wrote:
> > Ugh... that's nasty.  I wonder whether the right thing to do is making
> > writeback workers non-freezable.  IOs are supposed to be blocked from
> > lower layer anyway.  Jan, what do you think?
> 
> Well no, at least currently IO is not blocked in lower layers AFAIK - for
> that you'd need to freeze block devices & filesystems and there are issues

At least libata does and I think SCSI does too, but yeah, there
probably are drivers which depend on block layer blocking IOs, which
btw is a pretty fragile way to go about as upper layers might not be
the only source of activities.

> with that (Jiri Kosina was the last one which was trying to make this work
> AFAIR). And I think you need to stop writeback (and generally any IO) to be
> generated so that it doesn't interact in a strange way with device drivers
> being frozen. So IMO until suspend freezes filesystems & devices properly
> you have to freeze writeback workqueue.

I still think the right thing to do is plugging that block layer or
low level drivers.  It's like we're trying to plug multiple sources
when we can plug the point where they come together anyway.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Freezable workqueue blocks non-freezable workqueue during the system resume process

2016-03-03 Thread Jan Kara
Hello,

On Wed 02-03-16 11:00:58, Tejun Heo wrote:
> On Fri, Feb 26, 2016 at 02:19:20PM +0800, Peter Chen wrote:
> > On Thu, Feb 25, 2016 at 05:01:12PM -0500, Tejun Heo wrote:
> > > Hello, Peter.
> > > 
> > > On Wed, Feb 24, 2016 at 03:24:30PM +0800, Peter Chen wrote:
> > > > > You might want to complain to the block-layer people about this.  I 
> > > > > don't know if anything can be done to fix it.
> > > > > 
> > > > > Or maybe flush_work and flush_delayed_work can be changed to avoid 
> > > > > blocking if the workqueue is frozen.  Tejun?
> > > > > 
> > > > 
> > > > I have a patch to show the root cause of this issue.
> > > > 
> > > > http://www.spinics.net/lists/linux-usb/msg136815.html
> > > 
> > > I don't get it.  Why would it deadlock?  Shouldn't things get rolling
> > > once the workqueues are thawed?
> > 
> > The workqueue writeback can't be thawed due to driver's resume
> > (dpm_complete) is lock nested, and can't be finished.
> 
> Ugh... that's nasty.  I wonder whether the right thing to do is making
> writeback workers non-freezable.  IOs are supposed to be blocked from
> lower layer anyway.  Jan, what do you think?

Well no, at least currently IO is not blocked in lower layers AFAIK - for
that you'd need to freeze block devices & filesystems and there are issues
with that (Jiri Kosina was the last one which was trying to make this work
AFAIR). And I think you need to stop writeback (and generally any IO) to be
generated so that it doesn't interact in a strange way with device drivers
being frozen. So IMO until suspend freezes filesystems & devices properly
you have to freeze writeback workqueue.

Honza
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Freezable workqueue blocks non-freezable workqueue during the system resume process

2016-03-02 Thread Tejun Heo
Hello,

(cc'ing Jan)

On Fri, Feb 26, 2016 at 02:19:20PM +0800, Peter Chen wrote:
> On Thu, Feb 25, 2016 at 05:01:12PM -0500, Tejun Heo wrote:
> > Hello, Peter.
> > 
> > On Wed, Feb 24, 2016 at 03:24:30PM +0800, Peter Chen wrote:
> > > > You might want to complain to the block-layer people about this.  I 
> > > > don't know if anything can be done to fix it.
> > > > 
> > > > Or maybe flush_work and flush_delayed_work can be changed to avoid 
> > > > blocking if the workqueue is frozen.  Tejun?
> > > > 
> > > 
> > > I have a patch to show the root cause of this issue.
> > > 
> > > http://www.spinics.net/lists/linux-usb/msg136815.html
> > 
> > I don't get it.  Why would it deadlock?  Shouldn't things get rolling
> > once the workqueues are thawed?
> 
> The workqueue writeback can't be thawed due to driver's resume
> (dpm_complete) is lock nested, and can't be finished.

Ugh... that's nasty.  I wonder whether the right thing to do is making
writeback workers non-freezable.  IOs are supposed to be blocked from
lower layer anyway.  Jan, what do you think?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Freezable workqueue blocks non-freezable workqueue during the system resume process

2016-02-25 Thread Peter Chen
On Thu, Feb 25, 2016 at 05:01:12PM -0500, Tejun Heo wrote:
> Hello, Peter.
> 
> On Wed, Feb 24, 2016 at 03:24:30PM +0800, Peter Chen wrote:
> > > You might want to complain to the block-layer people about this.  I 
> > > don't know if anything can be done to fix it.
> > > 
> > > Or maybe flush_work and flush_delayed_work can be changed to avoid 
> > > blocking if the workqueue is frozen.  Tejun?
> > > 
> > 
> > I have a patch to show the root cause of this issue.
> > 
> > http://www.spinics.net/lists/linux-usb/msg136815.html
> 
> I don't get it.  Why would it deadlock?  Shouldn't things get rolling
> once the workqueues are thawed?
> 

Hi Tejun,

The workqueue writeback can't be thawed due to driver's resume
(dpm_complete) is lock nested, and can't be finished.

-- 

Best Regards,
Peter Chen
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Freezable workqueue blocks non-freezable workqueue during the system resume process

2016-02-25 Thread Tejun Heo
Hello, Peter.

On Wed, Feb 24, 2016 at 03:24:30PM +0800, Peter Chen wrote:
> > You might want to complain to the block-layer people about this.  I 
> > don't know if anything can be done to fix it.
> > 
> > Or maybe flush_work and flush_delayed_work can be changed to avoid 
> > blocking if the workqueue is frozen.  Tejun?
> > 
> 
> I have a patch to show the root cause of this issue.
> 
> http://www.spinics.net/lists/linux-usb/msg136815.html

I don't get it.  Why would it deadlock?  Shouldn't things get rolling
once the workqueues are thawed?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Freezable workqueue blocks non-freezable workqueue during the system resume process

2016-02-23 Thread Peter Chen
On Tue, Feb 23, 2016 at 10:34:09AM -0500, Alan Stern wrote:
> On Tue, 23 Feb 2016, Peter Chen wrote:
> 
> > Hi Tejun Heo and Florian Mickler,
> > 
> > I have a question that during the system resume process, the freezable
> > workqueue can be thawed if there is a non-freezable workqueue is
> > blocked (At uninterruptable state)?
> > 
> > My case like below, I have a USB OTG (Micro-AB) cable is at USB
> > Micro-B port, and there is a USB driver on it, and un-plug this
> > cable can wake up system from the suspend. There is a non-freezable
> > workqueue ci_otg will be scheduled after disconnecting OTG cable,
> > and in its worker ci_otg_work, it will try to disconnect USB drive,
> > and flush disk information.
> 
> These operations probably are not safe while the system is resuming.  
> It might be best to make them wait until the resume is finished.
> 
> > But flush disk information is done by
> > freezable workqueue writeback, it seeems workqueue writeback is
> > never got chance to execute, the workqueue ci_otg is waiting there
> > forever, and the system is deadlock.
> 
> > Both change workqueue ci_otg as freezable or change workqueue writeback
> > as non-freezable can fix this problem.
> 
> It sounds like making ci_otg freezable is the easiest solution.
> 
> > Please ignore it, the system is locked at driver's resume,
> > maybe at scsi or usb driver, so of cos, the freezable processes
> > can't be thawed.
>  
> > > [  555.263177] [] (flush_work) from [] 
> > > (flush_delayed_work+0x48/0x4c)   
> > > [  555.271106]  r8:ed5b5000 r7:c0b38a3c r6:eea439cc r5:eea4372c 
> > > r4:eea4372c
> > > [  555.277958] [] (flush_delayed_work) from [] 
> > > (bdi_unregister+0x84/0xec)
> > > [  555.286236]  r4:eea43520 r3:2153
> > > [  555.289885] [] (bdi_unregister) from [] 
> > > (blk_cleanup_queue+0x180/0x29c)
> > > [  555.298250]  r5:eea43808 r4:eea43400
> 
> You might want to complain to the block-layer people about this.  I 
> don't know if anything can be done to fix it.
> 
> Or maybe flush_work and flush_delayed_work can be changed to avoid 
> blocking if the workqueue is frozen.  Tejun?
> 

I have a patch to show the root cause of this issue.

http://www.spinics.net/lists/linux-usb/msg136815.html

-- 

Best Regards,
Peter Chen
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Freezable workqueue blocks non-freezable workqueue during the system resume process

2016-02-23 Thread Alan Stern
On Tue, 23 Feb 2016, Peter Chen wrote:

> Hi Tejun Heo and Florian Mickler,
> 
> I have a question that during the system resume process, the freezable
> workqueue can be thawed if there is a non-freezable workqueue is
> blocked (At uninterruptable state)?
> 
> My case like below, I have a USB OTG (Micro-AB) cable is at USB
> Micro-B port, and there is a USB driver on it, and un-plug this
> cable can wake up system from the suspend. There is a non-freezable
> workqueue ci_otg will be scheduled after disconnecting OTG cable,
> and in its worker ci_otg_work, it will try to disconnect USB drive,
> and flush disk information.

These operations probably are not safe while the system is resuming.  
It might be best to make them wait until the resume is finished.

> But flush disk information is done by
> freezable workqueue writeback, it seeems workqueue writeback is
> never got chance to execute, the workqueue ci_otg is waiting there
> forever, and the system is deadlock.

> Both change workqueue ci_otg as freezable or change workqueue writeback
> as non-freezable can fix this problem.

It sounds like making ci_otg freezable is the easiest solution.

> Please ignore it, the system is locked at driver's resume,
> maybe at scsi or usb driver, so of cos, the freezable processes
> can't be thawed.
 
> > [  555.263177] [] (flush_work) from [] 
> > (flush_delayed_work+0x48/0x4c)   
> > [  555.271106]  r8:ed5b5000 r7:c0b38a3c r6:eea439cc r5:eea4372c r4:eea4372c
> > [  555.277958] [] (flush_delayed_work) from [] 
> > (bdi_unregister+0x84/0xec)
> > [  555.286236]  r4:eea43520 r3:2153
> > [  555.289885] [] (bdi_unregister) from [] 
> > (blk_cleanup_queue+0x180/0x29c)
> > [  555.298250]  r5:eea43808 r4:eea43400

You might want to complain to the block-layer people about this.  I 
don't know if anything can be done to fix it.

Or maybe flush_work and flush_delayed_work can be changed to avoid 
blocking if the workqueue is frozen.  Tejun?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Freezable workqueue blocks non-freezable workqueue during the system resume process

2016-02-23 Thread Peter Chen
On Tue, Feb 23, 2016 at 11:20:56AM +0800, Peter Chen wrote:
> Hi Tejun Heo and Florian Mickler,
> 
> I have a question that during the system resume process, the freezable
> workqueue can be thawed if there is a non-freezable workqueue is
> blocked (At uninterruptable state)?
> 
> My case like below, I have a USB OTG (Micro-AB) cable is at USB
> Micro-B port, and there is a USB driver on it, and un-plug this
> cable can wake up system from the suspend. There is a non-freezable
> workqueue ci_otg will be scheduled after disconnecting OTG cable,
> and in its worker ci_otg_work, it will try to disconnect USB drive,
> and flush disk information. But flush disk information is done by
> freezable workqueue writeback, it seeems workqueue writeback is
> never got chance to execute, the workqueue ci_otg is waiting there
> forever, and the system is deadlock.
> 

> Both change workqueue ci_otg as freezable or change workqueue writeback
> as non-freezable can fix this problem.
> 
Please ignore it, the system is locked at driver's resume,
maybe at scsi or usb driver, so of cos, the freezable processes
can't be thawed.

[  553.429383] sh  D c07de74c 0   694691 0x
[  553.435801] Backtrace:
[  553.438295] [] (__schedule) from [] (schedule+0x48/0xa0)
[  553.445358]  r10:edd3c054 r9:edd3c078 r8:edddbd50 r7:edcbbc00 r6:c1377c34 
r5:6153
[  553.453313]  r4:eddda000
[  553.455896] [] (schedule) from [] 
(schedule_preempt_disabled+0x10/0x14)
[  553.464261]  r4:edd3c058 r3:000a
[  553.467910] [] (schedule_preempt_disabled) from [] 
(mutex_lock_nested+0x1a0/0x3e8)
[  553.477254] [] (mutex_lock_nested) from [] 
(dpm_complete+0xc0/0x1b0)
[  553.485358]  r10:00561408 r9:edd3c054 r8:c0b4863c r7:edddbd90 r6:c0b485d8 
r5:edd3c020
[  553.493313]  r4:edd3c0d0
[  553.495896] [] (dpm_complete) from [] 
(dpm_resume_end+0x1c/0x20)
[  553.503652]  r9: r8:c0b1a9d0 r7:c1334ec0 r6:c1334edc r5:0003 
r4:0010
[  553.511544] [] (dpm_resume_end) from [] 
(suspend_devices_and_enter+0x158/0x504)
[  553.520604]  r4: r3:c1334efc
[  553.524250] [] (suspend_devices_and_enter) from [] 
(pm_suspend+0x234/0x2cc)
[  553.532961]  r10:00561408 r9:ed6b7300 r8:0004 r7:c1334eec r6: 
r5:c1334ee8
[  553.540914]  r4:0003
[  553.543493] [] (pm_suspend) from [] 
(state_store+0x6c/0xc0)
[  553.550815]  r6:0003 r5:c09b2ca4 r4:0003 r3:006d
[  553.556599] [] (state_store) from [] 
(kobj_attr_store+0x1c/0x28)
[  553.564358]  r9:0004 r8:c0010004 r7:edf9480c r6:ed6b7300 r5:edf94800 
r4:0004
[  553.572258] [] (kobj_attr_store) from [] 
(sysfs_kf_write+0x54/0x58)
[  553.580295] [] (sysfs_kf_write) from [] 
(kernfs_fop_write+0xd8/0x1fc)
[  553.588487]  r6:ed6b7300 r5: r4: r3:c0188580
[  553.594262] [] (kernfs_fop_write) from [] 
(__vfs_write+0x2c/0xe0)
[  553.602105]  r10: r9:eddda000 r8:c0010004 r7:edddbf80 r6:00561408 
r5:edddbf80
[  553.610060]  r4:ed445280
[  553.612641] [] (__vfs_write) from [] 
(vfs_write+0x98/0x16c)
[  553.619963]  r8:c0010004 r7:edddbf80 r6:00561408 r5:0004 r4:ed445280
[  553.626800] [] (vfs_write) from [] (SyS_write+0x4c/0xa8)
[  553.633861]  r8:c0010004 r7:00561408 r6:0004 r5:ed445280 r4:ed445280
[  553.640705] [] (SyS_write) from [] 
(ret_fast_syscall+0x0/0x1c)
[  553.648291]  r7:0004 r6:b6f27d60 r5:00561408 r4:0004

> The call stack like below:
> 
> [  546.987379] writeback   S c07de74c 012  2 0x
> [  546.993804] Backtrace:
> [  546.996307] [] (__schedule) from [] 
> (schedule+0x48/0xa0)
> [  547.003370]  r10:ef14bc80 r9:ef14ca00 r8: r7:c0045c90 r6:ef14bc80 
> r5:ef14bc98
> [  547.011325]  r4:ef164000
> [  547.013907] [] (schedule) from [] 
> (rescuer_thread+0x290/0x308)
> [  547.021490]  r4: r3:0008
> [  547.025136] [] (rescuer_thread) from [] 
> (kthread+0xdc/0xf8)
> [  547.032459]  r10: r9: r8: r7:c0045c90 r6:ef14bc80 
> r5:ef1526c0
> [  547.040412]  r4:
> [  547.042993] [] (kthread) from [] 
> (ret_from_fork+0x14/0x24)
> [  547.050229]  r7: r6: r5:c004b9d8 r4:ef1526c0
> [  555.178869] kworker/u2:13   D c07de74c 0   826  2 0x
> 
> [  555.185310] Workqueue: ci_otg ci_otg_work
> [  555.189353] Backtrace:
> [  555.191849] [] (__schedule) from [] 
> (schedule+0x48/0xa0)
> [  555.198912]  r10:ee471ba0 r9: r8: r7:0002 r6:ee47 
> r5:ee471ba4
> [  555.206867]  r4:ee47
> [  555.209453] [] (schedule) from [] 
> (schedule_timeout+0x15c/0x1e0)
> [  555.217212]  r4:7fff r3:edc2b000
> [  555.220862] [] (schedule_timeout) from [] 
> (wait_for_common+0x94/0x144)
> [  555.229140]  r8: r7:0002 r6:ee47 r5:ee471ba4 r4:7fff
> [  555.235980] [] (wait_for_common) from [] 
> (wait_for_completion+0x18/0x1c)
> [  555.244430]  r10:0001 r9:c0b5563c r8:c0042e48 r7:ef086000 r6:eea4372c 
> r5:ef131b00
> [  555.252383]  r4:
> [  555.254970] [] (wait_for_completion) from [] 
> 

Freezable workqueue blocks non-freezable workqueue during the system resume process

2016-02-22 Thread Peter Chen
Hi Tejun Heo and Florian Mickler,

I have a question that during the system resume process, the freezable
workqueue can be thawed if there is a non-freezable workqueue is
blocked (At uninterruptable state)?

My case like below, I have a USB OTG (Micro-AB) cable is at USB
Micro-B port, and there is a USB driver on it, and un-plug this
cable can wake up system from the suspend. There is a non-freezable
workqueue ci_otg will be scheduled after disconnecting OTG cable,
and in its worker ci_otg_work, it will try to disconnect USB drive,
and flush disk information. But flush disk information is done by
freezable workqueue writeback, it seeems workqueue writeback is
never got chance to execute, the workqueue ci_otg is waiting there
forever, and the system is deadlock.

Both change workqueue ci_otg as freezable or change workqueue writeback
as non-freezable can fix this problem.

The call stack like below:

[  546.987379] writeback   S c07de74c 012  2 0x
[  546.993804] Backtrace:
[  546.996307] [] (__schedule) from [] (schedule+0x48/0xa0)
[  547.003370]  r10:ef14bc80 r9:ef14ca00 r8: r7:c0045c90 r6:ef14bc80 
r5:ef14bc98
[  547.011325]  r4:ef164000
[  547.013907] [] (schedule) from [] 
(rescuer_thread+0x290/0x308)
[  547.021490]  r4: r3:0008
[  547.025136] [] (rescuer_thread) from [] 
(kthread+0xdc/0xf8)
[  547.032459]  r10: r9: r8: r7:c0045c90 r6:ef14bc80 
r5:ef1526c0
[  547.040412]  r4:
[  547.042993] [] (kthread) from [] 
(ret_from_fork+0x14/0x24)
[  547.050229]  r7: r6: r5:c004b9d8 r4:ef1526c0
[  555.178869] kworker/u2:13   D c07de74c 0   826  2 0x

[  555.185310] Workqueue: ci_otg ci_otg_work
[  555.189353] Backtrace:
[  555.191849] [] (__schedule) from [] (schedule+0x48/0xa0)
[  555.198912]  r10:ee471ba0 r9: r8: r7:0002 r6:ee47 
r5:ee471ba4
[  555.206867]  r4:ee47
[  555.209453] [] (schedule) from [] 
(schedule_timeout+0x15c/0x1e0)
[  555.217212]  r4:7fff r3:edc2b000
[  555.220862] [] (schedule_timeout) from [] 
(wait_for_common+0x94/0x144)
[  555.229140]  r8: r7:0002 r6:ee47 r5:ee471ba4 r4:7fff
[  555.235980] [] (wait_for_common) from [] 
(wait_for_completion+0x18/0x1c)
[  555.244430]  r10:0001 r9:c0b5563c r8:c0042e48 r7:ef086000 r6:eea4372c 
r5:ef131b00
[  555.252383]  r4:
[  555.254970] [] (wait_for_completion) from [] 
(flush_work+0x19c/0x234)
[  555.263177] [] (flush_work) from [] 
(flush_delayed_work+0x48/0x4c)
[  555.271106]  r8:ed5b5000 r7:c0b38a3c r6:eea439cc r5:eea4372c r4:eea4372c
[  555.277958] [] (flush_delayed_work) from [] 
(bdi_unregister+0x84/0xec)
[  555.286236]  r4:eea43520 r3:2153
[  555.289885] [] (bdi_unregister) from [] 
(blk_cleanup_queue+0x180/0x29c)
[  555.298250]  r5:eea43808 r4:eea43400
[  555.301909] [] (blk_cleanup_queue) from [] 
(__scsi_remove_device+0x48/0xb8)
[  555.310623]  r7: r6:2153 r5:ededa950 r4:ededa800
[  555.316403] [] (__scsi_remove_device) from [] 
(scsi_forget_host+0x64/0x68)
[  555.325028]  r5:ededa800 r4:ed5b5000
[  555.328689] [] (scsi_forget_host) from [] 
(scsi_remove_host+0x78/0x104)
[  555.337054]  r5:ed5b5068 r4:ed5b5000
[  555.340709] [] (scsi_remove_host) from [] 
(usb_stor_disconnect+0x50/0xb4)
[  555.349247]  r6:ed5b56e4 r5:ed5b5818 r4:ed5b5690 r3:0008
[  555.355025] [] (usb_stor_disconnect) from [] 
(usb_unbind_interface+0x78/0x25c)
[  555.363997]  r8:c13919b4 r7:edd3c000 r6:edd3c020 r5:ee551c68 r4:ee551c00 
r3:c04cdf7c
[  555.371892] [] (usb_unbind_interface) from [] 
(__device_release_driver+0x8c/0x118)
[  555.381213]  r10:0001 r9:edd90c00 r8:c13919b4 r7:ee551c68 r6:c0b546e0 
r5:c0b5563c
[  555.389167]  r4:edd3c020
[  555.391752] [] (__device_release_driver) from [] 
(device_release_driver+0x28/0x34)
[  555.401071]  r5:edd3c020 r4:edd3c054
[  555.404721] [] (device_release_driver) from [] 
(bus_remove_device+0xe0/0x110)
[  555.413607]  r5:edd3c020 r4:ef17f04c
[  555.417253] [] (bus_remove_device) from [] 
(device_del+0x114/0x21c)
[  555.425270]  r6:edd3c028 r5:edd3c020 r4:ee551c00 r3:
[  555.431045] [] (device_del) from [] 
(usb_disable_device+0xa4/0x1e8)
[  555.439061]  r8:edd3c000 r7:eded8000 r6: r5:0001 r4:ee551c00
[  555.445906] [] (usb_disable_device) from [] 
(usb_disconnect+0x74/0x224)
[  555.454271]  r9:edd90c00 r8:ee551000 r7:ee551c68 r6:ee551c9c r5:ee551c00 
r4:0001
[  555.462156] [] (usb_disconnect) from [] 
(usb_disconnect+0x1d8/0x224)
[  555.470259]  r10:0001 r9:edd9 r8:ee471e2c r7:ee551468 r6:ee55149c 
r5:ee551400
[  555.478213]  r4:0001
[  555.480797] [] (usb_disconnect) from [] 
(usb_remove_hcd+0xa0/0x1ac)
[  555.488813]  r10:0001 r9:ee471eb0 r8: r7:ef3d9500 r6:eded810c 
r5:eded80b0
[  555.496765]  r4:eded8000
[  555.499351] [] (usb_remove_hcd) from [] 
(host_stop+0x28/0x64)
[  555.506847]  r6:eeb50010 r5:eded8000 r4:eeb51010
[  555.511563] [] (host_stop) from [] 
(ci_otg_work+0xc4/0x124)
[