Re: Freezable workqueue blocks non-freezable workqueue during the system resume process
Hello, Jan, Alan. On Tue, Mar 15, 2016 at 10:25:43AM +0100, Jan Kara wrote: > > The kernel does suspend device drivers; that is, it invokes their > > suspend callbacks. But it doesn't "freeze" them in any sense. Once a > > driver has been suspended, it assumes it won't receive any I/O requests > > until it has been resumed. Therefore the kernel first has to prevent > > all the upper layers from generating such requests and/or sending them > > to the low-level drivers. > > OK, so Tejun and you should talk together because you both seem to want > something else... If I understand it right, Tejun wants suspended devices > to just queue requests that have been submitted after these devices were > suspended and complete them once they are resumed... Yeah, I suppose that's why we have the code base we do now. I don't think freezing kernel threads is the right mechanism to plug IO devices during suspend. It's way too error-prone and causes a dependency nightmare as it acts essentially as a system-wide lock. More complex drivers already plug themselves which are necessary no matter what as upper layers or some kthreads aren't the only sources of commands to devices. We can plug at block layer for IOs coming down from higher layers. We can even provide a mechanism to plug certain kthreads if necessary but they should be contained in the driver - e.g. the suspend callback specifically blocking certain specific kthreads - instead of the vague "the system is generally stopped now and it seems to work most of the time" that we're doing now. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Freezable workqueue blocks non-freezable workqueue during the system resume process
On Mon 14-03-16 10:37:22, Alan Stern wrote: > On Mon, 14 Mar 2016, Jan Kara wrote: > > > On Fri 11-03-16 12:56:10, Tejun Heo wrote: > > > Hello, Jan. > > > > > > On Thu, Mar 03, 2016 at 10:33:10AM +0100, Jan Kara wrote: > > > > > Ugh... that's nasty. I wonder whether the right thing to do is making > > > > > writeback workers non-freezable. IOs are supposed to be blocked from > > > > > lower layer anyway. Jan, what do you think? > > > > > > > > Well no, at least currently IO is not blocked in lower layers AFAIK - > > > > for > > > > that you'd need to freeze block devices & filesystems and there are > > > > issues > > > > > > At least libata does and I think SCSI does too, but yeah, there > > > probably are drivers which depend on block layer blocking IOs, which > > > btw is a pretty fragile way to go about as upper layers might not be > > > the only source of activities. > > > > > > > with that (Jiri Kosina was the last one which was trying to make this > > > > work > > > > AFAIR). And I think you need to stop writeback (and generally any IO) > > > > to be > > > > generated so that it doesn't interact in a strange way with device > > > > drivers > > > > being frozen. So IMO until suspend freezes filesystems & devices > > > > properly > > > > you have to freeze writeback workqueue. > > What do you mean by "freezes ... devices"? Only a piece of code can be > frozen -- not a device. By that I meant block device and filesystem freezing. That way filesystem is frozen so that it doesn't submit any more IO to the device. > The kernel does suspend device drivers; that is, it invokes their > suspend callbacks. But it doesn't "freeze" them in any sense. Once a > driver has been suspended, it assumes it won't receive any I/O requests > until it has been resumed. Therefore the kernel first has to prevent > all the upper layers from generating such requests and/or sending them > to the low-level drivers. OK, so Tejun and you should talk together because you both seem to want something else... If I understand it right, Tejun wants suspended devices to just queue requests that have been submitted after these devices were suspended and complete them once they are resumed... > > > I still think the right thing to do is plugging that block layer or > > > low level drivers. It's like we're trying to plug multiple sources > > > when we can plug the point where they come together anyway. > > > > I agree that freezing writeback workers is a workaround for real issues at > > best and ideally we shouldn't have to do that. But at least for now I had > > the impression that it is needed for suspend to work reasonably reliably. > > The design is not to plug low-level drivers, but instead to prevent > them from receiving any requests by plugging or freezing high-level > code. > > It's pretty clear that we don't want to have ongoing I/O during a > system suspend, right? And that means the I/O has to be prevented (or > "plugged", if you prefer) somewhere -- either at an upper layer or at a > lower layer. There was a choice to be made, and the decision was to do > it at an upper layer. I agree the IO has to be plugged somewhere. And Tejun seems to want to plug it at lower layer... Honza -- Jan KaraSUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Freezable workqueue blocks non-freezable workqueue during the system resume process
On Mon, 14 Mar 2016, Jan Kara wrote: > On Fri 11-03-16 12:56:10, Tejun Heo wrote: > > Hello, Jan. > > > > On Thu, Mar 03, 2016 at 10:33:10AM +0100, Jan Kara wrote: > > > > Ugh... that's nasty. I wonder whether the right thing to do is making > > > > writeback workers non-freezable. IOs are supposed to be blocked from > > > > lower layer anyway. Jan, what do you think? > > > > > > Well no, at least currently IO is not blocked in lower layers AFAIK - for > > > that you'd need to freeze block devices & filesystems and there are issues > > > > At least libata does and I think SCSI does too, but yeah, there > > probably are drivers which depend on block layer blocking IOs, which > > btw is a pretty fragile way to go about as upper layers might not be > > the only source of activities. > > > > > with that (Jiri Kosina was the last one which was trying to make this work > > > AFAIR). And I think you need to stop writeback (and generally any IO) to > > > be > > > generated so that it doesn't interact in a strange way with device drivers > > > being frozen. So IMO until suspend freezes filesystems & devices properly > > > you have to freeze writeback workqueue. What do you mean by "freezes ... devices"? Only a piece of code can be frozen -- not a device. The kernel does suspend device drivers; that is, it invokes their suspend callbacks. But it doesn't "freeze" them in any sense. Once a driver has been suspended, it assumes it won't receive any I/O requests until it has been resumed. Therefore the kernel first has to prevent all the upper layers from generating such requests and/or sending them to the low-level drivers. > > I still think the right thing to do is plugging that block layer or > > low level drivers. It's like we're trying to plug multiple sources > > when we can plug the point where they come together anyway. > > I agree that freezing writeback workers is a workaround for real issues at > best and ideally we shouldn't have to do that. But at least for now I had > the impression that it is needed for suspend to work reasonably reliably. The design is not to plug low-level drivers, but instead to prevent them from receiving any requests by plugging or freezing high-level code. It's pretty clear that we don't want to have ongoing I/O during a system suspend, right? And that means the I/O has to be prevented (or "plugged", if you prefer) somewhere -- either at an upper layer or at a lower layer. There was a choice to be made, and the decision was to do it at an upper layer. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Freezable workqueue blocks non-freezable workqueue during the system resume process
On Fri 11-03-16 12:56:10, Tejun Heo wrote: > Hello, Jan. > > On Thu, Mar 03, 2016 at 10:33:10AM +0100, Jan Kara wrote: > > > Ugh... that's nasty. I wonder whether the right thing to do is making > > > writeback workers non-freezable. IOs are supposed to be blocked from > > > lower layer anyway. Jan, what do you think? > > > > Well no, at least currently IO is not blocked in lower layers AFAIK - for > > that you'd need to freeze block devices & filesystems and there are issues > > At least libata does and I think SCSI does too, but yeah, there > probably are drivers which depend on block layer blocking IOs, which > btw is a pretty fragile way to go about as upper layers might not be > the only source of activities. > > > with that (Jiri Kosina was the last one which was trying to make this work > > AFAIR). And I think you need to stop writeback (and generally any IO) to be > > generated so that it doesn't interact in a strange way with device drivers > > being frozen. So IMO until suspend freezes filesystems & devices properly > > you have to freeze writeback workqueue. > > I still think the right thing to do is plugging that block layer or > low level drivers. It's like we're trying to plug multiple sources > when we can plug the point where they come together anyway. I agree that freezing writeback workers is a workaround for real issues at best and ideally we shouldn't have to do that. But at least for now I had the impression that it is needed for suspend to work reasonably reliably. Honza -- Jan KaraSUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Freezable workqueue blocks non-freezable workqueue during the system resume process
Hello, Jan. On Thu, Mar 03, 2016 at 10:33:10AM +0100, Jan Kara wrote: > > Ugh... that's nasty. I wonder whether the right thing to do is making > > writeback workers non-freezable. IOs are supposed to be blocked from > > lower layer anyway. Jan, what do you think? > > Well no, at least currently IO is not blocked in lower layers AFAIK - for > that you'd need to freeze block devices & filesystems and there are issues At least libata does and I think SCSI does too, but yeah, there probably are drivers which depend on block layer blocking IOs, which btw is a pretty fragile way to go about as upper layers might not be the only source of activities. > with that (Jiri Kosina was the last one which was trying to make this work > AFAIR). And I think you need to stop writeback (and generally any IO) to be > generated so that it doesn't interact in a strange way with device drivers > being frozen. So IMO until suspend freezes filesystems & devices properly > you have to freeze writeback workqueue. I still think the right thing to do is plugging that block layer or low level drivers. It's like we're trying to plug multiple sources when we can plug the point where they come together anyway. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Freezable workqueue blocks non-freezable workqueue during the system resume process
Hello, On Wed 02-03-16 11:00:58, Tejun Heo wrote: > On Fri, Feb 26, 2016 at 02:19:20PM +0800, Peter Chen wrote: > > On Thu, Feb 25, 2016 at 05:01:12PM -0500, Tejun Heo wrote: > > > Hello, Peter. > > > > > > On Wed, Feb 24, 2016 at 03:24:30PM +0800, Peter Chen wrote: > > > > > You might want to complain to the block-layer people about this. I > > > > > don't know if anything can be done to fix it. > > > > > > > > > > Or maybe flush_work and flush_delayed_work can be changed to avoid > > > > > blocking if the workqueue is frozen. Tejun? > > > > > > > > > > > > > I have a patch to show the root cause of this issue. > > > > > > > > http://www.spinics.net/lists/linux-usb/msg136815.html > > > > > > I don't get it. Why would it deadlock? Shouldn't things get rolling > > > once the workqueues are thawed? > > > > The workqueue writeback can't be thawed due to driver's resume > > (dpm_complete) is lock nested, and can't be finished. > > Ugh... that's nasty. I wonder whether the right thing to do is making > writeback workers non-freezable. IOs are supposed to be blocked from > lower layer anyway. Jan, what do you think? Well no, at least currently IO is not blocked in lower layers AFAIK - for that you'd need to freeze block devices & filesystems and there are issues with that (Jiri Kosina was the last one which was trying to make this work AFAIR). And I think you need to stop writeback (and generally any IO) to be generated so that it doesn't interact in a strange way with device drivers being frozen. So IMO until suspend freezes filesystems & devices properly you have to freeze writeback workqueue. Honza -- Jan KaraSUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Freezable workqueue blocks non-freezable workqueue during the system resume process
Hello, (cc'ing Jan) On Fri, Feb 26, 2016 at 02:19:20PM +0800, Peter Chen wrote: > On Thu, Feb 25, 2016 at 05:01:12PM -0500, Tejun Heo wrote: > > Hello, Peter. > > > > On Wed, Feb 24, 2016 at 03:24:30PM +0800, Peter Chen wrote: > > > > You might want to complain to the block-layer people about this. I > > > > don't know if anything can be done to fix it. > > > > > > > > Or maybe flush_work and flush_delayed_work can be changed to avoid > > > > blocking if the workqueue is frozen. Tejun? > > > > > > > > > > I have a patch to show the root cause of this issue. > > > > > > http://www.spinics.net/lists/linux-usb/msg136815.html > > > > I don't get it. Why would it deadlock? Shouldn't things get rolling > > once the workqueues are thawed? > > The workqueue writeback can't be thawed due to driver's resume > (dpm_complete) is lock nested, and can't be finished. Ugh... that's nasty. I wonder whether the right thing to do is making writeback workers non-freezable. IOs are supposed to be blocked from lower layer anyway. Jan, what do you think? Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Freezable workqueue blocks non-freezable workqueue during the system resume process
On Thu, Feb 25, 2016 at 05:01:12PM -0500, Tejun Heo wrote: > Hello, Peter. > > On Wed, Feb 24, 2016 at 03:24:30PM +0800, Peter Chen wrote: > > > You might want to complain to the block-layer people about this. I > > > don't know if anything can be done to fix it. > > > > > > Or maybe flush_work and flush_delayed_work can be changed to avoid > > > blocking if the workqueue is frozen. Tejun? > > > > > > > I have a patch to show the root cause of this issue. > > > > http://www.spinics.net/lists/linux-usb/msg136815.html > > I don't get it. Why would it deadlock? Shouldn't things get rolling > once the workqueues are thawed? > Hi Tejun, The workqueue writeback can't be thawed due to driver's resume (dpm_complete) is lock nested, and can't be finished. -- Best Regards, Peter Chen -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Freezable workqueue blocks non-freezable workqueue during the system resume process
Hello, Peter. On Wed, Feb 24, 2016 at 03:24:30PM +0800, Peter Chen wrote: > > You might want to complain to the block-layer people about this. I > > don't know if anything can be done to fix it. > > > > Or maybe flush_work and flush_delayed_work can be changed to avoid > > blocking if the workqueue is frozen. Tejun? > > > > I have a patch to show the root cause of this issue. > > http://www.spinics.net/lists/linux-usb/msg136815.html I don't get it. Why would it deadlock? Shouldn't things get rolling once the workqueues are thawed? Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Freezable workqueue blocks non-freezable workqueue during the system resume process
On Tue, Feb 23, 2016 at 10:34:09AM -0500, Alan Stern wrote: > On Tue, 23 Feb 2016, Peter Chen wrote: > > > Hi Tejun Heo and Florian Mickler, > > > > I have a question that during the system resume process, the freezable > > workqueue can be thawed if there is a non-freezable workqueue is > > blocked (At uninterruptable state)? > > > > My case like below, I have a USB OTG (Micro-AB) cable is at USB > > Micro-B port, and there is a USB driver on it, and un-plug this > > cable can wake up system from the suspend. There is a non-freezable > > workqueue ci_otg will be scheduled after disconnecting OTG cable, > > and in its worker ci_otg_work, it will try to disconnect USB drive, > > and flush disk information. > > These operations probably are not safe while the system is resuming. > It might be best to make them wait until the resume is finished. > > > But flush disk information is done by > > freezable workqueue writeback, it seeems workqueue writeback is > > never got chance to execute, the workqueue ci_otg is waiting there > > forever, and the system is deadlock. > > > Both change workqueue ci_otg as freezable or change workqueue writeback > > as non-freezable can fix this problem. > > It sounds like making ci_otg freezable is the easiest solution. > > > Please ignore it, the system is locked at driver's resume, > > maybe at scsi or usb driver, so of cos, the freezable processes > > can't be thawed. > > > > [ 555.263177] [] (flush_work) from [] > > > (flush_delayed_work+0x48/0x4c) > > > [ 555.271106] r8:ed5b5000 r7:c0b38a3c r6:eea439cc r5:eea4372c > > > r4:eea4372c > > > [ 555.277958] [] (flush_delayed_work) from [] > > > (bdi_unregister+0x84/0xec) > > > [ 555.286236] r4:eea43520 r3:2153 > > > [ 555.289885] [] (bdi_unregister) from [] > > > (blk_cleanup_queue+0x180/0x29c) > > > [ 555.298250] r5:eea43808 r4:eea43400 > > You might want to complain to the block-layer people about this. I > don't know if anything can be done to fix it. > > Or maybe flush_work and flush_delayed_work can be changed to avoid > blocking if the workqueue is frozen. Tejun? > I have a patch to show the root cause of this issue. http://www.spinics.net/lists/linux-usb/msg136815.html -- Best Regards, Peter Chen -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Freezable workqueue blocks non-freezable workqueue during the system resume process
On Tue, 23 Feb 2016, Peter Chen wrote: > Hi Tejun Heo and Florian Mickler, > > I have a question that during the system resume process, the freezable > workqueue can be thawed if there is a non-freezable workqueue is > blocked (At uninterruptable state)? > > My case like below, I have a USB OTG (Micro-AB) cable is at USB > Micro-B port, and there is a USB driver on it, and un-plug this > cable can wake up system from the suspend. There is a non-freezable > workqueue ci_otg will be scheduled after disconnecting OTG cable, > and in its worker ci_otg_work, it will try to disconnect USB drive, > and flush disk information. These operations probably are not safe while the system is resuming. It might be best to make them wait until the resume is finished. > But flush disk information is done by > freezable workqueue writeback, it seeems workqueue writeback is > never got chance to execute, the workqueue ci_otg is waiting there > forever, and the system is deadlock. > Both change workqueue ci_otg as freezable or change workqueue writeback > as non-freezable can fix this problem. It sounds like making ci_otg freezable is the easiest solution. > Please ignore it, the system is locked at driver's resume, > maybe at scsi or usb driver, so of cos, the freezable processes > can't be thawed. > > [ 555.263177] [] (flush_work) from [] > > (flush_delayed_work+0x48/0x4c) > > [ 555.271106] r8:ed5b5000 r7:c0b38a3c r6:eea439cc r5:eea4372c r4:eea4372c > > [ 555.277958] [] (flush_delayed_work) from [] > > (bdi_unregister+0x84/0xec) > > [ 555.286236] r4:eea43520 r3:2153 > > [ 555.289885] [] (bdi_unregister) from [] > > (blk_cleanup_queue+0x180/0x29c) > > [ 555.298250] r5:eea43808 r4:eea43400 You might want to complain to the block-layer people about this. I don't know if anything can be done to fix it. Or maybe flush_work and flush_delayed_work can be changed to avoid blocking if the workqueue is frozen. Tejun? Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Freezable workqueue blocks non-freezable workqueue during the system resume process
On Tue, Feb 23, 2016 at 11:20:56AM +0800, Peter Chen wrote: > Hi Tejun Heo and Florian Mickler, > > I have a question that during the system resume process, the freezable > workqueue can be thawed if there is a non-freezable workqueue is > blocked (At uninterruptable state)? > > My case like below, I have a USB OTG (Micro-AB) cable is at USB > Micro-B port, and there is a USB driver on it, and un-plug this > cable can wake up system from the suspend. There is a non-freezable > workqueue ci_otg will be scheduled after disconnecting OTG cable, > and in its worker ci_otg_work, it will try to disconnect USB drive, > and flush disk information. But flush disk information is done by > freezable workqueue writeback, it seeems workqueue writeback is > never got chance to execute, the workqueue ci_otg is waiting there > forever, and the system is deadlock. > > Both change workqueue ci_otg as freezable or change workqueue writeback > as non-freezable can fix this problem. > Please ignore it, the system is locked at driver's resume, maybe at scsi or usb driver, so of cos, the freezable processes can't be thawed. [ 553.429383] sh D c07de74c 0 694691 0x [ 553.435801] Backtrace: [ 553.438295] [] (__schedule) from [] (schedule+0x48/0xa0) [ 553.445358] r10:edd3c054 r9:edd3c078 r8:edddbd50 r7:edcbbc00 r6:c1377c34 r5:6153 [ 553.453313] r4:eddda000 [ 553.455896] [] (schedule) from [] (schedule_preempt_disabled+0x10/0x14) [ 553.464261] r4:edd3c058 r3:000a [ 553.467910] [] (schedule_preempt_disabled) from [] (mutex_lock_nested+0x1a0/0x3e8) [ 553.477254] [] (mutex_lock_nested) from [] (dpm_complete+0xc0/0x1b0) [ 553.485358] r10:00561408 r9:edd3c054 r8:c0b4863c r7:edddbd90 r6:c0b485d8 r5:edd3c020 [ 553.493313] r4:edd3c0d0 [ 553.495896] [] (dpm_complete) from [] (dpm_resume_end+0x1c/0x20) [ 553.503652] r9: r8:c0b1a9d0 r7:c1334ec0 r6:c1334edc r5:0003 r4:0010 [ 553.511544] [] (dpm_resume_end) from [] (suspend_devices_and_enter+0x158/0x504) [ 553.520604] r4: r3:c1334efc [ 553.524250] [] (suspend_devices_and_enter) from [] (pm_suspend+0x234/0x2cc) [ 553.532961] r10:00561408 r9:ed6b7300 r8:0004 r7:c1334eec r6: r5:c1334ee8 [ 553.540914] r4:0003 [ 553.543493] [] (pm_suspend) from [] (state_store+0x6c/0xc0) [ 553.550815] r6:0003 r5:c09b2ca4 r4:0003 r3:006d [ 553.556599] [] (state_store) from [] (kobj_attr_store+0x1c/0x28) [ 553.564358] r9:0004 r8:c0010004 r7:edf9480c r6:ed6b7300 r5:edf94800 r4:0004 [ 553.572258] [] (kobj_attr_store) from [] (sysfs_kf_write+0x54/0x58) [ 553.580295] [] (sysfs_kf_write) from [] (kernfs_fop_write+0xd8/0x1fc) [ 553.588487] r6:ed6b7300 r5: r4: r3:c0188580 [ 553.594262] [] (kernfs_fop_write) from [] (__vfs_write+0x2c/0xe0) [ 553.602105] r10: r9:eddda000 r8:c0010004 r7:edddbf80 r6:00561408 r5:edddbf80 [ 553.610060] r4:ed445280 [ 553.612641] [] (__vfs_write) from [] (vfs_write+0x98/0x16c) [ 553.619963] r8:c0010004 r7:edddbf80 r6:00561408 r5:0004 r4:ed445280 [ 553.626800] [] (vfs_write) from [] (SyS_write+0x4c/0xa8) [ 553.633861] r8:c0010004 r7:00561408 r6:0004 r5:ed445280 r4:ed445280 [ 553.640705] [] (SyS_write) from [] (ret_fast_syscall+0x0/0x1c) [ 553.648291] r7:0004 r6:b6f27d60 r5:00561408 r4:0004 > The call stack like below: > > [ 546.987379] writeback S c07de74c 012 2 0x > [ 546.993804] Backtrace: > [ 546.996307] [] (__schedule) from [] > (schedule+0x48/0xa0) > [ 547.003370] r10:ef14bc80 r9:ef14ca00 r8: r7:c0045c90 r6:ef14bc80 > r5:ef14bc98 > [ 547.011325] r4:ef164000 > [ 547.013907] [] (schedule) from [] > (rescuer_thread+0x290/0x308) > [ 547.021490] r4: r3:0008 > [ 547.025136] [] (rescuer_thread) from [] > (kthread+0xdc/0xf8) > [ 547.032459] r10: r9: r8: r7:c0045c90 r6:ef14bc80 > r5:ef1526c0 > [ 547.040412] r4: > [ 547.042993] [] (kthread) from [] > (ret_from_fork+0x14/0x24) > [ 547.050229] r7: r6: r5:c004b9d8 r4:ef1526c0 > [ 555.178869] kworker/u2:13 D c07de74c 0 826 2 0x > > [ 555.185310] Workqueue: ci_otg ci_otg_work > [ 555.189353] Backtrace: > [ 555.191849] [] (__schedule) from [] > (schedule+0x48/0xa0) > [ 555.198912] r10:ee471ba0 r9: r8: r7:0002 r6:ee47 > r5:ee471ba4 > [ 555.206867] r4:ee47 > [ 555.209453] [] (schedule) from [] > (schedule_timeout+0x15c/0x1e0) > [ 555.217212] r4:7fff r3:edc2b000 > [ 555.220862] [] (schedule_timeout) from [] > (wait_for_common+0x94/0x144) > [ 555.229140] r8: r7:0002 r6:ee47 r5:ee471ba4 r4:7fff > [ 555.235980] [] (wait_for_common) from [] > (wait_for_completion+0x18/0x1c) > [ 555.244430] r10:0001 r9:c0b5563c r8:c0042e48 r7:ef086000 r6:eea4372c > r5:ef131b00 > [ 555.252383] r4: > [ 555.254970] [] (wait_for_completion) from [] >
Freezable workqueue blocks non-freezable workqueue during the system resume process
Hi Tejun Heo and Florian Mickler, I have a question that during the system resume process, the freezable workqueue can be thawed if there is a non-freezable workqueue is blocked (At uninterruptable state)? My case like below, I have a USB OTG (Micro-AB) cable is at USB Micro-B port, and there is a USB driver on it, and un-plug this cable can wake up system from the suspend. There is a non-freezable workqueue ci_otg will be scheduled after disconnecting OTG cable, and in its worker ci_otg_work, it will try to disconnect USB drive, and flush disk information. But flush disk information is done by freezable workqueue writeback, it seeems workqueue writeback is never got chance to execute, the workqueue ci_otg is waiting there forever, and the system is deadlock. Both change workqueue ci_otg as freezable or change workqueue writeback as non-freezable can fix this problem. The call stack like below: [ 546.987379] writeback S c07de74c 012 2 0x [ 546.993804] Backtrace: [ 546.996307] [] (__schedule) from [] (schedule+0x48/0xa0) [ 547.003370] r10:ef14bc80 r9:ef14ca00 r8: r7:c0045c90 r6:ef14bc80 r5:ef14bc98 [ 547.011325] r4:ef164000 [ 547.013907] [] (schedule) from [] (rescuer_thread+0x290/0x308) [ 547.021490] r4: r3:0008 [ 547.025136] [] (rescuer_thread) from [] (kthread+0xdc/0xf8) [ 547.032459] r10: r9: r8: r7:c0045c90 r6:ef14bc80 r5:ef1526c0 [ 547.040412] r4: [ 547.042993] [] (kthread) from [] (ret_from_fork+0x14/0x24) [ 547.050229] r7: r6: r5:c004b9d8 r4:ef1526c0 [ 555.178869] kworker/u2:13 D c07de74c 0 826 2 0x [ 555.185310] Workqueue: ci_otg ci_otg_work [ 555.189353] Backtrace: [ 555.191849] [] (__schedule) from [] (schedule+0x48/0xa0) [ 555.198912] r10:ee471ba0 r9: r8: r7:0002 r6:ee47 r5:ee471ba4 [ 555.206867] r4:ee47 [ 555.209453] [] (schedule) from [] (schedule_timeout+0x15c/0x1e0) [ 555.217212] r4:7fff r3:edc2b000 [ 555.220862] [] (schedule_timeout) from [] (wait_for_common+0x94/0x144) [ 555.229140] r8: r7:0002 r6:ee47 r5:ee471ba4 r4:7fff [ 555.235980] [] (wait_for_common) from [] (wait_for_completion+0x18/0x1c) [ 555.244430] r10:0001 r9:c0b5563c r8:c0042e48 r7:ef086000 r6:eea4372c r5:ef131b00 [ 555.252383] r4: [ 555.254970] [] (wait_for_completion) from [] (flush_work+0x19c/0x234) [ 555.263177] [] (flush_work) from [] (flush_delayed_work+0x48/0x4c) [ 555.271106] r8:ed5b5000 r7:c0b38a3c r6:eea439cc r5:eea4372c r4:eea4372c [ 555.277958] [] (flush_delayed_work) from [] (bdi_unregister+0x84/0xec) [ 555.286236] r4:eea43520 r3:2153 [ 555.289885] [] (bdi_unregister) from [] (blk_cleanup_queue+0x180/0x29c) [ 555.298250] r5:eea43808 r4:eea43400 [ 555.301909] [] (blk_cleanup_queue) from [] (__scsi_remove_device+0x48/0xb8) [ 555.310623] r7: r6:2153 r5:ededa950 r4:ededa800 [ 555.316403] [] (__scsi_remove_device) from [] (scsi_forget_host+0x64/0x68) [ 555.325028] r5:ededa800 r4:ed5b5000 [ 555.328689] [] (scsi_forget_host) from [] (scsi_remove_host+0x78/0x104) [ 555.337054] r5:ed5b5068 r4:ed5b5000 [ 555.340709] [] (scsi_remove_host) from [] (usb_stor_disconnect+0x50/0xb4) [ 555.349247] r6:ed5b56e4 r5:ed5b5818 r4:ed5b5690 r3:0008 [ 555.355025] [] (usb_stor_disconnect) from [] (usb_unbind_interface+0x78/0x25c) [ 555.363997] r8:c13919b4 r7:edd3c000 r6:edd3c020 r5:ee551c68 r4:ee551c00 r3:c04cdf7c [ 555.371892] [] (usb_unbind_interface) from [] (__device_release_driver+0x8c/0x118) [ 555.381213] r10:0001 r9:edd90c00 r8:c13919b4 r7:ee551c68 r6:c0b546e0 r5:c0b5563c [ 555.389167] r4:edd3c020 [ 555.391752] [] (__device_release_driver) from [] (device_release_driver+0x28/0x34) [ 555.401071] r5:edd3c020 r4:edd3c054 [ 555.404721] [] (device_release_driver) from [] (bus_remove_device+0xe0/0x110) [ 555.413607] r5:edd3c020 r4:ef17f04c [ 555.417253] [] (bus_remove_device) from [] (device_del+0x114/0x21c) [ 555.425270] r6:edd3c028 r5:edd3c020 r4:ee551c00 r3: [ 555.431045] [] (device_del) from [] (usb_disable_device+0xa4/0x1e8) [ 555.439061] r8:edd3c000 r7:eded8000 r6: r5:0001 r4:ee551c00 [ 555.445906] [] (usb_disable_device) from [] (usb_disconnect+0x74/0x224) [ 555.454271] r9:edd90c00 r8:ee551000 r7:ee551c68 r6:ee551c9c r5:ee551c00 r4:0001 [ 555.462156] [] (usb_disconnect) from [] (usb_disconnect+0x1d8/0x224) [ 555.470259] r10:0001 r9:edd9 r8:ee471e2c r7:ee551468 r6:ee55149c r5:ee551400 [ 555.478213] r4:0001 [ 555.480797] [] (usb_disconnect) from [] (usb_remove_hcd+0xa0/0x1ac) [ 555.488813] r10:0001 r9:ee471eb0 r8: r7:ef3d9500 r6:eded810c r5:eded80b0 [ 555.496765] r4:eded8000 [ 555.499351] [] (usb_remove_hcd) from [] (host_stop+0x28/0x64) [ 555.506847] r6:eeb50010 r5:eded8000 r4:eeb51010 [ 555.511563] [] (host_stop) from [] (ci_otg_work+0xc4/0x124) [