Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers
On Fri, 2013-06-07 at 08:08 -0700, Greg KH wrote: > On Fri, Jun 07, 2013 at 12:54:55PM +0800, Yanmin Zhang wrote: > > On Thu, 2013-06-06 at 21:19 -0700, Greg KH wrote: > > > On Fri, Jun 07, 2013 at 11:18:06AM +0800, Yanmin Zhang wrote: > > > > On Thu, 2013-06-06 at 20:08 -0700, Greg KH wrote: > > > > > On Fri, Jun 07, 2013 at 10:37:52AM +0800, Yanmin Zhang wrote: > > > > > > On Thu, 2013-06-06 at 18:02 -0700, Greg KH wrote: > > > > > > > On Fri, Jun 07, 2013 at 08:53:29AM +0800, Yanmin Zhang wrote: > > > > > > > > On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote: > > > > > > > > > On Thu, 6 Jun 2013, shuox@intel.com wrote: > > > > > > > > > > From: Zhang Yanmin > > > > > > > > > > > > > > > > > > > > synchronize_irq waits pending IRQ handlers to be finished. > > > > > > > > > > If using this > > > > > > > > > > function while holding a resource, the IRQ handler may > > > > > > > > > > cause deadlock. > > > > > > > > > > > > > > > > > > > > Here we add a new function irq_in_progress which doesn't > > > > > > > > > > wait for the handlers > > > > > > > > > > to be finished. > > > > > > > > > > > > > > > > > > > > A typical use case at suspend-to-ram: > > > > > > > > > > > > > > > > > > > > device driver's irq handler is complicated and might hold a > > > > > > > > > > mutex at rare cases. > > > > > > > > > > Its suspend function is called and a suspended flag is set. > > > > > > > > > > In case its IRQ handler is running, suspend function calls > > > > > > > > > > irq_in_progress. if > > > > > > > > > > handler is running, abort suspend. > > > > > > > > > > The irq handler checks the suspended flag. If the device is > > > > > > > > > > suspended, irq handler > > > > > > > > > > either ignores the interrupt, or wakes up the whole system, > > > > > > > > > > and the driver's > > > > > > > > > > resume function could deal with the delayed interrupt > > > > > > > > > > handling. > > > > > > > > > > > > > > > > > > This is as wrong as it can be. Fix the driver instead of > > > > > > > > > hacking racy > > > > > > > > > functions into the core code. > > > > > > > > > > > > > > > > > > So your problem looks like this: > > > > > > > > > > > > > > > > > > CPU 0 CPU 1 > > > > > > > > > irq_handler_thread() suspend() > > > > > > > > >. mutex_lock(); > > > > > > > > >mutex_lock();synchronize_irq(); > > > > > > > > > > > > > > > > > > So why needs the mutex to be taken before synchronize_irq()? > > > > > > > > > Why not > > > > > > > > > doing the obvious? > > > > > > > > > > > > > > > > > > suspend() > > > > > > > > > disable_irq(); (Implies synchronize_irq) > > > > > > > > > mutex_lock(); > > > > > > > > > > > > > > > > > > mutex_unlock(); > > > > > > > > > enable_irq(); > > > > > > > > Thanks for the kind comment. > > > > > > > > > > > > > > > > We do consider your solution before and it works well indeed > > > > > > > > with some specific > > > > > > > > simple drivers. For example, some drives use GPIO pin as > > > > > > > > interrupt source. > > > > > > > > > > > > > > > > On one specific platform, disable_irq would really disable the > > > > > > > > irq at RTE entry, > > > > > > > > which means we lose the wakeup capability of this device. > > > > > > > > synchronize_irq can be another solution. But we did hit 'DPM > > > > > > > > device timeout' issue > > > > > > > > reported by dpm_wd_handler at suspend-to-ram. > > > > > > > > > > > > > > > > The driver is complicated. We couldn't change too many > > > > > > > > functions to optimize it. > > > > > > > > In addition, we have to use the driver instead of throwing it > > > > > > > > away. > > > > > > > > > > > > > > What is preventing you from rewriting it to work properly? > > > > > > It's complicated. > > > > > > > > > > That sounds like your issue, not ours, so please don't push the > > > > > problem > > > > > onto someone else. Take ownership of the driver, fix it up, and all > > > > > will be good. > > > > > > > > > > > > > > > > > > With irq_in_progress, we can resolve this issue and it does > > > > > > > > work, although it > > > > > > > > looks like ugly. > > > > > > > > > > > > > > Don't paper over driver bugs in the core kernel, fix the driver. > > > > > > It's hard to say it's a driver bug. Could generic kernel provide > > > > > > some > > > > > > flexibility for such complicated drivers? > > > > > > > > > > Please post the code for the driver, and then we will be glad to > > > > > continue the dicussion. > > > > Greg, > > > > > > > > Thanks for your enthusiasm. It's a private driver for products. > > > > > > What do you mean "private driver"? All drivers need to be merged into > > > the mainline kernel, it saves you time and money, and we will fix your > > > bugs for you. > > > > > > You know that, your bosses know that, your management knows that, so why > > > are you trying to hide things? > > > > > >
Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers
On Fri, Jun 07, 2013 at 12:54:55PM +0800, Yanmin Zhang wrote: > On Thu, 2013-06-06 at 21:19 -0700, Greg KH wrote: > > On Fri, Jun 07, 2013 at 11:18:06AM +0800, Yanmin Zhang wrote: > > > On Thu, 2013-06-06 at 20:08 -0700, Greg KH wrote: > > > > On Fri, Jun 07, 2013 at 10:37:52AM +0800, Yanmin Zhang wrote: > > > > > On Thu, 2013-06-06 at 18:02 -0700, Greg KH wrote: > > > > > > On Fri, Jun 07, 2013 at 08:53:29AM +0800, Yanmin Zhang wrote: > > > > > > > On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote: > > > > > > > > On Thu, 6 Jun 2013, shuox@intel.com wrote: > > > > > > > > > From: Zhang Yanmin > > > > > > > > > > > > > > > > > > synchronize_irq waits pending IRQ handlers to be finished. If > > > > > > > > > using this > > > > > > > > > function while holding a resource, the IRQ handler may cause > > > > > > > > > deadlock. > > > > > > > > > > > > > > > > > > Here we add a new function irq_in_progress which doesn't wait > > > > > > > > > for the handlers > > > > > > > > > to be finished. > > > > > > > > > > > > > > > > > > A typical use case at suspend-to-ram: > > > > > > > > > > > > > > > > > > device driver's irq handler is complicated and might hold a > > > > > > > > > mutex at rare cases. > > > > > > > > > Its suspend function is called and a suspended flag is set. > > > > > > > > > In case its IRQ handler is running, suspend function calls > > > > > > > > > irq_in_progress. if > > > > > > > > > handler is running, abort suspend. > > > > > > > > > The irq handler checks the suspended flag. If the device is > > > > > > > > > suspended, irq handler > > > > > > > > > either ignores the interrupt, or wakes up the whole system, > > > > > > > > > and the driver's > > > > > > > > > resume function could deal with the delayed interrupt > > > > > > > > > handling. > > > > > > > > > > > > > > > > This is as wrong as it can be. Fix the driver instead of > > > > > > > > hacking racy > > > > > > > > functions into the core code. > > > > > > > > > > > > > > > > So your problem looks like this: > > > > > > > > > > > > > > > > CPU 0 CPU 1 > > > > > > > > irq_handler_thread()suspend() > > > > > > > >.mutex_lock(); > > > > > > > >mutex_lock(); synchronize_irq(); > > > > > > > > > > > > > > > > So why needs the mutex to be taken before synchronize_irq()? > > > > > > > > Why not > > > > > > > > doing the obvious? > > > > > > > > > > > > > > > > suspend() > > > > > > > > disable_irq(); (Implies synchronize_irq) > > > > > > > > mutex_lock(); > > > > > > > > > > > > > > > > mutex_unlock(); > > > > > > > > enable_irq(); > > > > > > > Thanks for the kind comment. > > > > > > > > > > > > > > We do consider your solution before and it works well indeed with > > > > > > > some specific > > > > > > > simple drivers. For example, some drives use GPIO pin as > > > > > > > interrupt source. > > > > > > > > > > > > > > On one specific platform, disable_irq would really disable the > > > > > > > irq at RTE entry, > > > > > > > which means we lose the wakeup capability of this device. > > > > > > > synchronize_irq can be another solution. But we did hit 'DPM > > > > > > > device timeout' issue > > > > > > > reported by dpm_wd_handler at suspend-to-ram. > > > > > > > > > > > > > > The driver is complicated. We couldn't change too many functions > > > > > > > to optimize it. > > > > > > > In addition, we have to use the driver instead of throwing it > > > > > > > away. > > > > > > > > > > > > What is preventing you from rewriting it to work properly? > > > > > It's complicated. > > > > > > > > That sounds like your issue, not ours, so please don't push the problem > > > > onto someone else. Take ownership of the driver, fix it up, and all > > > > will be good. > > > > > > > > > > > > > > > With irq_in_progress, we can resolve this issue and it does work, > > > > > > > although it > > > > > > > looks like ugly. > > > > > > > > > > > > Don't paper over driver bugs in the core kernel, fix the driver. > > > > > It's hard to say it's a driver bug. Could generic kernel provide some > > > > > flexibility for such complicated drivers? > > > > > > > > Please post the code for the driver, and then we will be glad to > > > > continue the dicussion. > > > Greg, > > > > > > Thanks for your enthusiasm. It's a private driver for products. > > > > What do you mean "private driver"? All drivers need to be merged into > > the mainline kernel, it saves you time and money, and we will fix your > > bugs for you. > > > > You know that, your bosses know that, your management knows that, so why > > are you trying to hide things? > > > > totally confused, > They are embedded device drivers, highly depending on specific devices which > runs > its own firmware in devices. Here the kernel drivers run at AP side. That's no different from loads of drivers that we have in the kernel today, no need to keep them from
Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers
On Fri, Jun 07, 2013 at 12:54:55PM +0800, Yanmin Zhang wrote: On Thu, 2013-06-06 at 21:19 -0700, Greg KH wrote: On Fri, Jun 07, 2013 at 11:18:06AM +0800, Yanmin Zhang wrote: On Thu, 2013-06-06 at 20:08 -0700, Greg KH wrote: On Fri, Jun 07, 2013 at 10:37:52AM +0800, Yanmin Zhang wrote: On Thu, 2013-06-06 at 18:02 -0700, Greg KH wrote: On Fri, Jun 07, 2013 at 08:53:29AM +0800, Yanmin Zhang wrote: On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote: On Thu, 6 Jun 2013, shuox@intel.com wrote: From: Zhang Yanmin yanmin.zh...@intel.com synchronize_irq waits pending IRQ handlers to be finished. If using this function while holding a resource, the IRQ handler may cause deadlock. Here we add a new function irq_in_progress which doesn't wait for the handlers to be finished. A typical use case at suspend-to-ram: device driver's irq handler is complicated and might hold a mutex at rare cases. Its suspend function is called and a suspended flag is set. In case its IRQ handler is running, suspend function calls irq_in_progress. if handler is running, abort suspend. The irq handler checks the suspended flag. If the device is suspended, irq handler either ignores the interrupt, or wakes up the whole system, and the driver's resume function could deal with the delayed interrupt handling. This is as wrong as it can be. Fix the driver instead of hacking racy functions into the core code. So your problem looks like this: CPU 0 CPU 1 irq_handler_thread()suspend() .mutex_lock(m); mutex_lock(m); synchronize_irq(); So why needs the mutex to be taken before synchronize_irq()? Why not doing the obvious? suspend() disable_irq(); (Implies synchronize_irq) mutex_lock(m); mutex_unlock(m); enable_irq(); Thanks for the kind comment. We do consider your solution before and it works well indeed with some specific simple drivers. For example, some drives use GPIO pin as interrupt source. On one specific platform, disable_irq would really disable the irq at RTE entry, which means we lose the wakeup capability of this device. synchronize_irq can be another solution. But we did hit 'DPM device timeout' issue reported by dpm_wd_handler at suspend-to-ram. The driver is complicated. We couldn't change too many functions to optimize it. In addition, we have to use the driver instead of throwing it away. What is preventing you from rewriting it to work properly? It's complicated. That sounds like your issue, not ours, so please don't push the problem onto someone else. Take ownership of the driver, fix it up, and all will be good. With irq_in_progress, we can resolve this issue and it does work, although it looks like ugly. Don't paper over driver bugs in the core kernel, fix the driver. It's hard to say it's a driver bug. Could generic kernel provide some flexibility for such complicated drivers? Please post the code for the driver, and then we will be glad to continue the dicussion. Greg, Thanks for your enthusiasm. It's a private driver for products. What do you mean private driver? All drivers need to be merged into the mainline kernel, it saves you time and money, and we will fix your bugs for you. You know that, your bosses know that, your management knows that, so why are you trying to hide things? totally confused, They are embedded device drivers, highly depending on specific devices which runs its own firmware in devices. Here the kernel drivers run at AP side. That's no different from loads of drivers that we have in the kernel today, no need to keep them from being merged, please submit them. One example is Graphics driver, which is very big and coding is not friendly. Kernel experts can raise tons of questions against the driver, but we have to make the driver work well on real products. So you are saying that kernel experts don't ask questions that actually make drivers work well on real products? If that's how you feel about the community, then why are you asking the community for help in the first place? And do you somehow think that we don't know how to review/write/fix graphics drivers? Who do you think made Linux in the first place? Another reason is drivers need workaround many hardware
Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers
On Fri, 2013-06-07 at 08:08 -0700, Greg KH wrote: On Fri, Jun 07, 2013 at 12:54:55PM +0800, Yanmin Zhang wrote: On Thu, 2013-06-06 at 21:19 -0700, Greg KH wrote: On Fri, Jun 07, 2013 at 11:18:06AM +0800, Yanmin Zhang wrote: On Thu, 2013-06-06 at 20:08 -0700, Greg KH wrote: On Fri, Jun 07, 2013 at 10:37:52AM +0800, Yanmin Zhang wrote: On Thu, 2013-06-06 at 18:02 -0700, Greg KH wrote: On Fri, Jun 07, 2013 at 08:53:29AM +0800, Yanmin Zhang wrote: On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote: On Thu, 6 Jun 2013, shuox@intel.com wrote: From: Zhang Yanmin yanmin.zh...@intel.com synchronize_irq waits pending IRQ handlers to be finished. If using this function while holding a resource, the IRQ handler may cause deadlock. Here we add a new function irq_in_progress which doesn't wait for the handlers to be finished. A typical use case at suspend-to-ram: device driver's irq handler is complicated and might hold a mutex at rare cases. Its suspend function is called and a suspended flag is set. In case its IRQ handler is running, suspend function calls irq_in_progress. if handler is running, abort suspend. The irq handler checks the suspended flag. If the device is suspended, irq handler either ignores the interrupt, or wakes up the whole system, and the driver's resume function could deal with the delayed interrupt handling. This is as wrong as it can be. Fix the driver instead of hacking racy functions into the core code. So your problem looks like this: CPU 0 CPU 1 irq_handler_thread() suspend() . mutex_lock(m); mutex_lock(m);synchronize_irq(); So why needs the mutex to be taken before synchronize_irq()? Why not doing the obvious? suspend() disable_irq(); (Implies synchronize_irq) mutex_lock(m); mutex_unlock(m); enable_irq(); Thanks for the kind comment. We do consider your solution before and it works well indeed with some specific simple drivers. For example, some drives use GPIO pin as interrupt source. On one specific platform, disable_irq would really disable the irq at RTE entry, which means we lose the wakeup capability of this device. synchronize_irq can be another solution. But we did hit 'DPM device timeout' issue reported by dpm_wd_handler at suspend-to-ram. The driver is complicated. We couldn't change too many functions to optimize it. In addition, we have to use the driver instead of throwing it away. What is preventing you from rewriting it to work properly? It's complicated. That sounds like your issue, not ours, so please don't push the problem onto someone else. Take ownership of the driver, fix it up, and all will be good. With irq_in_progress, we can resolve this issue and it does work, although it looks like ugly. Don't paper over driver bugs in the core kernel, fix the driver. It's hard to say it's a driver bug. Could generic kernel provide some flexibility for such complicated drivers? Please post the code for the driver, and then we will be glad to continue the dicussion. Greg, Thanks for your enthusiasm. It's a private driver for products. What do you mean private driver? All drivers need to be merged into the mainline kernel, it saves you time and money, and we will fix your bugs for you. You know that, your bosses know that, your management knows that, so why are you trying to hide things? totally confused, They are embedded device drivers, highly depending on specific devices which runs its own firmware in devices. Here the kernel drivers run at AP side. That's no different from loads of drivers that we have in the kernel today, no need to keep them from being merged, please submit them. It need managers' approval and is beyond my capability. One example is Graphics driver, which is very big and coding is not friendly. Kernel experts can raise tons of questions against the driver, but we have to make the driver work well on real products. So you are saying that kernel experts don't ask questions that actually make drivers work well on real products? Obviously, I didn't say that. Pls. also remove the , as I really think I'm
Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers
On Thu, 2013-06-06 at 21:19 -0700, Greg KH wrote: > On Fri, Jun 07, 2013 at 11:18:06AM +0800, Yanmin Zhang wrote: > > On Thu, 2013-06-06 at 20:08 -0700, Greg KH wrote: > > > On Fri, Jun 07, 2013 at 10:37:52AM +0800, Yanmin Zhang wrote: > > > > On Thu, 2013-06-06 at 18:02 -0700, Greg KH wrote: > > > > > On Fri, Jun 07, 2013 at 08:53:29AM +0800, Yanmin Zhang wrote: > > > > > > On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote: > > > > > > > On Thu, 6 Jun 2013, shuox@intel.com wrote: > > > > > > > > From: Zhang Yanmin > > > > > > > > > > > > > > > > synchronize_irq waits pending IRQ handlers to be finished. If > > > > > > > > using this > > > > > > > > function while holding a resource, the IRQ handler may cause > > > > > > > > deadlock. > > > > > > > > > > > > > > > > Here we add a new function irq_in_progress which doesn't wait > > > > > > > > for the handlers > > > > > > > > to be finished. > > > > > > > > > > > > > > > > A typical use case at suspend-to-ram: > > > > > > > > > > > > > > > > device driver's irq handler is complicated and might hold a > > > > > > > > mutex at rare cases. > > > > > > > > Its suspend function is called and a suspended flag is set. > > > > > > > > In case its IRQ handler is running, suspend function calls > > > > > > > > irq_in_progress. if > > > > > > > > handler is running, abort suspend. > > > > > > > > The irq handler checks the suspended flag. If the device is > > > > > > > > suspended, irq handler > > > > > > > > either ignores the interrupt, or wakes up the whole system, and > > > > > > > > the driver's > > > > > > > > resume function could deal with the delayed interrupt handling. > > > > > > > > > > > > > > This is as wrong as it can be. Fix the driver instead of hacking > > > > > > > racy > > > > > > > functions into the core code. > > > > > > > > > > > > > > So your problem looks like this: > > > > > > > > > > > > > > CPU 0 CPU 1 > > > > > > > irq_handler_thread() suspend() > > > > > > >. mutex_lock(); > > > > > > >mutex_lock();synchronize_irq(); > > > > > > > > > > > > > > So why needs the mutex to be taken before synchronize_irq()? Why > > > > > > > not > > > > > > > doing the obvious? > > > > > > > > > > > > > > suspend() > > > > > > > disable_irq(); (Implies synchronize_irq) > > > > > > > mutex_lock(); > > > > > > > > > > > > > > mutex_unlock(); > > > > > > > enable_irq(); > > > > > > Thanks for the kind comment. > > > > > > > > > > > > We do consider your solution before and it works well indeed with > > > > > > some specific > > > > > > simple drivers. For example, some drives use GPIO pin as interrupt > > > > > > source. > > > > > > > > > > > > On one specific platform, disable_irq would really disable the irq > > > > > > at RTE entry, > > > > > > which means we lose the wakeup capability of this device. > > > > > > synchronize_irq can be another solution. But we did hit 'DPM device > > > > > > timeout' issue > > > > > > reported by dpm_wd_handler at suspend-to-ram. > > > > > > > > > > > > The driver is complicated. We couldn't change too many functions to > > > > > > optimize it. > > > > > > In addition, we have to use the driver instead of throwing it away. > > > > > > > > > > What is preventing you from rewriting it to work properly? > > > > It's complicated. > > > > > > That sounds like your issue, not ours, so please don't push the problem > > > onto someone else. Take ownership of the driver, fix it up, and all > > > will be good. > > > > > > > > > > > > With irq_in_progress, we can resolve this issue and it does work, > > > > > > although it > > > > > > looks like ugly. > > > > > > > > > > Don't paper over driver bugs in the core kernel, fix the driver. > > > > It's hard to say it's a driver bug. Could generic kernel provide some > > > > flexibility for such complicated drivers? > > > > > > Please post the code for the driver, and then we will be glad to > > > continue the dicussion. > > Greg, > > > > Thanks for your enthusiasm. It's a private driver for products. > > What do you mean "private driver"? All drivers need to be merged into > the mainline kernel, it saves you time and money, and we will fix your > bugs for you. > > You know that, your bosses know that, your management knows that, so why > are you trying to hide things? > > totally confused, They are embedded device drivers, highly depending on specific devices which runs its own firmware in devices. Here the kernel drivers run at AP side. One example is Graphics driver, which is very big and coding is not friendly. Kernel experts can raise tons of questions against the driver, but we have to make the driver work well on real products. Another reason is drivers need workaround many hardware issues. That makes it hard to implement kernel drivers in good shape sometimes. We need support all cases. We fixed lots of hard issues on
Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers
On Fri, Jun 07, 2013 at 11:18:06AM +0800, Yanmin Zhang wrote: > On Thu, 2013-06-06 at 20:08 -0700, Greg KH wrote: > > On Fri, Jun 07, 2013 at 10:37:52AM +0800, Yanmin Zhang wrote: > > > On Thu, 2013-06-06 at 18:02 -0700, Greg KH wrote: > > > > On Fri, Jun 07, 2013 at 08:53:29AM +0800, Yanmin Zhang wrote: > > > > > On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote: > > > > > > On Thu, 6 Jun 2013, shuox@intel.com wrote: > > > > > > > From: Zhang Yanmin > > > > > > > > > > > > > > synchronize_irq waits pending IRQ handlers to be finished. If > > > > > > > using this > > > > > > > function while holding a resource, the IRQ handler may cause > > > > > > > deadlock. > > > > > > > > > > > > > > Here we add a new function irq_in_progress which doesn't wait for > > > > > > > the handlers > > > > > > > to be finished. > > > > > > > > > > > > > > A typical use case at suspend-to-ram: > > > > > > > > > > > > > > device driver's irq handler is complicated and might hold a mutex > > > > > > > at rare cases. > > > > > > > Its suspend function is called and a suspended flag is set. > > > > > > > In case its IRQ handler is running, suspend function calls > > > > > > > irq_in_progress. if > > > > > > > handler is running, abort suspend. > > > > > > > The irq handler checks the suspended flag. If the device is > > > > > > > suspended, irq handler > > > > > > > either ignores the interrupt, or wakes up the whole system, and > > > > > > > the driver's > > > > > > > resume function could deal with the delayed interrupt handling. > > > > > > > > > > > > This is as wrong as it can be. Fix the driver instead of hacking > > > > > > racy > > > > > > functions into the core code. > > > > > > > > > > > > So your problem looks like this: > > > > > > > > > > > > CPU 0 CPU 1 > > > > > > irq_handler_thread()suspend() > > > > > >.mutex_lock(); > > > > > >mutex_lock(); synchronize_irq(); > > > > > > > > > > > > So why needs the mutex to be taken before synchronize_irq()? Why not > > > > > > doing the obvious? > > > > > > > > > > > > suspend() > > > > > > disable_irq(); (Implies synchronize_irq) > > > > > > mutex_lock(); > > > > > > > > > > > > mutex_unlock(); > > > > > > enable_irq(); > > > > > Thanks for the kind comment. > > > > > > > > > > We do consider your solution before and it works well indeed with > > > > > some specific > > > > > simple drivers. For example, some drives use GPIO pin as interrupt > > > > > source. > > > > > > > > > > On one specific platform, disable_irq would really disable the irq at > > > > > RTE entry, > > > > > which means we lose the wakeup capability of this device. > > > > > synchronize_irq can be another solution. But we did hit 'DPM device > > > > > timeout' issue > > > > > reported by dpm_wd_handler at suspend-to-ram. > > > > > > > > > > The driver is complicated. We couldn't change too many functions to > > > > > optimize it. > > > > > In addition, we have to use the driver instead of throwing it away. > > > > > > > > What is preventing you from rewriting it to work properly? > > > It's complicated. > > > > That sounds like your issue, not ours, so please don't push the problem > > onto someone else. Take ownership of the driver, fix it up, and all > > will be good. > > > > > > > > > With irq_in_progress, we can resolve this issue and it does work, > > > > > although it > > > > > looks like ugly. > > > > > > > > Don't paper over driver bugs in the core kernel, fix the driver. > > > It's hard to say it's a driver bug. Could generic kernel provide some > > > flexibility for such complicated drivers? > > > > Please post the code for the driver, and then we will be glad to > > continue the dicussion. > Greg, > > Thanks for your enthusiasm. It's a private driver for products. What do you mean "private driver"? All drivers need to be merged into the mainline kernel, it saves you time and money, and we will fix your bugs for you. You know that, your bosses know that, your management knows that, so why are you trying to hide things? totally confused, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers
On Thu, 2013-06-06 at 20:08 -0700, Greg KH wrote: > On Fri, Jun 07, 2013 at 10:37:52AM +0800, Yanmin Zhang wrote: > > On Thu, 2013-06-06 at 18:02 -0700, Greg KH wrote: > > > On Fri, Jun 07, 2013 at 08:53:29AM +0800, Yanmin Zhang wrote: > > > > On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote: > > > > > On Thu, 6 Jun 2013, shuox@intel.com wrote: > > > > > > From: Zhang Yanmin > > > > > > > > > > > > synchronize_irq waits pending IRQ handlers to be finished. If using > > > > > > this > > > > > > function while holding a resource, the IRQ handler may cause > > > > > > deadlock. > > > > > > > > > > > > Here we add a new function irq_in_progress which doesn't wait for > > > > > > the handlers > > > > > > to be finished. > > > > > > > > > > > > A typical use case at suspend-to-ram: > > > > > > > > > > > > device driver's irq handler is complicated and might hold a mutex > > > > > > at rare cases. > > > > > > Its suspend function is called and a suspended flag is set. > > > > > > In case its IRQ handler is running, suspend function calls > > > > > > irq_in_progress. if > > > > > > handler is running, abort suspend. > > > > > > The irq handler checks the suspended flag. If the device is > > > > > > suspended, irq handler > > > > > > either ignores the interrupt, or wakes up the whole system, and the > > > > > > driver's > > > > > > resume function could deal with the delayed interrupt handling. > > > > > > > > > > This is as wrong as it can be. Fix the driver instead of hacking racy > > > > > functions into the core code. > > > > > > > > > > So your problem looks like this: > > > > > > > > > > CPU 0 CPU 1 > > > > > irq_handler_thread() suspend() > > > > >. mutex_lock(); > > > > >mutex_lock();synchronize_irq(); > > > > > > > > > > So why needs the mutex to be taken before synchronize_irq()? Why not > > > > > doing the obvious? > > > > > > > > > > suspend() > > > > > disable_irq(); (Implies synchronize_irq) > > > > > mutex_lock(); > > > > > > > > > > mutex_unlock(); > > > > > enable_irq(); > > > > Thanks for the kind comment. > > > > > > > > We do consider your solution before and it works well indeed with some > > > > specific > > > > simple drivers. For example, some drives use GPIO pin as interrupt > > > > source. > > > > > > > > On one specific platform, disable_irq would really disable the irq at > > > > RTE entry, > > > > which means we lose the wakeup capability of this device. > > > > synchronize_irq can be another solution. But we did hit 'DPM device > > > > timeout' issue > > > > reported by dpm_wd_handler at suspend-to-ram. > > > > > > > > The driver is complicated. We couldn't change too many functions to > > > > optimize it. > > > > In addition, we have to use the driver instead of throwing it away. > > > > > > What is preventing you from rewriting it to work properly? > > It's complicated. > > That sounds like your issue, not ours, so please don't push the problem > onto someone else. Take ownership of the driver, fix it up, and all > will be good. > > > > > > With irq_in_progress, we can resolve this issue and it does work, > > > > although it > > > > looks like ugly. > > > > > > Don't paper over driver bugs in the core kernel, fix the driver. > > It's hard to say it's a driver bug. Could generic kernel provide some > > flexibility for such complicated drivers? > > Please post the code for the driver, and then we will be glad to > continue the dicussion. Greg, Thanks for your enthusiasm. It's a private driver for products. Let's stop here and I wouldn't push the patch to upstream. Thanks all. Yanmin > > As for "complicated", just make it "uncomplicated", it's just code :) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers
On Fri, Jun 07, 2013 at 10:37:52AM +0800, Yanmin Zhang wrote: > On Thu, 2013-06-06 at 18:02 -0700, Greg KH wrote: > > On Fri, Jun 07, 2013 at 08:53:29AM +0800, Yanmin Zhang wrote: > > > On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote: > > > > On Thu, 6 Jun 2013, shuox@intel.com wrote: > > > > > From: Zhang Yanmin > > > > > > > > > > synchronize_irq waits pending IRQ handlers to be finished. If using > > > > > this > > > > > function while holding a resource, the IRQ handler may cause deadlock. > > > > > > > > > > Here we add a new function irq_in_progress which doesn't wait for the > > > > > handlers > > > > > to be finished. > > > > > > > > > > A typical use case at suspend-to-ram: > > > > > > > > > > device driver's irq handler is complicated and might hold a mutex at > > > > > rare cases. > > > > > Its suspend function is called and a suspended flag is set. > > > > > In case its IRQ handler is running, suspend function calls > > > > > irq_in_progress. if > > > > > handler is running, abort suspend. > > > > > The irq handler checks the suspended flag. If the device is > > > > > suspended, irq handler > > > > > either ignores the interrupt, or wakes up the whole system, and the > > > > > driver's > > > > > resume function could deal with the delayed interrupt handling. > > > > > > > > This is as wrong as it can be. Fix the driver instead of hacking racy > > > > functions into the core code. > > > > > > > > So your problem looks like this: > > > > > > > > CPU 0 CPU 1 > > > > irq_handler_thread()suspend() > > > >.mutex_lock(); > > > >mutex_lock(); synchronize_irq(); > > > > > > > > So why needs the mutex to be taken before synchronize_irq()? Why not > > > > doing the obvious? > > > > > > > > suspend() > > > > disable_irq(); (Implies synchronize_irq) > > > > mutex_lock(); > > > > > > > > mutex_unlock(); > > > > enable_irq(); > > > Thanks for the kind comment. > > > > > > We do consider your solution before and it works well indeed with some > > > specific > > > simple drivers. For example, some drives use GPIO pin as interrupt source. > > > > > > On one specific platform, disable_irq would really disable the irq at RTE > > > entry, > > > which means we lose the wakeup capability of this device. > > > synchronize_irq can be another solution. But we did hit 'DPM device > > > timeout' issue > > > reported by dpm_wd_handler at suspend-to-ram. > > > > > > The driver is complicated. We couldn't change too many functions to > > > optimize it. > > > In addition, we have to use the driver instead of throwing it away. > > > > What is preventing you from rewriting it to work properly? > It's complicated. That sounds like your issue, not ours, so please don't push the problem onto someone else. Take ownership of the driver, fix it up, and all will be good. > > > With irq_in_progress, we can resolve this issue and it does work, > > > although it > > > looks like ugly. > > > > Don't paper over driver bugs in the core kernel, fix the driver. > It's hard to say it's a driver bug. Could generic kernel provide some > flexibility for such complicated drivers? Please post the code for the driver, and then we will be glad to continue the dicussion. As for "complicated", just make it "uncomplicated", it's just code :) greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers
On Thu, 2013-06-06 at 18:02 -0700, Greg KH wrote: > On Fri, Jun 07, 2013 at 08:53:29AM +0800, Yanmin Zhang wrote: > > On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote: > > > On Thu, 6 Jun 2013, shuox@intel.com wrote: > > > > From: Zhang Yanmin > > > > > > > > synchronize_irq waits pending IRQ handlers to be finished. If using this > > > > function while holding a resource, the IRQ handler may cause deadlock. > > > > > > > > Here we add a new function irq_in_progress which doesn't wait for the > > > > handlers > > > > to be finished. > > > > > > > > A typical use case at suspend-to-ram: > > > > > > > > device driver's irq handler is complicated and might hold a mutex at > > > > rare cases. > > > > Its suspend function is called and a suspended flag is set. > > > > In case its IRQ handler is running, suspend function calls > > > > irq_in_progress. if > > > > handler is running, abort suspend. > > > > The irq handler checks the suspended flag. If the device is suspended, > > > > irq handler > > > > either ignores the interrupt, or wakes up the whole system, and the > > > > driver's > > > > resume function could deal with the delayed interrupt handling. > > > > > > This is as wrong as it can be. Fix the driver instead of hacking racy > > > functions into the core code. > > > > > > So your problem looks like this: > > > > > > CPU 0 CPU 1 > > > irq_handler_thread() suspend() > > >. mutex_lock(); > > >mutex_lock();synchronize_irq(); > > > > > > So why needs the mutex to be taken before synchronize_irq()? Why not > > > doing the obvious? > > > > > > suspend() > > > disable_irq(); (Implies synchronize_irq) > > > mutex_lock(); > > > > > > mutex_unlock(); > > > enable_irq(); > > Thanks for the kind comment. > > > > We do consider your solution before and it works well indeed with some > > specific > > simple drivers. For example, some drives use GPIO pin as interrupt source. > > > > On one specific platform, disable_irq would really disable the irq at RTE > > entry, > > which means we lose the wakeup capability of this device. > > synchronize_irq can be another solution. But we did hit 'DPM device > > timeout' issue > > reported by dpm_wd_handler at suspend-to-ram. > > > > The driver is complicated. We couldn't change too many functions to > > optimize it. > > In addition, we have to use the driver instead of throwing it away. > > What is preventing you from rewriting it to work properly? It's complicated. > > > With irq_in_progress, we can resolve this issue and it does work, although > > it > > looks like ugly. > > Don't paper over driver bugs in the core kernel, fix the driver. It's hard to say it's a driver bug. Could generic kernel provide some flexibility for such complicated drivers? For example, any driver's suspend can return error and the whole suspend-to-ram aborts. Can we say the driver has a bug? If so, why not to change all suspend/resume callbacks to return VOID? Current kernel already allows drivers to abort suspend-to-ram at some rare corner cases. Of course, if the abort happens frequently, it's a bug and we need fix it in driver. If it happens rarely, can we provide some flexibility for the driver? Thanks, Yanmin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers
On Fri, Jun 07, 2013 at 08:53:29AM +0800, Yanmin Zhang wrote: > On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote: > > On Thu, 6 Jun 2013, shuox@intel.com wrote: > > > From: Zhang Yanmin > > > > > > synchronize_irq waits pending IRQ handlers to be finished. If using this > > > function while holding a resource, the IRQ handler may cause deadlock. > > > > > > Here we add a new function irq_in_progress which doesn't wait for the > > > handlers > > > to be finished. > > > > > > A typical use case at suspend-to-ram: > > > > > > device driver's irq handler is complicated and might hold a mutex at rare > > > cases. > > > Its suspend function is called and a suspended flag is set. > > > In case its IRQ handler is running, suspend function calls > > > irq_in_progress. if > > > handler is running, abort suspend. > > > The irq handler checks the suspended flag. If the device is suspended, > > > irq handler > > > either ignores the interrupt, or wakes up the whole system, and the > > > driver's > > > resume function could deal with the delayed interrupt handling. > > > > This is as wrong as it can be. Fix the driver instead of hacking racy > > functions into the core code. > > > > So your problem looks like this: > > > > CPU 0 CPU 1 > > irq_handler_thread()suspend() > >.mutex_lock(); > >mutex_lock(); synchronize_irq(); > > > > So why needs the mutex to be taken before synchronize_irq()? Why not > > doing the obvious? > > > > suspend() > > disable_irq(); (Implies synchronize_irq) > > mutex_lock(); > > > > mutex_unlock(); > > enable_irq(); > Thanks for the kind comment. > > We do consider your solution before and it works well indeed with some > specific > simple drivers. For example, some drives use GPIO pin as interrupt source. > > On one specific platform, disable_irq would really disable the irq at RTE > entry, > which means we lose the wakeup capability of this device. > synchronize_irq can be another solution. But we did hit 'DPM device timeout' > issue > reported by dpm_wd_handler at suspend-to-ram. > > The driver is complicated. We couldn't change too many functions to optimize > it. > In addition, we have to use the driver instead of throwing it away. What is preventing you from rewriting it to work properly? > With irq_in_progress, we can resolve this issue and it does work, although it > looks like ugly. Don't paper over driver bugs in the core kernel, fix the driver. greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers
On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote: > On Thu, 6 Jun 2013, shuox@intel.com wrote: > > From: Zhang Yanmin > > > > synchronize_irq waits pending IRQ handlers to be finished. If using this > > function while holding a resource, the IRQ handler may cause deadlock. > > > > Here we add a new function irq_in_progress which doesn't wait for the > > handlers > > to be finished. > > > > A typical use case at suspend-to-ram: > > > > device driver's irq handler is complicated and might hold a mutex at rare > > cases. > > Its suspend function is called and a suspended flag is set. > > In case its IRQ handler is running, suspend function calls irq_in_progress. > > if > > handler is running, abort suspend. > > The irq handler checks the suspended flag. If the device is suspended, irq > > handler > > either ignores the interrupt, or wakes up the whole system, and the driver's > > resume function could deal with the delayed interrupt handling. > > This is as wrong as it can be. Fix the driver instead of hacking racy > functions into the core code. > > So your problem looks like this: > > CPU 0 CPU 1 > irq_handler_thread() suspend() >. mutex_lock(); >mutex_lock();synchronize_irq(); > > So why needs the mutex to be taken before synchronize_irq()? Why not > doing the obvious? > > suspend() > disable_irq(); (Implies synchronize_irq) > mutex_lock(); > > mutex_unlock(); > enable_irq(); Thanks for the kind comment. We do consider your solution before and it works well indeed with some specific simple drivers. For example, some drives use GPIO pin as interrupt source. On one specific platform, disable_irq would really disable the irq at RTE entry, which means we lose the wakeup capability of this device. synchronize_irq can be another solution. But we did hit 'DPM device timeout' issue reported by dpm_wd_handler at suspend-to-ram. The driver is complicated. We couldn't change too many functions to optimize it. In addition, we have to use the driver instead of throwing it away. With irq_in_progress, we can resolve this issue and it does work, although it looks like ugly. Yanmin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers
On Thu, 6 Jun 2013, shuox@intel.com wrote: > From: Zhang Yanmin > > synchronize_irq waits pending IRQ handlers to be finished. If using this > function while holding a resource, the IRQ handler may cause deadlock. > > Here we add a new function irq_in_progress which doesn't wait for the handlers > to be finished. > > A typical use case at suspend-to-ram: > > device driver's irq handler is complicated and might hold a mutex at rare > cases. > Its suspend function is called and a suspended flag is set. > In case its IRQ handler is running, suspend function calls irq_in_progress. if > handler is running, abort suspend. > The irq handler checks the suspended flag. If the device is suspended, irq > handler > either ignores the interrupt, or wakes up the whole system, and the driver's > resume function could deal with the delayed interrupt handling. This is as wrong as it can be. Fix the driver instead of hacking racy functions into the core code. So your problem looks like this: CPU 0 CPU 1 irq_handler_thread()suspend() .mutex_lock(); mutex_lock(); synchronize_irq(); So why needs the mutex to be taken before synchronize_irq()? Why not doing the obvious? suspend() disable_irq(); (Implies synchronize_irq) mutex_lock(); mutex_unlock(); enable_irq(); Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers
On Thu, 6 Jun 2013, shuox@intel.com wrote: From: Zhang Yanmin yanmin.zh...@intel.com synchronize_irq waits pending IRQ handlers to be finished. If using this function while holding a resource, the IRQ handler may cause deadlock. Here we add a new function irq_in_progress which doesn't wait for the handlers to be finished. A typical use case at suspend-to-ram: device driver's irq handler is complicated and might hold a mutex at rare cases. Its suspend function is called and a suspended flag is set. In case its IRQ handler is running, suspend function calls irq_in_progress. if handler is running, abort suspend. The irq handler checks the suspended flag. If the device is suspended, irq handler either ignores the interrupt, or wakes up the whole system, and the driver's resume function could deal with the delayed interrupt handling. This is as wrong as it can be. Fix the driver instead of hacking racy functions into the core code. So your problem looks like this: CPU 0 CPU 1 irq_handler_thread()suspend() .mutex_lock(m); mutex_lock(m); synchronize_irq(); So why needs the mutex to be taken before synchronize_irq()? Why not doing the obvious? suspend() disable_irq(); (Implies synchronize_irq) mutex_lock(m); mutex_unlock(m); enable_irq(); Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers
On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote: On Thu, 6 Jun 2013, shuox@intel.com wrote: From: Zhang Yanmin yanmin.zh...@intel.com synchronize_irq waits pending IRQ handlers to be finished. If using this function while holding a resource, the IRQ handler may cause deadlock. Here we add a new function irq_in_progress which doesn't wait for the handlers to be finished. A typical use case at suspend-to-ram: device driver's irq handler is complicated and might hold a mutex at rare cases. Its suspend function is called and a suspended flag is set. In case its IRQ handler is running, suspend function calls irq_in_progress. if handler is running, abort suspend. The irq handler checks the suspended flag. If the device is suspended, irq handler either ignores the interrupt, or wakes up the whole system, and the driver's resume function could deal with the delayed interrupt handling. This is as wrong as it can be. Fix the driver instead of hacking racy functions into the core code. So your problem looks like this: CPU 0 CPU 1 irq_handler_thread() suspend() . mutex_lock(m); mutex_lock(m);synchronize_irq(); So why needs the mutex to be taken before synchronize_irq()? Why not doing the obvious? suspend() disable_irq(); (Implies synchronize_irq) mutex_lock(m); mutex_unlock(m); enable_irq(); Thanks for the kind comment. We do consider your solution before and it works well indeed with some specific simple drivers. For example, some drives use GPIO pin as interrupt source. On one specific platform, disable_irq would really disable the irq at RTE entry, which means we lose the wakeup capability of this device. synchronize_irq can be another solution. But we did hit 'DPM device timeout' issue reported by dpm_wd_handler at suspend-to-ram. The driver is complicated. We couldn't change too many functions to optimize it. In addition, we have to use the driver instead of throwing it away. With irq_in_progress, we can resolve this issue and it does work, although it looks like ugly. Yanmin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers
On Fri, Jun 07, 2013 at 08:53:29AM +0800, Yanmin Zhang wrote: On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote: On Thu, 6 Jun 2013, shuox@intel.com wrote: From: Zhang Yanmin yanmin.zh...@intel.com synchronize_irq waits pending IRQ handlers to be finished. If using this function while holding a resource, the IRQ handler may cause deadlock. Here we add a new function irq_in_progress which doesn't wait for the handlers to be finished. A typical use case at suspend-to-ram: device driver's irq handler is complicated and might hold a mutex at rare cases. Its suspend function is called and a suspended flag is set. In case its IRQ handler is running, suspend function calls irq_in_progress. if handler is running, abort suspend. The irq handler checks the suspended flag. If the device is suspended, irq handler either ignores the interrupt, or wakes up the whole system, and the driver's resume function could deal with the delayed interrupt handling. This is as wrong as it can be. Fix the driver instead of hacking racy functions into the core code. So your problem looks like this: CPU 0 CPU 1 irq_handler_thread()suspend() .mutex_lock(m); mutex_lock(m); synchronize_irq(); So why needs the mutex to be taken before synchronize_irq()? Why not doing the obvious? suspend() disable_irq(); (Implies synchronize_irq) mutex_lock(m); mutex_unlock(m); enable_irq(); Thanks for the kind comment. We do consider your solution before and it works well indeed with some specific simple drivers. For example, some drives use GPIO pin as interrupt source. On one specific platform, disable_irq would really disable the irq at RTE entry, which means we lose the wakeup capability of this device. synchronize_irq can be another solution. But we did hit 'DPM device timeout' issue reported by dpm_wd_handler at suspend-to-ram. The driver is complicated. We couldn't change too many functions to optimize it. In addition, we have to use the driver instead of throwing it away. What is preventing you from rewriting it to work properly? With irq_in_progress, we can resolve this issue and it does work, although it looks like ugly. Don't paper over driver bugs in the core kernel, fix the driver. greg k-h -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers
On Thu, 2013-06-06 at 18:02 -0700, Greg KH wrote: On Fri, Jun 07, 2013 at 08:53:29AM +0800, Yanmin Zhang wrote: On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote: On Thu, 6 Jun 2013, shuox@intel.com wrote: From: Zhang Yanmin yanmin.zh...@intel.com synchronize_irq waits pending IRQ handlers to be finished. If using this function while holding a resource, the IRQ handler may cause deadlock. Here we add a new function irq_in_progress which doesn't wait for the handlers to be finished. A typical use case at suspend-to-ram: device driver's irq handler is complicated and might hold a mutex at rare cases. Its suspend function is called and a suspended flag is set. In case its IRQ handler is running, suspend function calls irq_in_progress. if handler is running, abort suspend. The irq handler checks the suspended flag. If the device is suspended, irq handler either ignores the interrupt, or wakes up the whole system, and the driver's resume function could deal with the delayed interrupt handling. This is as wrong as it can be. Fix the driver instead of hacking racy functions into the core code. So your problem looks like this: CPU 0 CPU 1 irq_handler_thread() suspend() . mutex_lock(m); mutex_lock(m);synchronize_irq(); So why needs the mutex to be taken before synchronize_irq()? Why not doing the obvious? suspend() disable_irq(); (Implies synchronize_irq) mutex_lock(m); mutex_unlock(m); enable_irq(); Thanks for the kind comment. We do consider your solution before and it works well indeed with some specific simple drivers. For example, some drives use GPIO pin as interrupt source. On one specific platform, disable_irq would really disable the irq at RTE entry, which means we lose the wakeup capability of this device. synchronize_irq can be another solution. But we did hit 'DPM device timeout' issue reported by dpm_wd_handler at suspend-to-ram. The driver is complicated. We couldn't change too many functions to optimize it. In addition, we have to use the driver instead of throwing it away. What is preventing you from rewriting it to work properly? It's complicated. With irq_in_progress, we can resolve this issue and it does work, although it looks like ugly. Don't paper over driver bugs in the core kernel, fix the driver. It's hard to say it's a driver bug. Could generic kernel provide some flexibility for such complicated drivers? For example, any driver's suspend can return error and the whole suspend-to-ram aborts. Can we say the driver has a bug? If so, why not to change all suspend/resume callbacks to return VOID? Current kernel already allows drivers to abort suspend-to-ram at some rare corner cases. Of course, if the abort happens frequently, it's a bug and we need fix it in driver. If it happens rarely, can we provide some flexibility for the driver? Thanks, Yanmin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers
On Fri, Jun 07, 2013 at 10:37:52AM +0800, Yanmin Zhang wrote: On Thu, 2013-06-06 at 18:02 -0700, Greg KH wrote: On Fri, Jun 07, 2013 at 08:53:29AM +0800, Yanmin Zhang wrote: On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote: On Thu, 6 Jun 2013, shuox@intel.com wrote: From: Zhang Yanmin yanmin.zh...@intel.com synchronize_irq waits pending IRQ handlers to be finished. If using this function while holding a resource, the IRQ handler may cause deadlock. Here we add a new function irq_in_progress which doesn't wait for the handlers to be finished. A typical use case at suspend-to-ram: device driver's irq handler is complicated and might hold a mutex at rare cases. Its suspend function is called and a suspended flag is set. In case its IRQ handler is running, suspend function calls irq_in_progress. if handler is running, abort suspend. The irq handler checks the suspended flag. If the device is suspended, irq handler either ignores the interrupt, or wakes up the whole system, and the driver's resume function could deal with the delayed interrupt handling. This is as wrong as it can be. Fix the driver instead of hacking racy functions into the core code. So your problem looks like this: CPU 0 CPU 1 irq_handler_thread()suspend() .mutex_lock(m); mutex_lock(m); synchronize_irq(); So why needs the mutex to be taken before synchronize_irq()? Why not doing the obvious? suspend() disable_irq(); (Implies synchronize_irq) mutex_lock(m); mutex_unlock(m); enable_irq(); Thanks for the kind comment. We do consider your solution before and it works well indeed with some specific simple drivers. For example, some drives use GPIO pin as interrupt source. On one specific platform, disable_irq would really disable the irq at RTE entry, which means we lose the wakeup capability of this device. synchronize_irq can be another solution. But we did hit 'DPM device timeout' issue reported by dpm_wd_handler at suspend-to-ram. The driver is complicated. We couldn't change too many functions to optimize it. In addition, we have to use the driver instead of throwing it away. What is preventing you from rewriting it to work properly? It's complicated. That sounds like your issue, not ours, so please don't push the problem onto someone else. Take ownership of the driver, fix it up, and all will be good. With irq_in_progress, we can resolve this issue and it does work, although it looks like ugly. Don't paper over driver bugs in the core kernel, fix the driver. It's hard to say it's a driver bug. Could generic kernel provide some flexibility for such complicated drivers? Please post the code for the driver, and then we will be glad to continue the dicussion. As for complicated, just make it uncomplicated, it's just code :) greg k-h -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers
On Thu, 2013-06-06 at 20:08 -0700, Greg KH wrote: On Fri, Jun 07, 2013 at 10:37:52AM +0800, Yanmin Zhang wrote: On Thu, 2013-06-06 at 18:02 -0700, Greg KH wrote: On Fri, Jun 07, 2013 at 08:53:29AM +0800, Yanmin Zhang wrote: On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote: On Thu, 6 Jun 2013, shuox@intel.com wrote: From: Zhang Yanmin yanmin.zh...@intel.com synchronize_irq waits pending IRQ handlers to be finished. If using this function while holding a resource, the IRQ handler may cause deadlock. Here we add a new function irq_in_progress which doesn't wait for the handlers to be finished. A typical use case at suspend-to-ram: device driver's irq handler is complicated and might hold a mutex at rare cases. Its suspend function is called and a suspended flag is set. In case its IRQ handler is running, suspend function calls irq_in_progress. if handler is running, abort suspend. The irq handler checks the suspended flag. If the device is suspended, irq handler either ignores the interrupt, or wakes up the whole system, and the driver's resume function could deal with the delayed interrupt handling. This is as wrong as it can be. Fix the driver instead of hacking racy functions into the core code. So your problem looks like this: CPU 0 CPU 1 irq_handler_thread() suspend() . mutex_lock(m); mutex_lock(m);synchronize_irq(); So why needs the mutex to be taken before synchronize_irq()? Why not doing the obvious? suspend() disable_irq(); (Implies synchronize_irq) mutex_lock(m); mutex_unlock(m); enable_irq(); Thanks for the kind comment. We do consider your solution before and it works well indeed with some specific simple drivers. For example, some drives use GPIO pin as interrupt source. On one specific platform, disable_irq would really disable the irq at RTE entry, which means we lose the wakeup capability of this device. synchronize_irq can be another solution. But we did hit 'DPM device timeout' issue reported by dpm_wd_handler at suspend-to-ram. The driver is complicated. We couldn't change too many functions to optimize it. In addition, we have to use the driver instead of throwing it away. What is preventing you from rewriting it to work properly? It's complicated. That sounds like your issue, not ours, so please don't push the problem onto someone else. Take ownership of the driver, fix it up, and all will be good. With irq_in_progress, we can resolve this issue and it does work, although it looks like ugly. Don't paper over driver bugs in the core kernel, fix the driver. It's hard to say it's a driver bug. Could generic kernel provide some flexibility for such complicated drivers? Please post the code for the driver, and then we will be glad to continue the dicussion. Greg, Thanks for your enthusiasm. It's a private driver for products. Let's stop here and I wouldn't push the patch to upstream. Thanks all. Yanmin As for complicated, just make it uncomplicated, it's just code :) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers
On Fri, Jun 07, 2013 at 11:18:06AM +0800, Yanmin Zhang wrote: On Thu, 2013-06-06 at 20:08 -0700, Greg KH wrote: On Fri, Jun 07, 2013 at 10:37:52AM +0800, Yanmin Zhang wrote: On Thu, 2013-06-06 at 18:02 -0700, Greg KH wrote: On Fri, Jun 07, 2013 at 08:53:29AM +0800, Yanmin Zhang wrote: On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote: On Thu, 6 Jun 2013, shuox@intel.com wrote: From: Zhang Yanmin yanmin.zh...@intel.com synchronize_irq waits pending IRQ handlers to be finished. If using this function while holding a resource, the IRQ handler may cause deadlock. Here we add a new function irq_in_progress which doesn't wait for the handlers to be finished. A typical use case at suspend-to-ram: device driver's irq handler is complicated and might hold a mutex at rare cases. Its suspend function is called and a suspended flag is set. In case its IRQ handler is running, suspend function calls irq_in_progress. if handler is running, abort suspend. The irq handler checks the suspended flag. If the device is suspended, irq handler either ignores the interrupt, or wakes up the whole system, and the driver's resume function could deal with the delayed interrupt handling. This is as wrong as it can be. Fix the driver instead of hacking racy functions into the core code. So your problem looks like this: CPU 0 CPU 1 irq_handler_thread()suspend() .mutex_lock(m); mutex_lock(m); synchronize_irq(); So why needs the mutex to be taken before synchronize_irq()? Why not doing the obvious? suspend() disable_irq(); (Implies synchronize_irq) mutex_lock(m); mutex_unlock(m); enable_irq(); Thanks for the kind comment. We do consider your solution before and it works well indeed with some specific simple drivers. For example, some drives use GPIO pin as interrupt source. On one specific platform, disable_irq would really disable the irq at RTE entry, which means we lose the wakeup capability of this device. synchronize_irq can be another solution. But we did hit 'DPM device timeout' issue reported by dpm_wd_handler at suspend-to-ram. The driver is complicated. We couldn't change too many functions to optimize it. In addition, we have to use the driver instead of throwing it away. What is preventing you from rewriting it to work properly? It's complicated. That sounds like your issue, not ours, so please don't push the problem onto someone else. Take ownership of the driver, fix it up, and all will be good. With irq_in_progress, we can resolve this issue and it does work, although it looks like ugly. Don't paper over driver bugs in the core kernel, fix the driver. It's hard to say it's a driver bug. Could generic kernel provide some flexibility for such complicated drivers? Please post the code for the driver, and then we will be glad to continue the dicussion. Greg, Thanks for your enthusiasm. It's a private driver for products. What do you mean private driver? All drivers need to be merged into the mainline kernel, it saves you time and money, and we will fix your bugs for you. You know that, your bosses know that, your management knows that, so why are you trying to hide things? totally confused, greg k-h -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers
On Thu, 2013-06-06 at 21:19 -0700, Greg KH wrote: On Fri, Jun 07, 2013 at 11:18:06AM +0800, Yanmin Zhang wrote: On Thu, 2013-06-06 at 20:08 -0700, Greg KH wrote: On Fri, Jun 07, 2013 at 10:37:52AM +0800, Yanmin Zhang wrote: On Thu, 2013-06-06 at 18:02 -0700, Greg KH wrote: On Fri, Jun 07, 2013 at 08:53:29AM +0800, Yanmin Zhang wrote: On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote: On Thu, 6 Jun 2013, shuox@intel.com wrote: From: Zhang Yanmin yanmin.zh...@intel.com synchronize_irq waits pending IRQ handlers to be finished. If using this function while holding a resource, the IRQ handler may cause deadlock. Here we add a new function irq_in_progress which doesn't wait for the handlers to be finished. A typical use case at suspend-to-ram: device driver's irq handler is complicated and might hold a mutex at rare cases. Its suspend function is called and a suspended flag is set. In case its IRQ handler is running, suspend function calls irq_in_progress. if handler is running, abort suspend. The irq handler checks the suspended flag. If the device is suspended, irq handler either ignores the interrupt, or wakes up the whole system, and the driver's resume function could deal with the delayed interrupt handling. This is as wrong as it can be. Fix the driver instead of hacking racy functions into the core code. So your problem looks like this: CPU 0 CPU 1 irq_handler_thread() suspend() . mutex_lock(m); mutex_lock(m);synchronize_irq(); So why needs the mutex to be taken before synchronize_irq()? Why not doing the obvious? suspend() disable_irq(); (Implies synchronize_irq) mutex_lock(m); mutex_unlock(m); enable_irq(); Thanks for the kind comment. We do consider your solution before and it works well indeed with some specific simple drivers. For example, some drives use GPIO pin as interrupt source. On one specific platform, disable_irq would really disable the irq at RTE entry, which means we lose the wakeup capability of this device. synchronize_irq can be another solution. But we did hit 'DPM device timeout' issue reported by dpm_wd_handler at suspend-to-ram. The driver is complicated. We couldn't change too many functions to optimize it. In addition, we have to use the driver instead of throwing it away. What is preventing you from rewriting it to work properly? It's complicated. That sounds like your issue, not ours, so please don't push the problem onto someone else. Take ownership of the driver, fix it up, and all will be good. With irq_in_progress, we can resolve this issue and it does work, although it looks like ugly. Don't paper over driver bugs in the core kernel, fix the driver. It's hard to say it's a driver bug. Could generic kernel provide some flexibility for such complicated drivers? Please post the code for the driver, and then we will be glad to continue the dicussion. Greg, Thanks for your enthusiasm. It's a private driver for products. What do you mean private driver? All drivers need to be merged into the mainline kernel, it saves you time and money, and we will fix your bugs for you. You know that, your bosses know that, your management knows that, so why are you trying to hide things? totally confused, They are embedded device drivers, highly depending on specific devices which runs its own firmware in devices. Here the kernel drivers run at AP side. One example is Graphics driver, which is very big and coding is not friendly. Kernel experts can raise tons of questions against the driver, but we have to make the driver work well on real products. Another reason is drivers need workaround many hardware issues. That makes it hard to implement kernel drivers in good shape sometimes. We need support all cases. We fixed lots of hard issues on embedded products and think if kernel could be more flexible to support complicated cases. Anyway, thanks. Yanmin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/