Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers

2013-06-07 Thread Yanmin Zhang
On Fri, 2013-06-07 at 08:08 -0700, Greg KH wrote:
> On Fri, Jun 07, 2013 at 12:54:55PM +0800, Yanmin Zhang wrote:
> > On Thu, 2013-06-06 at 21:19 -0700, Greg KH wrote:
> > > On Fri, Jun 07, 2013 at 11:18:06AM +0800, Yanmin Zhang wrote:
> > > > On Thu, 2013-06-06 at 20:08 -0700, Greg KH wrote:
> > > > > On Fri, Jun 07, 2013 at 10:37:52AM +0800, Yanmin Zhang wrote:
> > > > > > On Thu, 2013-06-06 at 18:02 -0700, Greg KH wrote:
> > > > > > > On Fri, Jun 07, 2013 at 08:53:29AM +0800, Yanmin Zhang wrote:
> > > > > > > > On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote:
> > > > > > > > > On Thu, 6 Jun 2013, shuox@intel.com wrote:
> > > > > > > > > > From: Zhang Yanmin 
> > > > > > > > > > 
> > > > > > > > > > synchronize_irq waits pending IRQ handlers to be finished. 
> > > > > > > > > > If using this
> > > > > > > > > > function while holding a resource, the IRQ handler may 
> > > > > > > > > > cause deadlock.
> > > > > > > > > > 
> > > > > > > > > > Here we add a new function irq_in_progress which doesn't 
> > > > > > > > > > wait for the handlers
> > > > > > > > > > to be finished.
> > > > > > > > > > 
> > > > > > > > > > A typical use case at suspend-to-ram:
> > > > > > > > > > 
> > > > > > > > > > device driver's irq handler is complicated and might hold a 
> > > > > > > > > > mutex at rare cases.
> > > > > > > > > > Its suspend function is called and a suspended flag is set.
> > > > > > > > > > In case its IRQ handler is running, suspend function calls 
> > > > > > > > > > irq_in_progress. if
> > > > > > > > > > handler is running, abort suspend.
> > > > > > > > > > The irq handler checks the suspended flag. If the device is 
> > > > > > > > > > suspended, irq handler
> > > > > > > > > > either ignores the interrupt, or wakes up the whole system, 
> > > > > > > > > > and the driver's
> > > > > > > > > > resume function could deal with the delayed interrupt 
> > > > > > > > > > handling.
> > > > > > > > > 
> > > > > > > > > This is as wrong as it can be. Fix the driver instead of 
> > > > > > > > > hacking racy
> > > > > > > > > functions into the core code.
> > > > > > > > > 
> > > > > > > > > So your problem looks like this:
> > > > > > > > > 
> > > > > > > > > CPU 0 CPU 1
> > > > > > > > > irq_handler_thread()  suspend()
> > > > > > > > >.  mutex_lock();
> > > > > > > > >mutex_lock();synchronize_irq();
> > > > > > > > > 
> > > > > > > > > So why needs the mutex to be taken before synchronize_irq()? 
> > > > > > > > > Why not
> > > > > > > > > doing the obvious?
> > > > > > > > > 
> > > > > > > > > suspend()
> > > > > > > > >   disable_irq(); (Implies synchronize_irq)
> > > > > > > > >   mutex_lock();
> > > > > > > > >   
> > > > > > > > >   mutex_unlock();
> > > > > > > > >   enable_irq();
> > > > > > > > Thanks for the kind comment.
> > > > > > > > 
> > > > > > > > We do consider your solution before and it works well indeed 
> > > > > > > > with some specific
> > > > > > > > simple drivers. For example, some drives use GPIO pin as 
> > > > > > > > interrupt source.
> > > > > > > > 
> > > > > > > > On one specific platform, disable_irq would really disable the 
> > > > > > > > irq at RTE entry,
> > > > > > > > which means we lose the wakeup capability of this device.
> > > > > > > > synchronize_irq can be another solution. But we did hit 'DPM 
> > > > > > > > device timeout' issue
> > > > > > > > reported by dpm_wd_handler at suspend-to-ram.
> > > > > > > > 
> > > > > > > > The driver is complicated. We couldn't change too many 
> > > > > > > > functions to optimize it.
> > > > > > > > In addition, we have to use the driver instead of throwing it 
> > > > > > > > away.
> > > > > > > 
> > > > > > > What is preventing you from rewriting it to work properly?
> > > > > > It's complicated.
> > > > > 
> > > > > That sounds like your issue, not ours, so please don't push the 
> > > > > problem
> > > > > onto someone else.  Take ownership of the driver, fix it up, and all
> > > > > will be good.
> > > > > 
> > > > > 
> > > > > > > > With irq_in_progress, we can resolve this issue and it does 
> > > > > > > > work, although it
> > > > > > > > looks like ugly.
> > > > > > > 
> > > > > > > Don't paper over driver bugs in the core kernel, fix the driver.
> > > > > > It's hard to say it's a driver bug. Could generic kernel provide 
> > > > > > some
> > > > > > flexibility for such complicated drivers?
> > > > > 
> > > > > Please post the code for the driver, and then we will be glad to
> > > > > continue the dicussion.
> > > > Greg,
> > > > 
> > > > Thanks for your enthusiasm. It's a private driver for products.
> > > 
> > > What do you mean "private driver"?  All drivers need to be merged into
> > > the mainline kernel, it saves you time and money, and we will fix your
> > > bugs for you.
> > > 
> > > You know that, your bosses know that, your management knows that, so why
> > > are you trying to hide things?
> > > 
> > > 

Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers

2013-06-07 Thread Greg KH
On Fri, Jun 07, 2013 at 12:54:55PM +0800, Yanmin Zhang wrote:
> On Thu, 2013-06-06 at 21:19 -0700, Greg KH wrote:
> > On Fri, Jun 07, 2013 at 11:18:06AM +0800, Yanmin Zhang wrote:
> > > On Thu, 2013-06-06 at 20:08 -0700, Greg KH wrote:
> > > > On Fri, Jun 07, 2013 at 10:37:52AM +0800, Yanmin Zhang wrote:
> > > > > On Thu, 2013-06-06 at 18:02 -0700, Greg KH wrote:
> > > > > > On Fri, Jun 07, 2013 at 08:53:29AM +0800, Yanmin Zhang wrote:
> > > > > > > On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote:
> > > > > > > > On Thu, 6 Jun 2013, shuox@intel.com wrote:
> > > > > > > > > From: Zhang Yanmin 
> > > > > > > > > 
> > > > > > > > > synchronize_irq waits pending IRQ handlers to be finished. If 
> > > > > > > > > using this
> > > > > > > > > function while holding a resource, the IRQ handler may cause 
> > > > > > > > > deadlock.
> > > > > > > > > 
> > > > > > > > > Here we add a new function irq_in_progress which doesn't wait 
> > > > > > > > > for the handlers
> > > > > > > > > to be finished.
> > > > > > > > > 
> > > > > > > > > A typical use case at suspend-to-ram:
> > > > > > > > > 
> > > > > > > > > device driver's irq handler is complicated and might hold a 
> > > > > > > > > mutex at rare cases.
> > > > > > > > > Its suspend function is called and a suspended flag is set.
> > > > > > > > > In case its IRQ handler is running, suspend function calls 
> > > > > > > > > irq_in_progress. if
> > > > > > > > > handler is running, abort suspend.
> > > > > > > > > The irq handler checks the suspended flag. If the device is 
> > > > > > > > > suspended, irq handler
> > > > > > > > > either ignores the interrupt, or wakes up the whole system, 
> > > > > > > > > and the driver's
> > > > > > > > > resume function could deal with the delayed interrupt 
> > > > > > > > > handling.
> > > > > > > > 
> > > > > > > > This is as wrong as it can be. Fix the driver instead of 
> > > > > > > > hacking racy
> > > > > > > > functions into the core code.
> > > > > > > > 
> > > > > > > > So your problem looks like this:
> > > > > > > > 
> > > > > > > > CPU 0   CPU 1
> > > > > > > > irq_handler_thread()suspend()
> > > > > > > >.mutex_lock();
> > > > > > > >mutex_lock();  synchronize_irq();
> > > > > > > > 
> > > > > > > > So why needs the mutex to be taken before synchronize_irq()? 
> > > > > > > > Why not
> > > > > > > > doing the obvious?
> > > > > > > > 
> > > > > > > > suspend()
> > > > > > > >   disable_irq(); (Implies synchronize_irq)
> > > > > > > >   mutex_lock();
> > > > > > > >   
> > > > > > > >   mutex_unlock();
> > > > > > > >   enable_irq();
> > > > > > > Thanks for the kind comment.
> > > > > > > 
> > > > > > > We do consider your solution before and it works well indeed with 
> > > > > > > some specific
> > > > > > > simple drivers. For example, some drives use GPIO pin as 
> > > > > > > interrupt source.
> > > > > > > 
> > > > > > > On one specific platform, disable_irq would really disable the 
> > > > > > > irq at RTE entry,
> > > > > > > which means we lose the wakeup capability of this device.
> > > > > > > synchronize_irq can be another solution. But we did hit 'DPM 
> > > > > > > device timeout' issue
> > > > > > > reported by dpm_wd_handler at suspend-to-ram.
> > > > > > > 
> > > > > > > The driver is complicated. We couldn't change too many functions 
> > > > > > > to optimize it.
> > > > > > > In addition, we have to use the driver instead of throwing it 
> > > > > > > away.
> > > > > > 
> > > > > > What is preventing you from rewriting it to work properly?
> > > > > It's complicated.
> > > > 
> > > > That sounds like your issue, not ours, so please don't push the problem
> > > > onto someone else.  Take ownership of the driver, fix it up, and all
> > > > will be good.
> > > > 
> > > > 
> > > > > > > With irq_in_progress, we can resolve this issue and it does work, 
> > > > > > > although it
> > > > > > > looks like ugly.
> > > > > > 
> > > > > > Don't paper over driver bugs in the core kernel, fix the driver.
> > > > > It's hard to say it's a driver bug. Could generic kernel provide some
> > > > > flexibility for such complicated drivers?
> > > > 
> > > > Please post the code for the driver, and then we will be glad to
> > > > continue the dicussion.
> > > Greg,
> > > 
> > > Thanks for your enthusiasm. It's a private driver for products.
> > 
> > What do you mean "private driver"?  All drivers need to be merged into
> > the mainline kernel, it saves you time and money, and we will fix your
> > bugs for you.
> > 
> > You know that, your bosses know that, your management knows that, so why
> > are you trying to hide things?
> > 
> > totally confused,
> They are embedded device drivers, highly depending on specific devices which 
> runs
> its own firmware in devices. Here the kernel drivers run at AP side.

That's no different from loads of drivers that we have in the kernel
today, no need to keep them from 

Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers

2013-06-07 Thread Greg KH
On Fri, Jun 07, 2013 at 12:54:55PM +0800, Yanmin Zhang wrote:
 On Thu, 2013-06-06 at 21:19 -0700, Greg KH wrote:
  On Fri, Jun 07, 2013 at 11:18:06AM +0800, Yanmin Zhang wrote:
   On Thu, 2013-06-06 at 20:08 -0700, Greg KH wrote:
On Fri, Jun 07, 2013 at 10:37:52AM +0800, Yanmin Zhang wrote:
 On Thu, 2013-06-06 at 18:02 -0700, Greg KH wrote:
  On Fri, Jun 07, 2013 at 08:53:29AM +0800, Yanmin Zhang wrote:
   On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote:
On Thu, 6 Jun 2013, shuox@intel.com wrote:
 From: Zhang Yanmin yanmin.zh...@intel.com
 
 synchronize_irq waits pending IRQ handlers to be finished. If 
 using this
 function while holding a resource, the IRQ handler may cause 
 deadlock.
 
 Here we add a new function irq_in_progress which doesn't wait 
 for the handlers
 to be finished.
 
 A typical use case at suspend-to-ram:
 
 device driver's irq handler is complicated and might hold a 
 mutex at rare cases.
 Its suspend function is called and a suspended flag is set.
 In case its IRQ handler is running, suspend function calls 
 irq_in_progress. if
 handler is running, abort suspend.
 The irq handler checks the suspended flag. If the device is 
 suspended, irq handler
 either ignores the interrupt, or wakes up the whole system, 
 and the driver's
 resume function could deal with the delayed interrupt 
 handling.

This is as wrong as it can be. Fix the driver instead of 
hacking racy
functions into the core code.

So your problem looks like this:

CPU 0   CPU 1
irq_handler_thread()suspend()
   .mutex_lock(m);
   mutex_lock(m);  synchronize_irq();

So why needs the mutex to be taken before synchronize_irq()? 
Why not
doing the obvious?

suspend()
  disable_irq(); (Implies synchronize_irq)
  mutex_lock(m);
  
  mutex_unlock(m);
  enable_irq();
   Thanks for the kind comment.
   
   We do consider your solution before and it works well indeed with 
   some specific
   simple drivers. For example, some drives use GPIO pin as 
   interrupt source.
   
   On one specific platform, disable_irq would really disable the 
   irq at RTE entry,
   which means we lose the wakeup capability of this device.
   synchronize_irq can be another solution. But we did hit 'DPM 
   device timeout' issue
   reported by dpm_wd_handler at suspend-to-ram.
   
   The driver is complicated. We couldn't change too many functions 
   to optimize it.
   In addition, we have to use the driver instead of throwing it 
   away.
  
  What is preventing you from rewriting it to work properly?
 It's complicated.

That sounds like your issue, not ours, so please don't push the problem
onto someone else.  Take ownership of the driver, fix it up, and all
will be good.


   With irq_in_progress, we can resolve this issue and it does work, 
   although it
   looks like ugly.
  
  Don't paper over driver bugs in the core kernel, fix the driver.
 It's hard to say it's a driver bug. Could generic kernel provide some
 flexibility for such complicated drivers?

Please post the code for the driver, and then we will be glad to
continue the dicussion.
   Greg,
   
   Thanks for your enthusiasm. It's a private driver for products.
  
  What do you mean private driver?  All drivers need to be merged into
  the mainline kernel, it saves you time and money, and we will fix your
  bugs for you.
  
  You know that, your bosses know that, your management knows that, so why
  are you trying to hide things?
  
  totally confused,
 They are embedded device drivers, highly depending on specific devices which 
 runs
 its own firmware in devices. Here the kernel drivers run at AP side.

That's no different from loads of drivers that we have in the kernel
today, no need to keep them from being merged, please submit them.

 One example is Graphics driver, which is very big and coding is not friendly. 
 Kernel
 experts can raise tons of questions against the driver, but we have to make 
 the driver
 work well on real products.

So you are saying that kernel experts don't ask questions that
actually make drivers work well on real products?  If that's how you
feel about the community, then why are you asking the community for help
in the first place?

And do you somehow think that we don't know how to review/write/fix
graphics drivers?  Who do you think made Linux in the first place?

 Another reason is drivers need workaround many hardware 

Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers

2013-06-07 Thread Yanmin Zhang
On Fri, 2013-06-07 at 08:08 -0700, Greg KH wrote:
 On Fri, Jun 07, 2013 at 12:54:55PM +0800, Yanmin Zhang wrote:
  On Thu, 2013-06-06 at 21:19 -0700, Greg KH wrote:
   On Fri, Jun 07, 2013 at 11:18:06AM +0800, Yanmin Zhang wrote:
On Thu, 2013-06-06 at 20:08 -0700, Greg KH wrote:
 On Fri, Jun 07, 2013 at 10:37:52AM +0800, Yanmin Zhang wrote:
  On Thu, 2013-06-06 at 18:02 -0700, Greg KH wrote:
   On Fri, Jun 07, 2013 at 08:53:29AM +0800, Yanmin Zhang wrote:
On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote:
 On Thu, 6 Jun 2013, shuox@intel.com wrote:
  From: Zhang Yanmin yanmin.zh...@intel.com
  
  synchronize_irq waits pending IRQ handlers to be finished. 
  If using this
  function while holding a resource, the IRQ handler may 
  cause deadlock.
  
  Here we add a new function irq_in_progress which doesn't 
  wait for the handlers
  to be finished.
  
  A typical use case at suspend-to-ram:
  
  device driver's irq handler is complicated and might hold a 
  mutex at rare cases.
  Its suspend function is called and a suspended flag is set.
  In case its IRQ handler is running, suspend function calls 
  irq_in_progress. if
  handler is running, abort suspend.
  The irq handler checks the suspended flag. If the device is 
  suspended, irq handler
  either ignores the interrupt, or wakes up the whole system, 
  and the driver's
  resume function could deal with the delayed interrupt 
  handling.
 
 This is as wrong as it can be. Fix the driver instead of 
 hacking racy
 functions into the core code.
 
 So your problem looks like this:
 
 CPU 0 CPU 1
 irq_handler_thread()  suspend()
.  mutex_lock(m);
mutex_lock(m);synchronize_irq();
 
 So why needs the mutex to be taken before synchronize_irq()? 
 Why not
 doing the obvious?
 
 suspend()
   disable_irq(); (Implies synchronize_irq)
   mutex_lock(m);
   
   mutex_unlock(m);
   enable_irq();
Thanks for the kind comment.

We do consider your solution before and it works well indeed 
with some specific
simple drivers. For example, some drives use GPIO pin as 
interrupt source.

On one specific platform, disable_irq would really disable the 
irq at RTE entry,
which means we lose the wakeup capability of this device.
synchronize_irq can be another solution. But we did hit 'DPM 
device timeout' issue
reported by dpm_wd_handler at suspend-to-ram.

The driver is complicated. We couldn't change too many 
functions to optimize it.
In addition, we have to use the driver instead of throwing it 
away.
   
   What is preventing you from rewriting it to work properly?
  It's complicated.
 
 That sounds like your issue, not ours, so please don't push the 
 problem
 onto someone else.  Take ownership of the driver, fix it up, and all
 will be good.
 
 
With irq_in_progress, we can resolve this issue and it does 
work, although it
looks like ugly.
   
   Don't paper over driver bugs in the core kernel, fix the driver.
  It's hard to say it's a driver bug. Could generic kernel provide 
  some
  flexibility for such complicated drivers?
 
 Please post the code for the driver, and then we will be glad to
 continue the dicussion.
Greg,

Thanks for your enthusiasm. It's a private driver for products.
   
   What do you mean private driver?  All drivers need to be merged into
   the mainline kernel, it saves you time and money, and we will fix your
   bugs for you.
   
   You know that, your bosses know that, your management knows that, so why
   are you trying to hide things?
   
   totally confused,
  They are embedded device drivers, highly depending on specific devices 
  which runs
  its own firmware in devices. Here the kernel drivers run at AP side.
 
 That's no different from loads of drivers that we have in the kernel
 today, no need to keep them from being merged, please submit them.
It need managers' approval and is beyond my capability.

 
  One example is Graphics driver, which is very big and coding is not 
  friendly. Kernel
  experts can raise tons of questions against the driver, but we have to make 
  the driver
  work well on real products.
 
 So you are saying that kernel experts don't ask questions that
 actually make drivers work well on real products?
Obviously, I didn't say that. Pls. also remove the , as I really
think I'm 

Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers

2013-06-06 Thread Yanmin Zhang
On Thu, 2013-06-06 at 21:19 -0700, Greg KH wrote:
> On Fri, Jun 07, 2013 at 11:18:06AM +0800, Yanmin Zhang wrote:
> > On Thu, 2013-06-06 at 20:08 -0700, Greg KH wrote:
> > > On Fri, Jun 07, 2013 at 10:37:52AM +0800, Yanmin Zhang wrote:
> > > > On Thu, 2013-06-06 at 18:02 -0700, Greg KH wrote:
> > > > > On Fri, Jun 07, 2013 at 08:53:29AM +0800, Yanmin Zhang wrote:
> > > > > > On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote:
> > > > > > > On Thu, 6 Jun 2013, shuox@intel.com wrote:
> > > > > > > > From: Zhang Yanmin 
> > > > > > > > 
> > > > > > > > synchronize_irq waits pending IRQ handlers to be finished. If 
> > > > > > > > using this
> > > > > > > > function while holding a resource, the IRQ handler may cause 
> > > > > > > > deadlock.
> > > > > > > > 
> > > > > > > > Here we add a new function irq_in_progress which doesn't wait 
> > > > > > > > for the handlers
> > > > > > > > to be finished.
> > > > > > > > 
> > > > > > > > A typical use case at suspend-to-ram:
> > > > > > > > 
> > > > > > > > device driver's irq handler is complicated and might hold a 
> > > > > > > > mutex at rare cases.
> > > > > > > > Its suspend function is called and a suspended flag is set.
> > > > > > > > In case its IRQ handler is running, suspend function calls 
> > > > > > > > irq_in_progress. if
> > > > > > > > handler is running, abort suspend.
> > > > > > > > The irq handler checks the suspended flag. If the device is 
> > > > > > > > suspended, irq handler
> > > > > > > > either ignores the interrupt, or wakes up the whole system, and 
> > > > > > > > the driver's
> > > > > > > > resume function could deal with the delayed interrupt handling.
> > > > > > > 
> > > > > > > This is as wrong as it can be. Fix the driver instead of hacking 
> > > > > > > racy
> > > > > > > functions into the core code.
> > > > > > > 
> > > > > > > So your problem looks like this:
> > > > > > > 
> > > > > > > CPU 0 CPU 1
> > > > > > > irq_handler_thread()  suspend()
> > > > > > >.  mutex_lock();
> > > > > > >mutex_lock();synchronize_irq();
> > > > > > > 
> > > > > > > So why needs the mutex to be taken before synchronize_irq()? Why 
> > > > > > > not
> > > > > > > doing the obvious?
> > > > > > > 
> > > > > > > suspend()
> > > > > > >   disable_irq(); (Implies synchronize_irq)
> > > > > > >   mutex_lock();
> > > > > > >   
> > > > > > >   mutex_unlock();
> > > > > > >   enable_irq();
> > > > > > Thanks for the kind comment.
> > > > > > 
> > > > > > We do consider your solution before and it works well indeed with 
> > > > > > some specific
> > > > > > simple drivers. For example, some drives use GPIO pin as interrupt 
> > > > > > source.
> > > > > > 
> > > > > > On one specific platform, disable_irq would really disable the irq 
> > > > > > at RTE entry,
> > > > > > which means we lose the wakeup capability of this device.
> > > > > > synchronize_irq can be another solution. But we did hit 'DPM device 
> > > > > > timeout' issue
> > > > > > reported by dpm_wd_handler at suspend-to-ram.
> > > > > > 
> > > > > > The driver is complicated. We couldn't change too many functions to 
> > > > > > optimize it.
> > > > > > In addition, we have to use the driver instead of throwing it away.
> > > > > 
> > > > > What is preventing you from rewriting it to work properly?
> > > > It's complicated.
> > > 
> > > That sounds like your issue, not ours, so please don't push the problem
> > > onto someone else.  Take ownership of the driver, fix it up, and all
> > > will be good.
> > > 
> > > 
> > > > > > With irq_in_progress, we can resolve this issue and it does work, 
> > > > > > although it
> > > > > > looks like ugly.
> > > > > 
> > > > > Don't paper over driver bugs in the core kernel, fix the driver.
> > > > It's hard to say it's a driver bug. Could generic kernel provide some
> > > > flexibility for such complicated drivers?
> > > 
> > > Please post the code for the driver, and then we will be glad to
> > > continue the dicussion.
> > Greg,
> > 
> > Thanks for your enthusiasm. It's a private driver for products.
> 
> What do you mean "private driver"?  All drivers need to be merged into
> the mainline kernel, it saves you time and money, and we will fix your
> bugs for you.
> 
> You know that, your bosses know that, your management knows that, so why
> are you trying to hide things?
> 
> totally confused,
They are embedded device drivers, highly depending on specific devices which 
runs
its own firmware in devices. Here the kernel drivers run at AP side.

One example is Graphics driver, which is very big and coding is not friendly. 
Kernel
experts can raise tons of questions against the driver, but we have to make the 
driver
work well on real products.

Another reason is drivers need workaround many hardware issues. That makes it
hard to implement kernel drivers in good shape sometimes.

We need support all cases.

We fixed lots of hard issues on 

Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers

2013-06-06 Thread Greg KH
On Fri, Jun 07, 2013 at 11:18:06AM +0800, Yanmin Zhang wrote:
> On Thu, 2013-06-06 at 20:08 -0700, Greg KH wrote:
> > On Fri, Jun 07, 2013 at 10:37:52AM +0800, Yanmin Zhang wrote:
> > > On Thu, 2013-06-06 at 18:02 -0700, Greg KH wrote:
> > > > On Fri, Jun 07, 2013 at 08:53:29AM +0800, Yanmin Zhang wrote:
> > > > > On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote:
> > > > > > On Thu, 6 Jun 2013, shuox@intel.com wrote:
> > > > > > > From: Zhang Yanmin 
> > > > > > > 
> > > > > > > synchronize_irq waits pending IRQ handlers to be finished. If 
> > > > > > > using this
> > > > > > > function while holding a resource, the IRQ handler may cause 
> > > > > > > deadlock.
> > > > > > > 
> > > > > > > Here we add a new function irq_in_progress which doesn't wait for 
> > > > > > > the handlers
> > > > > > > to be finished.
> > > > > > > 
> > > > > > > A typical use case at suspend-to-ram:
> > > > > > > 
> > > > > > > device driver's irq handler is complicated and might hold a mutex 
> > > > > > > at rare cases.
> > > > > > > Its suspend function is called and a suspended flag is set.
> > > > > > > In case its IRQ handler is running, suspend function calls 
> > > > > > > irq_in_progress. if
> > > > > > > handler is running, abort suspend.
> > > > > > > The irq handler checks the suspended flag. If the device is 
> > > > > > > suspended, irq handler
> > > > > > > either ignores the interrupt, or wakes up the whole system, and 
> > > > > > > the driver's
> > > > > > > resume function could deal with the delayed interrupt handling.
> > > > > > 
> > > > > > This is as wrong as it can be. Fix the driver instead of hacking 
> > > > > > racy
> > > > > > functions into the core code.
> > > > > > 
> > > > > > So your problem looks like this:
> > > > > > 
> > > > > > CPU 0   CPU 1
> > > > > > irq_handler_thread()suspend()
> > > > > >.mutex_lock();
> > > > > >mutex_lock();  synchronize_irq();
> > > > > > 
> > > > > > So why needs the mutex to be taken before synchronize_irq()? Why not
> > > > > > doing the obvious?
> > > > > > 
> > > > > > suspend()
> > > > > >   disable_irq(); (Implies synchronize_irq)
> > > > > >   mutex_lock();
> > > > > >   
> > > > > >   mutex_unlock();
> > > > > >   enable_irq();
> > > > > Thanks for the kind comment.
> > > > > 
> > > > > We do consider your solution before and it works well indeed with 
> > > > > some specific
> > > > > simple drivers. For example, some drives use GPIO pin as interrupt 
> > > > > source.
> > > > > 
> > > > > On one specific platform, disable_irq would really disable the irq at 
> > > > > RTE entry,
> > > > > which means we lose the wakeup capability of this device.
> > > > > synchronize_irq can be another solution. But we did hit 'DPM device 
> > > > > timeout' issue
> > > > > reported by dpm_wd_handler at suspend-to-ram.
> > > > > 
> > > > > The driver is complicated. We couldn't change too many functions to 
> > > > > optimize it.
> > > > > In addition, we have to use the driver instead of throwing it away.
> > > > 
> > > > What is preventing you from rewriting it to work properly?
> > > It's complicated.
> > 
> > That sounds like your issue, not ours, so please don't push the problem
> > onto someone else.  Take ownership of the driver, fix it up, and all
> > will be good.
> > 
> > 
> > > > > With irq_in_progress, we can resolve this issue and it does work, 
> > > > > although it
> > > > > looks like ugly.
> > > > 
> > > > Don't paper over driver bugs in the core kernel, fix the driver.
> > > It's hard to say it's a driver bug. Could generic kernel provide some
> > > flexibility for such complicated drivers?
> > 
> > Please post the code for the driver, and then we will be glad to
> > continue the dicussion.
> Greg,
> 
> Thanks for your enthusiasm. It's a private driver for products.

What do you mean "private driver"?  All drivers need to be merged into
the mainline kernel, it saves you time and money, and we will fix your
bugs for you.

You know that, your bosses know that, your management knows that, so why
are you trying to hide things?

totally confused,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers

2013-06-06 Thread Yanmin Zhang
On Thu, 2013-06-06 at 20:08 -0700, Greg KH wrote:
> On Fri, Jun 07, 2013 at 10:37:52AM +0800, Yanmin Zhang wrote:
> > On Thu, 2013-06-06 at 18:02 -0700, Greg KH wrote:
> > > On Fri, Jun 07, 2013 at 08:53:29AM +0800, Yanmin Zhang wrote:
> > > > On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote:
> > > > > On Thu, 6 Jun 2013, shuox@intel.com wrote:
> > > > > > From: Zhang Yanmin 
> > > > > > 
> > > > > > synchronize_irq waits pending IRQ handlers to be finished. If using 
> > > > > > this
> > > > > > function while holding a resource, the IRQ handler may cause 
> > > > > > deadlock.
> > > > > > 
> > > > > > Here we add a new function irq_in_progress which doesn't wait for 
> > > > > > the handlers
> > > > > > to be finished.
> > > > > > 
> > > > > > A typical use case at suspend-to-ram:
> > > > > > 
> > > > > > device driver's irq handler is complicated and might hold a mutex 
> > > > > > at rare cases.
> > > > > > Its suspend function is called and a suspended flag is set.
> > > > > > In case its IRQ handler is running, suspend function calls 
> > > > > > irq_in_progress. if
> > > > > > handler is running, abort suspend.
> > > > > > The irq handler checks the suspended flag. If the device is 
> > > > > > suspended, irq handler
> > > > > > either ignores the interrupt, or wakes up the whole system, and the 
> > > > > > driver's
> > > > > > resume function could deal with the delayed interrupt handling.
> > > > > 
> > > > > This is as wrong as it can be. Fix the driver instead of hacking racy
> > > > > functions into the core code.
> > > > > 
> > > > > So your problem looks like this:
> > > > > 
> > > > > CPU 0 CPU 1
> > > > > irq_handler_thread()  suspend()
> > > > >.  mutex_lock();
> > > > >mutex_lock();synchronize_irq();
> > > > > 
> > > > > So why needs the mutex to be taken before synchronize_irq()? Why not
> > > > > doing the obvious?
> > > > > 
> > > > > suspend()
> > > > >   disable_irq(); (Implies synchronize_irq)
> > > > >   mutex_lock();
> > > > >   
> > > > >   mutex_unlock();
> > > > >   enable_irq();
> > > > Thanks for the kind comment.
> > > > 
> > > > We do consider your solution before and it works well indeed with some 
> > > > specific
> > > > simple drivers. For example, some drives use GPIO pin as interrupt 
> > > > source.
> > > > 
> > > > On one specific platform, disable_irq would really disable the irq at 
> > > > RTE entry,
> > > > which means we lose the wakeup capability of this device.
> > > > synchronize_irq can be another solution. But we did hit 'DPM device 
> > > > timeout' issue
> > > > reported by dpm_wd_handler at suspend-to-ram.
> > > > 
> > > > The driver is complicated. We couldn't change too many functions to 
> > > > optimize it.
> > > > In addition, we have to use the driver instead of throwing it away.
> > > 
> > > What is preventing you from rewriting it to work properly?
> > It's complicated.
> 
> That sounds like your issue, not ours, so please don't push the problem
> onto someone else.  Take ownership of the driver, fix it up, and all
> will be good.
> 
> 
> > > > With irq_in_progress, we can resolve this issue and it does work, 
> > > > although it
> > > > looks like ugly.
> > > 
> > > Don't paper over driver bugs in the core kernel, fix the driver.
> > It's hard to say it's a driver bug. Could generic kernel provide some
> > flexibility for such complicated drivers?
> 
> Please post the code for the driver, and then we will be glad to
> continue the dicussion.
Greg,

Thanks for your enthusiasm. It's a private driver for products.

Let's stop here and I wouldn't push the patch to upstream.

Thanks all.
Yanmin

> 
> As for "complicated", just make it "uncomplicated", it's just code :)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers

2013-06-06 Thread Greg KH
On Fri, Jun 07, 2013 at 10:37:52AM +0800, Yanmin Zhang wrote:
> On Thu, 2013-06-06 at 18:02 -0700, Greg KH wrote:
> > On Fri, Jun 07, 2013 at 08:53:29AM +0800, Yanmin Zhang wrote:
> > > On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote:
> > > > On Thu, 6 Jun 2013, shuox@intel.com wrote:
> > > > > From: Zhang Yanmin 
> > > > > 
> > > > > synchronize_irq waits pending IRQ handlers to be finished. If using 
> > > > > this
> > > > > function while holding a resource, the IRQ handler may cause deadlock.
> > > > > 
> > > > > Here we add a new function irq_in_progress which doesn't wait for the 
> > > > > handlers
> > > > > to be finished.
> > > > > 
> > > > > A typical use case at suspend-to-ram:
> > > > > 
> > > > > device driver's irq handler is complicated and might hold a mutex at 
> > > > > rare cases.
> > > > > Its suspend function is called and a suspended flag is set.
> > > > > In case its IRQ handler is running, suspend function calls 
> > > > > irq_in_progress. if
> > > > > handler is running, abort suspend.
> > > > > The irq handler checks the suspended flag. If the device is 
> > > > > suspended, irq handler
> > > > > either ignores the interrupt, or wakes up the whole system, and the 
> > > > > driver's
> > > > > resume function could deal with the delayed interrupt handling.
> > > > 
> > > > This is as wrong as it can be. Fix the driver instead of hacking racy
> > > > functions into the core code.
> > > > 
> > > > So your problem looks like this:
> > > > 
> > > > CPU 0   CPU 1
> > > > irq_handler_thread()suspend()
> > > >.mutex_lock();
> > > >mutex_lock();  synchronize_irq();
> > > > 
> > > > So why needs the mutex to be taken before synchronize_irq()? Why not
> > > > doing the obvious?
> > > > 
> > > > suspend()
> > > >   disable_irq(); (Implies synchronize_irq)
> > > >   mutex_lock();
> > > >   
> > > >   mutex_unlock();
> > > >   enable_irq();
> > > Thanks for the kind comment.
> > > 
> > > We do consider your solution before and it works well indeed with some 
> > > specific
> > > simple drivers. For example, some drives use GPIO pin as interrupt source.
> > > 
> > > On one specific platform, disable_irq would really disable the irq at RTE 
> > > entry,
> > > which means we lose the wakeup capability of this device.
> > > synchronize_irq can be another solution. But we did hit 'DPM device 
> > > timeout' issue
> > > reported by dpm_wd_handler at suspend-to-ram.
> > > 
> > > The driver is complicated. We couldn't change too many functions to 
> > > optimize it.
> > > In addition, we have to use the driver instead of throwing it away.
> > 
> > What is preventing you from rewriting it to work properly?
> It's complicated.

That sounds like your issue, not ours, so please don't push the problem
onto someone else.  Take ownership of the driver, fix it up, and all
will be good.


> > > With irq_in_progress, we can resolve this issue and it does work, 
> > > although it
> > > looks like ugly.
> > 
> > Don't paper over driver bugs in the core kernel, fix the driver.
> It's hard to say it's a driver bug. Could generic kernel provide some
> flexibility for such complicated drivers?

Please post the code for the driver, and then we will be glad to
continue the dicussion.

As for "complicated", just make it "uncomplicated", it's just code :)

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers

2013-06-06 Thread Yanmin Zhang
On Thu, 2013-06-06 at 18:02 -0700, Greg KH wrote:
> On Fri, Jun 07, 2013 at 08:53:29AM +0800, Yanmin Zhang wrote:
> > On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote:
> > > On Thu, 6 Jun 2013, shuox@intel.com wrote:
> > > > From: Zhang Yanmin 
> > > > 
> > > > synchronize_irq waits pending IRQ handlers to be finished. If using this
> > > > function while holding a resource, the IRQ handler may cause deadlock.
> > > > 
> > > > Here we add a new function irq_in_progress which doesn't wait for the 
> > > > handlers
> > > > to be finished.
> > > > 
> > > > A typical use case at suspend-to-ram:
> > > > 
> > > > device driver's irq handler is complicated and might hold a mutex at 
> > > > rare cases.
> > > > Its suspend function is called and a suspended flag is set.
> > > > In case its IRQ handler is running, suspend function calls 
> > > > irq_in_progress. if
> > > > handler is running, abort suspend.
> > > > The irq handler checks the suspended flag. If the device is suspended, 
> > > > irq handler
> > > > either ignores the interrupt, or wakes up the whole system, and the 
> > > > driver's
> > > > resume function could deal with the delayed interrupt handling.
> > > 
> > > This is as wrong as it can be. Fix the driver instead of hacking racy
> > > functions into the core code.
> > > 
> > > So your problem looks like this:
> > > 
> > > CPU 0 CPU 1
> > > irq_handler_thread()  suspend()
> > >.  mutex_lock();
> > >mutex_lock();synchronize_irq();
> > > 
> > > So why needs the mutex to be taken before synchronize_irq()? Why not
> > > doing the obvious?
> > > 
> > > suspend()
> > >   disable_irq(); (Implies synchronize_irq)
> > >   mutex_lock();
> > >   
> > >   mutex_unlock();
> > >   enable_irq();
> > Thanks for the kind comment.
> > 
> > We do consider your solution before and it works well indeed with some 
> > specific
> > simple drivers. For example, some drives use GPIO pin as interrupt source.
> > 
> > On one specific platform, disable_irq would really disable the irq at RTE 
> > entry,
> > which means we lose the wakeup capability of this device.
> > synchronize_irq can be another solution. But we did hit 'DPM device 
> > timeout' issue
> > reported by dpm_wd_handler at suspend-to-ram.
> > 
> > The driver is complicated. We couldn't change too many functions to 
> > optimize it.
> > In addition, we have to use the driver instead of throwing it away.
> 
> What is preventing you from rewriting it to work properly?
It's complicated.

> 
> > With irq_in_progress, we can resolve this issue and it does work, although 
> > it
> > looks like ugly.
> 
> Don't paper over driver bugs in the core kernel, fix the driver.
It's hard to say it's a driver bug. Could generic kernel provide some 
flexibility
for such complicated drivers?

For example, any driver's suspend can return error and the whole suspend-to-ram
aborts. Can we say the driver has a bug? If so, why not to change all 
suspend/resume
callbacks to return VOID?
Current kernel already allows drivers to abort suspend-to-ram at some rare 
corner cases.
Of course, if the abort happens frequently, it's a bug and we need fix it in 
driver.
If it happens rarely, can we provide some flexibility for the driver?

Thanks,
Yanmin


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers

2013-06-06 Thread Greg KH
On Fri, Jun 07, 2013 at 08:53:29AM +0800, Yanmin Zhang wrote:
> On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote:
> > On Thu, 6 Jun 2013, shuox@intel.com wrote:
> > > From: Zhang Yanmin 
> > > 
> > > synchronize_irq waits pending IRQ handlers to be finished. If using this
> > > function while holding a resource, the IRQ handler may cause deadlock.
> > > 
> > > Here we add a new function irq_in_progress which doesn't wait for the 
> > > handlers
> > > to be finished.
> > > 
> > > A typical use case at suspend-to-ram:
> > > 
> > > device driver's irq handler is complicated and might hold a mutex at rare 
> > > cases.
> > > Its suspend function is called and a suspended flag is set.
> > > In case its IRQ handler is running, suspend function calls 
> > > irq_in_progress. if
> > > handler is running, abort suspend.
> > > The irq handler checks the suspended flag. If the device is suspended, 
> > > irq handler
> > > either ignores the interrupt, or wakes up the whole system, and the 
> > > driver's
> > > resume function could deal with the delayed interrupt handling.
> > 
> > This is as wrong as it can be. Fix the driver instead of hacking racy
> > functions into the core code.
> > 
> > So your problem looks like this:
> > 
> > CPU 0   CPU 1
> > irq_handler_thread()suspend()
> >.mutex_lock();
> >mutex_lock();  synchronize_irq();
> > 
> > So why needs the mutex to be taken before synchronize_irq()? Why not
> > doing the obvious?
> > 
> > suspend()
> >   disable_irq(); (Implies synchronize_irq)
> >   mutex_lock();
> >   
> >   mutex_unlock();
> >   enable_irq();
> Thanks for the kind comment.
> 
> We do consider your solution before and it works well indeed with some 
> specific
> simple drivers. For example, some drives use GPIO pin as interrupt source.
> 
> On one specific platform, disable_irq would really disable the irq at RTE 
> entry,
> which means we lose the wakeup capability of this device.
> synchronize_irq can be another solution. But we did hit 'DPM device timeout' 
> issue
> reported by dpm_wd_handler at suspend-to-ram.
> 
> The driver is complicated. We couldn't change too many functions to optimize 
> it.
> In addition, we have to use the driver instead of throwing it away.

What is preventing you from rewriting it to work properly?

> With irq_in_progress, we can resolve this issue and it does work, although it
> looks like ugly.

Don't paper over driver bugs in the core kernel, fix the driver.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers

2013-06-06 Thread Yanmin Zhang
On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote:
> On Thu, 6 Jun 2013, shuox@intel.com wrote:
> > From: Zhang Yanmin 
> > 
> > synchronize_irq waits pending IRQ handlers to be finished. If using this
> > function while holding a resource, the IRQ handler may cause deadlock.
> > 
> > Here we add a new function irq_in_progress which doesn't wait for the 
> > handlers
> > to be finished.
> > 
> > A typical use case at suspend-to-ram:
> > 
> > device driver's irq handler is complicated and might hold a mutex at rare 
> > cases.
> > Its suspend function is called and a suspended flag is set.
> > In case its IRQ handler is running, suspend function calls irq_in_progress. 
> > if
> > handler is running, abort suspend.
> > The irq handler checks the suspended flag. If the device is suspended, irq 
> > handler
> > either ignores the interrupt, or wakes up the whole system, and the driver's
> > resume function could deal with the delayed interrupt handling.
> 
> This is as wrong as it can be. Fix the driver instead of hacking racy
> functions into the core code.
> 
> So your problem looks like this:
> 
> CPU 0 CPU 1
> irq_handler_thread()  suspend()
>.  mutex_lock();
>mutex_lock();synchronize_irq();
> 
> So why needs the mutex to be taken before synchronize_irq()? Why not
> doing the obvious?
> 
> suspend()
>   disable_irq(); (Implies synchronize_irq)
>   mutex_lock();
>   
>   mutex_unlock();
>   enable_irq();
Thanks for the kind comment.

We do consider your solution before and it works well indeed with some specific
simple drivers. For example, some drives use GPIO pin as interrupt source.

On one specific platform, disable_irq would really disable the irq at RTE entry,
which means we lose the wakeup capability of this device.
synchronize_irq can be another solution. But we did hit 'DPM device timeout' 
issue
reported by dpm_wd_handler at suspend-to-ram.

The driver is complicated. We couldn't change too many functions to optimize it.
In addition, we have to use the driver instead of throwing it away.

With irq_in_progress, we can resolve this issue and it does work, although it
looks like ugly.

Yanmin


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers

2013-06-06 Thread Thomas Gleixner
On Thu, 6 Jun 2013, shuox@intel.com wrote:
> From: Zhang Yanmin 
> 
> synchronize_irq waits pending IRQ handlers to be finished. If using this
> function while holding a resource, the IRQ handler may cause deadlock.
> 
> Here we add a new function irq_in_progress which doesn't wait for the handlers
> to be finished.
> 
> A typical use case at suspend-to-ram:
> 
> device driver's irq handler is complicated and might hold a mutex at rare 
> cases.
> Its suspend function is called and a suspended flag is set.
> In case its IRQ handler is running, suspend function calls irq_in_progress. if
> handler is running, abort suspend.
> The irq handler checks the suspended flag. If the device is suspended, irq 
> handler
> either ignores the interrupt, or wakes up the whole system, and the driver's
> resume function could deal with the delayed interrupt handling.

This is as wrong as it can be. Fix the driver instead of hacking racy
functions into the core code.

So your problem looks like this:

CPU 0   CPU 1
irq_handler_thread()suspend()
   .mutex_lock();
   mutex_lock();  synchronize_irq();

So why needs the mutex to be taken before synchronize_irq()? Why not
doing the obvious?

suspend()
  disable_irq(); (Implies synchronize_irq)
  mutex_lock();
  
  mutex_unlock();
  enable_irq();

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers

2013-06-06 Thread Thomas Gleixner
On Thu, 6 Jun 2013, shuox@intel.com wrote:
 From: Zhang Yanmin yanmin.zh...@intel.com
 
 synchronize_irq waits pending IRQ handlers to be finished. If using this
 function while holding a resource, the IRQ handler may cause deadlock.
 
 Here we add a new function irq_in_progress which doesn't wait for the handlers
 to be finished.
 
 A typical use case at suspend-to-ram:
 
 device driver's irq handler is complicated and might hold a mutex at rare 
 cases.
 Its suspend function is called and a suspended flag is set.
 In case its IRQ handler is running, suspend function calls irq_in_progress. if
 handler is running, abort suspend.
 The irq handler checks the suspended flag. If the device is suspended, irq 
 handler
 either ignores the interrupt, or wakes up the whole system, and the driver's
 resume function could deal with the delayed interrupt handling.

This is as wrong as it can be. Fix the driver instead of hacking racy
functions into the core code.

So your problem looks like this:

CPU 0   CPU 1
irq_handler_thread()suspend()
   .mutex_lock(m);
   mutex_lock(m);  synchronize_irq();

So why needs the mutex to be taken before synchronize_irq()? Why not
doing the obvious?

suspend()
  disable_irq(); (Implies synchronize_irq)
  mutex_lock(m);
  
  mutex_unlock(m);
  enable_irq();

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers

2013-06-06 Thread Yanmin Zhang
On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote:
 On Thu, 6 Jun 2013, shuox@intel.com wrote:
  From: Zhang Yanmin yanmin.zh...@intel.com
  
  synchronize_irq waits pending IRQ handlers to be finished. If using this
  function while holding a resource, the IRQ handler may cause deadlock.
  
  Here we add a new function irq_in_progress which doesn't wait for the 
  handlers
  to be finished.
  
  A typical use case at suspend-to-ram:
  
  device driver's irq handler is complicated and might hold a mutex at rare 
  cases.
  Its suspend function is called and a suspended flag is set.
  In case its IRQ handler is running, suspend function calls irq_in_progress. 
  if
  handler is running, abort suspend.
  The irq handler checks the suspended flag. If the device is suspended, irq 
  handler
  either ignores the interrupt, or wakes up the whole system, and the driver's
  resume function could deal with the delayed interrupt handling.
 
 This is as wrong as it can be. Fix the driver instead of hacking racy
 functions into the core code.
 
 So your problem looks like this:
 
 CPU 0 CPU 1
 irq_handler_thread()  suspend()
.  mutex_lock(m);
mutex_lock(m);synchronize_irq();
 
 So why needs the mutex to be taken before synchronize_irq()? Why not
 doing the obvious?
 
 suspend()
   disable_irq(); (Implies synchronize_irq)
   mutex_lock(m);
   
   mutex_unlock(m);
   enable_irq();
Thanks for the kind comment.

We do consider your solution before and it works well indeed with some specific
simple drivers. For example, some drives use GPIO pin as interrupt source.

On one specific platform, disable_irq would really disable the irq at RTE entry,
which means we lose the wakeup capability of this device.
synchronize_irq can be another solution. But we did hit 'DPM device timeout' 
issue
reported by dpm_wd_handler at suspend-to-ram.

The driver is complicated. We couldn't change too many functions to optimize it.
In addition, we have to use the driver instead of throwing it away.

With irq_in_progress, we can resolve this issue and it does work, although it
looks like ugly.

Yanmin


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers

2013-06-06 Thread Greg KH
On Fri, Jun 07, 2013 at 08:53:29AM +0800, Yanmin Zhang wrote:
 On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote:
  On Thu, 6 Jun 2013, shuox@intel.com wrote:
   From: Zhang Yanmin yanmin.zh...@intel.com
   
   synchronize_irq waits pending IRQ handlers to be finished. If using this
   function while holding a resource, the IRQ handler may cause deadlock.
   
   Here we add a new function irq_in_progress which doesn't wait for the 
   handlers
   to be finished.
   
   A typical use case at suspend-to-ram:
   
   device driver's irq handler is complicated and might hold a mutex at rare 
   cases.
   Its suspend function is called and a suspended flag is set.
   In case its IRQ handler is running, suspend function calls 
   irq_in_progress. if
   handler is running, abort suspend.
   The irq handler checks the suspended flag. If the device is suspended, 
   irq handler
   either ignores the interrupt, or wakes up the whole system, and the 
   driver's
   resume function could deal with the delayed interrupt handling.
  
  This is as wrong as it can be. Fix the driver instead of hacking racy
  functions into the core code.
  
  So your problem looks like this:
  
  CPU 0   CPU 1
  irq_handler_thread()suspend()
 .mutex_lock(m);
 mutex_lock(m);  synchronize_irq();
  
  So why needs the mutex to be taken before synchronize_irq()? Why not
  doing the obvious?
  
  suspend()
disable_irq(); (Implies synchronize_irq)
mutex_lock(m);

mutex_unlock(m);
enable_irq();
 Thanks for the kind comment.
 
 We do consider your solution before and it works well indeed with some 
 specific
 simple drivers. For example, some drives use GPIO pin as interrupt source.
 
 On one specific platform, disable_irq would really disable the irq at RTE 
 entry,
 which means we lose the wakeup capability of this device.
 synchronize_irq can be another solution. But we did hit 'DPM device timeout' 
 issue
 reported by dpm_wd_handler at suspend-to-ram.
 
 The driver is complicated. We couldn't change too many functions to optimize 
 it.
 In addition, we have to use the driver instead of throwing it away.

What is preventing you from rewriting it to work properly?

 With irq_in_progress, we can resolve this issue and it does work, although it
 looks like ugly.

Don't paper over driver bugs in the core kernel, fix the driver.

greg k-h
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers

2013-06-06 Thread Yanmin Zhang
On Thu, 2013-06-06 at 18:02 -0700, Greg KH wrote:
 On Fri, Jun 07, 2013 at 08:53:29AM +0800, Yanmin Zhang wrote:
  On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote:
   On Thu, 6 Jun 2013, shuox@intel.com wrote:
From: Zhang Yanmin yanmin.zh...@intel.com

synchronize_irq waits pending IRQ handlers to be finished. If using this
function while holding a resource, the IRQ handler may cause deadlock.

Here we add a new function irq_in_progress which doesn't wait for the 
handlers
to be finished.

A typical use case at suspend-to-ram:

device driver's irq handler is complicated and might hold a mutex at 
rare cases.
Its suspend function is called and a suspended flag is set.
In case its IRQ handler is running, suspend function calls 
irq_in_progress. if
handler is running, abort suspend.
The irq handler checks the suspended flag. If the device is suspended, 
irq handler
either ignores the interrupt, or wakes up the whole system, and the 
driver's
resume function could deal with the delayed interrupt handling.
   
   This is as wrong as it can be. Fix the driver instead of hacking racy
   functions into the core code.
   
   So your problem looks like this:
   
   CPU 0 CPU 1
   irq_handler_thread()  suspend()
  .  mutex_lock(m);
  mutex_lock(m);synchronize_irq();
   
   So why needs the mutex to be taken before synchronize_irq()? Why not
   doing the obvious?
   
   suspend()
 disable_irq(); (Implies synchronize_irq)
 mutex_lock(m);
 
 mutex_unlock(m);
 enable_irq();
  Thanks for the kind comment.
  
  We do consider your solution before and it works well indeed with some 
  specific
  simple drivers. For example, some drives use GPIO pin as interrupt source.
  
  On one specific platform, disable_irq would really disable the irq at RTE 
  entry,
  which means we lose the wakeup capability of this device.
  synchronize_irq can be another solution. But we did hit 'DPM device 
  timeout' issue
  reported by dpm_wd_handler at suspend-to-ram.
  
  The driver is complicated. We couldn't change too many functions to 
  optimize it.
  In addition, we have to use the driver instead of throwing it away.
 
 What is preventing you from rewriting it to work properly?
It's complicated.

 
  With irq_in_progress, we can resolve this issue and it does work, although 
  it
  looks like ugly.
 
 Don't paper over driver bugs in the core kernel, fix the driver.
It's hard to say it's a driver bug. Could generic kernel provide some 
flexibility
for such complicated drivers?

For example, any driver's suspend can return error and the whole suspend-to-ram
aborts. Can we say the driver has a bug? If so, why not to change all 
suspend/resume
callbacks to return VOID?
Current kernel already allows drivers to abort suspend-to-ram at some rare 
corner cases.
Of course, if the abort happens frequently, it's a bug and we need fix it in 
driver.
If it happens rarely, can we provide some flexibility for the driver?

Thanks,
Yanmin


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers

2013-06-06 Thread Greg KH
On Fri, Jun 07, 2013 at 10:37:52AM +0800, Yanmin Zhang wrote:
 On Thu, 2013-06-06 at 18:02 -0700, Greg KH wrote:
  On Fri, Jun 07, 2013 at 08:53:29AM +0800, Yanmin Zhang wrote:
   On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote:
On Thu, 6 Jun 2013, shuox@intel.com wrote:
 From: Zhang Yanmin yanmin.zh...@intel.com
 
 synchronize_irq waits pending IRQ handlers to be finished. If using 
 this
 function while holding a resource, the IRQ handler may cause deadlock.
 
 Here we add a new function irq_in_progress which doesn't wait for the 
 handlers
 to be finished.
 
 A typical use case at suspend-to-ram:
 
 device driver's irq handler is complicated and might hold a mutex at 
 rare cases.
 Its suspend function is called and a suspended flag is set.
 In case its IRQ handler is running, suspend function calls 
 irq_in_progress. if
 handler is running, abort suspend.
 The irq handler checks the suspended flag. If the device is 
 suspended, irq handler
 either ignores the interrupt, or wakes up the whole system, and the 
 driver's
 resume function could deal with the delayed interrupt handling.

This is as wrong as it can be. Fix the driver instead of hacking racy
functions into the core code.

So your problem looks like this:

CPU 0   CPU 1
irq_handler_thread()suspend()
   .mutex_lock(m);
   mutex_lock(m);  synchronize_irq();

So why needs the mutex to be taken before synchronize_irq()? Why not
doing the obvious?

suspend()
  disable_irq(); (Implies synchronize_irq)
  mutex_lock(m);
  
  mutex_unlock(m);
  enable_irq();
   Thanks for the kind comment.
   
   We do consider your solution before and it works well indeed with some 
   specific
   simple drivers. For example, some drives use GPIO pin as interrupt source.
   
   On one specific platform, disable_irq would really disable the irq at RTE 
   entry,
   which means we lose the wakeup capability of this device.
   synchronize_irq can be another solution. But we did hit 'DPM device 
   timeout' issue
   reported by dpm_wd_handler at suspend-to-ram.
   
   The driver is complicated. We couldn't change too many functions to 
   optimize it.
   In addition, we have to use the driver instead of throwing it away.
  
  What is preventing you from rewriting it to work properly?
 It's complicated.

That sounds like your issue, not ours, so please don't push the problem
onto someone else.  Take ownership of the driver, fix it up, and all
will be good.


   With irq_in_progress, we can resolve this issue and it does work, 
   although it
   looks like ugly.
  
  Don't paper over driver bugs in the core kernel, fix the driver.
 It's hard to say it's a driver bug. Could generic kernel provide some
 flexibility for such complicated drivers?

Please post the code for the driver, and then we will be glad to
continue the dicussion.

As for complicated, just make it uncomplicated, it's just code :)

greg k-h
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers

2013-06-06 Thread Yanmin Zhang
On Thu, 2013-06-06 at 20:08 -0700, Greg KH wrote:
 On Fri, Jun 07, 2013 at 10:37:52AM +0800, Yanmin Zhang wrote:
  On Thu, 2013-06-06 at 18:02 -0700, Greg KH wrote:
   On Fri, Jun 07, 2013 at 08:53:29AM +0800, Yanmin Zhang wrote:
On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote:
 On Thu, 6 Jun 2013, shuox@intel.com wrote:
  From: Zhang Yanmin yanmin.zh...@intel.com
  
  synchronize_irq waits pending IRQ handlers to be finished. If using 
  this
  function while holding a resource, the IRQ handler may cause 
  deadlock.
  
  Here we add a new function irq_in_progress which doesn't wait for 
  the handlers
  to be finished.
  
  A typical use case at suspend-to-ram:
  
  device driver's irq handler is complicated and might hold a mutex 
  at rare cases.
  Its suspend function is called and a suspended flag is set.
  In case its IRQ handler is running, suspend function calls 
  irq_in_progress. if
  handler is running, abort suspend.
  The irq handler checks the suspended flag. If the device is 
  suspended, irq handler
  either ignores the interrupt, or wakes up the whole system, and the 
  driver's
  resume function could deal with the delayed interrupt handling.
 
 This is as wrong as it can be. Fix the driver instead of hacking racy
 functions into the core code.
 
 So your problem looks like this:
 
 CPU 0 CPU 1
 irq_handler_thread()  suspend()
.  mutex_lock(m);
mutex_lock(m);synchronize_irq();
 
 So why needs the mutex to be taken before synchronize_irq()? Why not
 doing the obvious?
 
 suspend()
   disable_irq(); (Implies synchronize_irq)
   mutex_lock(m);
   
   mutex_unlock(m);
   enable_irq();
Thanks for the kind comment.

We do consider your solution before and it works well indeed with some 
specific
simple drivers. For example, some drives use GPIO pin as interrupt 
source.

On one specific platform, disable_irq would really disable the irq at 
RTE entry,
which means we lose the wakeup capability of this device.
synchronize_irq can be another solution. But we did hit 'DPM device 
timeout' issue
reported by dpm_wd_handler at suspend-to-ram.

The driver is complicated. We couldn't change too many functions to 
optimize it.
In addition, we have to use the driver instead of throwing it away.
   
   What is preventing you from rewriting it to work properly?
  It's complicated.
 
 That sounds like your issue, not ours, so please don't push the problem
 onto someone else.  Take ownership of the driver, fix it up, and all
 will be good.
 
 
With irq_in_progress, we can resolve this issue and it does work, 
although it
looks like ugly.
   
   Don't paper over driver bugs in the core kernel, fix the driver.
  It's hard to say it's a driver bug. Could generic kernel provide some
  flexibility for such complicated drivers?
 
 Please post the code for the driver, and then we will be glad to
 continue the dicussion.
Greg,

Thanks for your enthusiasm. It's a private driver for products.

Let's stop here and I wouldn't push the patch to upstream.

Thanks all.
Yanmin

 
 As for complicated, just make it uncomplicated, it's just code :)


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers

2013-06-06 Thread Greg KH
On Fri, Jun 07, 2013 at 11:18:06AM +0800, Yanmin Zhang wrote:
 On Thu, 2013-06-06 at 20:08 -0700, Greg KH wrote:
  On Fri, Jun 07, 2013 at 10:37:52AM +0800, Yanmin Zhang wrote:
   On Thu, 2013-06-06 at 18:02 -0700, Greg KH wrote:
On Fri, Jun 07, 2013 at 08:53:29AM +0800, Yanmin Zhang wrote:
 On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote:
  On Thu, 6 Jun 2013, shuox@intel.com wrote:
   From: Zhang Yanmin yanmin.zh...@intel.com
   
   synchronize_irq waits pending IRQ handlers to be finished. If 
   using this
   function while holding a resource, the IRQ handler may cause 
   deadlock.
   
   Here we add a new function irq_in_progress which doesn't wait for 
   the handlers
   to be finished.
   
   A typical use case at suspend-to-ram:
   
   device driver's irq handler is complicated and might hold a mutex 
   at rare cases.
   Its suspend function is called and a suspended flag is set.
   In case its IRQ handler is running, suspend function calls 
   irq_in_progress. if
   handler is running, abort suspend.
   The irq handler checks the suspended flag. If the device is 
   suspended, irq handler
   either ignores the interrupt, or wakes up the whole system, and 
   the driver's
   resume function could deal with the delayed interrupt handling.
  
  This is as wrong as it can be. Fix the driver instead of hacking 
  racy
  functions into the core code.
  
  So your problem looks like this:
  
  CPU 0   CPU 1
  irq_handler_thread()suspend()
 .mutex_lock(m);
 mutex_lock(m);  synchronize_irq();
  
  So why needs the mutex to be taken before synchronize_irq()? Why not
  doing the obvious?
  
  suspend()
disable_irq(); (Implies synchronize_irq)
mutex_lock(m);

mutex_unlock(m);
enable_irq();
 Thanks for the kind comment.
 
 We do consider your solution before and it works well indeed with 
 some specific
 simple drivers. For example, some drives use GPIO pin as interrupt 
 source.
 
 On one specific platform, disable_irq would really disable the irq at 
 RTE entry,
 which means we lose the wakeup capability of this device.
 synchronize_irq can be another solution. But we did hit 'DPM device 
 timeout' issue
 reported by dpm_wd_handler at suspend-to-ram.
 
 The driver is complicated. We couldn't change too many functions to 
 optimize it.
 In addition, we have to use the driver instead of throwing it away.

What is preventing you from rewriting it to work properly?
   It's complicated.
  
  That sounds like your issue, not ours, so please don't push the problem
  onto someone else.  Take ownership of the driver, fix it up, and all
  will be good.
  
  
 With irq_in_progress, we can resolve this issue and it does work, 
 although it
 looks like ugly.

Don't paper over driver bugs in the core kernel, fix the driver.
   It's hard to say it's a driver bug. Could generic kernel provide some
   flexibility for such complicated drivers?
  
  Please post the code for the driver, and then we will be glad to
  continue the dicussion.
 Greg,
 
 Thanks for your enthusiasm. It's a private driver for products.

What do you mean private driver?  All drivers need to be merged into
the mainline kernel, it saves you time and money, and we will fix your
bugs for you.

You know that, your bosses know that, your management knows that, so why
are you trying to hide things?

totally confused,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers

2013-06-06 Thread Yanmin Zhang
On Thu, 2013-06-06 at 21:19 -0700, Greg KH wrote:
 On Fri, Jun 07, 2013 at 11:18:06AM +0800, Yanmin Zhang wrote:
  On Thu, 2013-06-06 at 20:08 -0700, Greg KH wrote:
   On Fri, Jun 07, 2013 at 10:37:52AM +0800, Yanmin Zhang wrote:
On Thu, 2013-06-06 at 18:02 -0700, Greg KH wrote:
 On Fri, Jun 07, 2013 at 08:53:29AM +0800, Yanmin Zhang wrote:
  On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote:
   On Thu, 6 Jun 2013, shuox@intel.com wrote:
From: Zhang Yanmin yanmin.zh...@intel.com

synchronize_irq waits pending IRQ handlers to be finished. If 
using this
function while holding a resource, the IRQ handler may cause 
deadlock.

Here we add a new function irq_in_progress which doesn't wait 
for the handlers
to be finished.

A typical use case at suspend-to-ram:

device driver's irq handler is complicated and might hold a 
mutex at rare cases.
Its suspend function is called and a suspended flag is set.
In case its IRQ handler is running, suspend function calls 
irq_in_progress. if
handler is running, abort suspend.
The irq handler checks the suspended flag. If the device is 
suspended, irq handler
either ignores the interrupt, or wakes up the whole system, and 
the driver's
resume function could deal with the delayed interrupt handling.
   
   This is as wrong as it can be. Fix the driver instead of hacking 
   racy
   functions into the core code.
   
   So your problem looks like this:
   
   CPU 0 CPU 1
   irq_handler_thread()  suspend()
  .  mutex_lock(m);
  mutex_lock(m);synchronize_irq();
   
   So why needs the mutex to be taken before synchronize_irq()? Why 
   not
   doing the obvious?
   
   suspend()
 disable_irq(); (Implies synchronize_irq)
 mutex_lock(m);
 
 mutex_unlock(m);
 enable_irq();
  Thanks for the kind comment.
  
  We do consider your solution before and it works well indeed with 
  some specific
  simple drivers. For example, some drives use GPIO pin as interrupt 
  source.
  
  On one specific platform, disable_irq would really disable the irq 
  at RTE entry,
  which means we lose the wakeup capability of this device.
  synchronize_irq can be another solution. But we did hit 'DPM device 
  timeout' issue
  reported by dpm_wd_handler at suspend-to-ram.
  
  The driver is complicated. We couldn't change too many functions to 
  optimize it.
  In addition, we have to use the driver instead of throwing it away.
 
 What is preventing you from rewriting it to work properly?
It's complicated.
   
   That sounds like your issue, not ours, so please don't push the problem
   onto someone else.  Take ownership of the driver, fix it up, and all
   will be good.
   
   
  With irq_in_progress, we can resolve this issue and it does work, 
  although it
  looks like ugly.
 
 Don't paper over driver bugs in the core kernel, fix the driver.
It's hard to say it's a driver bug. Could generic kernel provide some
flexibility for such complicated drivers?
   
   Please post the code for the driver, and then we will be glad to
   continue the dicussion.
  Greg,
  
  Thanks for your enthusiasm. It's a private driver for products.
 
 What do you mean private driver?  All drivers need to be merged into
 the mainline kernel, it saves you time and money, and we will fix your
 bugs for you.
 
 You know that, your bosses know that, your management knows that, so why
 are you trying to hide things?
 
 totally confused,
They are embedded device drivers, highly depending on specific devices which 
runs
its own firmware in devices. Here the kernel drivers run at AP side.

One example is Graphics driver, which is very big and coding is not friendly. 
Kernel
experts can raise tons of questions against the driver, but we have to make the 
driver
work well on real products.

Another reason is drivers need workaround many hardware issues. That makes it
hard to implement kernel drivers in good shape sometimes.

We need support all cases.

We fixed lots of hard issues on embedded products and think if kernel could be 
more
flexible to support complicated cases.

Anyway, thanks.

Yanmin


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/