Re: [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()

2012-11-28 Thread Ming Lei
On Wed, Nov 28, 2012 at 6:06 PM, Rafael J. Wysocki  wrote:
>
> Well, it may be unfrequent, but does it mean it has to do things that may
> be avoided (ie. walking the children of every node in the path in some cases)?

I agree so without introducing extra cost, :-)

> I don't really think that the counters would cost us that much anyway.

On ARM v7, sizeof(struct device) becomes 376 from 368 after introducing
'unsigned intnoio_cnt;' to 'struct dev_pm_info', and total memory
increases about 3752bytes in a small configuration(about 494 device instance).
The actual memory increase should be more than the data because 'struct device'
is generally embedded into other concrete device structure.

>> Also looks the current implementation of pm_runtime_set_memalloc_noio()
>> is simple and clean enough with the flag, IMO.
>
> I know you always know better. :-)

We still need to consider cost and the function calling frequency, :-)

>
>> > I would use the flag only to store the information that
>> > pm_runtime_set_memalloc_noio(dev, true) has been run for this device 
>> > directly
>> > and I'd use a counter for everything else.
>> >
>> > That is, have power.memalloc_count that would be incremented when (1)
>> > pm_runtime_set_memalloc_noio(dev, true) is called for that device and (2) 
>> > when
>> > power.memalloc_count for one of its children changes from 0 to 1 (and
>> > analogously for decrementation).  Then, check the counter in 
>> > rpm_callback().
>>
>> Sorry, could you explain in a bit detail why we need the counter? Looks only
>> checking the flag in rpm_callback() is enough, doesn't it?
>
> Why would I want to use power.memalloc_count in addition to the
> power.memalloc_noio flag?
>
> Consider this:
>
> pm_runtime_set_memalloc_noio(dev):
> return if power.memalloc_noio is set
> set power.memalloc_noio
>   loop:
> increment power.memalloc_count
> if power.memalloc_count is 1 now switch to parent and go to loop

I am wondering if the above should be changed to below because the child
count of memalloc_noio device need to be recorded.

pm_runtime_set_memalloc_noio(dev):
 return if power.memalloc_noio is set
 set power.memalloc_noio
loop:
 increment power.memalloc_count
 switch to parent and go to loop

So pm_runtime_set_memalloc_noio(dev) will become worse than
the improved pm_runtime_set_memalloc_noio(dev, true), which
can return immediately if one dev or parent's flag is true.

> pm_runtime_clear_memalloc_noio(dev):
> return if power.memalloc_noio is unset
> unset power.memalloc_noio
>   loop:
> decrement power.memalloc_count
> if power.memalloc_count is 0 now switch to parent and go to loop

The above will perform well than pm_runtime_set_memalloc_noio(dev, false),
because the above avoids to walk children of device.

So one becomes worse and another becomes better, :-)

Also the children count of one device is generally very small, less than
10 for most devices, see the data obtained in one common x86 pc(thinkpad
t410) from below link:

http://kernel.ubuntu.com/~ming/up/t410-dev-child-cnt.log

- about 8 devices whose child count is more than 10, top three are 18, 17 ,12,
and all the three are root devices.

- about 117 devices whose child count is between 1 and 9

- other 501 devices whose child count is zero

>From above data, walking device children should have not much effect on
performance of pm_runtime_set_memalloc_noio(), which is also called in
very infrequent path.

> Looks kind of simpler, doesn't it?

Looks simpler, but more code lines than single
pm_runtime_set_memalloc_noio(), :-)

>
> And why rpm_callback() should check power.memalloc_count instead of the count?
> Because power.memalloc_noio will only be set for devices that
> pm_runtime_set_memalloc_noio(dev) was called for directly (not necessarily for
> the parents).
>
> And that works even if someone calls any of them twice in a row for the same
> device (presumably by mistake) and doesn't have to make any assumptions
> about devices it is called for.

IMO, we can ignore the mistake usage because the function is called only
in network/block core code currently, not by individual driver.

>
>> > Besides, don't you need to check children for the arg device itself?
>>
>> It isn't needed since the children of network/block device can't be
>> involved of the deadlock in runtime PM path.
>>
>> Also, the function is only called by network device or block device
>> subsystem, both the two kind of device are class device and should
>> have no children.
>
> OK, so not walking the arg device's children is an optimization related to
> some assumptions regarding who's supposed to use this routine.  That should
> be clearly documented.

I think the patch already documents it in the comment of
pm_runtime_set_memalloc_noio().

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a 

Re: [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()

2012-11-28 Thread Rafael J. Wysocki
On Wednesday, November 28, 2012 05:47:18 PM Ming Lei wrote:
> On Wed, Nov 28, 2012 at 5:29 PM, Rafael J. Wysocki  wrote:
> >
> > But it doesn't have to walk the children.  Moreover, with counters it only
> 
> Yeah, I got it, it is the advantage of counter, but with extra 'int'
> field introduced
> in 'struct device'.
> 
> > needs to walk the whole path if all devices in it need to be updated.  For
> > example, if you call pm_runtime_set_memalloc_noio(dev, true) for a device
> > whose parent's counter is greater than zero already, you don't need to
> > walk the path above the parent.
> 
> We still can do it with the flag only, pm_runtime_set_memalloc_noio(dev, true)
> can return immediately if one parent or the 'dev' flag is true.
> 
> But considered that the pm_runtime_set_memalloc_noio(dev, false) is only
> called in a very infrequent path(network/block device->remove()), looks the
> introduced cost isn't worthy of the obtained advantage.
> 
> So could you accept not introducing counter? and I will update with the
> above improvement you suggested.

Well, please see my other message I sent a while ago. :-)

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()

2012-11-28 Thread Rafael J. Wysocki
On Wednesday, November 28, 2012 11:57:19 AM Ming Lei wrote:
> On Wed, Nov 28, 2012 at 5:19 AM, Rafael J. Wysocki  wrote:
> > On Saturday, November 24, 2012 08:59:14 PM Ming Lei wrote:
> >> The patch introduces the flag of memalloc_noio in 'struct dev_pm_info'
> >> to help PM core to teach mm not allocating memory with GFP_KERNEL
> >> flag for avoiding probable deadlock.
> >>
> >> As explained in the comment, any GFP_KERNEL allocation inside
> >> runtime_resume() or runtime_suspend() on any one of device in
> >> the path from one block or network device to the root device
> >> in the device tree may cause deadlock, the introduced
> >> pm_runtime_set_memalloc_noio() sets or clears the flag on
> >> device in the path recursively.
> >>
> >> Cc: Alan Stern 
> >> Cc: "Rafael J. Wysocki" 
> >> Signed-off-by: Ming Lei 
> >> ---
> >> v5:
> >>   - fix code style error
> >>   - add comment on clear the device memalloc_noio flag
> >> v4:
> >>   - rename memalloc_noio_resume as memalloc_noio
> >>   - remove pm_runtime_get_memalloc_noio()
> >>   - add comments on pm_runtime_set_memalloc_noio
> >> v3:
> >>   - introduce pm_runtime_get_memalloc_noio()
> >>   - hold one global lock on pm_runtime_set_memalloc_noio
> >>   - hold device power lock when accessing memalloc_noio_resume
> >> flag suggested by Alan Stern
> >>   - implement pm_runtime_set_memalloc_noio without recursion
> >> suggested by Alan Stern
> >> v2:
> >>   - introduce pm_runtime_set_memalloc_noio()
> >> ---
> >>  drivers/base/power/runtime.c |   60 
> >> ++
> >>  include/linux/pm.h   |1 +
> >>  include/linux/pm_runtime.h   |3 +++
> >>  3 files changed, 64 insertions(+)
> >>
> >> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> >> index 3148b10..3e198a0 100644
> >> --- a/drivers/base/power/runtime.c
> >> +++ b/drivers/base/power/runtime.c
> >> @@ -124,6 +124,66 @@ unsigned long 
> >> pm_runtime_autosuspend_expiration(struct device *dev)
> >>  }
> >>  EXPORT_SYMBOL_GPL(pm_runtime_autosuspend_expiration);
> >>
> >> +static int dev_memalloc_noio(struct device *dev, void *data)
> >> +{
> >> + return dev->power.memalloc_noio;
> >> +}
> >> +
> >> +/*
> >> + * pm_runtime_set_memalloc_noio - Set a device's memalloc_noio flag.
> >> + * @dev: Device to handle.
> >> + * @enable: True for setting the flag and False for clearing the flag.
> >> + *
> >> + * Set the flag for all devices in the path from the device to the
> >> + * root device in the device tree if @enable is true, otherwise clear
> >> + * the flag for devices in the path whose siblings don't set the flag.
> >> + *
> >
> > Please use counters instead of walking the whole path every time.  Ie. in
> > addition to the flag add a counter to store the number of the device's
> > children having that flag set.
> 
> Thanks for your review.
> 
> IMO, pm_runtime_set_memalloc_noio() is only called in
> probe() and release() of block device and network device, which is
> in a very infrequent path, so I am wondering if it is worthy of introducing
> another counter for all devices.

Well, it may be unfrequent, but does it mean it has to do things that may
be avoided (ie. walking the children of every node in the path in some cases)?

I don't really think that the counters would cost us that much anyway.

> Also looks the current implementation of pm_runtime_set_memalloc_noio()
> is simple and clean enough with the flag, IMO.

I know you always know better. :-)

> > I would use the flag only to store the information that
> > pm_runtime_set_memalloc_noio(dev, true) has been run for this device 
> > directly
> > and I'd use a counter for everything else.
> >
> > That is, have power.memalloc_count that would be incremented when (1)
> > pm_runtime_set_memalloc_noio(dev, true) is called for that device and (2) 
> > when
> > power.memalloc_count for one of its children changes from 0 to 1 (and
> > analogously for decrementation).  Then, check the counter in rpm_callback().
> 
> Sorry, could you explain in a bit detail why we need the counter? Looks only
> checking the flag in rpm_callback() is enough, doesn't it?

Why would I want to use power.memalloc_count in addition to the
power.memalloc_noio flag?

Consider this:

pm_runtime_set_memalloc_noio(dev):
return if power.memalloc_noio is set
set power.memalloc_noio
  loop:
increment power.memalloc_count
if power.memalloc_count is 1 now switch to parent and go to loop

pm_runtime_clear_memalloc_noio(dev):
return if power.memalloc_noio is unset
unset power.memalloc_noio
  loop:
decrement power.memalloc_count
if power.memalloc_count is 0 now switch to parent and go to loop

Looks kind of simpler, doesn't it?

And why rpm_callback() should check power.memalloc_count instead of the count?
Because power.memalloc_noio will only be set for devices that
pm_runtime_set_memalloc_noio(dev) 

Re: [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()

2012-11-28 Thread Ming Lei
On Wed, Nov 28, 2012 at 5:29 PM, Rafael J. Wysocki  wrote:
>
> But it doesn't have to walk the children.  Moreover, with counters it only

Yeah, I got it, it is the advantage of counter, but with extra 'int'
field introduced
in 'struct device'.

> needs to walk the whole path if all devices in it need to be updated.  For
> example, if you call pm_runtime_set_memalloc_noio(dev, true) for a device
> whose parent's counter is greater than zero already, you don't need to
> walk the path above the parent.

We still can do it with the flag only, pm_runtime_set_memalloc_noio(dev, true)
can return immediately if one parent or the 'dev' flag is true.

But considered that the pm_runtime_set_memalloc_noio(dev, false) is only
called in a very infrequent path(network/block device->remove()), looks the
introduced cost isn't worthy of the obtained advantage.

So could you accept not introducing counter? and I will update with the
above improvement you suggested.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()

2012-11-28 Thread Rafael J. Wysocki
On Wednesday, November 28, 2012 12:34:36 PM Ming Lei wrote:
> On Wed, Nov 28, 2012 at 5:19 AM, Rafael J. Wysocki  wrote:
> >
> > Please use counters instead of walking the whole path every time.  Ie. in
> > addition to the flag add a counter to store the number of the device's
> > children having that flag set.
> 
> Even though counter is added, walking the whole path can't be avoided too,
> and may be a explicit walking or recursion, because 
> pm_runtime_set_memalloc_noio
> is required to set or clear the flag(or increase/decrease the counter) of
> devices in the whole path.

But it doesn't have to walk the children.  Moreover, with counters it only
needs to walk the whole path if all devices in it need to be updated.  For
example, if you call pm_runtime_set_memalloc_noio(dev, true) for a device
whose parent's counter is greater than zero already, you don't need to
walk the path above the parent.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()

2012-11-28 Thread Rafael J. Wysocki
On Wednesday, November 28, 2012 12:34:36 PM Ming Lei wrote:
 On Wed, Nov 28, 2012 at 5:19 AM, Rafael J. Wysocki r...@sisk.pl wrote:
 
  Please use counters instead of walking the whole path every time.  Ie. in
  addition to the flag add a counter to store the number of the device's
  children having that flag set.
 
 Even though counter is added, walking the whole path can't be avoided too,
 and may be a explicit walking or recursion, because 
 pm_runtime_set_memalloc_noio
 is required to set or clear the flag(or increase/decrease the counter) of
 devices in the whole path.

But it doesn't have to walk the children.  Moreover, with counters it only
needs to walk the whole path if all devices in it need to be updated.  For
example, if you call pm_runtime_set_memalloc_noio(dev, true) for a device
whose parent's counter is greater than zero already, you don't need to
walk the path above the parent.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()

2012-11-28 Thread Ming Lei
On Wed, Nov 28, 2012 at 5:29 PM, Rafael J. Wysocki r...@sisk.pl wrote:

 But it doesn't have to walk the children.  Moreover, with counters it only

Yeah, I got it, it is the advantage of counter, but with extra 'int'
field introduced
in 'struct device'.

 needs to walk the whole path if all devices in it need to be updated.  For
 example, if you call pm_runtime_set_memalloc_noio(dev, true) for a device
 whose parent's counter is greater than zero already, you don't need to
 walk the path above the parent.

We still can do it with the flag only, pm_runtime_set_memalloc_noio(dev, true)
can return immediately if one parent or the 'dev' flag is true.

But considered that the pm_runtime_set_memalloc_noio(dev, false) is only
called in a very infrequent path(network/block device-remove()), looks the
introduced cost isn't worthy of the obtained advantage.

So could you accept not introducing counter? and I will update with the
above improvement you suggested.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()

2012-11-28 Thread Rafael J. Wysocki
On Wednesday, November 28, 2012 11:57:19 AM Ming Lei wrote:
 On Wed, Nov 28, 2012 at 5:19 AM, Rafael J. Wysocki r...@sisk.pl wrote:
  On Saturday, November 24, 2012 08:59:14 PM Ming Lei wrote:
  The patch introduces the flag of memalloc_noio in 'struct dev_pm_info'
  to help PM core to teach mm not allocating memory with GFP_KERNEL
  flag for avoiding probable deadlock.
 
  As explained in the comment, any GFP_KERNEL allocation inside
  runtime_resume() or runtime_suspend() on any one of device in
  the path from one block or network device to the root device
  in the device tree may cause deadlock, the introduced
  pm_runtime_set_memalloc_noio() sets or clears the flag on
  device in the path recursively.
 
  Cc: Alan Stern st...@rowland.harvard.edu
  Cc: Rafael J. Wysocki r...@sisk.pl
  Signed-off-by: Ming Lei ming@canonical.com
  ---
  v5:
- fix code style error
- add comment on clear the device memalloc_noio flag
  v4:
- rename memalloc_noio_resume as memalloc_noio
- remove pm_runtime_get_memalloc_noio()
- add comments on pm_runtime_set_memalloc_noio
  v3:
- introduce pm_runtime_get_memalloc_noio()
- hold one global lock on pm_runtime_set_memalloc_noio
- hold device power lock when accessing memalloc_noio_resume
  flag suggested by Alan Stern
- implement pm_runtime_set_memalloc_noio without recursion
  suggested by Alan Stern
  v2:
- introduce pm_runtime_set_memalloc_noio()
  ---
   drivers/base/power/runtime.c |   60 
  ++
   include/linux/pm.h   |1 +
   include/linux/pm_runtime.h   |3 +++
   3 files changed, 64 insertions(+)
 
  diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
  index 3148b10..3e198a0 100644
  --- a/drivers/base/power/runtime.c
  +++ b/drivers/base/power/runtime.c
  @@ -124,6 +124,66 @@ unsigned long 
  pm_runtime_autosuspend_expiration(struct device *dev)
   }
   EXPORT_SYMBOL_GPL(pm_runtime_autosuspend_expiration);
 
  +static int dev_memalloc_noio(struct device *dev, void *data)
  +{
  + return dev-power.memalloc_noio;
  +}
  +
  +/*
  + * pm_runtime_set_memalloc_noio - Set a device's memalloc_noio flag.
  + * @dev: Device to handle.
  + * @enable: True for setting the flag and False for clearing the flag.
  + *
  + * Set the flag for all devices in the path from the device to the
  + * root device in the device tree if @enable is true, otherwise clear
  + * the flag for devices in the path whose siblings don't set the flag.
  + *
 
  Please use counters instead of walking the whole path every time.  Ie. in
  addition to the flag add a counter to store the number of the device's
  children having that flag set.
 
 Thanks for your review.
 
 IMO, pm_runtime_set_memalloc_noio() is only called in
 probe() and release() of block device and network device, which is
 in a very infrequent path, so I am wondering if it is worthy of introducing
 another counter for all devices.

Well, it may be unfrequent, but does it mean it has to do things that may
be avoided (ie. walking the children of every node in the path in some cases)?

I don't really think that the counters would cost us that much anyway.

 Also looks the current implementation of pm_runtime_set_memalloc_noio()
 is simple and clean enough with the flag, IMO.

I know you always know better. :-)

  I would use the flag only to store the information that
  pm_runtime_set_memalloc_noio(dev, true) has been run for this device 
  directly
  and I'd use a counter for everything else.
 
  That is, have power.memalloc_count that would be incremented when (1)
  pm_runtime_set_memalloc_noio(dev, true) is called for that device and (2) 
  when
  power.memalloc_count for one of its children changes from 0 to 1 (and
  analogously for decrementation).  Then, check the counter in rpm_callback().
 
 Sorry, could you explain in a bit detail why we need the counter? Looks only
 checking the flag in rpm_callback() is enough, doesn't it?

Why would I want to use power.memalloc_count in addition to the
power.memalloc_noio flag?

Consider this:

pm_runtime_set_memalloc_noio(dev):
return if power.memalloc_noio is set
set power.memalloc_noio
  loop:
increment power.memalloc_count
if power.memalloc_count is 1 now switch to parent and go to loop

pm_runtime_clear_memalloc_noio(dev):
return if power.memalloc_noio is unset
unset power.memalloc_noio
  loop:
decrement power.memalloc_count
if power.memalloc_count is 0 now switch to parent and go to loop

Looks kind of simpler, doesn't it?

And why rpm_callback() should check power.memalloc_count instead of the count?
Because power.memalloc_noio will only be set for devices that
pm_runtime_set_memalloc_noio(dev) was called for directly (not necessarily for
the parents).

And that works even if someone calls any of them twice in a row for the same
device (presumably 

Re: [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()

2012-11-28 Thread Rafael J. Wysocki
On Wednesday, November 28, 2012 05:47:18 PM Ming Lei wrote:
 On Wed, Nov 28, 2012 at 5:29 PM, Rafael J. Wysocki r...@sisk.pl wrote:
 
  But it doesn't have to walk the children.  Moreover, with counters it only
 
 Yeah, I got it, it is the advantage of counter, but with extra 'int'
 field introduced
 in 'struct device'.
 
  needs to walk the whole path if all devices in it need to be updated.  For
  example, if you call pm_runtime_set_memalloc_noio(dev, true) for a device
  whose parent's counter is greater than zero already, you don't need to
  walk the path above the parent.
 
 We still can do it with the flag only, pm_runtime_set_memalloc_noio(dev, true)
 can return immediately if one parent or the 'dev' flag is true.
 
 But considered that the pm_runtime_set_memalloc_noio(dev, false) is only
 called in a very infrequent path(network/block device-remove()), looks the
 introduced cost isn't worthy of the obtained advantage.
 
 So could you accept not introducing counter? and I will update with the
 above improvement you suggested.

Well, please see my other message I sent a while ago. :-)

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()

2012-11-28 Thread Ming Lei
On Wed, Nov 28, 2012 at 6:06 PM, Rafael J. Wysocki r...@sisk.pl wrote:

 Well, it may be unfrequent, but does it mean it has to do things that may
 be avoided (ie. walking the children of every node in the path in some cases)?

I agree so without introducing extra cost, :-)

 I don't really think that the counters would cost us that much anyway.

On ARM v7, sizeof(struct device) becomes 376 from 368 after introducing
'unsigned intnoio_cnt;' to 'struct dev_pm_info', and total memory
increases about 3752bytes in a small configuration(about 494 device instance).
The actual memory increase should be more than the data because 'struct device'
is generally embedded into other concrete device structure.

 Also looks the current implementation of pm_runtime_set_memalloc_noio()
 is simple and clean enough with the flag, IMO.

 I know you always know better. :-)

We still need to consider cost and the function calling frequency, :-)


  I would use the flag only to store the information that
  pm_runtime_set_memalloc_noio(dev, true) has been run for this device 
  directly
  and I'd use a counter for everything else.
 
  That is, have power.memalloc_count that would be incremented when (1)
  pm_runtime_set_memalloc_noio(dev, true) is called for that device and (2) 
  when
  power.memalloc_count for one of its children changes from 0 to 1 (and
  analogously for decrementation).  Then, check the counter in 
  rpm_callback().

 Sorry, could you explain in a bit detail why we need the counter? Looks only
 checking the flag in rpm_callback() is enough, doesn't it?

 Why would I want to use power.memalloc_count in addition to the
 power.memalloc_noio flag?

 Consider this:

 pm_runtime_set_memalloc_noio(dev):
 return if power.memalloc_noio is set
 set power.memalloc_noio
   loop:
 increment power.memalloc_count
 if power.memalloc_count is 1 now switch to parent and go to loop

I am wondering if the above should be changed to below because the child
count of memalloc_noio device need to be recorded.

pm_runtime_set_memalloc_noio(dev):
 return if power.memalloc_noio is set
 set power.memalloc_noio
loop:
 increment power.memalloc_count
 switch to parent and go to loop

So pm_runtime_set_memalloc_noio(dev) will become worse than
the improved pm_runtime_set_memalloc_noio(dev, true), which
can return immediately if one dev or parent's flag is true.

 pm_runtime_clear_memalloc_noio(dev):
 return if power.memalloc_noio is unset
 unset power.memalloc_noio
   loop:
 decrement power.memalloc_count
 if power.memalloc_count is 0 now switch to parent and go to loop

The above will perform well than pm_runtime_set_memalloc_noio(dev, false),
because the above avoids to walk children of device.

So one becomes worse and another becomes better, :-)

Also the children count of one device is generally very small, less than
10 for most devices, see the data obtained in one common x86 pc(thinkpad
t410) from below link:

http://kernel.ubuntu.com/~ming/up/t410-dev-child-cnt.log

- about 8 devices whose child count is more than 10, top three are 18, 17 ,12,
and all the three are root devices.

- about 117 devices whose child count is between 1 and 9

- other 501 devices whose child count is zero

From above data, walking device children should have not much effect on
performance of pm_runtime_set_memalloc_noio(), which is also called in
very infrequent path.

 Looks kind of simpler, doesn't it?

Looks simpler, but more code lines than single
pm_runtime_set_memalloc_noio(), :-)


 And why rpm_callback() should check power.memalloc_count instead of the count?
 Because power.memalloc_noio will only be set for devices that
 pm_runtime_set_memalloc_noio(dev) was called for directly (not necessarily for
 the parents).

 And that works even if someone calls any of them twice in a row for the same
 device (presumably by mistake) and doesn't have to make any assumptions
 about devices it is called for.

IMO, we can ignore the mistake usage because the function is called only
in network/block core code currently, not by individual driver.


  Besides, don't you need to check children for the arg device itself?

 It isn't needed since the children of network/block device can't be
 involved of the deadlock in runtime PM path.

 Also, the function is only called by network device or block device
 subsystem, both the two kind of device are class device and should
 have no children.

 OK, so not walking the arg device's children is an optimization related to
 some assumptions regarding who's supposed to use this routine.  That should
 be clearly documented.

I think the patch already documents it in the comment of
pm_runtime_set_memalloc_noio().

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()

2012-11-27 Thread Ming Lei
On Wed, Nov 28, 2012 at 5:19 AM, Rafael J. Wysocki  wrote:
>
> Please use counters instead of walking the whole path every time.  Ie. in
> addition to the flag add a counter to store the number of the device's
> children having that flag set.

Even though counter is added, walking the whole path can't be avoided too,
and may be a explicit walking or recursion, because pm_runtime_set_memalloc_noio
is required to set or clear the flag(or increase/decrease the counter) of
devices in the whole path.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()

2012-11-27 Thread Ming Lei
On Wed, Nov 28, 2012 at 5:19 AM, Rafael J. Wysocki  wrote:
> On Saturday, November 24, 2012 08:59:14 PM Ming Lei wrote:
>> The patch introduces the flag of memalloc_noio in 'struct dev_pm_info'
>> to help PM core to teach mm not allocating memory with GFP_KERNEL
>> flag for avoiding probable deadlock.
>>
>> As explained in the comment, any GFP_KERNEL allocation inside
>> runtime_resume() or runtime_suspend() on any one of device in
>> the path from one block or network device to the root device
>> in the device tree may cause deadlock, the introduced
>> pm_runtime_set_memalloc_noio() sets or clears the flag on
>> device in the path recursively.
>>
>> Cc: Alan Stern 
>> Cc: "Rafael J. Wysocki" 
>> Signed-off-by: Ming Lei 
>> ---
>> v5:
>>   - fix code style error
>>   - add comment on clear the device memalloc_noio flag
>> v4:
>>   - rename memalloc_noio_resume as memalloc_noio
>>   - remove pm_runtime_get_memalloc_noio()
>>   - add comments on pm_runtime_set_memalloc_noio
>> v3:
>>   - introduce pm_runtime_get_memalloc_noio()
>>   - hold one global lock on pm_runtime_set_memalloc_noio
>>   - hold device power lock when accessing memalloc_noio_resume
>> flag suggested by Alan Stern
>>   - implement pm_runtime_set_memalloc_noio without recursion
>> suggested by Alan Stern
>> v2:
>>   - introduce pm_runtime_set_memalloc_noio()
>> ---
>>  drivers/base/power/runtime.c |   60 
>> ++
>>  include/linux/pm.h   |1 +
>>  include/linux/pm_runtime.h   |3 +++
>>  3 files changed, 64 insertions(+)
>>
>> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
>> index 3148b10..3e198a0 100644
>> --- a/drivers/base/power/runtime.c
>> +++ b/drivers/base/power/runtime.c
>> @@ -124,6 +124,66 @@ unsigned long pm_runtime_autosuspend_expiration(struct 
>> device *dev)
>>  }
>>  EXPORT_SYMBOL_GPL(pm_runtime_autosuspend_expiration);
>>
>> +static int dev_memalloc_noio(struct device *dev, void *data)
>> +{
>> + return dev->power.memalloc_noio;
>> +}
>> +
>> +/*
>> + * pm_runtime_set_memalloc_noio - Set a device's memalloc_noio flag.
>> + * @dev: Device to handle.
>> + * @enable: True for setting the flag and False for clearing the flag.
>> + *
>> + * Set the flag for all devices in the path from the device to the
>> + * root device in the device tree if @enable is true, otherwise clear
>> + * the flag for devices in the path whose siblings don't set the flag.
>> + *
>
> Please use counters instead of walking the whole path every time.  Ie. in
> addition to the flag add a counter to store the number of the device's
> children having that flag set.

Thanks for your review.

IMO, pm_runtime_set_memalloc_noio() is only called in
probe() and release() of block device and network device, which is
in a very infrequent path, so I am wondering if it is worthy of introducing
another counter for all devices.

Also looks the current implementation of pm_runtime_set_memalloc_noio()
is simple and clean enough with the flag, IMO.

> I would use the flag only to store the information that
> pm_runtime_set_memalloc_noio(dev, true) has been run for this device directly
> and I'd use a counter for everything else.
>
> That is, have power.memalloc_count that would be incremented when (1)
> pm_runtime_set_memalloc_noio(dev, true) is called for that device and (2) when
> power.memalloc_count for one of its children changes from 0 to 1 (and
> analogously for decrementation).  Then, check the counter in rpm_callback().

Sorry, could you explain in a bit detail why we need the counter? Looks only
checking the flag in rpm_callback() is enough, doesn't it?

>
> Besides, don't you need to check children for the arg device itself?

It isn't needed since the children of network/block device can't be
involved of the deadlock in runtime PM path.

Also, the function is only called by network device or block device
subsystem, both the two kind of device are class device and should
have no children.

>
>> + * The function should only be called by block device, or network
>> + * device driver for solving the deadlock problem during runtime
>> + * resume/suspend:
>> + *
>> + * If memory allocation with GFP_KERNEL is called inside runtime
>> + * resume/suspend callback of any one of its ancestors(or the
>> + * block device itself), the deadlock may be triggered inside the
>> + * memory allocation since it might not complete until the block
>> + * device becomes active and the involed page I/O finishes. The
>> + * situation is pointed out first by Alan Stern. Network device
>> + * are involved in iSCSI kind of situation.
>> + *
>> + * The lock of dev_hotplug_mutex is held in the function for handling
>> + * hotplug race because pm_runtime_set_memalloc_noio() may be called
>> + * in async probe().
>> + *
>> + * The function should be called between device_add() and device_del()
>> + * on the affected 

Re: [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()

2012-11-27 Thread Rafael J. Wysocki
On Tuesday, November 27, 2012 10:19:29 PM Rafael J. Wysocki wrote:
> On Saturday, November 24, 2012 08:59:14 PM Ming Lei wrote:
> > The patch introduces the flag of memalloc_noio in 'struct dev_pm_info'
> > to help PM core to teach mm not allocating memory with GFP_KERNEL
> > flag for avoiding probable deadlock.
> > 
> > As explained in the comment, any GFP_KERNEL allocation inside
> > runtime_resume() or runtime_suspend() on any one of device in
> > the path from one block or network device to the root device
> > in the device tree may cause deadlock, the introduced
> > pm_runtime_set_memalloc_noio() sets or clears the flag on
> > device in the path recursively.
> > 
> > Cc: Alan Stern 
> > Cc: "Rafael J. Wysocki" 
> > Signed-off-by: Ming Lei 
> > ---
> > v5:
> > - fix code style error
> > - add comment on clear the device memalloc_noio flag
> > v4:
> > - rename memalloc_noio_resume as memalloc_noio
> > - remove pm_runtime_get_memalloc_noio()
> > - add comments on pm_runtime_set_memalloc_noio
> > v3:
> > - introduce pm_runtime_get_memalloc_noio()
> > - hold one global lock on pm_runtime_set_memalloc_noio
> > - hold device power lock when accessing memalloc_noio_resume
> >   flag suggested by Alan Stern
> > - implement pm_runtime_set_memalloc_noio without recursion
> >   suggested by Alan Stern
> > v2:
> > - introduce pm_runtime_set_memalloc_noio()
> > ---
> >  drivers/base/power/runtime.c |   60 
> > ++
> >  include/linux/pm.h   |1 +
> >  include/linux/pm_runtime.h   |3 +++
> >  3 files changed, 64 insertions(+)
> > 
> > diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> > index 3148b10..3e198a0 100644
> > --- a/drivers/base/power/runtime.c
> > +++ b/drivers/base/power/runtime.c
> > @@ -124,6 +124,66 @@ unsigned long pm_runtime_autosuspend_expiration(struct 
> > device *dev)
> >  }
> >  EXPORT_SYMBOL_GPL(pm_runtime_autosuspend_expiration);
> >  
> > +static int dev_memalloc_noio(struct device *dev, void *data)
> > +{
> > +   return dev->power.memalloc_noio;
> > +}
> > +
> > +/*
> > + * pm_runtime_set_memalloc_noio - Set a device's memalloc_noio flag.
> > + * @dev: Device to handle.
> > + * @enable: True for setting the flag and False for clearing the flag.
> > + *
> > + * Set the flag for all devices in the path from the device to the
> > + * root device in the device tree if @enable is true, otherwise clear
> > + * the flag for devices in the path whose siblings don't set the flag.
> > + *
> 
> Please use counters instead of walking the whole path every time.  Ie. in
> addition to the flag add a counter to store the number of the device's
> children having that flag set.

I would use the flag only to store the information that
pm_runtime_set_memalloc_noio(dev, true) has been run for this device directly
and I'd use a counter for everything else.

That is, have power.memalloc_count that would be incremented when (1)
pm_runtime_set_memalloc_noio(dev, true) is called for that device and (2) when
power.memalloc_count for one of its children changes from 0 to 1 (and
analogously for decrementation).  Then, check the counter in rpm_callback().

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()

2012-11-27 Thread Rafael J. Wysocki
On Saturday, November 24, 2012 08:59:14 PM Ming Lei wrote:
> The patch introduces the flag of memalloc_noio in 'struct dev_pm_info'
> to help PM core to teach mm not allocating memory with GFP_KERNEL
> flag for avoiding probable deadlock.
> 
> As explained in the comment, any GFP_KERNEL allocation inside
> runtime_resume() or runtime_suspend() on any one of device in
> the path from one block or network device to the root device
> in the device tree may cause deadlock, the introduced
> pm_runtime_set_memalloc_noio() sets or clears the flag on
> device in the path recursively.
> 
> Cc: Alan Stern 
> Cc: "Rafael J. Wysocki" 
> Signed-off-by: Ming Lei 
> ---
> v5:
>   - fix code style error
>   - add comment on clear the device memalloc_noio flag
> v4:
>   - rename memalloc_noio_resume as memalloc_noio
>   - remove pm_runtime_get_memalloc_noio()
>   - add comments on pm_runtime_set_memalloc_noio
> v3:
>   - introduce pm_runtime_get_memalloc_noio()
>   - hold one global lock on pm_runtime_set_memalloc_noio
>   - hold device power lock when accessing memalloc_noio_resume
> flag suggested by Alan Stern
>   - implement pm_runtime_set_memalloc_noio without recursion
> suggested by Alan Stern
> v2:
>   - introduce pm_runtime_set_memalloc_noio()
> ---
>  drivers/base/power/runtime.c |   60 
> ++
>  include/linux/pm.h   |1 +
>  include/linux/pm_runtime.h   |3 +++
>  3 files changed, 64 insertions(+)
> 
> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> index 3148b10..3e198a0 100644
> --- a/drivers/base/power/runtime.c
> +++ b/drivers/base/power/runtime.c
> @@ -124,6 +124,66 @@ unsigned long pm_runtime_autosuspend_expiration(struct 
> device *dev)
>  }
>  EXPORT_SYMBOL_GPL(pm_runtime_autosuspend_expiration);
>  
> +static int dev_memalloc_noio(struct device *dev, void *data)
> +{
> + return dev->power.memalloc_noio;
> +}
> +
> +/*
> + * pm_runtime_set_memalloc_noio - Set a device's memalloc_noio flag.
> + * @dev: Device to handle.
> + * @enable: True for setting the flag and False for clearing the flag.
> + *
> + * Set the flag for all devices in the path from the device to the
> + * root device in the device tree if @enable is true, otherwise clear
> + * the flag for devices in the path whose siblings don't set the flag.
> + *

Please use counters instead of walking the whole path every time.  Ie. in
addition to the flag add a counter to store the number of the device's
children having that flag set.

Besides, don't you need to check children for the arg device itself?

> + * The function should only be called by block device, or network
> + * device driver for solving the deadlock problem during runtime
> + * resume/suspend:
> + *
> + * If memory allocation with GFP_KERNEL is called inside runtime
> + * resume/suspend callback of any one of its ancestors(or the
> + * block device itself), the deadlock may be triggered inside the
> + * memory allocation since it might not complete until the block
> + * device becomes active and the involed page I/O finishes. The
> + * situation is pointed out first by Alan Stern. Network device
> + * are involved in iSCSI kind of situation.
> + *
> + * The lock of dev_hotplug_mutex is held in the function for handling
> + * hotplug race because pm_runtime_set_memalloc_noio() may be called
> + * in async probe().
> + *
> + * The function should be called between device_add() and device_del()
> + * on the affected device(block/network device).
> + */
> +void pm_runtime_set_memalloc_noio(struct device *dev, bool enable)
> +{
> + static DEFINE_MUTEX(dev_hotplug_mutex);

What's the mutex for?

> +
> + mutex_lock(_hotplug_mutex);
> + for (;;) {
> + /* hold power lock since bitfield is not SMP-safe. */
> + spin_lock_irq(>power.lock);
> + dev->power.memalloc_noio = enable;
> + spin_unlock_irq(>power.lock);
> +
> + dev = dev->parent;
> +
> + /*
> +  * clear flag of the parent device only if all the
> +  * children don't set the flag because ancestor's
> +  * flag was set by any one of the descendants.
> +  */
> + if (!dev || (!enable &&
> +  device_for_each_child(dev, NULL,
> +dev_memalloc_noio)))
> + break;
> + }
> + mutex_unlock(_hotplug_mutex);
> +}
> +EXPORT_SYMBOL_GPL(pm_runtime_set_memalloc_noio);
> +
>  /**
>   * rpm_check_suspend_allowed - Test whether a device may be suspended.
>   * @dev: Device to test.
> diff --git a/include/linux/pm.h b/include/linux/pm.h
> index 03d7bb1..1a8a69d 100644
> --- a/include/linux/pm.h
> +++ b/include/linux/pm.h
> @@ -538,6 +538,7 @@ struct dev_pm_info {
>   unsigned intirq_safe:1;
>   unsigned int

Re: [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()

2012-11-27 Thread Rafael J. Wysocki
On Saturday, November 24, 2012 08:59:14 PM Ming Lei wrote:
 The patch introduces the flag of memalloc_noio in 'struct dev_pm_info'
 to help PM core to teach mm not allocating memory with GFP_KERNEL
 flag for avoiding probable deadlock.
 
 As explained in the comment, any GFP_KERNEL allocation inside
 runtime_resume() or runtime_suspend() on any one of device in
 the path from one block or network device to the root device
 in the device tree may cause deadlock, the introduced
 pm_runtime_set_memalloc_noio() sets or clears the flag on
 device in the path recursively.
 
 Cc: Alan Stern st...@rowland.harvard.edu
 Cc: Rafael J. Wysocki r...@sisk.pl
 Signed-off-by: Ming Lei ming@canonical.com
 ---
 v5:
   - fix code style error
   - add comment on clear the device memalloc_noio flag
 v4:
   - rename memalloc_noio_resume as memalloc_noio
   - remove pm_runtime_get_memalloc_noio()
   - add comments on pm_runtime_set_memalloc_noio
 v3:
   - introduce pm_runtime_get_memalloc_noio()
   - hold one global lock on pm_runtime_set_memalloc_noio
   - hold device power lock when accessing memalloc_noio_resume
 flag suggested by Alan Stern
   - implement pm_runtime_set_memalloc_noio without recursion
 suggested by Alan Stern
 v2:
   - introduce pm_runtime_set_memalloc_noio()
 ---
  drivers/base/power/runtime.c |   60 
 ++
  include/linux/pm.h   |1 +
  include/linux/pm_runtime.h   |3 +++
  3 files changed, 64 insertions(+)
 
 diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
 index 3148b10..3e198a0 100644
 --- a/drivers/base/power/runtime.c
 +++ b/drivers/base/power/runtime.c
 @@ -124,6 +124,66 @@ unsigned long pm_runtime_autosuspend_expiration(struct 
 device *dev)
  }
  EXPORT_SYMBOL_GPL(pm_runtime_autosuspend_expiration);
  
 +static int dev_memalloc_noio(struct device *dev, void *data)
 +{
 + return dev-power.memalloc_noio;
 +}
 +
 +/*
 + * pm_runtime_set_memalloc_noio - Set a device's memalloc_noio flag.
 + * @dev: Device to handle.
 + * @enable: True for setting the flag and False for clearing the flag.
 + *
 + * Set the flag for all devices in the path from the device to the
 + * root device in the device tree if @enable is true, otherwise clear
 + * the flag for devices in the path whose siblings don't set the flag.
 + *

Please use counters instead of walking the whole path every time.  Ie. in
addition to the flag add a counter to store the number of the device's
children having that flag set.

Besides, don't you need to check children for the arg device itself?

 + * The function should only be called by block device, or network
 + * device driver for solving the deadlock problem during runtime
 + * resume/suspend:
 + *
 + * If memory allocation with GFP_KERNEL is called inside runtime
 + * resume/suspend callback of any one of its ancestors(or the
 + * block device itself), the deadlock may be triggered inside the
 + * memory allocation since it might not complete until the block
 + * device becomes active and the involed page I/O finishes. The
 + * situation is pointed out first by Alan Stern. Network device
 + * are involved in iSCSI kind of situation.
 + *
 + * The lock of dev_hotplug_mutex is held in the function for handling
 + * hotplug race because pm_runtime_set_memalloc_noio() may be called
 + * in async probe().
 + *
 + * The function should be called between device_add() and device_del()
 + * on the affected device(block/network device).
 + */
 +void pm_runtime_set_memalloc_noio(struct device *dev, bool enable)
 +{
 + static DEFINE_MUTEX(dev_hotplug_mutex);

What's the mutex for?

 +
 + mutex_lock(dev_hotplug_mutex);
 + for (;;) {
 + /* hold power lock since bitfield is not SMP-safe. */
 + spin_lock_irq(dev-power.lock);
 + dev-power.memalloc_noio = enable;
 + spin_unlock_irq(dev-power.lock);
 +
 + dev = dev-parent;
 +
 + /*
 +  * clear flag of the parent device only if all the
 +  * children don't set the flag because ancestor's
 +  * flag was set by any one of the descendants.
 +  */
 + if (!dev || (!enable 
 +  device_for_each_child(dev, NULL,
 +dev_memalloc_noio)))
 + break;
 + }
 + mutex_unlock(dev_hotplug_mutex);
 +}
 +EXPORT_SYMBOL_GPL(pm_runtime_set_memalloc_noio);
 +
  /**
   * rpm_check_suspend_allowed - Test whether a device may be suspended.
   * @dev: Device to test.
 diff --git a/include/linux/pm.h b/include/linux/pm.h
 index 03d7bb1..1a8a69d 100644
 --- a/include/linux/pm.h
 +++ b/include/linux/pm.h
 @@ -538,6 +538,7 @@ struct dev_pm_info {
   unsigned intirq_safe:1;
   unsigned intuse_autosuspend:1;
   unsigned int

Re: [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()

2012-11-27 Thread Rafael J. Wysocki
On Tuesday, November 27, 2012 10:19:29 PM Rafael J. Wysocki wrote:
 On Saturday, November 24, 2012 08:59:14 PM Ming Lei wrote:
  The patch introduces the flag of memalloc_noio in 'struct dev_pm_info'
  to help PM core to teach mm not allocating memory with GFP_KERNEL
  flag for avoiding probable deadlock.
  
  As explained in the comment, any GFP_KERNEL allocation inside
  runtime_resume() or runtime_suspend() on any one of device in
  the path from one block or network device to the root device
  in the device tree may cause deadlock, the introduced
  pm_runtime_set_memalloc_noio() sets or clears the flag on
  device in the path recursively.
  
  Cc: Alan Stern st...@rowland.harvard.edu
  Cc: Rafael J. Wysocki r...@sisk.pl
  Signed-off-by: Ming Lei ming@canonical.com
  ---
  v5:
  - fix code style error
  - add comment on clear the device memalloc_noio flag
  v4:
  - rename memalloc_noio_resume as memalloc_noio
  - remove pm_runtime_get_memalloc_noio()
  - add comments on pm_runtime_set_memalloc_noio
  v3:
  - introduce pm_runtime_get_memalloc_noio()
  - hold one global lock on pm_runtime_set_memalloc_noio
  - hold device power lock when accessing memalloc_noio_resume
flag suggested by Alan Stern
  - implement pm_runtime_set_memalloc_noio without recursion
suggested by Alan Stern
  v2:
  - introduce pm_runtime_set_memalloc_noio()
  ---
   drivers/base/power/runtime.c |   60 
  ++
   include/linux/pm.h   |1 +
   include/linux/pm_runtime.h   |3 +++
   3 files changed, 64 insertions(+)
  
  diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
  index 3148b10..3e198a0 100644
  --- a/drivers/base/power/runtime.c
  +++ b/drivers/base/power/runtime.c
  @@ -124,6 +124,66 @@ unsigned long pm_runtime_autosuspend_expiration(struct 
  device *dev)
   }
   EXPORT_SYMBOL_GPL(pm_runtime_autosuspend_expiration);
   
  +static int dev_memalloc_noio(struct device *dev, void *data)
  +{
  +   return dev-power.memalloc_noio;
  +}
  +
  +/*
  + * pm_runtime_set_memalloc_noio - Set a device's memalloc_noio flag.
  + * @dev: Device to handle.
  + * @enable: True for setting the flag and False for clearing the flag.
  + *
  + * Set the flag for all devices in the path from the device to the
  + * root device in the device tree if @enable is true, otherwise clear
  + * the flag for devices in the path whose siblings don't set the flag.
  + *
 
 Please use counters instead of walking the whole path every time.  Ie. in
 addition to the flag add a counter to store the number of the device's
 children having that flag set.

I would use the flag only to store the information that
pm_runtime_set_memalloc_noio(dev, true) has been run for this device directly
and I'd use a counter for everything else.

That is, have power.memalloc_count that would be incremented when (1)
pm_runtime_set_memalloc_noio(dev, true) is called for that device and (2) when
power.memalloc_count for one of its children changes from 0 to 1 (and
analogously for decrementation).  Then, check the counter in rpm_callback().

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()

2012-11-27 Thread Ming Lei
On Wed, Nov 28, 2012 at 5:19 AM, Rafael J. Wysocki r...@sisk.pl wrote:
 On Saturday, November 24, 2012 08:59:14 PM Ming Lei wrote:
 The patch introduces the flag of memalloc_noio in 'struct dev_pm_info'
 to help PM core to teach mm not allocating memory with GFP_KERNEL
 flag for avoiding probable deadlock.

 As explained in the comment, any GFP_KERNEL allocation inside
 runtime_resume() or runtime_suspend() on any one of device in
 the path from one block or network device to the root device
 in the device tree may cause deadlock, the introduced
 pm_runtime_set_memalloc_noio() sets or clears the flag on
 device in the path recursively.

 Cc: Alan Stern st...@rowland.harvard.edu
 Cc: Rafael J. Wysocki r...@sisk.pl
 Signed-off-by: Ming Lei ming@canonical.com
 ---
 v5:
   - fix code style error
   - add comment on clear the device memalloc_noio flag
 v4:
   - rename memalloc_noio_resume as memalloc_noio
   - remove pm_runtime_get_memalloc_noio()
   - add comments on pm_runtime_set_memalloc_noio
 v3:
   - introduce pm_runtime_get_memalloc_noio()
   - hold one global lock on pm_runtime_set_memalloc_noio
   - hold device power lock when accessing memalloc_noio_resume
 flag suggested by Alan Stern
   - implement pm_runtime_set_memalloc_noio without recursion
 suggested by Alan Stern
 v2:
   - introduce pm_runtime_set_memalloc_noio()
 ---
  drivers/base/power/runtime.c |   60 
 ++
  include/linux/pm.h   |1 +
  include/linux/pm_runtime.h   |3 +++
  3 files changed, 64 insertions(+)

 diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
 index 3148b10..3e198a0 100644
 --- a/drivers/base/power/runtime.c
 +++ b/drivers/base/power/runtime.c
 @@ -124,6 +124,66 @@ unsigned long pm_runtime_autosuspend_expiration(struct 
 device *dev)
  }
  EXPORT_SYMBOL_GPL(pm_runtime_autosuspend_expiration);

 +static int dev_memalloc_noio(struct device *dev, void *data)
 +{
 + return dev-power.memalloc_noio;
 +}
 +
 +/*
 + * pm_runtime_set_memalloc_noio - Set a device's memalloc_noio flag.
 + * @dev: Device to handle.
 + * @enable: True for setting the flag and False for clearing the flag.
 + *
 + * Set the flag for all devices in the path from the device to the
 + * root device in the device tree if @enable is true, otherwise clear
 + * the flag for devices in the path whose siblings don't set the flag.
 + *

 Please use counters instead of walking the whole path every time.  Ie. in
 addition to the flag add a counter to store the number of the device's
 children having that flag set.

Thanks for your review.

IMO, pm_runtime_set_memalloc_noio() is only called in
probe() and release() of block device and network device, which is
in a very infrequent path, so I am wondering if it is worthy of introducing
another counter for all devices.

Also looks the current implementation of pm_runtime_set_memalloc_noio()
is simple and clean enough with the flag, IMO.

 I would use the flag only to store the information that
 pm_runtime_set_memalloc_noio(dev, true) has been run for this device directly
 and I'd use a counter for everything else.

 That is, have power.memalloc_count that would be incremented when (1)
 pm_runtime_set_memalloc_noio(dev, true) is called for that device and (2) when
 power.memalloc_count for one of its children changes from 0 to 1 (and
 analogously for decrementation).  Then, check the counter in rpm_callback().

Sorry, could you explain in a bit detail why we need the counter? Looks only
checking the flag in rpm_callback() is enough, doesn't it?


 Besides, don't you need to check children for the arg device itself?

It isn't needed since the children of network/block device can't be
involved of the deadlock in runtime PM path.

Also, the function is only called by network device or block device
subsystem, both the two kind of device are class device and should
have no children.


 + * The function should only be called by block device, or network
 + * device driver for solving the deadlock problem during runtime
 + * resume/suspend:
 + *
 + * If memory allocation with GFP_KERNEL is called inside runtime
 + * resume/suspend callback of any one of its ancestors(or the
 + * block device itself), the deadlock may be triggered inside the
 + * memory allocation since it might not complete until the block
 + * device becomes active and the involed page I/O finishes. The
 + * situation is pointed out first by Alan Stern. Network device
 + * are involved in iSCSI kind of situation.
 + *
 + * The lock of dev_hotplug_mutex is held in the function for handling
 + * hotplug race because pm_runtime_set_memalloc_noio() may be called
 + * in async probe().
 + *
 + * The function should be called between device_add() and device_del()
 + * on the affected device(block/network device).
 + */
 +void pm_runtime_set_memalloc_noio(struct device *dev, bool enable)
 +{
 + 

Re: [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()

2012-11-27 Thread Ming Lei
On Wed, Nov 28, 2012 at 5:19 AM, Rafael J. Wysocki r...@sisk.pl wrote:

 Please use counters instead of walking the whole path every time.  Ie. in
 addition to the flag add a counter to store the number of the device's
 children having that flag set.

Even though counter is added, walking the whole path can't be avoided too,
and may be a explicit walking or recursion, because pm_runtime_set_memalloc_noio
is required to set or clear the flag(or increase/decrease the counter) of
devices in the whole path.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()

2012-11-24 Thread Ming Lei
The patch introduces the flag of memalloc_noio in 'struct dev_pm_info'
to help PM core to teach mm not allocating memory with GFP_KERNEL
flag for avoiding probable deadlock.

As explained in the comment, any GFP_KERNEL allocation inside
runtime_resume() or runtime_suspend() on any one of device in
the path from one block or network device to the root device
in the device tree may cause deadlock, the introduced
pm_runtime_set_memalloc_noio() sets or clears the flag on
device in the path recursively.

Cc: Alan Stern 
Cc: "Rafael J. Wysocki" 
Signed-off-by: Ming Lei 
---
v5:
- fix code style error
- add comment on clear the device memalloc_noio flag
v4:
- rename memalloc_noio_resume as memalloc_noio
- remove pm_runtime_get_memalloc_noio()
- add comments on pm_runtime_set_memalloc_noio
v3:
- introduce pm_runtime_get_memalloc_noio()
- hold one global lock on pm_runtime_set_memalloc_noio
- hold device power lock when accessing memalloc_noio_resume
  flag suggested by Alan Stern
- implement pm_runtime_set_memalloc_noio without recursion
  suggested by Alan Stern
v2:
- introduce pm_runtime_set_memalloc_noio()
---
 drivers/base/power/runtime.c |   60 ++
 include/linux/pm.h   |1 +
 include/linux/pm_runtime.h   |3 +++
 3 files changed, 64 insertions(+)

diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index 3148b10..3e198a0 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -124,6 +124,66 @@ unsigned long pm_runtime_autosuspend_expiration(struct 
device *dev)
 }
 EXPORT_SYMBOL_GPL(pm_runtime_autosuspend_expiration);
 
+static int dev_memalloc_noio(struct device *dev, void *data)
+{
+   return dev->power.memalloc_noio;
+}
+
+/*
+ * pm_runtime_set_memalloc_noio - Set a device's memalloc_noio flag.
+ * @dev: Device to handle.
+ * @enable: True for setting the flag and False for clearing the flag.
+ *
+ * Set the flag for all devices in the path from the device to the
+ * root device in the device tree if @enable is true, otherwise clear
+ * the flag for devices in the path whose siblings don't set the flag.
+ *
+ * The function should only be called by block device, or network
+ * device driver for solving the deadlock problem during runtime
+ * resume/suspend:
+ *
+ * If memory allocation with GFP_KERNEL is called inside runtime
+ * resume/suspend callback of any one of its ancestors(or the
+ * block device itself), the deadlock may be triggered inside the
+ * memory allocation since it might not complete until the block
+ * device becomes active and the involed page I/O finishes. The
+ * situation is pointed out first by Alan Stern. Network device
+ * are involved in iSCSI kind of situation.
+ *
+ * The lock of dev_hotplug_mutex is held in the function for handling
+ * hotplug race because pm_runtime_set_memalloc_noio() may be called
+ * in async probe().
+ *
+ * The function should be called between device_add() and device_del()
+ * on the affected device(block/network device).
+ */
+void pm_runtime_set_memalloc_noio(struct device *dev, bool enable)
+{
+   static DEFINE_MUTEX(dev_hotplug_mutex);
+
+   mutex_lock(_hotplug_mutex);
+   for (;;) {
+   /* hold power lock since bitfield is not SMP-safe. */
+   spin_lock_irq(>power.lock);
+   dev->power.memalloc_noio = enable;
+   spin_unlock_irq(>power.lock);
+
+   dev = dev->parent;
+
+   /*
+* clear flag of the parent device only if all the
+* children don't set the flag because ancestor's
+* flag was set by any one of the descendants.
+*/
+   if (!dev || (!enable &&
+device_for_each_child(dev, NULL,
+  dev_memalloc_noio)))
+   break;
+   }
+   mutex_unlock(_hotplug_mutex);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_set_memalloc_noio);
+
 /**
  * rpm_check_suspend_allowed - Test whether a device may be suspended.
  * @dev: Device to test.
diff --git a/include/linux/pm.h b/include/linux/pm.h
index 03d7bb1..1a8a69d 100644
--- a/include/linux/pm.h
+++ b/include/linux/pm.h
@@ -538,6 +538,7 @@ struct dev_pm_info {
unsigned intirq_safe:1;
unsigned intuse_autosuspend:1;
unsigned inttimer_autosuspends:1;
+   unsigned intmemalloc_noio:1;
enum rpm_requestrequest;
enum rpm_status runtime_status;
int runtime_error;
diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h
index f271860..775e063 100644
--- a/include/linux/pm_runtime.h
+++ b/include/linux/pm_runtime.h
@@ -47,6 +47,7 @@ extern void pm_runtime_set_autosuspend_delay(struct device 
*dev, int 

[PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()

2012-11-24 Thread Ming Lei
The patch introduces the flag of memalloc_noio in 'struct dev_pm_info'
to help PM core to teach mm not allocating memory with GFP_KERNEL
flag for avoiding probable deadlock.

As explained in the comment, any GFP_KERNEL allocation inside
runtime_resume() or runtime_suspend() on any one of device in
the path from one block or network device to the root device
in the device tree may cause deadlock, the introduced
pm_runtime_set_memalloc_noio() sets or clears the flag on
device in the path recursively.

Cc: Alan Stern st...@rowland.harvard.edu
Cc: Rafael J. Wysocki r...@sisk.pl
Signed-off-by: Ming Lei ming@canonical.com
---
v5:
- fix code style error
- add comment on clear the device memalloc_noio flag
v4:
- rename memalloc_noio_resume as memalloc_noio
- remove pm_runtime_get_memalloc_noio()
- add comments on pm_runtime_set_memalloc_noio
v3:
- introduce pm_runtime_get_memalloc_noio()
- hold one global lock on pm_runtime_set_memalloc_noio
- hold device power lock when accessing memalloc_noio_resume
  flag suggested by Alan Stern
- implement pm_runtime_set_memalloc_noio without recursion
  suggested by Alan Stern
v2:
- introduce pm_runtime_set_memalloc_noio()
---
 drivers/base/power/runtime.c |   60 ++
 include/linux/pm.h   |1 +
 include/linux/pm_runtime.h   |3 +++
 3 files changed, 64 insertions(+)

diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index 3148b10..3e198a0 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -124,6 +124,66 @@ unsigned long pm_runtime_autosuspend_expiration(struct 
device *dev)
 }
 EXPORT_SYMBOL_GPL(pm_runtime_autosuspend_expiration);
 
+static int dev_memalloc_noio(struct device *dev, void *data)
+{
+   return dev-power.memalloc_noio;
+}
+
+/*
+ * pm_runtime_set_memalloc_noio - Set a device's memalloc_noio flag.
+ * @dev: Device to handle.
+ * @enable: True for setting the flag and False for clearing the flag.
+ *
+ * Set the flag for all devices in the path from the device to the
+ * root device in the device tree if @enable is true, otherwise clear
+ * the flag for devices in the path whose siblings don't set the flag.
+ *
+ * The function should only be called by block device, or network
+ * device driver for solving the deadlock problem during runtime
+ * resume/suspend:
+ *
+ * If memory allocation with GFP_KERNEL is called inside runtime
+ * resume/suspend callback of any one of its ancestors(or the
+ * block device itself), the deadlock may be triggered inside the
+ * memory allocation since it might not complete until the block
+ * device becomes active and the involed page I/O finishes. The
+ * situation is pointed out first by Alan Stern. Network device
+ * are involved in iSCSI kind of situation.
+ *
+ * The lock of dev_hotplug_mutex is held in the function for handling
+ * hotplug race because pm_runtime_set_memalloc_noio() may be called
+ * in async probe().
+ *
+ * The function should be called between device_add() and device_del()
+ * on the affected device(block/network device).
+ */
+void pm_runtime_set_memalloc_noio(struct device *dev, bool enable)
+{
+   static DEFINE_MUTEX(dev_hotplug_mutex);
+
+   mutex_lock(dev_hotplug_mutex);
+   for (;;) {
+   /* hold power lock since bitfield is not SMP-safe. */
+   spin_lock_irq(dev-power.lock);
+   dev-power.memalloc_noio = enable;
+   spin_unlock_irq(dev-power.lock);
+
+   dev = dev-parent;
+
+   /*
+* clear flag of the parent device only if all the
+* children don't set the flag because ancestor's
+* flag was set by any one of the descendants.
+*/
+   if (!dev || (!enable 
+device_for_each_child(dev, NULL,
+  dev_memalloc_noio)))
+   break;
+   }
+   mutex_unlock(dev_hotplug_mutex);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_set_memalloc_noio);
+
 /**
  * rpm_check_suspend_allowed - Test whether a device may be suspended.
  * @dev: Device to test.
diff --git a/include/linux/pm.h b/include/linux/pm.h
index 03d7bb1..1a8a69d 100644
--- a/include/linux/pm.h
+++ b/include/linux/pm.h
@@ -538,6 +538,7 @@ struct dev_pm_info {
unsigned intirq_safe:1;
unsigned intuse_autosuspend:1;
unsigned inttimer_autosuspends:1;
+   unsigned intmemalloc_noio:1;
enum rpm_requestrequest;
enum rpm_status runtime_status;
int runtime_error;
diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h
index f271860..775e063 100644
--- a/include/linux/pm_runtime.h
+++ b/include/linux/pm_runtime.h
@@ -47,6 +47,7 @@ extern