subject:"\[PATCH\] ACPI \/ hotplug\: Fix concurrency issues and memory leaks"

On Fri, 2013-02-15 at 16:33 +, Moore, Robert wrote:
> > > > > > > > > Is there a mechanism in ACPICA to ensure that a handle
> > > > > > > > > won't become stale while a notify handler is running for
> > > > > > > > > it or is the OS responsible for ensuring that
> > > > > > > > > _EJ0 won't be run in parallel with notify handlers for
> > > > > > > > > device objects being ejected?
> > > > > > > > >
> > > > > > > >
> > > > > > > > It is up to the host.
> > > > > > >
> > > > > > > I was afraid that that might be the case. :-)
> > > > > > >
> > > > > > > So far the (Linux) host has been happily ignoring that
> > > > > > > potential problem, so I guess it can still be ignored for a
> > > > > > > while, although we'll need to address it eventually at one
> > point.
> > > > > >
> > > > > > I would think it should be fairly simple to setup a mechanism to
> > > > > > either tell the driver or for the driver to figure it out --
> > > > > > such that the driver knows that all handles associated with the
> > > > > > device are now invalid. Another way to look at it is that when
> > > > > > the device is re-installed, the driver should reinitialize such
> > > > > > that it obtains new handles for the devices and subobjects in
> > question.
> > > > >
> > > > > Unfortunately, there is quite strong assumption in our code that
> > > > > ACPI handles will not become stale before the device objects
> > > > > associated with them are removed.  For this reason, we need to
> > > > > know in advance which handles will become stale as a result of a
> > > > > table unload and remove their device objects beforehand.
> > > > >
> > > > > Moreover, when there's a notify handler installed for a given ACPI
> > > > > handle and that handle becomes stale while the notify handler is
> > > > > running, we'll be in trouble.  To avoid that we need to ensure
> > > > > that table unloads and notifies will always be mutually exclusive.
> > > >
> > > > I wonder if we can make acpi_ns_validate_handle() to actually be
> > > > able to verify if a given handle is valid.  This way, ACPICA can
> > > > fail gracefully
> > > > (AE_BAD_PARAMETER) when a stable handle is passed to the interfaces.
> > >
> > > That'd be good, but to implement it, I think, it would be necessary to
> > > introduce some reference counting of namespace objects such that the
> > > given object would only be deleted after the last reference to it had
> > been dropped.
> > > On table unload it would just be marked as invalid, but it would stay
> > > in memory as long as there were any references to it.
> > >
> > > So, for example, a notify handler would start from something like
> > > acpi_add_reference(handle), which would guarantee that the object
> > > pointed to by handle would stay in memory, and it would finish by
> > > doing
> > > acpi_drop_reference(handle) or a work item scheduled by it would do
> > that.
> > >
> > > We do that for objects based on struct device and it works well.
> > 
> > There is other way to implement it.  Since acpi_handle is defined as an
> > opaque value, this could be changed to an index to an array of pointers,
> > instead of a direct pointer.  Then we can safely invalidate an index by
> > invalidating the pointer associated with the index.
> 
> 
> 
> We have of course thought about adding a mechanism to validate/invalidate 
> acpica namespace handles. In fact, this is why the ACPI_HANDLE data type was 
> introduced in the first place, so that if we ever wanted or needed to 
> implement something like this, it would not break a lot of existing code.
> 
> However, we have never had a real need to implement such a mechanism, nor has 
> the ever been a request from any of the operating systems that run ACPICA.
> 
> The existing model is that the host has knowledge of what objects will go 
> away when a table is unloaded, and can simply assume that all related acpi 
> handles will go bad after the unload (and table unload is the only case where 
> parts of the namespace go away, as far as I remember). Upon a reload of the 
> table, the host knows to reinitialize all handles associated with the 
> table/device.
> 
> ACPCI has a table handler mechanism, where the host handler is invoked 
> *before* an ACPI table is actually unloaded, so the host can prepare for the 
> unload. For example, before returning from the table handler, the host may 
> want to synchronize by waiting for any outstanding notifies to complete, then 
> simply ignoring any further notifies from any devices associated with the 
> table.
> 
> In summary, it has always been felt that the fairly large overhead of 
> implementing a mechanism like this is not worth the cost, as well as not 
> really needed.
> 
> I suggest that the implementation of eject support proceed by using the 
> existing mechanisms such as the table handler. If additional 
> support/interfaces are needed in ACPICA, we can discuss it. However, just 
> about the last thing I would like to do is add a level of

RE: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

2013-02-15 Thread Moore, Robert


> > > > > > > > Is there a mechanism in ACPICA to ensure that a handle
> > > > > > > > won't become stale while a notify handler is running for
> > > > > > > > it or is the OS responsible for ensuring that
> > > > > > > > _EJ0 won't be run in parallel with notify handlers for
> > > > > > > > device objects being ejected?
> > > > > > > >
> > > > > > >
> > > > > > > It is up to the host.
> > > > > >
> > > > > > I was afraid that that might be the case. :-)
> > > > > >
> > > > > > So far the (Linux) host has been happily ignoring that
> > > > > > potential problem, so I guess it can still be ignored for a
> > > > > > while, although we'll need to address it eventually at one
> point.
> > > > >
> > > > > I would think it should be fairly simple to setup a mechanism to
> > > > > either tell the driver or for the driver to figure it out --
> > > > > such that the driver knows that all handles associated with the
> > > > > device are now invalid. Another way to look at it is that when
> > > > > the device is re-installed, the driver should reinitialize such
> > > > > that it obtains new handles for the devices and subobjects in
> question.
> > > >
> > > > Unfortunately, there is quite strong assumption in our code that
> > > > ACPI handles will not become stale before the device objects
> > > > associated with them are removed.  For this reason, we need to
> > > > know in advance which handles will become stale as a result of a
> > > > table unload and remove their device objects beforehand.
> > > >
> > > > Moreover, when there's a notify handler installed for a given ACPI
> > > > handle and that handle becomes stale while the notify handler is
> > > > running, we'll be in trouble.  To avoid that we need to ensure
> > > > that table unloads and notifies will always be mutually exclusive.
> > >
> > > I wonder if we can make acpi_ns_validate_handle() to actually be
> > > able to verify if a given handle is valid.  This way, ACPICA can
> > > fail gracefully
> > > (AE_BAD_PARAMETER) when a stable handle is passed to the interfaces.
> >
> > That'd be good, but to implement it, I think, it would be necessary to
> > introduce some reference counting of namespace objects such that the
> > given object would only be deleted after the last reference to it had
> been dropped.
> > On table unload it would just be marked as invalid, but it would stay
> > in memory as long as there were any references to it.
> >
> > So, for example, a notify handler would start from something like
> > acpi_add_reference(handle), which would guarantee that the object
> > pointed to by handle would stay in memory, and it would finish by
> > doing
> > acpi_drop_reference(handle) or a work item scheduled by it would do
> that.
> >
> > We do that for objects based on struct device and it works well.
> 
> There is other way to implement it.  Since acpi_handle is defined as an
> opaque value, this could be changed to an index to an array of pointers,
> instead of a direct pointer.  Then we can safely invalidate an index by
> invalidating the pointer associated with the index.



We have of course thought about adding a mechanism to validate/invalidate 
acpica namespace handles. In fact, this is why the ACPI_HANDLE data type was 
introduced in the first place, so that if we ever wanted or needed to implement 
something like this, it would not break a lot of existing code.

However, we have never had a real need to implement such a mechanism, nor has 
the ever been a request from any of the operating systems that run ACPICA.

The existing model is that the host has knowledge of what objects will go away 
when a table is unloaded, and can simply assume that all related acpi handles 
will go bad after the unload (and table unload is the only case where parts of 
the namespace go away, as far as I remember). Upon a reload of the table, the 
host knows to reinitialize all handles associated with the table/device.

ACPCI has a table handler mechanism, where the host handler is invoked *before* 
an ACPI table is actually unloaded, so the host can prepare for the unload. For 
example, before returning from the table handler, the host may want to 
synchronize by waiting for any outstanding notifies to complete, then simply 
ignoring any further notifies from any devices associated with the table.

In summary, it has always been felt that the fairly large overhead of 
implementing a mechanism like this is not worth the cost, as well as not really 
needed.

I suggest that the implementation of eject support proceed by using the 
existing mechanisms such as the table handler. If additional support/interfaces 
are needed in ACPICA, we can discuss it. However, just about the last thing I 
would like to do is add a level of indirection between the ACPI_HANDLE and the 
ACPI_NAMESPACE_NODE -- which would require a large, global change to ACPICA 
that would be only applicable for a single rather rare issue, the unloading of 
an ACPI table. Just the fact that we are

Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

On Fri, 2013-02-15 at 13:49 +0100, Rafael J. Wysocki wrote:
> On Thursday, February 14, 2013 05:28:02 PM Toshi Kani wrote:
> > On Fri, 2013-02-15 at 01:23 +0100, Rafael J. Wysocki wrote:
> > > On Thursday, February 14, 2013 11:45:27 PM Moore, Robert wrote:
> > > > 
> > > > > -Original Message-
> > > > > From: Rafael J. Wysocki [mailto:r...@sisk.pl]
> > > > > Sent: Thursday, February 14, 2013 12:59 PM
> > > > > To: Moore, Robert
> > > > > Cc: Toshi Kani; ACPI Devel Maling List; LKML; Bjorn Helgaas; Jiang 
> > > > > Liu;
> > > > > Yinghai Lu; Yasuaki Ishimatsu; Myron Stowe; linux-...@vger.kernel.org
> > > > > Subject: Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues 
> > > > > and
> > > > > memory leaks
> > > > > 
> > > > > On Thursday, February 14, 2013 08:45:14 PM Moore, Robert wrote:
> > > > > >
> > > > > > > -Original Message-
> > > > > > > From: Rafael J. Wysocki [mailto:r...@sisk.pl]
> > > > > > > Sent: Thursday, February 14, 2013 4:04 AM
> > > > > > > To: Moore, Robert
> > > > > > > Cc: Toshi Kani; ACPI Devel Maling List; LKML; Bjorn Helgaas; Jiang
> > > > > > > Liu; Yinghai Lu; Yasuaki Ishimatsu; Myron Stowe;
> > > > > > > linux-...@vger.kernel.org
> > > > > > > Subject: Re: [Update][PATCH] ACPI / hotplug: Fix concurrency 
> > > > > > > issues
> > > > > > > and memory leaks
> > > > > > >
> > > > > > > On Thursday, February 14, 2013 02:31:22 AM Moore, Robert wrote:
> > > > > > > > > > > I thought about that, but actually there's no guarantee 
> > > > > > > > > > > that
> > > > > > > > > > > the handle will be valid after _EJ0 as far as I can say.  
> > > > > > > > > > > So
> > > > > > > > > > > the race condition is going to be there anyway and using
> > > > > > > > > > > struct acpi_device just makes it easier to avoid it.
> > > > > > > > > >
> > > > > > > > > > In theory, yes, a stale handle could be a problem, if _EJ0
> > > > > > > > > > performs unload table and if ACPICA frees up its internal 
> > > > > > > > > > data
> > > > > > > > > > structure pointed by the handle as a result.  But we should
> > > > > > > > > > not see such issue now since we do not support dynamic ACPI
> > > > > > > > > > namespace
> > > > > > > yet.
> > > > > > > > >
> > > > > > > > > I'm waiting for information from Bob about that.  If we can
> > > > > > > > > assume ACPI handles to be always valid, that will simplify
> > > > > > > > > things quite a
> > > > > > > bit.
> > > > > > > >
> > > > > > > > If a table is unloaded, all the namespace nodes for that table 
> > > > > > > > are
> > > > > > > > removed from the namespace, and thus any ACPI_HANDLE pointers go
> > > > > > > > stale
> > > > > > > and invalid.
> > > > > > >
> > > > > > > OK, thanks!
> > > > > > >
> > > > > > > To me this means that we cannot assume a handle to stay valid
> > > > > > > between a notify handler and acpi_bus_hot_remove_device() run 
> > > > > > > from a
> > > > > workqueue.
> > > > > > >
> > > > > > > Is there a mechanism in ACPICA to ensure that a handle won't 
> > > > > > > become
> > > > > > > stale while a notify handler is running for it or is the OS
> > > > > > > responsible for ensuring that
> > > > > > > _EJ0 won't be run in parallel with notify handlers for device
> > > > > > > objects being ejected?
> > > > > > >
> > > > > >
> > > > > > It is up to the host.
> > > > > 
> > > > > I was afraid that that might be the case. :-)
> > > > > 
> > > > > So far

Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

2013-02-15 Thread Rafael J. Wysocki

On Thursday, February 14, 2013 05:28:02 PM Toshi Kani wrote:
> On Fri, 2013-02-15 at 01:23 +0100, Rafael J. Wysocki wrote:
> > On Thursday, February 14, 2013 11:45:27 PM Moore, Robert wrote:
> > > 
> > > > -Original Message-
> > > > From: Rafael J. Wysocki [mailto:r...@sisk.pl]
> > > > Sent: Thursday, February 14, 2013 12:59 PM
> > > > To: Moore, Robert
> > > > Cc: Toshi Kani; ACPI Devel Maling List; LKML; Bjorn Helgaas; Jiang Liu;
> > > > Yinghai Lu; Yasuaki Ishimatsu; Myron Stowe; linux-...@vger.kernel.org
> > > > Subject: Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and
> > > > memory leaks
> > > > 
> > > > On Thursday, February 14, 2013 08:45:14 PM Moore, Robert wrote:
> > > > >
> > > > > > -Original Message-
> > > > > > From: Rafael J. Wysocki [mailto:r...@sisk.pl]
> > > > > > Sent: Thursday, February 14, 2013 4:04 AM
> > > > > > To: Moore, Robert
> > > > > > Cc: Toshi Kani; ACPI Devel Maling List; LKML; Bjorn Helgaas; Jiang
> > > > > > Liu; Yinghai Lu; Yasuaki Ishimatsu; Myron Stowe;
> > > > > > linux-...@vger.kernel.org
> > > > > > Subject: Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues
> > > > > > and memory leaks
> > > > > >
> > > > > > On Thursday, February 14, 2013 02:31:22 AM Moore, Robert wrote:
> > > > > > > > > > I thought about that, but actually there's no guarantee that
> > > > > > > > > > the handle will be valid after _EJ0 as far as I can say.  So
> > > > > > > > > > the race condition is going to be there anyway and using
> > > > > > > > > > struct acpi_device just makes it easier to avoid it.
> > > > > > > > >
> > > > > > > > > In theory, yes, a stale handle could be a problem, if _EJ0
> > > > > > > > > performs unload table and if ACPICA frees up its internal data
> > > > > > > > > structure pointed by the handle as a result.  But we should
> > > > > > > > > not see such issue now since we do not support dynamic ACPI
> > > > > > > > > namespace
> > > > > > yet.
> > > > > > > >
> > > > > > > > I'm waiting for information from Bob about that.  If we can
> > > > > > > > assume ACPI handles to be always valid, that will simplify
> > > > > > > > things quite a
> > > > > > bit.
> > > > > > >
> > > > > > > If a table is unloaded, all the namespace nodes for that table are
> > > > > > > removed from the namespace, and thus any ACPI_HANDLE pointers go
> > > > > > > stale
> > > > > > and invalid.
> > > > > >
> > > > > > OK, thanks!
> > > > > >
> > > > > > To me this means that we cannot assume a handle to stay valid
> > > > > > between a notify handler and acpi_bus_hot_remove_device() run from a
> > > > workqueue.
> > > > > >
> > > > > > Is there a mechanism in ACPICA to ensure that a handle won't become
> > > > > > stale while a notify handler is running for it or is the OS
> > > > > > responsible for ensuring that
> > > > > > _EJ0 won't be run in parallel with notify handlers for device
> > > > > > objects being ejected?
> > > > > >
> > > > >
> > > > > It is up to the host.
> > > > 
> > > > I was afraid that that might be the case. :-)
> > > > 
> > > > So far the (Linux) host has been happily ignoring that potential 
> > > > problem,
> > > > so I guess it can still be ignored for a while, although we'll need to
> > > > address it eventually at one point.
> > > 
> > > I would think it should be fairly simple to setup a mechanism to either 
> > > tell
> > > the driver or for the driver to figure it out -- such that the driver 
> > > knows
> > > that all handles associated with the device are now invalid. Another way
> > > to look at it is that when the device is re-installed, the driver should
> > > reinitialize such that it obtains new handles for the devices and 
> > > sub

Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

2013-02-15 Thread Rafael J. Wysocki

On Thursday, February 14, 2013 05:28:02 PM Toshi Kani wrote:
 On Fri, 2013-02-15 at 01:23 +0100, Rafael J. Wysocki wrote:
  On Thursday, February 14, 2013 11:45:27 PM Moore, Robert wrote:

-Original Message-
From: Rafael J. Wysocki [mailto:r...@sisk.pl]
Sent: Thursday, February 14, 2013 12:59 PM
To: Moore, Robert
Cc: Toshi Kani; ACPI Devel Maling List; LKML; Bjorn Helgaas; Jiang Liu;
Yinghai Lu; Yasuaki Ishimatsu; Myron Stowe; linux-...@vger.kernel.org
Subject: Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and
memory leaks

On Thursday, February 14, 2013 08:45:14 PM Moore, Robert wrote:

  -Original Message-
  From: Rafael J. Wysocki [mailto:r...@sisk.pl]
  Sent: Thursday, February 14, 2013 4:04 AM
  To: Moore, Robert
  Cc: Toshi Kani; ACPI Devel Maling List; LKML; Bjorn Helgaas; Jiang
  Liu; Yinghai Lu; Yasuaki Ishimatsu; Myron Stowe;
  linux-...@vger.kernel.org
  Subject: Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues
  and memory leaks

  On Thursday, February 14, 2013 02:31:22 AM Moore, Robert wrote:
  I thought about that, but actually there's no guarantee that
  the handle will be valid after _EJ0 as far as I can say.  So
  the race condition is going to be there anyway and using
  struct acpi_device just makes it easier to avoid it.

 In theory, yes, a stale handle could be a problem, if _EJ0
 performs unload table and if ACPICA frees up its internal data
 structure pointed by the handle as a result.  But we should
 not see such issue now since we do not support dynamic ACPI
 namespace
  yet.

I'm waiting for information from Bob about that.  If we can
assume ACPI handles to be always valid, that will simplify
things quite a
  bit.

   If a table is unloaded, all the namespace nodes for that table are
   removed from the namespace, and thus any ACPI_HANDLE pointers go
   stale
  and invalid.

  OK, thanks!

  To me this means that we cannot assume a handle to stay valid
  between a notify handler and acpi_bus_hot_remove_device() run from a
workqueue.

  Is there a mechanism in ACPICA to ensure that a handle won't become
  stale while a notify handler is running for it or is the OS
  responsible for ensuring that
  _EJ0 won't be run in parallel with notify handlers for device
  objects being ejected?

 It is up to the host.

I was afraid that that might be the case. :-)

So far the (Linux) host has been happily ignoring that potential 
problem,
so I guess it can still be ignored for a while, although we'll need to
address it eventually at one point.

   I would think it should be fairly simple to setup a mechanism to either 
   tell
   the driver or for the driver to figure it out -- such that the driver 
   knows
   that all handles associated with the device are now invalid. Another way
   to look at it is that when the device is re-installed, the driver should
   reinitialize such that it obtains new handles for the devices and 
   subobjects
   in question.

  Unfortunately, there is quite strong assumption in our code that ACPI 
  handles
  will not become stale before the device objects associated with them are
  removed.  For this reason, we need to know in advance which handles will
  become stale as a result of a table unload and remove their device objects
  beforehand.

  Moreover, when there's a notify handler installed for a given ACPI handle
  and that handle becomes stale while the notify handler is running, we'll be
  in trouble.  To avoid that we need to ensure that table unloads and notifies
  will always be mutually exclusive.

 I wonder if we can make acpi_ns_validate_handle() to actually be able to
 verify if a given handle is valid.  This way, ACPICA can fail gracefully
 (AE_BAD_PARAMETER) when a stable handle is passed to the interfaces.

That'd be good, but to implement it, I think, it would be necessary to
introduce some reference counting of namespace objects such that the given
object would only be deleted after the last reference to it had been dropped.
On table unload it would just be marked as invalid, but it would stay in
memory as long as there were any references to it.

So, for example, a notify handler would start from something like
acpi_add_reference(handle), which would guarantee that the object pointed to by
handle would stay in memory, and it would finish by doing
acpi_drop_reference(handle) or a work item scheduled by it would do that.

We do that for objects based on struct device and it works well.

Thanks,
Rafael

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body

Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

On Fri, 2013-02-15 at 13:49 +0100, Rafael J. Wysocki wrote:
 On Thursday, February 14, 2013 05:28:02 PM Toshi Kani wrote:
  On Fri, 2013-02-15 at 01:23 +0100, Rafael J. Wysocki wrote:
   On Thursday, February 14, 2013 11:45:27 PM Moore, Robert wrote:

 -Original Message-
 From: Rafael J. Wysocki [mailto:r...@sisk.pl]
 Sent: Thursday, February 14, 2013 12:59 PM
 To: Moore, Robert
 Cc: Toshi Kani; ACPI Devel Maling List; LKML; Bjorn Helgaas; Jiang 
 Liu;
 Yinghai Lu; Yasuaki Ishimatsu; Myron Stowe; linux-...@vger.kernel.org
 Subject: Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues 
 and
 memory leaks

 On Thursday, February 14, 2013 08:45:14 PM Moore, Robert wrote:

   -Original Message-
   From: Rafael J. Wysocki [mailto:r...@sisk.pl]
   Sent: Thursday, February 14, 2013 4:04 AM
   To: Moore, Robert
   Cc: Toshi Kani; ACPI Devel Maling List; LKML; Bjorn Helgaas; Jiang
   Liu; Yinghai Lu; Yasuaki Ishimatsu; Myron Stowe;
   linux-...@vger.kernel.org
   Subject: Re: [Update][PATCH] ACPI / hotplug: Fix concurrency 
   issues
   and memory leaks

   On Thursday, February 14, 2013 02:31:22 AM Moore, Robert wrote:
   I thought about that, but actually there's no guarantee 
   that
   the handle will be valid after _EJ0 as far as I can say.  
   So
   the race condition is going to be there anyway and using
   struct acpi_device just makes it easier to avoid it.

  In theory, yes, a stale handle could be a problem, if _EJ0
  performs unload table and if ACPICA frees up its internal 
  data
  structure pointed by the handle as a result.  But we should
  not see such issue now since we do not support dynamic ACPI
  namespace
   yet.

 I'm waiting for information from Bob about that.  If we can
 assume ACPI handles to be always valid, that will simplify
 things quite a
   bit.

If a table is unloaded, all the namespace nodes for that table 
are
removed from the namespace, and thus any ACPI_HANDLE pointers go
stale
   and invalid.

   OK, thanks!

   To me this means that we cannot assume a handle to stay valid
   between a notify handler and acpi_bus_hot_remove_device() run 
   from a
 workqueue.

   Is there a mechanism in ACPICA to ensure that a handle won't 
   become
   stale while a notify handler is running for it or is the OS
   responsible for ensuring that
   _EJ0 won't be run in parallel with notify handlers for device
   objects being ejected?

  It is up to the host.

 I was afraid that that might be the case. :-)

 So far the (Linux) host has been happily ignoring that potential 
 problem,
 so I guess it can still be ignored for a while, although we'll need to
 address it eventually at one point.

I would think it should be fairly simple to setup a mechanism to either 
tell
the driver or for the driver to figure it out -- such that the driver 
knows
that all handles associated with the device are now invalid. Another way
to look at it is that when the device is re-installed, the driver should
reinitialize such that it obtains new handles for the devices and 
subobjects
in question.

   Unfortunately, there is quite strong assumption in our code that ACPI 
   handles
   will not become stale before the device objects associated with them are
   removed.  For this reason, we need to know in advance which handles will
   become stale as a result of a table unload and remove their device objects
   beforehand.

   Moreover, when there's a notify handler installed for a given ACPI handle
   and that handle becomes stale while the notify handler is running, we'll 
   be
   in trouble.  To avoid that we need to ensure that table unloads and 
   notifies
   will always be mutually exclusive.

  I wonder if we can make acpi_ns_validate_handle() to actually be able to
  verify if a given handle is valid.  This way, ACPICA can fail gracefully
  (AE_BAD_PARAMETER) when a stable handle is passed to the interfaces.

 That'd be good, but to implement it, I think, it would be necessary to
 introduce some reference counting of namespace objects such that the given
 object would only be deleted after the last reference to it had been dropped.
 On table unload it would just be marked as invalid, but it would stay in
 memory as long as there were any references to it.

 So, for example, a notify handler would start from something like
 acpi_add_reference(handle), which would guarantee that the object pointed to 
 by
 handle would stay in memory, and it would finish by doing
 acpi_drop_reference(handle) or a work item scheduled by it would do

RE: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

2013-02-15 Thread Moore, Robert


Is there a mechanism in ACPICA to ensure that a handle
won't become stale while a notify handler is running for
it or is the OS responsible for ensuring that
_EJ0 won't be run in parallel with notify handlers for
device objects being ejected?
   
  
   It is up to the host.
 
  I was afraid that that might be the case. :-)
 
  So far the (Linux) host has been happily ignoring that
  potential problem, so I guess it can still be ignored for a
  while, although we'll need to address it eventually at one
 point.

 I would think it should be fairly simple to setup a mechanism to
 either tell the driver or for the driver to figure it out --
 such that the driver knows that all handles associated with the
 device are now invalid. Another way to look at it is that when
 the device is re-installed, the driver should reinitialize such
 that it obtains new handles for the devices and subobjects in
 question.
   
Unfortunately, there is quite strong assumption in our code that
ACPI handles will not become stale before the device objects
associated with them are removed.  For this reason, we need to
know in advance which handles will become stale as a result of a
table unload and remove their device objects beforehand.
   
Moreover, when there's a notify handler installed for a given ACPI
handle and that handle becomes stale while the notify handler is
running, we'll be in trouble.  To avoid that we need to ensure
that table unloads and notifies will always be mutually exclusive.
  
   I wonder if we can make acpi_ns_validate_handle() to actually be
   able to verify if a given handle is valid.  This way, ACPICA can
   fail gracefully
   (AE_BAD_PARAMETER) when a stable handle is passed to the interfaces.
 
  That'd be good, but to implement it, I think, it would be necessary to
  introduce some reference counting of namespace objects such that the
  given object would only be deleted after the last reference to it had
 been dropped.
  On table unload it would just be marked as invalid, but it would stay
  in memory as long as there were any references to it.
 
  So, for example, a notify handler would start from something like
  acpi_add_reference(handle), which would guarantee that the object
  pointed to by handle would stay in memory, and it would finish by
  doing
  acpi_drop_reference(handle) or a work item scheduled by it would do
 that.
 
  We do that for objects based on struct device and it works well.
 
 There is other way to implement it.  Since acpi_handle is defined as an
 opaque value, this could be changed to an index to an array of pointers,
 instead of a direct pointer.  Then we can safely invalidate an index by
 invalidating the pointer associated with the index.



We have of course thought about adding a mechanism to validate/invalidate 
acpica namespace handles. In fact, this is why the ACPI_HANDLE data type was 
introduced in the first place, so that if we ever wanted or needed to implement 
something like this, it would not break a lot of existing code.

However, we have never had a real need to implement such a mechanism, nor has 
the ever been a request from any of the operating systems that run ACPICA.

The existing model is that the host has knowledge of what objects will go away 
when a table is unloaded, and can simply assume that all related acpi handles 
will go bad after the unload (and table unload is the only case where parts of 
the namespace go away, as far as I remember). Upon a reload of the table, the 
host knows to reinitialize all handles associated with the table/device.

ACPCI has a table handler mechanism, where the host handler is invoked *before* 
an ACPI table is actually unloaded, so the host can prepare for the unload. For 
example, before returning from the table handler, the host may want to 
synchronize by waiting for any outstanding notifies to complete, then simply 
ignoring any further notifies from any devices associated with the table.

In summary, it has always been felt that the fairly large overhead of 
implementing a mechanism like this is not worth the cost, as well as not really 
needed.

I suggest that the implementation of eject support proceed by using the 
existing mechanisms such as the table handler. If additional support/interfaces 
are needed in ACPICA, we can discuss it. However, just about the last thing I 
would like to do is add a level of indirection between the ACPI_HANDLE and the 
ACPI_NAMESPACE_NODE -- which would require a large, global change to ACPICA 
that would be only applicable for a single rather rare issue, the unloading of 
an ACPI table. Just the fact that we are discussing this in 2013 and ACPICA has 
been running since 1999 should confirm the rarity of this case and/or that the 
existing mechanism has been sufficient for other hosts that run ACPICA.

Bob

Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

On Fri, 2013-02-15 at 16:33 +, Moore, Robert wrote:
 Is there a mechanism in ACPICA to ensure that a handle
 won't become stale while a notify handler is running for
 it or is the OS responsible for ensuring that
 _EJ0 won't be run in parallel with notify handlers for
 device objects being ejected?

   
It is up to the host.
  
   I was afraid that that might be the case. :-)
  
   So far the (Linux) host has been happily ignoring that
   potential problem, so I guess it can still be ignored for a
   while, although we'll need to address it eventually at one
  point.
 
  I would think it should be fairly simple to setup a mechanism to
  either tell the driver or for the driver to figure it out --
  such that the driver knows that all handles associated with the
  device are now invalid. Another way to look at it is that when
  the device is re-installed, the driver should reinitialize such
  that it obtains new handles for the devices and subobjects in
  question.

 Unfortunately, there is quite strong assumption in our code that
 ACPI handles will not become stale before the device objects
 associated with them are removed.  For this reason, we need to
 know in advance which handles will become stale as a result of a
 table unload and remove their device objects beforehand.

 Moreover, when there's a notify handler installed for a given ACPI
 handle and that handle becomes stale while the notify handler is
 running, we'll be in trouble.  To avoid that we need to ensure
 that table unloads and notifies will always be mutually exclusive.
   
I wonder if we can make acpi_ns_validate_handle() to actually be
able to verify if a given handle is valid.  This way, ACPICA can
fail gracefully
(AE_BAD_PARAMETER) when a stable handle is passed to the interfaces.
  
   That'd be good, but to implement it, I think, it would be necessary to
   introduce some reference counting of namespace objects such that the
   given object would only be deleted after the last reference to it had
  been dropped.
   On table unload it would just be marked as invalid, but it would stay
   in memory as long as there were any references to it.
  
   So, for example, a notify handler would start from something like
   acpi_add_reference(handle), which would guarantee that the object
   pointed to by handle would stay in memory, and it would finish by
   doing
   acpi_drop_reference(handle) or a work item scheduled by it would do
  that.
  
   We do that for objects based on struct device and it works well.
  
  There is other way to implement it.  Since acpi_handle is defined as an
  opaque value, this could be changed to an index to an array of pointers,
  instead of a direct pointer.  Then we can safely invalidate an index by
  invalidating the pointer associated with the index.
 
 
 
 We have of course thought about adding a mechanism to validate/invalidate 
 acpica namespace handles. In fact, this is why the ACPI_HANDLE data type was 
 introduced in the first place, so that if we ever wanted or needed to 
 implement something like this, it would not break a lot of existing code.
 
 However, we have never had a real need to implement such a mechanism, nor has 
 the ever been a request from any of the operating systems that run ACPICA.
 
 The existing model is that the host has knowledge of what objects will go 
 away when a table is unloaded, and can simply assume that all related acpi 
 handles will go bad after the unload (and table unload is the only case where 
 parts of the namespace go away, as far as I remember). Upon a reload of the 
 table, the host knows to reinitialize all handles associated with the 
 table/device.
 
 ACPCI has a table handler mechanism, where the host handler is invoked 
 *before* an ACPI table is actually unloaded, so the host can prepare for the 
 unload. For example, before returning from the table handler, the host may 
 want to synchronize by waiting for any outstanding notifies to complete, then 
 simply ignoring any further notifies from any devices associated with the 
 table.
 
 In summary, it has always been felt that the fairly large overhead of 
 implementing a mechanism like this is not worth the cost, as well as not 
 really needed.
 
 I suggest that the implementation of eject support proceed by using the 
 existing mechanisms such as the table handler. If additional 
 support/interfaces are needed in ACPICA, we can discuss it. However, just 
 about the last thing I would like to do is add a level of indirection between 
 the ACPI_HANDLE and the ACPI_NAMESPACE_NODE -- which would require a large, 
 global change to ACPICA that would be only applicable for a single rather 
 rare issue, the unloading of an ACPI table. Just the fact that we are 
 discussing this in 2013 and ACPICA has been running since 1999 should

Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

2013-02-14 Thread Toshi Kani

On Fri, 2013-02-15 at 01:23 +0100, Rafael J. Wysocki wrote:
> On Thursday, February 14, 2013 11:45:27 PM Moore, Robert wrote:
> > 
> > > -Original Message-
> > > From: Rafael J. Wysocki [mailto:r...@sisk.pl]
> > > Sent: Thursday, February 14, 2013 12:59 PM
> > > To: Moore, Robert
> > > Cc: Toshi Kani; ACPI Devel Maling List; LKML; Bjorn Helgaas; Jiang Liu;
> > > Yinghai Lu; Yasuaki Ishimatsu; Myron Stowe; linux-...@vger.kernel.org
> > > Subject: Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and
> > > memory leaks
> > > 
> > > On Thursday, February 14, 2013 08:45:14 PM Moore, Robert wrote:
> > > >
> > > > > -Original Message-
> > > > > From: Rafael J. Wysocki [mailto:r...@sisk.pl]
> > > > > Sent: Thursday, February 14, 2013 4:04 AM
> > > > > To: Moore, Robert
> > > > > Cc: Toshi Kani; ACPI Devel Maling List; LKML; Bjorn Helgaas; Jiang
> > > > > Liu; Yinghai Lu; Yasuaki Ishimatsu; Myron Stowe;
> > > > > linux-...@vger.kernel.org
> > > > > Subject: Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues
> > > > > and memory leaks
> > > > >
> > > > > On Thursday, February 14, 2013 02:31:22 AM Moore, Robert wrote:
> > > > > > > > > I thought about that, but actually there's no guarantee that
> > > > > > > > > the handle will be valid after _EJ0 as far as I can say.  So
> > > > > > > > > the race condition is going to be there anyway and using
> > > > > > > > > struct acpi_device just makes it easier to avoid it.
> > > > > > > >
> > > > > > > > In theory, yes, a stale handle could be a problem, if _EJ0
> > > > > > > > performs unload table and if ACPICA frees up its internal data
> > > > > > > > structure pointed by the handle as a result.  But we should
> > > > > > > > not see such issue now since we do not support dynamic ACPI
> > > > > > > > namespace
> > > > > yet.
> > > > > > >
> > > > > > > I'm waiting for information from Bob about that.  If we can
> > > > > > > assume ACPI handles to be always valid, that will simplify
> > > > > > > things quite a
> > > > > bit.
> > > > > >
> > > > > > If a table is unloaded, all the namespace nodes for that table are
> > > > > > removed from the namespace, and thus any ACPI_HANDLE pointers go
> > > > > > stale
> > > > > and invalid.
> > > > >
> > > > > OK, thanks!
> > > > >
> > > > > To me this means that we cannot assume a handle to stay valid
> > > > > between a notify handler and acpi_bus_hot_remove_device() run from a
> > > workqueue.
> > > > >
> > > > > Is there a mechanism in ACPICA to ensure that a handle won't become
> > > > > stale while a notify handler is running for it or is the OS
> > > > > responsible for ensuring that
> > > > > _EJ0 won't be run in parallel with notify handlers for device
> > > > > objects being ejected?
> > > > >
> > > >
> > > > It is up to the host.
> > > 
> > > I was afraid that that might be the case. :-)
> > > 
> > > So far the (Linux) host has been happily ignoring that potential problem,
> > > so I guess it can still be ignored for a while, although we'll need to
> > > address it eventually at one point.
> > 
> > I would think it should be fairly simple to setup a mechanism to either tell
> > the driver or for the driver to figure it out -- such that the driver knows
> > that all handles associated with the device are now invalid. Another way
> > to look at it is that when the device is re-installed, the driver should
> > reinitialize such that it obtains new handles for the devices and subobjects
> > in question.
> 
> Unfortunately, there is quite strong assumption in our code that ACPI handles
> will not become stale before the device objects associated with them are
> removed.  For this reason, we need to know in advance which handles will
> become stale as a result of a table unload and remove their device objects
> beforehand.
> 
> Moreover, when there's a notify handler installed for a given ACPI handle
> and that handle becomes stale while the notify handler is running, we'll be
> in trouble.  To avoid that we need to ensure that table unloads and notifies
> will always be mutually exclusive.

I wonder if we can make acpi_ns_validate_handle() to actually be able to
verify if a given handle is valid.  This way, ACPICA can fail gracefully
(AE_BAD_PARAMETER) when a stable handle is passed to the interfaces.

Thanks,
-Toshi



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

On Thursday, February 14, 2013 11:45:27 PM Moore, Robert wrote:
> 
> > -Original Message-
> > From: Rafael J. Wysocki [mailto:r...@sisk.pl]
> > Sent: Thursday, February 14, 2013 12:59 PM
> > To: Moore, Robert
> > Cc: Toshi Kani; ACPI Devel Maling List; LKML; Bjorn Helgaas; Jiang Liu;
> > Yinghai Lu; Yasuaki Ishimatsu; Myron Stowe; linux-...@vger.kernel.org
> > Subject: Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and
> > memory leaks
> > 
> > On Thursday, February 14, 2013 08:45:14 PM Moore, Robert wrote:
> > >
> > > > -Original Message-
> > > > From: Rafael J. Wysocki [mailto:r...@sisk.pl]
> > > > Sent: Thursday, February 14, 2013 4:04 AM
> > > > To: Moore, Robert
> > > > Cc: Toshi Kani; ACPI Devel Maling List; LKML; Bjorn Helgaas; Jiang
> > > > Liu; Yinghai Lu; Yasuaki Ishimatsu; Myron Stowe;
> > > > linux-...@vger.kernel.org
> > > > Subject: Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues
> > > > and memory leaks
> > > >
> > > > On Thursday, February 14, 2013 02:31:22 AM Moore, Robert wrote:
> > > > > > > > I thought about that, but actually there's no guarantee that
> > > > > > > > the handle will be valid after _EJ0 as far as I can say.  So
> > > > > > > > the race condition is going to be there anyway and using
> > > > > > > > struct acpi_device just makes it easier to avoid it.
> > > > > > >
> > > > > > > In theory, yes, a stale handle could be a problem, if _EJ0
> > > > > > > performs unload table and if ACPICA frees up its internal data
> > > > > > > structure pointed by the handle as a result.  But we should
> > > > > > > not see such issue now since we do not support dynamic ACPI
> > > > > > > namespace
> > > > yet.
> > > > > >
> > > > > > I'm waiting for information from Bob about that.  If we can
> > > > > > assume ACPI handles to be always valid, that will simplify
> > > > > > things quite a
> > > > bit.
> > > > >
> > > > > If a table is unloaded, all the namespace nodes for that table are
> > > > > removed from the namespace, and thus any ACPI_HANDLE pointers go
> > > > > stale
> > > > and invalid.
> > > >
> > > > OK, thanks!
> > > >
> > > > To me this means that we cannot assume a handle to stay valid
> > > > between a notify handler and acpi_bus_hot_remove_device() run from a
> > workqueue.
> > > >
> > > > Is there a mechanism in ACPICA to ensure that a handle won't become
> > > > stale while a notify handler is running for it or is the OS
> > > > responsible for ensuring that
> > > > _EJ0 won't be run in parallel with notify handlers for device
> > > > objects being ejected?
> > > >
> > >
> > > It is up to the host.
> > 
> > I was afraid that that might be the case. :-)
> > 
> > So far the (Linux) host has been happily ignoring that potential problem,
> > so I guess it can still be ignored for a while, although we'll need to
> > address it eventually at one point.
> 
> I would think it should be fairly simple to setup a mechanism to either tell
> the driver or for the driver to figure it out -- such that the driver knows
> that all handles associated with the device are now invalid. Another way
> to look at it is that when the device is re-installed, the driver should
> reinitialize such that it obtains new handles for the devices and subobjects
> in question.

Unfortunately, there is quite strong assumption in our code that ACPI handles
will not become stale before the device objects associated with them are
removed.  For this reason, we need to know in advance which handles will
become stale as a result of a table unload and remove their device objects
beforehand.

Moreover, when there's a notify handler installed for a given ACPI handle
and that handle becomes stale while the notify handler is running, we'll be
in trouble.  To avoid that we need to ensure that table unloads and notifies
will always be mutually exclusive.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks



> -Original Message-
> From: Rafael J. Wysocki [mailto:r...@sisk.pl]
> Sent: Thursday, February 14, 2013 12:59 PM
> To: Moore, Robert
> Cc: Toshi Kani; ACPI Devel Maling List; LKML; Bjorn Helgaas; Jiang Liu;
> Yinghai Lu; Yasuaki Ishimatsu; Myron Stowe; linux-...@vger.kernel.org
> Subject: Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and
> memory leaks
> 
> On Thursday, February 14, 2013 08:45:14 PM Moore, Robert wrote:
> >
> > > -Original Message-
> > > From: Rafael J. Wysocki [mailto:r...@sisk.pl]
> > > Sent: Thursday, February 14, 2013 4:04 AM
> > > To: Moore, Robert
> > > Cc: Toshi Kani; ACPI Devel Maling List; LKML; Bjorn Helgaas; Jiang
> > > Liu; Yinghai Lu; Yasuaki Ishimatsu; Myron Stowe;
> > > linux-...@vger.kernel.org
> > > Subject: Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues
> > > and memory leaks
> > >
> > > On Thursday, February 14, 2013 02:31:22 AM Moore, Robert wrote:
> > > > > > > I thought about that, but actually there's no guarantee that
> > > > > > > the handle will be valid after _EJ0 as far as I can say.  So
> > > > > > > the race condition is going to be there anyway and using
> > > > > > > struct acpi_device just makes it easier to avoid it.
> > > > > >
> > > > > > In theory, yes, a stale handle could be a problem, if _EJ0
> > > > > > performs unload table and if ACPICA frees up its internal data
> > > > > > structure pointed by the handle as a result.  But we should
> > > > > > not see such issue now since we do not support dynamic ACPI
> > > > > > namespace
> > > yet.
> > > > >
> > > > > I'm waiting for information from Bob about that.  If we can
> > > > > assume ACPI handles to be always valid, that will simplify
> > > > > things quite a
> > > bit.
> > > >
> > > > If a table is unloaded, all the namespace nodes for that table are
> > > > removed from the namespace, and thus any ACPI_HANDLE pointers go
> > > > stale
> > > and invalid.
> > >
> > > OK, thanks!
> > >
> > > To me this means that we cannot assume a handle to stay valid
> > > between a notify handler and acpi_bus_hot_remove_device() run from a
> workqueue.
> > >
> > > Is there a mechanism in ACPICA to ensure that a handle won't become
> > > stale while a notify handler is running for it or is the OS
> > > responsible for ensuring that
> > > _EJ0 won't be run in parallel with notify handlers for device
> > > objects being ejected?
> > >
> >
> > It is up to the host.
> 
> I was afraid that that might be the case. :-)
> 
> So far the (Linux) host has been happily ignoring that potential problem,
> so I guess it can still be ignored for a while, although we'll need to
> address it eventually at one point.

I would think it should be fairly simple to setup a mechanism to either tell 
the driver or for the driver to figure it out -- such that the driver knows 
that all handles associated with the device are now invalid. Another way to 
look at it is that when the device is re-installed, the driver should 
reinitialize such that it obtains new handles for the devices and subobjects in 
question.

Bob






> 
> Thanks,
> Rafael
> 
> 
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

On Thursday, February 14, 2013 08:45:14 PM Moore, Robert wrote:
> 
> > -Original Message-
> > From: Rafael J. Wysocki [mailto:r...@sisk.pl]
> > Sent: Thursday, February 14, 2013 4:04 AM
> > To: Moore, Robert
> > Cc: Toshi Kani; ACPI Devel Maling List; LKML; Bjorn Helgaas; Jiang Liu;
> > Yinghai Lu; Yasuaki Ishimatsu; Myron Stowe; linux-...@vger.kernel.org
> > Subject: Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and
> > memory leaks
> > 
> > On Thursday, February 14, 2013 02:31:22 AM Moore, Robert wrote:
> > > > > > I thought about that, but actually there's no guarantee that the
> > > > > > handle will be valid after _EJ0 as far as I can say.  So the
> > > > > > race condition is going to be there anyway and using struct
> > > > > > acpi_device just makes it easier to avoid it.
> > > > >
> > > > > In theory, yes, a stale handle could be a problem, if _EJ0
> > > > > performs unload table and if ACPICA frees up its internal data
> > > > > structure pointed by the handle as a result.  But we should not
> > > > > see such issue now since we do not support dynamic ACPI namespace
> > yet.
> > > >
> > > > I'm waiting for information from Bob about that.  If we can assume
> > > > ACPI handles to be always valid, that will simplify things quite a
> > bit.
> > >
> > > If a table is unloaded, all the namespace nodes for that table are
> > > removed from the namespace, and thus any ACPI_HANDLE pointers go stale
> > and invalid.
> > 
> > OK, thanks!
> > 
> > To me this means that we cannot assume a handle to stay valid between a
> > notify handler and acpi_bus_hot_remove_device() run from a workqueue.
> > 
> > Is there a mechanism in ACPICA to ensure that a handle won't become stale
> > while a notify handler is running for it or is the OS responsible for
> > ensuring that
> > _EJ0 won't be run in parallel with notify handlers for device objects
> > being ejected?
> > 
> 
> It is up to the host.

I was afraid that that might be the case. :-)

So far the (Linux) host has been happily ignoring that potential problem, so
I guess it can still be ignored for a while, although we'll need to address it
eventually at one point.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks



> -Original Message-
> From: Rafael J. Wysocki [mailto:r...@sisk.pl]
> Sent: Thursday, February 14, 2013 4:04 AM
> To: Moore, Robert
> Cc: Toshi Kani; ACPI Devel Maling List; LKML; Bjorn Helgaas; Jiang Liu;
> Yinghai Lu; Yasuaki Ishimatsu; Myron Stowe; linux-...@vger.kernel.org
> Subject: Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and
> memory leaks
> 
> On Thursday, February 14, 2013 02:31:22 AM Moore, Robert wrote:
> > > > > I thought about that, but actually there's no guarantee that the
> > > > > handle will be valid after _EJ0 as far as I can say.  So the
> > > > > race condition is going to be there anyway and using struct
> > > > > acpi_device just makes it easier to avoid it.
> > > >
> > > > In theory, yes, a stale handle could be a problem, if _EJ0
> > > > performs unload table and if ACPICA frees up its internal data
> > > > structure pointed by the handle as a result.  But we should not
> > > > see such issue now since we do not support dynamic ACPI namespace
> yet.
> > >
> > > I'm waiting for information from Bob about that.  If we can assume
> > > ACPI handles to be always valid, that will simplify things quite a
> bit.
> >
> > If a table is unloaded, all the namespace nodes for that table are
> > removed from the namespace, and thus any ACPI_HANDLE pointers go stale
> and invalid.
> 
> OK, thanks!
> 
> To me this means that we cannot assume a handle to stay valid between a
> notify handler and acpi_bus_hot_remove_device() run from a workqueue.
> 
> Is there a mechanism in ACPICA to ensure that a handle won't become stale
> while a notify handler is running for it or is the OS responsible for
> ensuring that
> _EJ0 won't be run in parallel with notify handlers for device objects
> being ejected?
> 

It is up to the host.
Bob


> Rafael
> 
> 
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

On Thursday, February 14, 2013 12:05:43 PM Yinghai Lu wrote:
> On Wed, Feb 13, 2013 at 5:16 AM, Rafael J. Wysocki  wrote:
> > From: Rafael J. Wysocki 
> >
> > This changeset is aimed at fixing a few different but related
> > problems in the ACPI hotplug infrastructure.
> >
> > First of all, since notify handlers may be run in parallel with
> > acpi_bus_scan(), acpi_bus_trim() and acpi_bus_hot_remove_device()
> > and some of them are installed for ACPI handles that have no struct
> > acpi_device objects attached (i.e. before those objects are created),
> > those notify handlers have to take acpi_scan_lock to prevent races
> > from taking place (e.g. a struct acpi_device is found to be present
> > for the given ACPI handle, but right after that it is removed by
> > acpi_bus_trim() running in parallel to the given notify handler).
> > Moreover, since some of them call acpi_bus_scan() and
> > acpi_bus_trim(), this leads to the conclusion that acpi_scan_lock
> > should be acquired by the callers of these two funtions rather by
> > these functions themselves.
> >
> > For these reasons, make all notify handlers that can handle device
> > addition and eject events take acpi_scan_lock and remove the
> > acpi_scan_lock locking from acpi_bus_scan() and acpi_bus_trim().
> > Accordingly, update all of their users to make sure that they
> > are always called under acpi_scan_lock.
> >
> > Furthermore, since eject operations are carried out asynchronously
> > with respect to the notify events that trigger them, with the help
> > of acpi_bus_hot_remove_device(), even if notify handlers take the
> > ACPI scan lock, it still is possible that, for example,
> > acpi_bus_trim() will run between acpi_bus_hot_remove_device() and
> > the notify handler that scheduled its execution and that
> > acpi_bus_trim() will remove the device node passed to
> > acpi_bus_hot_remove_device() for ejection.  In that case, the struct
> > acpi_device object obtained by acpi_bus_hot_remove_device() will be
> > invalid and not-so-funny things will ensue.  To protect agaist that,
> > make the users of acpi_bus_hot_remove_device() run get_device() on
> > ACPI device node objects that are about to be passed to it and make
> > acpi_bus_hot_remove_device() run put_device() on them and check if
> > their ACPI handles are not NULL (make acpi_device_unregister() clear
> > the device nodes' ACPI handles for that check to work).
> >
> > Finally, observe that acpi_os_hotplug_execute() actually can fail,
> > in which case its caller ought to free memory allocated for the
> > context object to prevent leaks from happening.  It also needs to
> > run put_device() on the device node that it ran get_device() on
> > previously in that case.  Modify the code accordingly.
> >
> > Signed-off-by: Rafael J. Wysocki 
> > Acked-by: Yinghai Lu 
> > ---
> >
> > This includes fixes for two issues spotted by Yasuaki Ishimatsu.
> >
> 
> this one will make pci/next and pm/linux-next conflicts
> 
> Please check if attached fix is right.

Looks correct to me.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

2013-02-14 Thread Yinghai Lu

On Wed, Feb 13, 2013 at 5:16 AM, Rafael J. Wysocki  wrote:
> From: Rafael J. Wysocki 
>
> This changeset is aimed at fixing a few different but related
> problems in the ACPI hotplug infrastructure.
>
> First of all, since notify handlers may be run in parallel with
> acpi_bus_scan(), acpi_bus_trim() and acpi_bus_hot_remove_device()
> and some of them are installed for ACPI handles that have no struct
> acpi_device objects attached (i.e. before those objects are created),
> those notify handlers have to take acpi_scan_lock to prevent races
> from taking place (e.g. a struct acpi_device is found to be present
> for the given ACPI handle, but right after that it is removed by
> acpi_bus_trim() running in parallel to the given notify handler).
> Moreover, since some of them call acpi_bus_scan() and
> acpi_bus_trim(), this leads to the conclusion that acpi_scan_lock
> should be acquired by the callers of these two funtions rather by
> these functions themselves.
>
> For these reasons, make all notify handlers that can handle device
> addition and eject events take acpi_scan_lock and remove the
> acpi_scan_lock locking from acpi_bus_scan() and acpi_bus_trim().
> Accordingly, update all of their users to make sure that they
> are always called under acpi_scan_lock.
>
> Furthermore, since eject operations are carried out asynchronously
> with respect to the notify events that trigger them, with the help
> of acpi_bus_hot_remove_device(), even if notify handlers take the
> ACPI scan lock, it still is possible that, for example,
> acpi_bus_trim() will run between acpi_bus_hot_remove_device() and
> the notify handler that scheduled its execution and that
> acpi_bus_trim() will remove the device node passed to
> acpi_bus_hot_remove_device() for ejection.  In that case, the struct
> acpi_device object obtained by acpi_bus_hot_remove_device() will be
> invalid and not-so-funny things will ensue.  To protect agaist that,
> make the users of acpi_bus_hot_remove_device() run get_device() on
> ACPI device node objects that are about to be passed to it and make
> acpi_bus_hot_remove_device() run put_device() on them and check if
> their ACPI handles are not NULL (make acpi_device_unregister() clear
> the device nodes' ACPI handles for that check to work).
>
> Finally, observe that acpi_os_hotplug_execute() actually can fail,
> in which case its caller ought to free memory allocated for the
> context object to prevent leaks from happening.  It also needs to
> run put_device() on the device node that it ran get_device() on
> previously in that case.  Modify the code accordingly.
>
> Signed-off-by: Rafael J. Wysocki 
> Acked-by: Yinghai Lu 
> ---
>
> This includes fixes for two issues spotted by Yasuaki Ishimatsu.
>

this one will make pci/next and pm/linux-next conflicts

Please check if attached fix is right.

Thanks

Yinghai

> Thanks,
> Rafael
>
> ---
>  drivers/acpi/acpi_memhotplug.c |   56 +++---
>  drivers/acpi/container.c   |   12 --
>  drivers/acpi/dock.c|   19 --
>  drivers/acpi/processor_driver.c|   24 +---
>  drivers/acpi/scan.c|   69 
> +
>  drivers/pci/hotplug/acpiphp_glue.c |6 +++
>  drivers/pci/hotplug/sgi_hotplug.c  |5 ++
>  include/acpi/acpi_bus.h|3 +
>  8 files changed, 139 insertions(+), 55 deletions(-)
>
> Index: test/drivers/acpi/scan.c
> ===
> --- test.orig/drivers/acpi/scan.c
> +++ test/drivers/acpi/scan.c
> @@ -42,6 +42,18 @@ struct acpi_device_bus_id{
> struct list_head node;
>  };
>
> +void acpi_scan_lock_acquire(void)
> +{
> +   mutex_lock(_scan_lock);
> +}
> +EXPORT_SYMBOL_GPL(acpi_scan_lock_acquire);
> +
> +void acpi_scan_lock_release(void)
> +{
> +   mutex_unlock(_scan_lock);
> +}
> +EXPORT_SYMBOL_GPL(acpi_scan_lock_release);
> +
>  int acpi_scan_add_handler(struct acpi_scan_handler *handler)
>  {
> if (!handler || !handler->attach)
> @@ -95,8 +107,6 @@ acpi_device_modalias_show(struct device
>  }
>  static DEVICE_ATTR(modalias, 0444, acpi_device_modalias_show, NULL);
>
> -static void __acpi_bus_trim(struct acpi_device *start);
> -
>  /**
>   * acpi_bus_hot_remove_device: hot-remove a device and its children
>   * @context: struct acpi_eject_event pointer (freed in this func)
> @@ -107,7 +117,7 @@ static void __acpi_bus_trim(struct acpi_
>   */
>  void acpi_bus_hot_remove_device(void *context)
>  {
> -   struct acpi_eject_event *ej_event = (struct acpi_eject_event *) 
> context;
> +   struct acpi_eject_event *ej_event = context;
> struct acpi_device *device = ej_event->device;
> acpi_handle handle = device->handle;
> acpi_handle temp;
> @@ -118,11 +128,19 @@ void acpi_bus_hot_remove_device(void *co
>
> mutex_lock(_scan_lock);
>
> +   /* If there is no handle, the device node has been unregistered. */
> +   if

Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

On Thursday, February 14, 2013 02:31:22 AM Moore, Robert wrote:
> > > > I thought about that, but actually there's no guarantee that the
> > > > handle will be valid after _EJ0 as far as I can say.  So the race
> > > > condition is going to be there anyway and using struct acpi_device
> > > > just makes it easier to avoid it.
> > >
> > > In theory, yes, a stale handle could be a problem, if _EJ0 performs
> > > unload table and if ACPICA frees up its internal data structure
> > > pointed by the handle as a result.  But we should not see such issue
> > > now since we do not support dynamic ACPI namespace yet.
> > 
> > I'm waiting for information from Bob about that.  If we can assume ACPI
> > handles to be always valid, that will simplify things quite a bit.
> 
> If a table is unloaded, all the namespace nodes for that table are removed
> from the namespace, and thus any ACPI_HANDLE pointers go stale and invalid.

OK, thanks!

To me this means that we cannot assume a handle to stay valid between
a notify handler and acpi_bus_hot_remove_device() run from a workqueue.

Is there a mechanism in ACPICA to ensure that a handle won't become stale while
a notify handler is running for it or is the OS responsible for ensuring that
_EJ0 won't be run in parallel with notify handlers for device objects being
ejected?

Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

On Thursday, February 14, 2013 02:31:22 AM Moore, Robert wrote:
I thought about that, but actually there's no guarantee that the
handle will be valid after _EJ0 as far as I can say.  So the race
condition is going to be there anyway and using struct acpi_device
just makes it easier to avoid it.
  
   In theory, yes, a stale handle could be a problem, if _EJ0 performs
   unload table and if ACPICA frees up its internal data structure
   pointed by the handle as a result.  But we should not see such issue
   now since we do not support dynamic ACPI namespace yet.
  
  I'm waiting for information from Bob about that.  If we can assume ACPI
  handles to be always valid, that will simplify things quite a bit.
 
 If a table is unloaded, all the namespace nodes for that table are removed
 from the namespace, and thus any ACPI_HANDLE pointers go stale and invalid.

OK, thanks!

To me this means that we cannot assume a handle to stay valid between
a notify handler and acpi_bus_hot_remove_device() run from a workqueue.

Is there a mechanism in ACPICA to ensure that a handle won't become stale while
a notify handler is running for it or is the OS responsible for ensuring that
_EJ0 won't be run in parallel with notify handlers for device objects being
ejected?

Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

2013-02-14 Thread Yinghai Lu

On Wed, Feb 13, 2013 at 5:16 AM, Rafael J. Wysocki r...@sisk.pl wrote:
 From: Rafael J. Wysocki rafael.j.wyso...@intel.com

 This changeset is aimed at fixing a few different but related
 problems in the ACPI hotplug infrastructure.

 First of all, since notify handlers may be run in parallel with
 acpi_bus_scan(), acpi_bus_trim() and acpi_bus_hot_remove_device()
 and some of them are installed for ACPI handles that have no struct
 acpi_device objects attached (i.e. before those objects are created),
 those notify handlers have to take acpi_scan_lock to prevent races
 from taking place (e.g. a struct acpi_device is found to be present
 for the given ACPI handle, but right after that it is removed by
 acpi_bus_trim() running in parallel to the given notify handler).
 Moreover, since some of them call acpi_bus_scan() and
 acpi_bus_trim(), this leads to the conclusion that acpi_scan_lock
 should be acquired by the callers of these two funtions rather by
 these functions themselves.

 For these reasons, make all notify handlers that can handle device
 addition and eject events take acpi_scan_lock and remove the
 acpi_scan_lock locking from acpi_bus_scan() and acpi_bus_trim().
 Accordingly, update all of their users to make sure that they
 are always called under acpi_scan_lock.

 Furthermore, since eject operations are carried out asynchronously
 with respect to the notify events that trigger them, with the help
 of acpi_bus_hot_remove_device(), even if notify handlers take the
 ACPI scan lock, it still is possible that, for example,
 acpi_bus_trim() will run between acpi_bus_hot_remove_device() and
 the notify handler that scheduled its execution and that
 acpi_bus_trim() will remove the device node passed to
 acpi_bus_hot_remove_device() for ejection.  In that case, the struct
 acpi_device object obtained by acpi_bus_hot_remove_device() will be
 invalid and not-so-funny things will ensue.  To protect agaist that,
 make the users of acpi_bus_hot_remove_device() run get_device() on
 ACPI device node objects that are about to be passed to it and make
 acpi_bus_hot_remove_device() run put_device() on them and check if
 their ACPI handles are not NULL (make acpi_device_unregister() clear
 the device nodes' ACPI handles for that check to work).

 Finally, observe that acpi_os_hotplug_execute() actually can fail,
 in which case its caller ought to free memory allocated for the
 context object to prevent leaks from happening.  It also needs to
 run put_device() on the device node that it ran get_device() on
 previously in that case.  Modify the code accordingly.

 Signed-off-by: Rafael J. Wysocki rafael.j.wyso...@intel.com
 Acked-by: Yinghai Lu ying...@kernel.org
 ---

 This includes fixes for two issues spotted by Yasuaki Ishimatsu.


this one will make pci/next and pm/linux-next conflicts

Please check if attached fix is right.

Thanks

Yinghai

 Thanks,
 Rafael

 ---
  drivers/acpi/acpi_memhotplug.c |   56 +++---
  drivers/acpi/container.c   |   12 --
  drivers/acpi/dock.c|   19 --
  drivers/acpi/processor_driver.c|   24 +---
  drivers/acpi/scan.c|   69 
 +
  drivers/pci/hotplug/acpiphp_glue.c |6 +++
  drivers/pci/hotplug/sgi_hotplug.c  |5 ++
  include/acpi/acpi_bus.h|3 +
  8 files changed, 139 insertions(+), 55 deletions(-)

 Index: test/drivers/acpi/scan.c
 ===
 --- test.orig/drivers/acpi/scan.c
 +++ test/drivers/acpi/scan.c
 @@ -42,6 +42,18 @@ struct acpi_device_bus_id{
 struct list_head node;
  };

 +void acpi_scan_lock_acquire(void)
 +{
 +   mutex_lock(acpi_scan_lock);
 +}
 +EXPORT_SYMBOL_GPL(acpi_scan_lock_acquire);
 +
 +void acpi_scan_lock_release(void)
 +{
 +   mutex_unlock(acpi_scan_lock);
 +}
 +EXPORT_SYMBOL_GPL(acpi_scan_lock_release);
 +
  int acpi_scan_add_handler(struct acpi_scan_handler *handler)
  {
 if (!handler || !handler-attach)
 @@ -95,8 +107,6 @@ acpi_device_modalias_show(struct device
  }
  static DEVICE_ATTR(modalias, 0444, acpi_device_modalias_show, NULL);

 -static void __acpi_bus_trim(struct acpi_device *start);
 -
  /**
   * acpi_bus_hot_remove_device: hot-remove a device and its children
   * @context: struct acpi_eject_event pointer (freed in this func)
 @@ -107,7 +117,7 @@ static void __acpi_bus_trim(struct acpi_
   */
  void acpi_bus_hot_remove_device(void *context)
  {
 -   struct acpi_eject_event *ej_event = (struct acpi_eject_event *) 
 context;
 +   struct acpi_eject_event *ej_event = context;
 struct acpi_device *device = ej_event-device;
 acpi_handle handle = device-handle;
 acpi_handle temp;
 @@ -118,11 +128,19 @@ void acpi_bus_hot_remove_device(void *co

 mutex_lock(acpi_scan_lock);

 +   /* If there is no handle, the device node has been unregistered. */
 +   if (!device-handle) {
 +

Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

On Thursday, February 14, 2013 12:05:43 PM Yinghai Lu wrote:
 On Wed, Feb 13, 2013 at 5:16 AM, Rafael J. Wysocki r...@sisk.pl wrote:
  From: Rafael J. Wysocki rafael.j.wyso...@intel.com
 
  This changeset is aimed at fixing a few different but related
  problems in the ACPI hotplug infrastructure.
 
  First of all, since notify handlers may be run in parallel with
  acpi_bus_scan(), acpi_bus_trim() and acpi_bus_hot_remove_device()
  and some of them are installed for ACPI handles that have no struct
  acpi_device objects attached (i.e. before those objects are created),
  those notify handlers have to take acpi_scan_lock to prevent races
  from taking place (e.g. a struct acpi_device is found to be present
  for the given ACPI handle, but right after that it is removed by
  acpi_bus_trim() running in parallel to the given notify handler).
  Moreover, since some of them call acpi_bus_scan() and
  acpi_bus_trim(), this leads to the conclusion that acpi_scan_lock
  should be acquired by the callers of these two funtions rather by
  these functions themselves.
 
  For these reasons, make all notify handlers that can handle device
  addition and eject events take acpi_scan_lock and remove the
  acpi_scan_lock locking from acpi_bus_scan() and acpi_bus_trim().
  Accordingly, update all of their users to make sure that they
  are always called under acpi_scan_lock.
 
  Furthermore, since eject operations are carried out asynchronously
  with respect to the notify events that trigger them, with the help
  of acpi_bus_hot_remove_device(), even if notify handlers take the
  ACPI scan lock, it still is possible that, for example,
  acpi_bus_trim() will run between acpi_bus_hot_remove_device() and
  the notify handler that scheduled its execution and that
  acpi_bus_trim() will remove the device node passed to
  acpi_bus_hot_remove_device() for ejection.  In that case, the struct
  acpi_device object obtained by acpi_bus_hot_remove_device() will be
  invalid and not-so-funny things will ensue.  To protect agaist that,
  make the users of acpi_bus_hot_remove_device() run get_device() on
  ACPI device node objects that are about to be passed to it and make
  acpi_bus_hot_remove_device() run put_device() on them and check if
  their ACPI handles are not NULL (make acpi_device_unregister() clear
  the device nodes' ACPI handles for that check to work).
 
  Finally, observe that acpi_os_hotplug_execute() actually can fail,
  in which case its caller ought to free memory allocated for the
  context object to prevent leaks from happening.  It also needs to
  run put_device() on the device node that it ran get_device() on
  previously in that case.  Modify the code accordingly.
 
  Signed-off-by: Rafael J. Wysocki rafael.j.wyso...@intel.com
  Acked-by: Yinghai Lu ying...@kernel.org
  ---
 
  This includes fixes for two issues spotted by Yasuaki Ishimatsu.
 
 
 this one will make pci/next and pm/linux-next conflicts
 
 Please check if attached fix is right.

Looks correct to me.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

-Original Message-
From: Rafael J. Wysocki [mailto:r...@sisk.pl]
Sent: Thursday, February 14, 2013 4:04 AM
To: Moore, Robert
Cc: Toshi Kani; ACPI Devel Maling List; LKML; Bjorn Helgaas; Jiang Liu;
Yinghai Lu; Yasuaki Ishimatsu; Myron Stowe; linux-...@vger.kernel.org
Subject: Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and
memory leaks

On Thursday, February 14, 2013 02:31:22 AM Moore, Robert wrote:
I thought about that, but actually there's no guarantee that the
handle will be valid after _EJ0 as far as I can say. So the
race condition is going to be there anyway and using struct
acpi_device just makes it easier to avoid it.

In theory, yes, a stale handle could be a problem, if _EJ0
performs unload table and if ACPICA frees up its internal data
structure pointed by the handle as a result. But we should not
see such issue now since we do not support dynamic ACPI namespace
yet.

I'm waiting for information from Bob about that. If we can assume
ACPI handles to be always valid, that will simplify things quite a
bit.

If a table is unloaded, all the namespace nodes for that table are
removed from the namespace, and thus any ACPI_HANDLE pointers go stale
and invalid.

OK, thanks!

To me this means that we cannot assume a handle to stay valid between a
notify handler and acpi_bus_hot_remove_device() run from a workqueue.

Is there a mechanism in ACPICA to ensure that a handle won't become stale
while a notify handler is running for it or is the OS responsible for
ensuring that
_EJ0 won't be run in parallel with notify handlers for device objects
being ejected?

It is up to the host.
Bob

Rafael

--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

On Thursday, February 14, 2013 08:45:14 PM Moore, Robert wrote:

  -Original Message-
  From: Rafael J. Wysocki [mailto:r...@sisk.pl]
  Sent: Thursday, February 14, 2013 4:04 AM
  To: Moore, Robert
  Cc: Toshi Kani; ACPI Devel Maling List; LKML; Bjorn Helgaas; Jiang Liu;
  Yinghai Lu; Yasuaki Ishimatsu; Myron Stowe; linux-...@vger.kernel.org
  Subject: Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and
  memory leaks

  On Thursday, February 14, 2013 02:31:22 AM Moore, Robert wrote:
  I thought about that, but actually there's no guarantee that the
  handle will be valid after _EJ0 as far as I can say.  So the
  race condition is going to be there anyway and using struct
  acpi_device just makes it easier to avoid it.

 In theory, yes, a stale handle could be a problem, if _EJ0
 performs unload table and if ACPICA frees up its internal data
 structure pointed by the handle as a result.  But we should not
 see such issue now since we do not support dynamic ACPI namespace
  yet.

I'm waiting for information from Bob about that.  If we can assume
ACPI handles to be always valid, that will simplify things quite a
  bit.

   If a table is unloaded, all the namespace nodes for that table are
   removed from the namespace, and thus any ACPI_HANDLE pointers go stale
  and invalid.

  OK, thanks!

  To me this means that we cannot assume a handle to stay valid between a
  notify handler and acpi_bus_hot_remove_device() run from a workqueue.

  Is there a mechanism in ACPICA to ensure that a handle won't become stale
  while a notify handler is running for it or is the OS responsible for
  ensuring that
  _EJ0 won't be run in parallel with notify handlers for device objects
  being ejected?

 It is up to the host.

I was afraid that that might be the case. :-)

So far the (Linux) host has been happily ignoring that potential problem, so
I guess it can still be ignored for a while, although we'll need to address it
eventually at one point.

Thanks,
Rafael

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

-Original Message-
From: Rafael J. Wysocki [mailto:r...@sisk.pl]
Sent: Thursday, February 14, 2013 12:59 PM
To: Moore, Robert
Cc: Toshi Kani; ACPI Devel Maling List; LKML; Bjorn Helgaas; Jiang Liu;
Yinghai Lu; Yasuaki Ishimatsu; Myron Stowe; linux-...@vger.kernel.org
Subject: Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and
memory leaks

On Thursday, February 14, 2013 08:45:14 PM Moore, Robert wrote:

-Original Message-
From: Rafael J. Wysocki [mailto:r...@sisk.pl]
Sent: Thursday, February 14, 2013 4:04 AM
To: Moore, Robert
Cc: Toshi Kani; ACPI Devel Maling List; LKML; Bjorn Helgaas; Jiang
Liu; Yinghai Lu; Yasuaki Ishimatsu; Myron Stowe;
linux-...@vger.kernel.org
Subject: Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues
and memory leaks

On Thursday, February 14, 2013 02:31:22 AM Moore, Robert wrote:
I thought about that, but actually there's no guarantee that
the handle will be valid after _EJ0 as far as I can say. So
the race condition is going to be there anyway and using
struct acpi_device just makes it easier to avoid it.

In theory, yes, a stale handle could be a problem, if _EJ0
performs unload table and if ACPICA frees up its internal data
structure pointed by the handle as a result. But we should
not see such issue now since we do not support dynamic ACPI
namespace
yet.

I'm waiting for information from Bob about that. If we can
assume ACPI handles to be always valid, that will simplify
things quite a
bit.

If a table is unloaded, all the namespace nodes for that table are
removed from the namespace, and thus any ACPI_HANDLE pointers go
stale
and invalid.

OK, thanks!

To me this means that we cannot assume a handle to stay valid
between a notify handler and acpi_bus_hot_remove_device() run from a
workqueue.

Is there a mechanism in ACPICA to ensure that a handle won't become
stale while a notify handler is running for it or is the OS
responsible for ensuring that
_EJ0 won't be run in parallel with notify handlers for device
objects being ejected?

It is up to the host.

I was afraid that that might be the case. :-)

So far the (Linux) host has been happily ignoring that potential problem,
so I guess it can still be ignored for a while, although we'll need to
address it eventually at one point.

I would think it should be fairly simple to setup a mechanism to either tell
the driver or for the driver to figure it out -- such that the driver knows
that all handles associated with the device are now invalid. Another way to
look at it is that when the device is re-installed, the driver should
reinitialize such that it obtains new handles for the devices and subobjects in
question.

Bob

Thanks,
Rafael

Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

On Thursday, February 14, 2013 11:45:27 PM Moore, Robert wrote:

  -Original Message-
  From: Rafael J. Wysocki [mailto:r...@sisk.pl]
  Sent: Thursday, February 14, 2013 12:59 PM
  To: Moore, Robert
  Cc: Toshi Kani; ACPI Devel Maling List; LKML; Bjorn Helgaas; Jiang Liu;
  Yinghai Lu; Yasuaki Ishimatsu; Myron Stowe; linux-...@vger.kernel.org
  Subject: Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and
  memory leaks

  On Thursday, February 14, 2013 08:45:14 PM Moore, Robert wrote:

-Original Message-
From: Rafael J. Wysocki [mailto:r...@sisk.pl]
Sent: Thursday, February 14, 2013 4:04 AM
To: Moore, Robert
Cc: Toshi Kani; ACPI Devel Maling List; LKML; Bjorn Helgaas; Jiang
Liu; Yinghai Lu; Yasuaki Ishimatsu; Myron Stowe;
linux-...@vger.kernel.org
Subject: Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues
and memory leaks

On Thursday, February 14, 2013 02:31:22 AM Moore, Robert wrote:
I thought about that, but actually there's no guarantee that
the handle will be valid after _EJ0 as far as I can say.  So
the race condition is going to be there anyway and using
struct acpi_device just makes it easier to avoid it.

   In theory, yes, a stale handle could be a problem, if _EJ0
   performs unload table and if ACPICA frees up its internal data
   structure pointed by the handle as a result.  But we should
   not see such issue now since we do not support dynamic ACPI
   namespace
yet.

  I'm waiting for information from Bob about that.  If we can
  assume ACPI handles to be always valid, that will simplify
  things quite a
bit.

 If a table is unloaded, all the namespace nodes for that table are
 removed from the namespace, and thus any ACPI_HANDLE pointers go
 stale
and invalid.

OK, thanks!

To me this means that we cannot assume a handle to stay valid
between a notify handler and acpi_bus_hot_remove_device() run from a
  workqueue.

Is there a mechanism in ACPICA to ensure that a handle won't become
stale while a notify handler is running for it or is the OS
responsible for ensuring that
_EJ0 won't be run in parallel with notify handlers for device
objects being ejected?

   It is up to the host.

  I was afraid that that might be the case. :-)

  So far the (Linux) host has been happily ignoring that potential problem,
  so I guess it can still be ignored for a while, although we'll need to
  address it eventually at one point.

 I would think it should be fairly simple to setup a mechanism to either tell
 the driver or for the driver to figure it out -- such that the driver knows
 that all handles associated with the device are now invalid. Another way
 to look at it is that when the device is re-installed, the driver should
 reinitialize such that it obtains new handles for the devices and subobjects
 in question.

Unfortunately, there is quite strong assumption in our code that ACPI handles
will not become stale before the device objects associated with them are
removed.  For this reason, we need to know in advance which handles will
become stale as a result of a table unload and remove their device objects
beforehand.

Moreover, when there's a notify handler installed for a given ACPI handle
and that handle becomes stale while the notify handler is running, we'll be
in trouble.  To avoid that we need to ensure that table unloads and notifies
will always be mutually exclusive.

Thanks,
Rafael

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

2013-02-14 Thread Toshi Kani

On Fri, 2013-02-15 at 01:23 +0100, Rafael J. Wysocki wrote:
 On Thursday, February 14, 2013 11:45:27 PM Moore, Robert wrote:

   -Original Message-
   From: Rafael J. Wysocki [mailto:r...@sisk.pl]
   Sent: Thursday, February 14, 2013 12:59 PM
   To: Moore, Robert
   Cc: Toshi Kani; ACPI Devel Maling List; LKML; Bjorn Helgaas; Jiang Liu;
   Yinghai Lu; Yasuaki Ishimatsu; Myron Stowe; linux-...@vger.kernel.org
   Subject: Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and
   memory leaks

   On Thursday, February 14, 2013 08:45:14 PM Moore, Robert wrote:

 -Original Message-
 From: Rafael J. Wysocki [mailto:r...@sisk.pl]
 Sent: Thursday, February 14, 2013 4:04 AM
 To: Moore, Robert
 Cc: Toshi Kani; ACPI Devel Maling List; LKML; Bjorn Helgaas; Jiang
 Liu; Yinghai Lu; Yasuaki Ishimatsu; Myron Stowe;
 linux-...@vger.kernel.org
 Subject: Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues
 and memory leaks

 On Thursday, February 14, 2013 02:31:22 AM Moore, Robert wrote:
 I thought about that, but actually there's no guarantee that
 the handle will be valid after _EJ0 as far as I can say.  So
 the race condition is going to be there anyway and using
 struct acpi_device just makes it easier to avoid it.

In theory, yes, a stale handle could be a problem, if _EJ0
performs unload table and if ACPICA frees up its internal data
structure pointed by the handle as a result.  But we should
not see such issue now since we do not support dynamic ACPI
namespace
 yet.

   I'm waiting for information from Bob about that.  If we can
   assume ACPI handles to be always valid, that will simplify
   things quite a
 bit.

  If a table is unloaded, all the namespace nodes for that table are
  removed from the namespace, and thus any ACPI_HANDLE pointers go
  stale
 and invalid.

 OK, thanks!

 To me this means that we cannot assume a handle to stay valid
 between a notify handler and acpi_bus_hot_remove_device() run from a
   workqueue.

 Is there a mechanism in ACPICA to ensure that a handle won't become
 stale while a notify handler is running for it or is the OS
 responsible for ensuring that
 _EJ0 won't be run in parallel with notify handlers for device
 objects being ejected?

It is up to the host.

   I was afraid that that might be the case. :-)

   So far the (Linux) host has been happily ignoring that potential problem,
   so I guess it can still be ignored for a while, although we'll need to
   address it eventually at one point.

  I would think it should be fairly simple to setup a mechanism to either tell
  the driver or for the driver to figure it out -- such that the driver knows
  that all handles associated with the device are now invalid. Another way
  to look at it is that when the device is re-installed, the driver should
  reinitialize such that it obtains new handles for the devices and subobjects
  in question.

 Unfortunately, there is quite strong assumption in our code that ACPI handles
 will not become stale before the device objects associated with them are
 removed.  For this reason, we need to know in advance which handles will
 become stale as a result of a table unload and remove their device objects
 beforehand.

 Moreover, when there's a notify handler installed for a given ACPI handle
 and that handle becomes stale while the notify handler is running, we'll be
 in trouble.  To avoid that we need to ensure that table unloads and notifies
 will always be mutually exclusive.

I wonder if we can make acpi_ns_validate_handle() to actually be able to
verify if a given handle is valid.  This way, ACPICA can fail gracefully
(AE_BAD_PARAMETER) when a stable handle is passed to the interfaces.

Thanks,
-Toshi

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

2013-02-13 Thread Moore, Robert

> > > I thought about that, but actually there's no guarantee that the
> > > handle will be valid after _EJ0 as far as I can say.  So the race
> > > condition is going to be there anyway and using struct acpi_device
> > > just makes it easier to avoid it.
> >
> > In theory, yes, a stale handle could be a problem, if _EJ0 performs
> > unload table and if ACPICA frees up its internal data structure
> > pointed by the handle as a result.  But we should not see such issue
> > now since we do not support dynamic ACPI namespace yet.
> 
> I'm waiting for information from Bob about that.  If we can assume ACPI
> handles to be always valid, that will simplify things quite a bit.

If a table is unloaded, all the namespace nodes for that table are removed from 
the namespace, and thus any ACPI_HANDLE pointers go stale and invalid.

Bob

Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

On Thu, 2013-02-14 at 00:42 +0100, Rafael J. Wysocki wrote:
> On Wednesday, February 13, 2013 04:09:29 PM Toshi Kani wrote:
> > On Wed, 2013-02-13 at 21:52 +0100, Rafael J. Wysocki wrote:
> > > On Wednesday, February 13, 2013 10:43:58 AM Toshi Kani wrote:
> > > > On Wed, 2013-02-13 at 14:16 +0100, Rafael J. Wysocki wrote:
> > > > > From: Rafael J. Wysocki 
> > > > > 
> > > > > This changeset is aimed at fixing a few different but related
> > > > > problems in the ACPI hotplug infrastructure.
> > > > > 
> > > > > First of all, since notify handlers may be run in parallel with
> > > > > acpi_bus_scan(), acpi_bus_trim() and acpi_bus_hot_remove_device()
> > > > > and some of them are installed for ACPI handles that have no struct
> > > > > acpi_device objects attached (i.e. before those objects are created),
> > > > > those notify handlers have to take acpi_scan_lock to prevent races
> > > > > from taking place (e.g. a struct acpi_device is found to be present
> > > > > for the given ACPI handle, but right after that it is removed by
> > > > > acpi_bus_trim() running in parallel to the given notify handler).
> > > > > Moreover, since some of them call acpi_bus_scan() and
> > > > > acpi_bus_trim(), this leads to the conclusion that acpi_scan_lock
> > > > > should be acquired by the callers of these two funtions rather by
> > > > > these functions themselves.
> > > > > 
> > > > > For these reasons, make all notify handlers that can handle device
> > > > > addition and eject events take acpi_scan_lock and remove the
> > > > > acpi_scan_lock locking from acpi_bus_scan() and acpi_bus_trim().
> > > > > Accordingly, update all of their users to make sure that they
> > > > > are always called under acpi_scan_lock.
> > > > > 
> > > > > Furthermore, since eject operations are carried out asynchronously
> > > > > with respect to the notify events that trigger them, with the help
> > > > > of acpi_bus_hot_remove_device(), even if notify handlers take the
> > > > > ACPI scan lock, it still is possible that, for example,
> > > > > acpi_bus_trim() will run between acpi_bus_hot_remove_device() and
> > > > > the notify handler that scheduled its execution and that
> > > > > acpi_bus_trim() will remove the device node passed to
> > > > > acpi_bus_hot_remove_device() for ejection.  In that case, the struct
> > > > > acpi_device object obtained by acpi_bus_hot_remove_device() will be
> > > > > invalid and not-so-funny things will ensue.  To protect agaist that,
> > > > > make the users of acpi_bus_hot_remove_device() run get_device() on
> > > > > ACPI device node objects that are about to be passed to it and make
> > > > > acpi_bus_hot_remove_device() run put_device() on them and check if
> > > > > their ACPI handles are not NULL (make acpi_device_unregister() clear
> > > > > the device nodes' ACPI handles for that check to work).
> > > > > 
> > > > > Finally, observe that acpi_os_hotplug_execute() actually can fail,
> > > > > in which case its caller ought to free memory allocated for the
> > > > > context object to prevent leaks from happening.  It also needs to
> > > > > run put_device() on the device node that it ran get_device() on
> > > > > previously in that case.  Modify the code accordingly.
> > > > 
> > > > I am concerned with this approach.  ACPICA calls notify handlers through
> > > > kacpi_notify_wq, which has the max active set to 1.  We then use
> > > > kacpi_hotplug_wq (which also has the max active set to 1) so that a
> > > > hotplug procedure does not block the notify handlers since they can be
> > > > used for non-hotplug events as well.
> > > 
> > > In fact we use kacpi_hotplug_wq for a different reason.  Please read the
> > > comment in __acpi_os_execute() for more details.
> > 
> > Yes, I am aware of the issue as well.
> > 
> > > > Acquiring the scan lock in a notify handler means that a hotplug 
> > > > procedure
> > > > can block any notify events.
> > > 
> > > Yes, it can.
> > > 
> > > > So, I'd prefer the following approach.
> > > > 
> > > >  - Change all hot-plug procedures (i.e. both add and delete) to proceed
> > > > under kacpi_hotplug_wq by calling acpi_os_hotplug_execute(). This
> > > > serializes all hotplug procedures, and prevents blocking other notify
> > > > events.
> > > 
> > > Yes, we can do that.  I was thinking about doing that change, but not in 
> > > v3.9.
> > > There are simply too many notify handlers already there to do that so 
> > > late in
> > > the cycle.  And doing that for acpiphp, for example, won't be 
> > > straightforward
> > > at all.
> > 
> > Right.  I was not suggesting this approach for v3.9.
> 
> OK
> 
> > > Please think about the $subject patch as a temporary measure until we can 
> > > do
> > > something better (which we need to do anyway to reduce code duplication 
> > > among
> > > other things).
> > 
> > I am fine with the scan lock as long as it is internal.  This patch
> > publishes the locking interfaces to other modules, which made me worried
> > that this

Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

On Wednesday, February 13, 2013 04:09:29 PM Toshi Kani wrote:
> On Wed, 2013-02-13 at 21:52 +0100, Rafael J. Wysocki wrote:
> > On Wednesday, February 13, 2013 10:43:58 AM Toshi Kani wrote:
> > > On Wed, 2013-02-13 at 14:16 +0100, Rafael J. Wysocki wrote:
> > > > From: Rafael J. Wysocki 
> > > > 
> > > > This changeset is aimed at fixing a few different but related
> > > > problems in the ACPI hotplug infrastructure.
> > > > 
> > > > First of all, since notify handlers may be run in parallel with
> > > > acpi_bus_scan(), acpi_bus_trim() and acpi_bus_hot_remove_device()
> > > > and some of them are installed for ACPI handles that have no struct
> > > > acpi_device objects attached (i.e. before those objects are created),
> > > > those notify handlers have to take acpi_scan_lock to prevent races
> > > > from taking place (e.g. a struct acpi_device is found to be present
> > > > for the given ACPI handle, but right after that it is removed by
> > > > acpi_bus_trim() running in parallel to the given notify handler).
> > > > Moreover, since some of them call acpi_bus_scan() and
> > > > acpi_bus_trim(), this leads to the conclusion that acpi_scan_lock
> > > > should be acquired by the callers of these two funtions rather by
> > > > these functions themselves.
> > > > 
> > > > For these reasons, make all notify handlers that can handle device
> > > > addition and eject events take acpi_scan_lock and remove the
> > > > acpi_scan_lock locking from acpi_bus_scan() and acpi_bus_trim().
> > > > Accordingly, update all of their users to make sure that they
> > > > are always called under acpi_scan_lock.
> > > > 
> > > > Furthermore, since eject operations are carried out asynchronously
> > > > with respect to the notify events that trigger them, with the help
> > > > of acpi_bus_hot_remove_device(), even if notify handlers take the
> > > > ACPI scan lock, it still is possible that, for example,
> > > > acpi_bus_trim() will run between acpi_bus_hot_remove_device() and
> > > > the notify handler that scheduled its execution and that
> > > > acpi_bus_trim() will remove the device node passed to
> > > > acpi_bus_hot_remove_device() for ejection.  In that case, the struct
> > > > acpi_device object obtained by acpi_bus_hot_remove_device() will be
> > > > invalid and not-so-funny things will ensue.  To protect agaist that,
> > > > make the users of acpi_bus_hot_remove_device() run get_device() on
> > > > ACPI device node objects that are about to be passed to it and make
> > > > acpi_bus_hot_remove_device() run put_device() on them and check if
> > > > their ACPI handles are not NULL (make acpi_device_unregister() clear
> > > > the device nodes' ACPI handles for that check to work).
> > > > 
> > > > Finally, observe that acpi_os_hotplug_execute() actually can fail,
> > > > in which case its caller ought to free memory allocated for the
> > > > context object to prevent leaks from happening.  It also needs to
> > > > run put_device() on the device node that it ran get_device() on
> > > > previously in that case.  Modify the code accordingly.
> > > 
> > > I am concerned with this approach.  ACPICA calls notify handlers through
> > > kacpi_notify_wq, which has the max active set to 1.  We then use
> > > kacpi_hotplug_wq (which also has the max active set to 1) so that a
> > > hotplug procedure does not block the notify handlers since they can be
> > > used for non-hotplug events as well.
> > 
> > In fact we use kacpi_hotplug_wq for a different reason.  Please read the
> > comment in __acpi_os_execute() for more details.
> 
> Yes, I am aware of the issue as well.
> 
> > > Acquiring the scan lock in a notify handler means that a hotplug procedure
> > > can block any notify events.
> > 
> > Yes, it can.
> > 
> > > So, I'd prefer the following approach.
> > > 
> > >  - Change all hot-plug procedures (i.e. both add and delete) to proceed
> > > under kacpi_hotplug_wq by calling acpi_os_hotplug_execute(). This
> > > serializes all hotplug procedures, and prevents blocking other notify
> > > events.
> > 
> > Yes, we can do that.  I was thinking about doing that change, but not in 
> > v3.9.
> > There are simply too many notify handlers already there to do that so late 
> > in
> > the cycle.  And doing that for acpiphp, for example, won't be 
> > straightforward
> > at all.
> 
> Right.  I was not suggesting this approach for v3.9.

OK

> > Please think about the $subject patch as a temporary measure until we can do
> > something better (which we need to do anyway to reduce code duplication 
> > among
> > other things).
> 
> I am fine with the scan lock as long as it is internal.  This patch
> publishes the locking interfaces to other modules, which made me worried
> that this might become a long term solution.  If we need to fix this
> issue for v3.9, I am OK with it as you clarified this as a temporary
> solution.

Yes, I'm not going to allow anyone to use acpi_scan_lock anywhere else. :-)
I'm also going to make it internal again in

Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

On Wed, 2013-02-13 at 21:52 +0100, Rafael J. Wysocki wrote:
> On Wednesday, February 13, 2013 10:43:58 AM Toshi Kani wrote:
> > On Wed, 2013-02-13 at 14:16 +0100, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki 
> > > 
> > > This changeset is aimed at fixing a few different but related
> > > problems in the ACPI hotplug infrastructure.
> > > 
> > > First of all, since notify handlers may be run in parallel with
> > > acpi_bus_scan(), acpi_bus_trim() and acpi_bus_hot_remove_device()
> > > and some of them are installed for ACPI handles that have no struct
> > > acpi_device objects attached (i.e. before those objects are created),
> > > those notify handlers have to take acpi_scan_lock to prevent races
> > > from taking place (e.g. a struct acpi_device is found to be present
> > > for the given ACPI handle, but right after that it is removed by
> > > acpi_bus_trim() running in parallel to the given notify handler).
> > > Moreover, since some of them call acpi_bus_scan() and
> > > acpi_bus_trim(), this leads to the conclusion that acpi_scan_lock
> > > should be acquired by the callers of these two funtions rather by
> > > these functions themselves.
> > > 
> > > For these reasons, make all notify handlers that can handle device
> > > addition and eject events take acpi_scan_lock and remove the
> > > acpi_scan_lock locking from acpi_bus_scan() and acpi_bus_trim().
> > > Accordingly, update all of their users to make sure that they
> > > are always called under acpi_scan_lock.
> > > 
> > > Furthermore, since eject operations are carried out asynchronously
> > > with respect to the notify events that trigger them, with the help
> > > of acpi_bus_hot_remove_device(), even if notify handlers take the
> > > ACPI scan lock, it still is possible that, for example,
> > > acpi_bus_trim() will run between acpi_bus_hot_remove_device() and
> > > the notify handler that scheduled its execution and that
> > > acpi_bus_trim() will remove the device node passed to
> > > acpi_bus_hot_remove_device() for ejection.  In that case, the struct
> > > acpi_device object obtained by acpi_bus_hot_remove_device() will be
> > > invalid and not-so-funny things will ensue.  To protect agaist that,
> > > make the users of acpi_bus_hot_remove_device() run get_device() on
> > > ACPI device node objects that are about to be passed to it and make
> > > acpi_bus_hot_remove_device() run put_device() on them and check if
> > > their ACPI handles are not NULL (make acpi_device_unregister() clear
> > > the device nodes' ACPI handles for that check to work).
> > > 
> > > Finally, observe that acpi_os_hotplug_execute() actually can fail,
> > > in which case its caller ought to free memory allocated for the
> > > context object to prevent leaks from happening.  It also needs to
> > > run put_device() on the device node that it ran get_device() on
> > > previously in that case.  Modify the code accordingly.
> > 
> > I am concerned with this approach.  ACPICA calls notify handlers through
> > kacpi_notify_wq, which has the max active set to 1.  We then use
> > kacpi_hotplug_wq (which also has the max active set to 1) so that a
> > hotplug procedure does not block the notify handlers since they can be
> > used for non-hotplug events as well.
> 
> In fact we use kacpi_hotplug_wq for a different reason.  Please read the
> comment in __acpi_os_execute() for more details.

Yes, I am aware of the issue as well.

> > Acquiring the scan lock in a notify handler means that a hotplug procedure
> > can block any notify events.
> 
> Yes, it can.
> 
> > So, I'd prefer the following approach.
> > 
> >  - Change all hot-plug procedures (i.e. both add and delete) to proceed
> > under kacpi_hotplug_wq by calling acpi_os_hotplug_execute(). This
> > serializes all hotplug procedures, and prevents blocking other notify
> > events.
> 
> Yes, we can do that.  I was thinking about doing that change, but not in v3.9.
> There are simply too many notify handlers already there to do that so late in
> the cycle.  And doing that for acpiphp, for example, won't be straightforward
> at all.

Right.  I was not suggesting this approach for v3.9.

> Please think about the $subject patch as a temporary measure until we can do
> something better (which we need to do anyway to reduce code duplication among
> other things).

I am fine with the scan lock as long as it is internal.  This patch
publishes the locking interfaces to other modules, which made me worried
that this might become a long term solution.  If we need to fix this
issue for v3.9, I am OK with it as you clarified this as a temporary
solution.

> > (Ideally, we should also run all online/offline procedures
> > under a same work-queue, just like my RFC patchset did, but this is a
> > different topic for now.)
> 
> No, I don't think it is appropriate to run online/offline from _any_
> workqueue.  In my opinion they should be run from user space.

I think there are pros and cons for this.  If we use a user thread to
run

Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

On Wednesday, February 13, 2013 10:43:58 AM Toshi Kani wrote:
> On Wed, 2013-02-13 at 14:16 +0100, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki 
> > 
> > This changeset is aimed at fixing a few different but related
> > problems in the ACPI hotplug infrastructure.
> > 
> > First of all, since notify handlers may be run in parallel with
> > acpi_bus_scan(), acpi_bus_trim() and acpi_bus_hot_remove_device()
> > and some of them are installed for ACPI handles that have no struct
> > acpi_device objects attached (i.e. before those objects are created),
> > those notify handlers have to take acpi_scan_lock to prevent races
> > from taking place (e.g. a struct acpi_device is found to be present
> > for the given ACPI handle, but right after that it is removed by
> > acpi_bus_trim() running in parallel to the given notify handler).
> > Moreover, since some of them call acpi_bus_scan() and
> > acpi_bus_trim(), this leads to the conclusion that acpi_scan_lock
> > should be acquired by the callers of these two funtions rather by
> > these functions themselves.
> > 
> > For these reasons, make all notify handlers that can handle device
> > addition and eject events take acpi_scan_lock and remove the
> > acpi_scan_lock locking from acpi_bus_scan() and acpi_bus_trim().
> > Accordingly, update all of their users to make sure that they
> > are always called under acpi_scan_lock.
> > 
> > Furthermore, since eject operations are carried out asynchronously
> > with respect to the notify events that trigger them, with the help
> > of acpi_bus_hot_remove_device(), even if notify handlers take the
> > ACPI scan lock, it still is possible that, for example,
> > acpi_bus_trim() will run between acpi_bus_hot_remove_device() and
> > the notify handler that scheduled its execution and that
> > acpi_bus_trim() will remove the device node passed to
> > acpi_bus_hot_remove_device() for ejection.  In that case, the struct
> > acpi_device object obtained by acpi_bus_hot_remove_device() will be
> > invalid and not-so-funny things will ensue.  To protect agaist that,
> > make the users of acpi_bus_hot_remove_device() run get_device() on
> > ACPI device node objects that are about to be passed to it and make
> > acpi_bus_hot_remove_device() run put_device() on them and check if
> > their ACPI handles are not NULL (make acpi_device_unregister() clear
> > the device nodes' ACPI handles for that check to work).
> > 
> > Finally, observe that acpi_os_hotplug_execute() actually can fail,
> > in which case its caller ought to free memory allocated for the
> > context object to prevent leaks from happening.  It also needs to
> > run put_device() on the device node that it ran get_device() on
> > previously in that case.  Modify the code accordingly.
> 
> I am concerned with this approach.  ACPICA calls notify handlers through
> kacpi_notify_wq, which has the max active set to 1.  We then use
> kacpi_hotplug_wq (which also has the max active set to 1) so that a
> hotplug procedure does not block the notify handlers since they can be
> used for non-hotplug events as well.

In fact we use kacpi_hotplug_wq for a different reason.  Please read the
comment in __acpi_os_execute() for more details.

> Acquiring the scan lock in a notify handler means that a hotplug procedure
> can block any notify events.

Yes, it can.

> So, I'd prefer the following approach.
> 
>  - Change all hot-plug procedures (i.e. both add and delete) to proceed
> under kacpi_hotplug_wq by calling acpi_os_hotplug_execute(). This
> serializes all hotplug procedures, and prevents blocking other notify
> events.

Yes, we can do that.  I was thinking about doing that change, but not in v3.9.
There are simply too many notify handlers already there to do that so late in
the cycle.  And doing that for acpiphp, for example, won't be straightforward
at all.

Please think about the $subject patch as a temporary measure until we can do
something better (which we need to do anyway to reduce code duplication among
other things).

> (Ideally, we should also run all online/offline procedures
> under a same work-queue, just like my RFC patchset did, but this is a
> different topic for now.)

No, I don't think it is appropriate to run online/offline from _any_
workqueue.  In my opinion they should be run from user space.

>  - Revert 5993c4670 unless this change is absolutely necessary.  From
> the change log, it is not clear to me why this change was needed.  It
> changed acpi_bus_hot_remove_device() to take an acpi_device, instead of
> an acpi_handle, which introduced a race condition with acpi_device.
> acpi_bus_hot_remove_device() should take an acpi_handle, and then obtain
> its acpi_device from the acpi_handle since this function is serialized.

I thought about that, but actually there's no guarantee that the handle
will be valid after _EJ0 as far as I can say.  So the race condition is
going to be there anyway and using struct acpi_device just makes it easier
to avoid it.

>  -

Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

On Wed, 2013-02-13 at 14:16 +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki 
> 
> This changeset is aimed at fixing a few different but related
> problems in the ACPI hotplug infrastructure.
> 
> First of all, since notify handlers may be run in parallel with
> acpi_bus_scan(), acpi_bus_trim() and acpi_bus_hot_remove_device()
> and some of them are installed for ACPI handles that have no struct
> acpi_device objects attached (i.e. before those objects are created),
> those notify handlers have to take acpi_scan_lock to prevent races
> from taking place (e.g. a struct acpi_device is found to be present
> for the given ACPI handle, but right after that it is removed by
> acpi_bus_trim() running in parallel to the given notify handler).
> Moreover, since some of them call acpi_bus_scan() and
> acpi_bus_trim(), this leads to the conclusion that acpi_scan_lock
> should be acquired by the callers of these two funtions rather by
> these functions themselves.
> 
> For these reasons, make all notify handlers that can handle device
> addition and eject events take acpi_scan_lock and remove the
> acpi_scan_lock locking from acpi_bus_scan() and acpi_bus_trim().
> Accordingly, update all of their users to make sure that they
> are always called under acpi_scan_lock.
> 
> Furthermore, since eject operations are carried out asynchronously
> with respect to the notify events that trigger them, with the help
> of acpi_bus_hot_remove_device(), even if notify handlers take the
> ACPI scan lock, it still is possible that, for example,
> acpi_bus_trim() will run between acpi_bus_hot_remove_device() and
> the notify handler that scheduled its execution and that
> acpi_bus_trim() will remove the device node passed to
> acpi_bus_hot_remove_device() for ejection.  In that case, the struct
> acpi_device object obtained by acpi_bus_hot_remove_device() will be
> invalid and not-so-funny things will ensue.  To protect agaist that,
> make the users of acpi_bus_hot_remove_device() run get_device() on
> ACPI device node objects that are about to be passed to it and make
> acpi_bus_hot_remove_device() run put_device() on them and check if
> their ACPI handles are not NULL (make acpi_device_unregister() clear
> the device nodes' ACPI handles for that check to work).
> 
> Finally, observe that acpi_os_hotplug_execute() actually can fail,
> in which case its caller ought to free memory allocated for the
> context object to prevent leaks from happening.  It also needs to
> run put_device() on the device node that it ran get_device() on
> previously in that case.  Modify the code accordingly.

I am concerned with this approach.  ACPICA calls notify handlers through
kacpi_notify_wq, which has the max active set to 1.  We then use
kacpi_hotplug_wq (which also has the max active set to 1) so that a
hotplug procedure does not block the notify handlers since they can be
used for non-hotplug events as well.  Acquiring the scan lock in a
notify handler means that a hotplug procedure can block any notify
events.

So, I'd prefer the following approach.

 - Change all hot-plug procedures (i.e. both add and delete) to proceed
under kacpi_hotplug_wq by calling acpi_os_hotplug_execute(). This
serializes all hotplug procedures, and prevents blocking other notify
events.  (Ideally, we should also run all online/offline procedures
under a same work-queue, just like my RFC patchset did, but this is a
different topic for now.)

 - Revert 5993c4670 unless this change is absolutely necessary.  From
the change log, it is not clear to me why this change was needed.  It
changed acpi_bus_hot_remove_device() to take an acpi_device, instead of
an acpi_handle, which introduced a race condition with acpi_device.
acpi_bus_hot_remove_device() should take an acpi_handle, and then obtain
its acpi_device from the acpi_handle since this function is serialized.

 - Remove sanity checks with an acpi_device in the notify handlers,
which have a race condition with acpi_device.  These type-specific
checks will need to be removed when we have a common notify handler
anyway.  The notify handler can continue to check the status of ACPI
device object with an acpi_handle.  Type-specific sanity checks /
validations can be performed within a hotplug procedure, instead.

Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Update][PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

From: Rafael J. Wysocki 

This changeset is aimed at fixing a few different but related
problems in the ACPI hotplug infrastructure.

First of all, since notify handlers may be run in parallel with
acpi_bus_scan(), acpi_bus_trim() and acpi_bus_hot_remove_device()
and some of them are installed for ACPI handles that have no struct
acpi_device objects attached (i.e. before those objects are created),
those notify handlers have to take acpi_scan_lock to prevent races
from taking place (e.g. a struct acpi_device is found to be present
for the given ACPI handle, but right after that it is removed by
acpi_bus_trim() running in parallel to the given notify handler).
Moreover, since some of them call acpi_bus_scan() and
acpi_bus_trim(), this leads to the conclusion that acpi_scan_lock
should be acquired by the callers of these two funtions rather by
these functions themselves.

For these reasons, make all notify handlers that can handle device
addition and eject events take acpi_scan_lock and remove the
acpi_scan_lock locking from acpi_bus_scan() and acpi_bus_trim().
Accordingly, update all of their users to make sure that they
are always called under acpi_scan_lock.

Furthermore, since eject operations are carried out asynchronously
with respect to the notify events that trigger them, with the help
of acpi_bus_hot_remove_device(), even if notify handlers take the
ACPI scan lock, it still is possible that, for example,
acpi_bus_trim() will run between acpi_bus_hot_remove_device() and
the notify handler that scheduled its execution and that
acpi_bus_trim() will remove the device node passed to
acpi_bus_hot_remove_device() for ejection.  In that case, the struct
acpi_device object obtained by acpi_bus_hot_remove_device() will be
invalid and not-so-funny things will ensue.  To protect agaist that,
make the users of acpi_bus_hot_remove_device() run get_device() on
ACPI device node objects that are about to be passed to it and make
acpi_bus_hot_remove_device() run put_device() on them and check if
their ACPI handles are not NULL (make acpi_device_unregister() clear
the device nodes' ACPI handles for that check to work).

Finally, observe that acpi_os_hotplug_execute() actually can fail,
in which case its caller ought to free memory allocated for the
context object to prevent leaks from happening.  It also needs to
run put_device() on the device node that it ran get_device() on
previously in that case.  Modify the code accordingly.

Signed-off-by: Rafael J. Wysocki 
Acked-by: Yinghai Lu 
---

This includes fixes for two issues spotted by Yasuaki Ishimatsu.

Thanks,
Rafael

---
 drivers/acpi/acpi_memhotplug.c |   56 +++---
 drivers/acpi/container.c   |   12 --
 drivers/acpi/dock.c|   19 --
 drivers/acpi/processor_driver.c|   24 +---
 drivers/acpi/scan.c|   69 +
 drivers/pci/hotplug/acpiphp_glue.c |6 +++
 drivers/pci/hotplug/sgi_hotplug.c  |5 ++
 include/acpi/acpi_bus.h|3 +
 8 files changed, 139 insertions(+), 55 deletions(-)

Index: test/drivers/acpi/scan.c
===
--- test.orig/drivers/acpi/scan.c
+++ test/drivers/acpi/scan.c
@@ -42,6 +42,18 @@ struct acpi_device_bus_id{
struct list_head node;
 };
 
+void acpi_scan_lock_acquire(void)
+{
+   mutex_lock(_scan_lock);
+}
+EXPORT_SYMBOL_GPL(acpi_scan_lock_acquire);
+
+void acpi_scan_lock_release(void)
+{
+   mutex_unlock(_scan_lock);
+}
+EXPORT_SYMBOL_GPL(acpi_scan_lock_release);
+
 int acpi_scan_add_handler(struct acpi_scan_handler *handler)
 {
if (!handler || !handler->attach)
@@ -95,8 +107,6 @@ acpi_device_modalias_show(struct device
 }
 static DEVICE_ATTR(modalias, 0444, acpi_device_modalias_show, NULL);
 
-static void __acpi_bus_trim(struct acpi_device *start);
-
 /**
  * acpi_bus_hot_remove_device: hot-remove a device and its children
  * @context: struct acpi_eject_event pointer (freed in this func)
@@ -107,7 +117,7 @@ static void __acpi_bus_trim(struct acpi_
  */
 void acpi_bus_hot_remove_device(void *context)
 {
-   struct acpi_eject_event *ej_event = (struct acpi_eject_event *) context;
+   struct acpi_eject_event *ej_event = context;
struct acpi_device *device = ej_event->device;
acpi_handle handle = device->handle;
acpi_handle temp;
@@ -118,11 +128,19 @@ void acpi_bus_hot_remove_device(void *co
 
mutex_lock(_scan_lock);
 
+   /* If there is no handle, the device node has been unregistered. */
+   if (!device->handle) {
+   dev_dbg(>dev, "ACPI handle missing\n");
+   put_device(>dev);
+   goto out;
+   }
+
ACPI_DEBUG_PRINT((ACPI_DB_INFO,
"Hot-removing device %s...\n", dev_name(>dev)));
 
-   __acpi_bus_trim(device);
-   /* Device node has been released. */
+   acpi_bus_trim(device);
+   /* Device node has

Re: [PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

On Wednesday, February 13, 2013 12:31:05 PM Yasuaki Ishimatsu wrote:
> Hi Rafael,
> 
> I have another comment at container.c.
> 
> 2013/02/13 12:08, Yasuaki Ishimatsu wrote:
> > Hi Rafael,
> >
> > The patch seems good.
> > There is a comment below.
> >
> > 2013/02/13 9:19, Rafael J. Wysocki wrote:
> >> From: Rafael J. Wysocki 
> >>
> >> This changeset is aimed at fixing a few different but related
> >> problems in the ACPI hotplug infrastructure.
> >>
> >> First of all, since notify handlers may be run in parallel with
> >> acpi_bus_scan(), acpi_bus_trim() and acpi_bus_hot_remove_device()
> >> and some of them are installed for ACPI handles that have no struct
> >> acpi_device objects attached (i.e. before those objects are created),
> >> those notify handlers have to take acpi_scan_lock to prevent races
> >> from taking place (e.g. a struct acpi_device is found to be present
> >> for the given ACPI handle, but right after that it is removed by
> >> acpi_bus_trim() running in parallel to the given notify handler).
> >> Moreover, since some of them call acpi_bus_scan() and
> >> acpi_bus_trim(), this leads to the conclusion that acpi_scan_lock
> >> should be acquired by the callers of these two funtions rather by
> >> these functions themselves.
> >>
> >> For these reasons, make all notify handlers that can handle device
> >> addition and eject events take acpi_scan_lock and remove the
> >> acpi_scan_lock locking from acpi_bus_scan() and acpi_bus_trim().
> >> Accordingly, update all of their users to make sure that they
> >> are always called under acpi_scan_lock.
> >>
> >> Furthermore, since eject operations are carried out asynchronously
> >> with respect to the notify events that trigger them, with the help
> >> of acpi_bus_hot_remove_device(), even if notify handlers take the
> >> ACPI scan lock, it still is possible that, for example,
> >> acpi_bus_trim() will run between acpi_bus_hot_remove_device() and
> >> the notify handler that scheduled its execution and that
> >> acpi_bus_trim() will remove the device node passed to
> >> acpi_bus_hot_remove_device() for ejection.  In that case, the struct
> >> acpi_device object obtained by acpi_bus_hot_remove_device() will be
> >> invalid and not-so-funny things will ensue.  To protect agaist that,
> >> make the users of acpi_bus_hot_remove_device() run get_device() on
> >> ACPI device node objects that are about to be passed to it and make
> >> acpi_bus_hot_remove_device() run put_device() on them and check if
> >> their ACPI handles are not NULL (make acpi_device_unregister() clear
> >> the device nodes' ACPI handles for that check to work).
> >>
> >> Finally, observe that acpi_os_hotplug_execute() actually can fail,
> >> in which case its caller ought to free memory allocated for the
> >> context object to prevent leaks from happening.  It also needs to
> >> run put_device() on the device node that it ran get_device() on
> >> previously in that case.  Modify the code accordingly.
> >>
> >> Signed-off-by: Rafael J. Wysocki 
> >> ---
> >>
> >> On top of linux-pm.git/linux-next.
> >>
> >> Thanks,
> >> Rafael
> >>
> >> ---
> >>   drivers/acpi/acpi_memhotplug.c |   56 +++---
> >>   drivers/acpi/container.c   |   10 +++--
> >>   drivers/acpi/dock.c|   19 --
> >>   drivers/acpi/processor_driver.c|   24 +---
> >>   drivers/acpi/scan.c|   69 
> >> +
> >>   drivers/pci/hotplug/acpiphp_glue.c |6 +++
> >>   drivers/pci/hotplug/sgi_hotplug.c  |5 ++
> >>   include/acpi/acpi_bus.h|3 +
> >>   8 files changed, 138 insertions(+), 54 deletions(-)
> >>
> >> Index: test/drivers/acpi/scan.c
> >> ===
> >> --- test.orig/drivers/acpi/scan.c
> >> +++ test/drivers/acpi/scan.c
> >> @@ -42,6 +42,18 @@ struct acpi_device_bus_id{
> >>   struct list_head node;
> >>   };
> >>
> >> +void acpi_scan_lock_acquire(void)
> >> +{
> >> +mutex_lock(_scan_lock);
> >> +}
> >> +EXPORT_SYMBOL_GPL(acpi_scan_lock_acquire);
> >> +
> >> +void acpi_scan_lock_release(void)
> >> +{
> >> +mutex_unlock(_scan_lock);
> >> +}
> >> +EXPORT_SYMBOL_GPL(acpi_scan_lock_release);
> >> +
> >>   int acpi_scan_add_handler(struct acpi_scan_handler *handler)
> >>   {
> >>   if (!handler || !handler->attach)
> >> @@ -95,8 +107,6 @@ acpi_device_modalias_show(struct device
> >>   }
> >>   static DEVICE_ATTR(modalias, 0444, acpi_device_modalias_show, NULL);
> >>
> >> -static void __acpi_bus_trim(struct acpi_device *start);
> >> -
> >>   /**
> >>* acpi_bus_hot_remove_device: hot-remove a device and its children
> >>* @context: struct acpi_eject_event pointer (freed in this func)
> >> @@ -107,7 +117,7 @@ static void __acpi_bus_trim(struct acpi_
> >>*/
> >>   void acpi_bus_hot_remove_device(void *context)
> >>   {
> >> -struct acpi_eject_event *ej_event = (struct acpi_eject_event *) 
> >>

Re: [PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

On Tuesday, February 12, 2013 05:55:26 PM Yinghai Lu wrote:
> On Tue, Feb 12, 2013 at 4:19 PM, Rafael J. Wysocki  wrote:
> > From: Rafael J. Wysocki 
> >
> > This changeset is aimed at fixing a few different but related
> > problems in the ACPI hotplug infrastructure.
> >
> > First of all, since notify handlers may be run in parallel with
> > acpi_bus_scan(), acpi_bus_trim() and acpi_bus_hot_remove_device()
> > and some of them are installed for ACPI handles that have no struct
> > acpi_device objects attached (i.e. before those objects are created),
> > those notify handlers have to take acpi_scan_lock to prevent races
> > from taking place (e.g. a struct acpi_device is found to be present
> > for the given ACPI handle, but right after that it is removed by
> > acpi_bus_trim() running in parallel to the given notify handler).
> > Moreover, since some of them call acpi_bus_scan() and
> > acpi_bus_trim(), this leads to the conclusion that acpi_scan_lock
> > should be acquired by the callers of these two funtions rather by
> > these functions themselves.
> >
> > For these reasons, make all notify handlers that can handle device
> > addition and eject events take acpi_scan_lock and remove the
> > acpi_scan_lock locking from acpi_bus_scan() and acpi_bus_trim().
> > Accordingly, update all of their users to make sure that they
> > are always called under acpi_scan_lock.
> >
> > Furthermore, since eject operations are carried out asynchronously
> > with respect to the notify events that trigger them, with the help
> > of acpi_bus_hot_remove_device(), even if notify handlers take the
> > ACPI scan lock, it still is possible that, for example,
> > acpi_bus_trim() will run between acpi_bus_hot_remove_device() and
> > the notify handler that scheduled its execution and that
> > acpi_bus_trim() will remove the device node passed to
> > acpi_bus_hot_remove_device() for ejection.  In that case, the struct
> > acpi_device object obtained by acpi_bus_hot_remove_device() will be
> > invalid and not-so-funny things will ensue.  To protect agaist that,
> > make the users of acpi_bus_hot_remove_device() run get_device() on
> > ACPI device node objects that are about to be passed to it and make
> > acpi_bus_hot_remove_device() run put_device() on them and check if
> > their ACPI handles are not NULL (make acpi_device_unregister() clear
> > the device nodes' ACPI handles for that check to work).
> >
> > Finally, observe that acpi_os_hotplug_execute() actually can fail,
> > in which case its caller ought to free memory allocated for the
> > context object to prevent leaks from happening.  It also needs to
> > run put_device() on the device node that it ran get_device() on
> > previously in that case.  Modify the code accordingly.
> >
> > Signed-off-by: Rafael J. Wysocki 
> 
> Acked-by: Yinghai Lu 

Thanks!

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

On Tuesday, February 12, 2013 05:55:26 PM Yinghai Lu wrote:
 On Tue, Feb 12, 2013 at 4:19 PM, Rafael J. Wysocki r...@sisk.pl wrote:
  From: Rafael J. Wysocki rafael.j.wyso...@intel.com
 
  This changeset is aimed at fixing a few different but related
  problems in the ACPI hotplug infrastructure.
 
  First of all, since notify handlers may be run in parallel with
  acpi_bus_scan(), acpi_bus_trim() and acpi_bus_hot_remove_device()
  and some of them are installed for ACPI handles that have no struct
  acpi_device objects attached (i.e. before those objects are created),
  those notify handlers have to take acpi_scan_lock to prevent races
  from taking place (e.g. a struct acpi_device is found to be present
  for the given ACPI handle, but right after that it is removed by
  acpi_bus_trim() running in parallel to the given notify handler).
  Moreover, since some of them call acpi_bus_scan() and
  acpi_bus_trim(), this leads to the conclusion that acpi_scan_lock
  should be acquired by the callers of these two funtions rather by
  these functions themselves.
 
  For these reasons, make all notify handlers that can handle device
  addition and eject events take acpi_scan_lock and remove the
  acpi_scan_lock locking from acpi_bus_scan() and acpi_bus_trim().
  Accordingly, update all of their users to make sure that they
  are always called under acpi_scan_lock.
 
  Furthermore, since eject operations are carried out asynchronously
  with respect to the notify events that trigger them, with the help
  of acpi_bus_hot_remove_device(), even if notify handlers take the
  ACPI scan lock, it still is possible that, for example,
  acpi_bus_trim() will run between acpi_bus_hot_remove_device() and
  the notify handler that scheduled its execution and that
  acpi_bus_trim() will remove the device node passed to
  acpi_bus_hot_remove_device() for ejection.  In that case, the struct
  acpi_device object obtained by acpi_bus_hot_remove_device() will be
  invalid and not-so-funny things will ensue.  To protect agaist that,
  make the users of acpi_bus_hot_remove_device() run get_device() on
  ACPI device node objects that are about to be passed to it and make
  acpi_bus_hot_remove_device() run put_device() on them and check if
  their ACPI handles are not NULL (make acpi_device_unregister() clear
  the device nodes' ACPI handles for that check to work).
 
  Finally, observe that acpi_os_hotplug_execute() actually can fail,
  in which case its caller ought to free memory allocated for the
  context object to prevent leaks from happening.  It also needs to
  run put_device() on the device node that it ran get_device() on
  previously in that case.  Modify the code accordingly.
 
  Signed-off-by: Rafael J. Wysocki rafael.j.wyso...@intel.com
 
 Acked-by: Yinghai Lu ying...@kernel.org

Thanks!

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

On Wednesday, February 13, 2013 12:31:05 PM Yasuaki Ishimatsu wrote:
 Hi Rafael,
 
 I have another comment at container.c.
 
 2013/02/13 12:08, Yasuaki Ishimatsu wrote:
  Hi Rafael,
 
  The patch seems good.
  There is a comment below.
 
  2013/02/13 9:19, Rafael J. Wysocki wrote:
  From: Rafael J. Wysocki rafael.j.wyso...@intel.com
 
  This changeset is aimed at fixing a few different but related
  problems in the ACPI hotplug infrastructure.
 
  First of all, since notify handlers may be run in parallel with
  acpi_bus_scan(), acpi_bus_trim() and acpi_bus_hot_remove_device()
  and some of them are installed for ACPI handles that have no struct
  acpi_device objects attached (i.e. before those objects are created),
  those notify handlers have to take acpi_scan_lock to prevent races
  from taking place (e.g. a struct acpi_device is found to be present
  for the given ACPI handle, but right after that it is removed by
  acpi_bus_trim() running in parallel to the given notify handler).
  Moreover, since some of them call acpi_bus_scan() and
  acpi_bus_trim(), this leads to the conclusion that acpi_scan_lock
  should be acquired by the callers of these two funtions rather by
  these functions themselves.
 
  For these reasons, make all notify handlers that can handle device
  addition and eject events take acpi_scan_lock and remove the
  acpi_scan_lock locking from acpi_bus_scan() and acpi_bus_trim().
  Accordingly, update all of their users to make sure that they
  are always called under acpi_scan_lock.
 
  Furthermore, since eject operations are carried out asynchronously
  with respect to the notify events that trigger them, with the help
  of acpi_bus_hot_remove_device(), even if notify handlers take the
  ACPI scan lock, it still is possible that, for example,
  acpi_bus_trim() will run between acpi_bus_hot_remove_device() and
  the notify handler that scheduled its execution and that
  acpi_bus_trim() will remove the device node passed to
  acpi_bus_hot_remove_device() for ejection.  In that case, the struct
  acpi_device object obtained by acpi_bus_hot_remove_device() will be
  invalid and not-so-funny things will ensue.  To protect agaist that,
  make the users of acpi_bus_hot_remove_device() run get_device() on
  ACPI device node objects that are about to be passed to it and make
  acpi_bus_hot_remove_device() run put_device() on them and check if
  their ACPI handles are not NULL (make acpi_device_unregister() clear
  the device nodes' ACPI handles for that check to work).
 
  Finally, observe that acpi_os_hotplug_execute() actually can fail,
  in which case its caller ought to free memory allocated for the
  context object to prevent leaks from happening.  It also needs to
  run put_device() on the device node that it ran get_device() on
  previously in that case.  Modify the code accordingly.
 
  Signed-off-by: Rafael J. Wysocki rafael.j.wyso...@intel.com
  ---
 
  On top of linux-pm.git/linux-next.
 
  Thanks,
  Rafael
 
  ---
drivers/acpi/acpi_memhotplug.c |   56 +++---
drivers/acpi/container.c   |   10 +++--
drivers/acpi/dock.c|   19 --
drivers/acpi/processor_driver.c|   24 +---
drivers/acpi/scan.c|   69 
  +
drivers/pci/hotplug/acpiphp_glue.c |6 +++
drivers/pci/hotplug/sgi_hotplug.c  |5 ++
include/acpi/acpi_bus.h|3 +
8 files changed, 138 insertions(+), 54 deletions(-)
 
  Index: test/drivers/acpi/scan.c
  ===
  --- test.orig/drivers/acpi/scan.c
  +++ test/drivers/acpi/scan.c
  @@ -42,6 +42,18 @@ struct acpi_device_bus_id{
struct list_head node;
};
 
  +void acpi_scan_lock_acquire(void)
  +{
  +mutex_lock(acpi_scan_lock);
  +}
  +EXPORT_SYMBOL_GPL(acpi_scan_lock_acquire);
  +
  +void acpi_scan_lock_release(void)
  +{
  +mutex_unlock(acpi_scan_lock);
  +}
  +EXPORT_SYMBOL_GPL(acpi_scan_lock_release);
  +
int acpi_scan_add_handler(struct acpi_scan_handler *handler)
{
if (!handler || !handler-attach)
  @@ -95,8 +107,6 @@ acpi_device_modalias_show(struct device
}
static DEVICE_ATTR(modalias, 0444, acpi_device_modalias_show, NULL);
 
  -static void __acpi_bus_trim(struct acpi_device *start);
  -
/**
 * acpi_bus_hot_remove_device: hot-remove a device and its children
 * @context: struct acpi_eject_event pointer (freed in this func)
  @@ -107,7 +117,7 @@ static void __acpi_bus_trim(struct acpi_
 */
void acpi_bus_hot_remove_device(void *context)
{
  -struct acpi_eject_event *ej_event = (struct acpi_eject_event *) 
  context;
  +struct acpi_eject_event *ej_event = context;
struct acpi_device *device = ej_event-device;
acpi_handle handle = device-handle;
acpi_handle temp;
  @@ -118,11 +128,19 @@ void acpi_bus_hot_remove_device(void *co

[Update][PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

From: Rafael J. Wysocki rafael.j.wyso...@intel.com

This changeset is aimed at fixing a few different but related
problems in the ACPI hotplug infrastructure.

First of all, since notify handlers may be run in parallel with
acpi_bus_scan(), acpi_bus_trim() and acpi_bus_hot_remove_device()
and some of them are installed for ACPI handles that have no struct
acpi_device objects attached (i.e. before those objects are created),
those notify handlers have to take acpi_scan_lock to prevent races
from taking place (e.g. a struct acpi_device is found to be present
for the given ACPI handle, but right after that it is removed by
acpi_bus_trim() running in parallel to the given notify handler).
Moreover, since some of them call acpi_bus_scan() and
acpi_bus_trim(), this leads to the conclusion that acpi_scan_lock
should be acquired by the callers of these two funtions rather by
these functions themselves.

For these reasons, make all notify handlers that can handle device
addition and eject events take acpi_scan_lock and remove the
acpi_scan_lock locking from acpi_bus_scan() and acpi_bus_trim().
Accordingly, update all of their users to make sure that they
are always called under acpi_scan_lock.

Furthermore, since eject operations are carried out asynchronously
with respect to the notify events that trigger them, with the help
of acpi_bus_hot_remove_device(), even if notify handlers take the
ACPI scan lock, it still is possible that, for example,
acpi_bus_trim() will run between acpi_bus_hot_remove_device() and
the notify handler that scheduled its execution and that
acpi_bus_trim() will remove the device node passed to
acpi_bus_hot_remove_device() for ejection.  In that case, the struct
acpi_device object obtained by acpi_bus_hot_remove_device() will be
invalid and not-so-funny things will ensue.  To protect agaist that,
make the users of acpi_bus_hot_remove_device() run get_device() on
ACPI device node objects that are about to be passed to it and make
acpi_bus_hot_remove_device() run put_device() on them and check if
their ACPI handles are not NULL (make acpi_device_unregister() clear
the device nodes' ACPI handles for that check to work).

Finally, observe that acpi_os_hotplug_execute() actually can fail,
in which case its caller ought to free memory allocated for the
context object to prevent leaks from happening.  It also needs to
run put_device() on the device node that it ran get_device() on
previously in that case.  Modify the code accordingly.

Signed-off-by: Rafael J. Wysocki rafael.j.wyso...@intel.com
Acked-by: Yinghai Lu ying...@kernel.org
---

This includes fixes for two issues spotted by Yasuaki Ishimatsu.

Thanks,
Rafael

---
 drivers/acpi/acpi_memhotplug.c |   56 +++---
 drivers/acpi/container.c   |   12 --
 drivers/acpi/dock.c|   19 --
 drivers/acpi/processor_driver.c|   24 +---
 drivers/acpi/scan.c|   69 +
 drivers/pci/hotplug/acpiphp_glue.c |6 +++
 drivers/pci/hotplug/sgi_hotplug.c  |5 ++
 include/acpi/acpi_bus.h|3 +
 8 files changed, 139 insertions(+), 55 deletions(-)

Index: test/drivers/acpi/scan.c
===
--- test.orig/drivers/acpi/scan.c
+++ test/drivers/acpi/scan.c
@@ -42,6 +42,18 @@ struct acpi_device_bus_id{
struct list_head node;
 };
 
+void acpi_scan_lock_acquire(void)
+{
+   mutex_lock(acpi_scan_lock);
+}
+EXPORT_SYMBOL_GPL(acpi_scan_lock_acquire);
+
+void acpi_scan_lock_release(void)
+{
+   mutex_unlock(acpi_scan_lock);
+}
+EXPORT_SYMBOL_GPL(acpi_scan_lock_release);
+
 int acpi_scan_add_handler(struct acpi_scan_handler *handler)
 {
if (!handler || !handler-attach)
@@ -95,8 +107,6 @@ acpi_device_modalias_show(struct device
 }
 static DEVICE_ATTR(modalias, 0444, acpi_device_modalias_show, NULL);
 
-static void __acpi_bus_trim(struct acpi_device *start);
-
 /**
  * acpi_bus_hot_remove_device: hot-remove a device and its children
  * @context: struct acpi_eject_event pointer (freed in this func)
@@ -107,7 +117,7 @@ static void __acpi_bus_trim(struct acpi_
  */
 void acpi_bus_hot_remove_device(void *context)
 {
-   struct acpi_eject_event *ej_event = (struct acpi_eject_event *) context;
+   struct acpi_eject_event *ej_event = context;
struct acpi_device *device = ej_event-device;
acpi_handle handle = device-handle;
acpi_handle temp;
@@ -118,11 +128,19 @@ void acpi_bus_hot_remove_device(void *co
 
mutex_lock(acpi_scan_lock);
 
+   /* If there is no handle, the device node has been unregistered. */
+   if (!device-handle) {
+   dev_dbg(device-dev, ACPI handle missing\n);
+   put_device(device-dev);
+   goto out;
+   }
+
ACPI_DEBUG_PRINT((ACPI_DB_INFO,
Hot-removing device %s...\n, dev_name(device-dev)));
 
-   __acpi_bus_trim(device);
-   /*

Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

On Wed, 2013-02-13 at 14:16 +0100, Rafael J. Wysocki wrote:
 From: Rafael J. Wysocki rafael.j.wyso...@intel.com
 
 This changeset is aimed at fixing a few different but related
 problems in the ACPI hotplug infrastructure.
 
 First of all, since notify handlers may be run in parallel with
 acpi_bus_scan(), acpi_bus_trim() and acpi_bus_hot_remove_device()
 and some of them are installed for ACPI handles that have no struct
 acpi_device objects attached (i.e. before those objects are created),
 those notify handlers have to take acpi_scan_lock to prevent races
 from taking place (e.g. a struct acpi_device is found to be present
 for the given ACPI handle, but right after that it is removed by
 acpi_bus_trim() running in parallel to the given notify handler).
 Moreover, since some of them call acpi_bus_scan() and
 acpi_bus_trim(), this leads to the conclusion that acpi_scan_lock
 should be acquired by the callers of these two funtions rather by
 these functions themselves.
 
 For these reasons, make all notify handlers that can handle device
 addition and eject events take acpi_scan_lock and remove the
 acpi_scan_lock locking from acpi_bus_scan() and acpi_bus_trim().
 Accordingly, update all of their users to make sure that they
 are always called under acpi_scan_lock.
 
 Furthermore, since eject operations are carried out asynchronously
 with respect to the notify events that trigger them, with the help
 of acpi_bus_hot_remove_device(), even if notify handlers take the
 ACPI scan lock, it still is possible that, for example,
 acpi_bus_trim() will run between acpi_bus_hot_remove_device() and
 the notify handler that scheduled its execution and that
 acpi_bus_trim() will remove the device node passed to
 acpi_bus_hot_remove_device() for ejection.  In that case, the struct
 acpi_device object obtained by acpi_bus_hot_remove_device() will be
 invalid and not-so-funny things will ensue.  To protect agaist that,
 make the users of acpi_bus_hot_remove_device() run get_device() on
 ACPI device node objects that are about to be passed to it and make
 acpi_bus_hot_remove_device() run put_device() on them and check if
 their ACPI handles are not NULL (make acpi_device_unregister() clear
 the device nodes' ACPI handles for that check to work).
 
 Finally, observe that acpi_os_hotplug_execute() actually can fail,
 in which case its caller ought to free memory allocated for the
 context object to prevent leaks from happening.  It also needs to
 run put_device() on the device node that it ran get_device() on
 previously in that case.  Modify the code accordingly.

I am concerned with this approach.  ACPICA calls notify handlers through
kacpi_notify_wq, which has the max active set to 1.  We then use
kacpi_hotplug_wq (which also has the max active set to 1) so that a
hotplug procedure does not block the notify handlers since they can be
used for non-hotplug events as well.  Acquiring the scan lock in a
notify handler means that a hotplug procedure can block any notify
events.

So, I'd prefer the following approach.

 - Change all hot-plug procedures (i.e. both add and delete) to proceed
under kacpi_hotplug_wq by calling acpi_os_hotplug_execute(). This
serializes all hotplug procedures, and prevents blocking other notify
events.  (Ideally, we should also run all online/offline procedures
under a same work-queue, just like my RFC patchset did, but this is a
different topic for now.)

 - Revert 5993c4670 unless this change is absolutely necessary.  From
the change log, it is not clear to me why this change was needed.  It
changed acpi_bus_hot_remove_device() to take an acpi_device, instead of
an acpi_handle, which introduced a race condition with acpi_device.
acpi_bus_hot_remove_device() should take an acpi_handle, and then obtain
its acpi_device from the acpi_handle since this function is serialized.

 - Remove sanity checks with an acpi_device in the notify handlers,
which have a race condition with acpi_device.  These type-specific
checks will need to be removed when we have a common notify handler
anyway.  The notify handler can continue to check the status of ACPI
device object with an acpi_handle.  Type-specific sanity checks /
validations can be performed within a hotplug procedure, instead.

Thanks,
-Toshi

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

On Wednesday, February 13, 2013 10:43:58 AM Toshi Kani wrote:
 On Wed, 2013-02-13 at 14:16 +0100, Rafael J. Wysocki wrote:
  From: Rafael J. Wysocki rafael.j.wyso...@intel.com
  
  This changeset is aimed at fixing a few different but related
  problems in the ACPI hotplug infrastructure.
  
  First of all, since notify handlers may be run in parallel with
  acpi_bus_scan(), acpi_bus_trim() and acpi_bus_hot_remove_device()
  and some of them are installed for ACPI handles that have no struct
  acpi_device objects attached (i.e. before those objects are created),
  those notify handlers have to take acpi_scan_lock to prevent races
  from taking place (e.g. a struct acpi_device is found to be present
  for the given ACPI handle, but right after that it is removed by
  acpi_bus_trim() running in parallel to the given notify handler).
  Moreover, since some of them call acpi_bus_scan() and
  acpi_bus_trim(), this leads to the conclusion that acpi_scan_lock
  should be acquired by the callers of these two funtions rather by
  these functions themselves.
  
  For these reasons, make all notify handlers that can handle device
  addition and eject events take acpi_scan_lock and remove the
  acpi_scan_lock locking from acpi_bus_scan() and acpi_bus_trim().
  Accordingly, update all of their users to make sure that they
  are always called under acpi_scan_lock.
  
  Furthermore, since eject operations are carried out asynchronously
  with respect to the notify events that trigger them, with the help
  of acpi_bus_hot_remove_device(), even if notify handlers take the
  ACPI scan lock, it still is possible that, for example,
  acpi_bus_trim() will run between acpi_bus_hot_remove_device() and
  the notify handler that scheduled its execution and that
  acpi_bus_trim() will remove the device node passed to
  acpi_bus_hot_remove_device() for ejection.  In that case, the struct
  acpi_device object obtained by acpi_bus_hot_remove_device() will be
  invalid and not-so-funny things will ensue.  To protect agaist that,
  make the users of acpi_bus_hot_remove_device() run get_device() on
  ACPI device node objects that are about to be passed to it and make
  acpi_bus_hot_remove_device() run put_device() on them and check if
  their ACPI handles are not NULL (make acpi_device_unregister() clear
  the device nodes' ACPI handles for that check to work).
  
  Finally, observe that acpi_os_hotplug_execute() actually can fail,
  in which case its caller ought to free memory allocated for the
  context object to prevent leaks from happening.  It also needs to
  run put_device() on the device node that it ran get_device() on
  previously in that case.  Modify the code accordingly.
 
 I am concerned with this approach.  ACPICA calls notify handlers through
 kacpi_notify_wq, which has the max active set to 1.  We then use
 kacpi_hotplug_wq (which also has the max active set to 1) so that a
 hotplug procedure does not block the notify handlers since they can be
 used for non-hotplug events as well.

In fact we use kacpi_hotplug_wq for a different reason.  Please read the
comment in __acpi_os_execute() for more details.

 Acquiring the scan lock in a notify handler means that a hotplug procedure
 can block any notify events.

Yes, it can.

 So, I'd prefer the following approach.
 
  - Change all hot-plug procedures (i.e. both add and delete) to proceed
 under kacpi_hotplug_wq by calling acpi_os_hotplug_execute(). This
 serializes all hotplug procedures, and prevents blocking other notify
 events.

Yes, we can do that.  I was thinking about doing that change, but not in v3.9.
There are simply too many notify handlers already there to do that so late in
the cycle.  And doing that for acpiphp, for example, won't be straightforward
at all.

Please think about the $subject patch as a temporary measure until we can do
something better (which we need to do anyway to reduce code duplication among
other things).

 (Ideally, we should also run all online/offline procedures
 under a same work-queue, just like my RFC patchset did, but this is a
 different topic for now.)

No, I don't think it is appropriate to run online/offline from _any_
workqueue.  In my opinion they should be run from user space.

  - Revert 5993c4670 unless this change is absolutely necessary.  From
 the change log, it is not clear to me why this change was needed.  It
 changed acpi_bus_hot_remove_device() to take an acpi_device, instead of
 an acpi_handle, which introduced a race condition with acpi_device.
 acpi_bus_hot_remove_device() should take an acpi_handle, and then obtain
 its acpi_device from the acpi_handle since this function is serialized.

I thought about that, but actually there's no guarantee that the handle
will be valid after _EJ0 as far as I can say.  So the race condition is
going to be there anyway and using struct acpi_device just makes it easier
to avoid it.

  - Remove sanity checks with an acpi_device in the notify handlers,
 which have a race condition

Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

On Wed, 2013-02-13 at 21:52 +0100, Rafael J. Wysocki wrote:
 On Wednesday, February 13, 2013 10:43:58 AM Toshi Kani wrote:
  On Wed, 2013-02-13 at 14:16 +0100, Rafael J. Wysocki wrote:
   From: Rafael J. Wysocki rafael.j.wyso...@intel.com
   
   This changeset is aimed at fixing a few different but related
   problems in the ACPI hotplug infrastructure.
   
   First of all, since notify handlers may be run in parallel with
   acpi_bus_scan(), acpi_bus_trim() and acpi_bus_hot_remove_device()
   and some of them are installed for ACPI handles that have no struct
   acpi_device objects attached (i.e. before those objects are created),
   those notify handlers have to take acpi_scan_lock to prevent races
   from taking place (e.g. a struct acpi_device is found to be present
   for the given ACPI handle, but right after that it is removed by
   acpi_bus_trim() running in parallel to the given notify handler).
   Moreover, since some of them call acpi_bus_scan() and
   acpi_bus_trim(), this leads to the conclusion that acpi_scan_lock
   should be acquired by the callers of these two funtions rather by
   these functions themselves.
   
   For these reasons, make all notify handlers that can handle device
   addition and eject events take acpi_scan_lock and remove the
   acpi_scan_lock locking from acpi_bus_scan() and acpi_bus_trim().
   Accordingly, update all of their users to make sure that they
   are always called under acpi_scan_lock.
   
   Furthermore, since eject operations are carried out asynchronously
   with respect to the notify events that trigger them, with the help
   of acpi_bus_hot_remove_device(), even if notify handlers take the
   ACPI scan lock, it still is possible that, for example,
   acpi_bus_trim() will run between acpi_bus_hot_remove_device() and
   the notify handler that scheduled its execution and that
   acpi_bus_trim() will remove the device node passed to
   acpi_bus_hot_remove_device() for ejection.  In that case, the struct
   acpi_device object obtained by acpi_bus_hot_remove_device() will be
   invalid and not-so-funny things will ensue.  To protect agaist that,
   make the users of acpi_bus_hot_remove_device() run get_device() on
   ACPI device node objects that are about to be passed to it and make
   acpi_bus_hot_remove_device() run put_device() on them and check if
   their ACPI handles are not NULL (make acpi_device_unregister() clear
   the device nodes' ACPI handles for that check to work).
   
   Finally, observe that acpi_os_hotplug_execute() actually can fail,
   in which case its caller ought to free memory allocated for the
   context object to prevent leaks from happening.  It also needs to
   run put_device() on the device node that it ran get_device() on
   previously in that case.  Modify the code accordingly.
  
  I am concerned with this approach.  ACPICA calls notify handlers through
  kacpi_notify_wq, which has the max active set to 1.  We then use
  kacpi_hotplug_wq (which also has the max active set to 1) so that a
  hotplug procedure does not block the notify handlers since they can be
  used for non-hotplug events as well.
 
 In fact we use kacpi_hotplug_wq for a different reason.  Please read the
 comment in __acpi_os_execute() for more details.

Yes, I am aware of the issue as well.

  Acquiring the scan lock in a notify handler means that a hotplug procedure
  can block any notify events.
 
 Yes, it can.
 
  So, I'd prefer the following approach.
  
   - Change all hot-plug procedures (i.e. both add and delete) to proceed
  under kacpi_hotplug_wq by calling acpi_os_hotplug_execute(). This
  serializes all hotplug procedures, and prevents blocking other notify
  events.
 
 Yes, we can do that.  I was thinking about doing that change, but not in v3.9.
 There are simply too many notify handlers already there to do that so late in
 the cycle.  And doing that for acpiphp, for example, won't be straightforward
 at all.

Right.  I was not suggesting this approach for v3.9.

 Please think about the $subject patch as a temporary measure until we can do
 something better (which we need to do anyway to reduce code duplication among
 other things).

I am fine with the scan lock as long as it is internal.  This patch
publishes the locking interfaces to other modules, which made me worried
that this might become a long term solution.  If we need to fix this
issue for v3.9, I am OK with it as you clarified this as a temporary
solution.

  (Ideally, we should also run all online/offline procedures
  under a same work-queue, just like my RFC patchset did, but this is a
  different topic for now.)
 
 No, I don't think it is appropriate to run online/offline from _any_
 workqueue.  In my opinion they should be run from user space.

I think there are pros and cons for this.  If we use a user thread to
run online/offline procedure, we can return a result directly.  However,
if an operation takes a long time, it will block the user thread until
it is done.  In

Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

On Wednesday, February 13, 2013 04:09:29 PM Toshi Kani wrote:
 On Wed, 2013-02-13 at 21:52 +0100, Rafael J. Wysocki wrote:
  On Wednesday, February 13, 2013 10:43:58 AM Toshi Kani wrote:
   On Wed, 2013-02-13 at 14:16 +0100, Rafael J. Wysocki wrote:
From: Rafael J. Wysocki rafael.j.wyso...@intel.com

This changeset is aimed at fixing a few different but related
problems in the ACPI hotplug infrastructure.

First of all, since notify handlers may be run in parallel with
acpi_bus_scan(), acpi_bus_trim() and acpi_bus_hot_remove_device()
and some of them are installed for ACPI handles that have no struct
acpi_device objects attached (i.e. before those objects are created),
those notify handlers have to take acpi_scan_lock to prevent races
from taking place (e.g. a struct acpi_device is found to be present
for the given ACPI handle, but right after that it is removed by
acpi_bus_trim() running in parallel to the given notify handler).
Moreover, since some of them call acpi_bus_scan() and
acpi_bus_trim(), this leads to the conclusion that acpi_scan_lock
should be acquired by the callers of these two funtions rather by
these functions themselves.

For these reasons, make all notify handlers that can handle device
addition and eject events take acpi_scan_lock and remove the
acpi_scan_lock locking from acpi_bus_scan() and acpi_bus_trim().
Accordingly, update all of their users to make sure that they
are always called under acpi_scan_lock.

Furthermore, since eject operations are carried out asynchronously
with respect to the notify events that trigger them, with the help
of acpi_bus_hot_remove_device(), even if notify handlers take the
ACPI scan lock, it still is possible that, for example,
acpi_bus_trim() will run between acpi_bus_hot_remove_device() and
the notify handler that scheduled its execution and that
acpi_bus_trim() will remove the device node passed to
acpi_bus_hot_remove_device() for ejection.  In that case, the struct
acpi_device object obtained by acpi_bus_hot_remove_device() will be
invalid and not-so-funny things will ensue.  To protect agaist that,
make the users of acpi_bus_hot_remove_device() run get_device() on
ACPI device node objects that are about to be passed to it and make
acpi_bus_hot_remove_device() run put_device() on them and check if
their ACPI handles are not NULL (make acpi_device_unregister() clear
the device nodes' ACPI handles for that check to work).

Finally, observe that acpi_os_hotplug_execute() actually can fail,
in which case its caller ought to free memory allocated for the
context object to prevent leaks from happening.  It also needs to
run put_device() on the device node that it ran get_device() on
previously in that case.  Modify the code accordingly.
   
   I am concerned with this approach.  ACPICA calls notify handlers through
   kacpi_notify_wq, which has the max active set to 1.  We then use
   kacpi_hotplug_wq (which also has the max active set to 1) so that a
   hotplug procedure does not block the notify handlers since they can be
   used for non-hotplug events as well.
  
  In fact we use kacpi_hotplug_wq for a different reason.  Please read the
  comment in __acpi_os_execute() for more details.
 
 Yes, I am aware of the issue as well.
 
   Acquiring the scan lock in a notify handler means that a hotplug procedure
   can block any notify events.
  
  Yes, it can.
  
   So, I'd prefer the following approach.
   
- Change all hot-plug procedures (i.e. both add and delete) to proceed
   under kacpi_hotplug_wq by calling acpi_os_hotplug_execute(). This
   serializes all hotplug procedures, and prevents blocking other notify
   events.
  
  Yes, we can do that.  I was thinking about doing that change, but not in 
  v3.9.
  There are simply too many notify handlers already there to do that so late 
  in
  the cycle.  And doing that for acpiphp, for example, won't be 
  straightforward
  at all.
 
 Right.  I was not suggesting this approach for v3.9.

OK

  Please think about the $subject patch as a temporary measure until we can do
  something better (which we need to do anyway to reduce code duplication 
  among
  other things).
 
 I am fine with the scan lock as long as it is internal.  This patch
 publishes the locking interfaces to other modules, which made me worried
 that this might become a long term solution.  If we need to fix this
 issue for v3.9, I am OK with it as you clarified this as a temporary
 solution.

Yes, I'm not going to allow anyone to use acpi_scan_lock anywhere else. :-)
I'm also going to make it internal again in v3.10 if possible.

   (Ideally, we should also run all online/offline procedures
   under a same work-queue, just like my RFC patchset did, but this is a
   different topic for now.)
  
  No, I don't think it is appropriate to run online/offline

Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

On Thu, 2013-02-14 at 00:42 +0100, Rafael J. Wysocki wrote:
 On Wednesday, February 13, 2013 04:09:29 PM Toshi Kani wrote:
  On Wed, 2013-02-13 at 21:52 +0100, Rafael J. Wysocki wrote:
   On Wednesday, February 13, 2013 10:43:58 AM Toshi Kani wrote:
On Wed, 2013-02-13 at 14:16 +0100, Rafael J. Wysocki wrote:
 From: Rafael J. Wysocki rafael.j.wyso...@intel.com
 
 This changeset is aimed at fixing a few different but related
 problems in the ACPI hotplug infrastructure.
 
 First of all, since notify handlers may be run in parallel with
 acpi_bus_scan(), acpi_bus_trim() and acpi_bus_hot_remove_device()
 and some of them are installed for ACPI handles that have no struct
 acpi_device objects attached (i.e. before those objects are created),
 those notify handlers have to take acpi_scan_lock to prevent races
 from taking place (e.g. a struct acpi_device is found to be present
 for the given ACPI handle, but right after that it is removed by
 acpi_bus_trim() running in parallel to the given notify handler).
 Moreover, since some of them call acpi_bus_scan() and
 acpi_bus_trim(), this leads to the conclusion that acpi_scan_lock
 should be acquired by the callers of these two funtions rather by
 these functions themselves.
 
 For these reasons, make all notify handlers that can handle device
 addition and eject events take acpi_scan_lock and remove the
 acpi_scan_lock locking from acpi_bus_scan() and acpi_bus_trim().
 Accordingly, update all of their users to make sure that they
 are always called under acpi_scan_lock.
 
 Furthermore, since eject operations are carried out asynchronously
 with respect to the notify events that trigger them, with the help
 of acpi_bus_hot_remove_device(), even if notify handlers take the
 ACPI scan lock, it still is possible that, for example,
 acpi_bus_trim() will run between acpi_bus_hot_remove_device() and
 the notify handler that scheduled its execution and that
 acpi_bus_trim() will remove the device node passed to
 acpi_bus_hot_remove_device() for ejection.  In that case, the struct
 acpi_device object obtained by acpi_bus_hot_remove_device() will be
 invalid and not-so-funny things will ensue.  To protect agaist that,
 make the users of acpi_bus_hot_remove_device() run get_device() on
 ACPI device node objects that are about to be passed to it and make
 acpi_bus_hot_remove_device() run put_device() on them and check if
 their ACPI handles are not NULL (make acpi_device_unregister() clear
 the device nodes' ACPI handles for that check to work).
 
 Finally, observe that acpi_os_hotplug_execute() actually can fail,
 in which case its caller ought to free memory allocated for the
 context object to prevent leaks from happening.  It also needs to
 run put_device() on the device node that it ran get_device() on
 previously in that case.  Modify the code accordingly.

I am concerned with this approach.  ACPICA calls notify handlers through
kacpi_notify_wq, which has the max active set to 1.  We then use
kacpi_hotplug_wq (which also has the max active set to 1) so that a
hotplug procedure does not block the notify handlers since they can be
used for non-hotplug events as well.
   
   In fact we use kacpi_hotplug_wq for a different reason.  Please read the
   comment in __acpi_os_execute() for more details.
  
  Yes, I am aware of the issue as well.
  
Acquiring the scan lock in a notify handler means that a hotplug 
procedure
can block any notify events.
   
   Yes, it can.
   
So, I'd prefer the following approach.

 - Change all hot-plug procedures (i.e. both add and delete) to proceed
under kacpi_hotplug_wq by calling acpi_os_hotplug_execute(). This
serializes all hotplug procedures, and prevents blocking other notify
events.
   
   Yes, we can do that.  I was thinking about doing that change, but not in 
   v3.9.
   There are simply too many notify handlers already there to do that so 
   late in
   the cycle.  And doing that for acpiphp, for example, won't be 
   straightforward
   at all.
  
  Right.  I was not suggesting this approach for v3.9.
 
 OK
 
   Please think about the $subject patch as a temporary measure until we can 
   do
   something better (which we need to do anyway to reduce code duplication 
   among
   other things).
  
  I am fine with the scan lock as long as it is internal.  This patch
  publishes the locking interfaces to other modules, which made me worried
  that this might become a long term solution.  If we need to fix this
  issue for v3.9, I am OK with it as you clarified this as a temporary
  solution.
 
 Yes, I'm not going to allow anyone to use acpi_scan_lock anywhere else. :-)
 I'm also going to make it internal again in v3.10 if possible.

Sounds good. :-)

(Ideally, we should also run all

RE: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

2013-02-13 Thread Moore, Robert

   I thought about that, but actually there's no guarantee that the
   handle will be valid after _EJ0 as far as I can say.  So the race
   condition is going to be there anyway and using struct acpi_device
   just makes it easier to avoid it.
 
  In theory, yes, a stale handle could be a problem, if _EJ0 performs
  unload table and if ACPICA frees up its internal data structure
  pointed by the handle as a result.  But we should not see such issue
  now since we do not support dynamic ACPI namespace yet.
 
 I'm waiting for information from Bob about that.  If we can assume ACPI
 handles to be always valid, that will simplify things quite a bit.

If a table is unloaded, all the namespace nodes for that table are removed from 
the namespace, and thus any ACPI_HANDLE pointers go stale and invalid.

Bob

Re: [PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks


Hi Rafael,

I have another comment at container.c.

2013/02/13 12:08, Yasuaki Ishimatsu wrote:

Hi Rafael,

The patch seems good.
There is a comment below.

2013/02/13 9:19, Rafael J. Wysocki wrote:

From: Rafael J. Wysocki 

This changeset is aimed at fixing a few different but related
problems in the ACPI hotplug infrastructure.

First of all, since notify handlers may be run in parallel with
acpi_bus_scan(), acpi_bus_trim() and acpi_bus_hot_remove_device()
and some of them are installed for ACPI handles that have no struct
acpi_device objects attached (i.e. before those objects are created),
those notify handlers have to take acpi_scan_lock to prevent races
from taking place (e.g. a struct acpi_device is found to be present
for the given ACPI handle, but right after that it is removed by
acpi_bus_trim() running in parallel to the given notify handler).
Moreover, since some of them call acpi_bus_scan() and
acpi_bus_trim(), this leads to the conclusion that acpi_scan_lock
should be acquired by the callers of these two funtions rather by
these functions themselves.

For these reasons, make all notify handlers that can handle device
addition and eject events take acpi_scan_lock and remove the
acpi_scan_lock locking from acpi_bus_scan() and acpi_bus_trim().
Accordingly, update all of their users to make sure that they
are always called under acpi_scan_lock.

Furthermore, since eject operations are carried out asynchronously
with respect to the notify events that trigger them, with the help
of acpi_bus_hot_remove_device(), even if notify handlers take the
ACPI scan lock, it still is possible that, for example,
acpi_bus_trim() will run between acpi_bus_hot_remove_device() and
the notify handler that scheduled its execution and that
acpi_bus_trim() will remove the device node passed to
acpi_bus_hot_remove_device() for ejection.  In that case, the struct
acpi_device object obtained by acpi_bus_hot_remove_device() will be
invalid and not-so-funny things will ensue.  To protect agaist that,
make the users of acpi_bus_hot_remove_device() run get_device() on
ACPI device node objects that are about to be passed to it and make
acpi_bus_hot_remove_device() run put_device() on them and check if
their ACPI handles are not NULL (make acpi_device_unregister() clear
the device nodes' ACPI handles for that check to work).

Finally, observe that acpi_os_hotplug_execute() actually can fail,
in which case its caller ought to free memory allocated for the
context object to prevent leaks from happening.  It also needs to
run put_device() on the device node that it ran get_device() on
previously in that case.  Modify the code accordingly.

Signed-off-by: Rafael J. Wysocki 
---

On top of linux-pm.git/linux-next.

Thanks,
Rafael

---
  drivers/acpi/acpi_memhotplug.c |   56 +++---
  drivers/acpi/container.c   |   10 +++--
  drivers/acpi/dock.c|   19 --
  drivers/acpi/processor_driver.c|   24 +---
  drivers/acpi/scan.c|   69 
+
  drivers/pci/hotplug/acpiphp_glue.c |6 +++
  drivers/pci/hotplug/sgi_hotplug.c  |5 ++
  include/acpi/acpi_bus.h|3 +
  8 files changed, 138 insertions(+), 54 deletions(-)

Index: test/drivers/acpi/scan.c
===
--- test.orig/drivers/acpi/scan.c
+++ test/drivers/acpi/scan.c
@@ -42,6 +42,18 @@ struct acpi_device_bus_id{
  struct list_head node;
  };

+void acpi_scan_lock_acquire(void)
+{
+mutex_lock(_scan_lock);
+}
+EXPORT_SYMBOL_GPL(acpi_scan_lock_acquire);
+
+void acpi_scan_lock_release(void)
+{
+mutex_unlock(_scan_lock);
+}
+EXPORT_SYMBOL_GPL(acpi_scan_lock_release);
+
  int acpi_scan_add_handler(struct acpi_scan_handler *handler)
  {
  if (!handler || !handler->attach)
@@ -95,8 +107,6 @@ acpi_device_modalias_show(struct device
  }
  static DEVICE_ATTR(modalias, 0444, acpi_device_modalias_show, NULL);

-static void __acpi_bus_trim(struct acpi_device *start);
-
  /**
   * acpi_bus_hot_remove_device: hot-remove a device and its children
   * @context: struct acpi_eject_event pointer (freed in this func)
@@ -107,7 +117,7 @@ static void __acpi_bus_trim(struct acpi_
   */
  void acpi_bus_hot_remove_device(void *context)
  {
-struct acpi_eject_event *ej_event = (struct acpi_eject_event *) context;
+struct acpi_eject_event *ej_event = context;
  struct acpi_device *device = ej_event->device;
  acpi_handle handle = device->handle;
  acpi_handle temp;
@@ -118,11 +128,19 @@ void acpi_bus_hot_remove_device(void *co

  mutex_lock(_scan_lock);

+/* If there is no handle, the device node has been unregistered. */
+if (!device->handle) {
+dev_dbg(>dev, "ACPI handle missing\n");
+put_device(>dev);
+goto out;
+}
+
  ACPI_DEBUG_PRINT((ACPI_DB_INFO,
  "Hot-removing device %s...\n", dev_name(>dev)));

-__acpi_bus_trim(device);
-

Re: [PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks


Hi Rafael,

The patch seems good.
There is a comment below.

2013/02/13 9:19, Rafael J. Wysocki wrote:

From: Rafael J. Wysocki 

This changeset is aimed at fixing a few different but related
problems in the ACPI hotplug infrastructure.

First of all, since notify handlers may be run in parallel with
acpi_bus_scan(), acpi_bus_trim() and acpi_bus_hot_remove_device()
and some of them are installed for ACPI handles that have no struct
acpi_device objects attached (i.e. before those objects are created),
those notify handlers have to take acpi_scan_lock to prevent races
from taking place (e.g. a struct acpi_device is found to be present
for the given ACPI handle, but right after that it is removed by
acpi_bus_trim() running in parallel to the given notify handler).
Moreover, since some of them call acpi_bus_scan() and
acpi_bus_trim(), this leads to the conclusion that acpi_scan_lock
should be acquired by the callers of these two funtions rather by
these functions themselves.

For these reasons, make all notify handlers that can handle device
addition and eject events take acpi_scan_lock and remove the
acpi_scan_lock locking from acpi_bus_scan() and acpi_bus_trim().
Accordingly, update all of their users to make sure that they
are always called under acpi_scan_lock.

Furthermore, since eject operations are carried out asynchronously
with respect to the notify events that trigger them, with the help
of acpi_bus_hot_remove_device(), even if notify handlers take the
ACPI scan lock, it still is possible that, for example,
acpi_bus_trim() will run between acpi_bus_hot_remove_device() and
the notify handler that scheduled its execution and that
acpi_bus_trim() will remove the device node passed to
acpi_bus_hot_remove_device() for ejection.  In that case, the struct
acpi_device object obtained by acpi_bus_hot_remove_device() will be
invalid and not-so-funny things will ensue.  To protect agaist that,
make the users of acpi_bus_hot_remove_device() run get_device() on
ACPI device node objects that are about to be passed to it and make
acpi_bus_hot_remove_device() run put_device() on them and check if
their ACPI handles are not NULL (make acpi_device_unregister() clear
the device nodes' ACPI handles for that check to work).

Finally, observe that acpi_os_hotplug_execute() actually can fail,
in which case its caller ought to free memory allocated for the
context object to prevent leaks from happening.  It also needs to
run put_device() on the device node that it ran get_device() on
previously in that case.  Modify the code accordingly.

Signed-off-by: Rafael J. Wysocki 
---

On top of linux-pm.git/linux-next.

Thanks,
Rafael

---
  drivers/acpi/acpi_memhotplug.c |   56 +++---
  drivers/acpi/container.c   |   10 +++--
  drivers/acpi/dock.c|   19 --
  drivers/acpi/processor_driver.c|   24 +---
  drivers/acpi/scan.c|   69 
+
  drivers/pci/hotplug/acpiphp_glue.c |6 +++
  drivers/pci/hotplug/sgi_hotplug.c  |5 ++
  include/acpi/acpi_bus.h|3 +
  8 files changed, 138 insertions(+), 54 deletions(-)

Index: test/drivers/acpi/scan.c
===
--- test.orig/drivers/acpi/scan.c
+++ test/drivers/acpi/scan.c
@@ -42,6 +42,18 @@ struct acpi_device_bus_id{
struct list_head node;
  };

+void acpi_scan_lock_acquire(void)
+{
+   mutex_lock(_scan_lock);
+}
+EXPORT_SYMBOL_GPL(acpi_scan_lock_acquire);
+
+void acpi_scan_lock_release(void)
+{
+   mutex_unlock(_scan_lock);
+}
+EXPORT_SYMBOL_GPL(acpi_scan_lock_release);
+
  int acpi_scan_add_handler(struct acpi_scan_handler *handler)
  {
if (!handler || !handler->attach)
@@ -95,8 +107,6 @@ acpi_device_modalias_show(struct device
  }
  static DEVICE_ATTR(modalias, 0444, acpi_device_modalias_show, NULL);

-static void __acpi_bus_trim(struct acpi_device *start);
-
  /**
   * acpi_bus_hot_remove_device: hot-remove a device and its children
   * @context: struct acpi_eject_event pointer (freed in this func)
@@ -107,7 +117,7 @@ static void __acpi_bus_trim(struct acpi_
   */
  void acpi_bus_hot_remove_device(void *context)
  {
-   struct acpi_eject_event *ej_event = (struct acpi_eject_event *) context;
+   struct acpi_eject_event *ej_event = context;
struct acpi_device *device = ej_event->device;
acpi_handle handle = device->handle;
acpi_handle temp;
@@ -118,11 +128,19 @@ void acpi_bus_hot_remove_device(void *co

mutex_lock(_scan_lock);

+   /* If there is no handle, the device node has been unregistered. */
+   if (!device->handle) {
+   dev_dbg(>dev, "ACPI handle missing\n");
+   put_device(>dev);
+   goto out;
+   }
+
ACPI_DEBUG_PRINT((ACPI_DB_INFO,
"Hot-removing device %s...\n", dev_name(>dev)));

-   __acpi_bus_trim(device);
-   /* Device node has been

Re: [PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

2013-02-12 Thread Yinghai Lu

On Tue, Feb 12, 2013 at 4:19 PM, Rafael J. Wysocki  wrote:
> From: Rafael J. Wysocki 
>
> This changeset is aimed at fixing a few different but related
> problems in the ACPI hotplug infrastructure.
>
> First of all, since notify handlers may be run in parallel with
> acpi_bus_scan(), acpi_bus_trim() and acpi_bus_hot_remove_device()
> and some of them are installed for ACPI handles that have no struct
> acpi_device objects attached (i.e. before those objects are created),
> those notify handlers have to take acpi_scan_lock to prevent races
> from taking place (e.g. a struct acpi_device is found to be present
> for the given ACPI handle, but right after that it is removed by
> acpi_bus_trim() running in parallel to the given notify handler).
> Moreover, since some of them call acpi_bus_scan() and
> acpi_bus_trim(), this leads to the conclusion that acpi_scan_lock
> should be acquired by the callers of these two funtions rather by
> these functions themselves.
>
> For these reasons, make all notify handlers that can handle device
> addition and eject events take acpi_scan_lock and remove the
> acpi_scan_lock locking from acpi_bus_scan() and acpi_bus_trim().
> Accordingly, update all of their users to make sure that they
> are always called under acpi_scan_lock.
>
> Furthermore, since eject operations are carried out asynchronously
> with respect to the notify events that trigger them, with the help
> of acpi_bus_hot_remove_device(), even if notify handlers take the
> ACPI scan lock, it still is possible that, for example,
> acpi_bus_trim() will run between acpi_bus_hot_remove_device() and
> the notify handler that scheduled its execution and that
> acpi_bus_trim() will remove the device node passed to
> acpi_bus_hot_remove_device() for ejection.  In that case, the struct
> acpi_device object obtained by acpi_bus_hot_remove_device() will be
> invalid and not-so-funny things will ensue.  To protect agaist that,
> make the users of acpi_bus_hot_remove_device() run get_device() on
> ACPI device node objects that are about to be passed to it and make
> acpi_bus_hot_remove_device() run put_device() on them and check if
> their ACPI handles are not NULL (make acpi_device_unregister() clear
> the device nodes' ACPI handles for that check to work).
>
> Finally, observe that acpi_os_hotplug_execute() actually can fail,
> in which case its caller ought to free memory allocated for the
> context object to prevent leaks from happening.  It also needs to
> run put_device() on the device node that it ran get_device() on
> previously in that case.  Modify the code accordingly.
>
> Signed-off-by: Rafael J. Wysocki 

Acked-by: Yinghai Lu 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

2013-02-12 Thread Rafael J. Wysocki

From: Rafael J. Wysocki 

This changeset is aimed at fixing a few different but related
problems in the ACPI hotplug infrastructure.

First of all, since notify handlers may be run in parallel with
acpi_bus_scan(), acpi_bus_trim() and acpi_bus_hot_remove_device()
and some of them are installed for ACPI handles that have no struct
acpi_device objects attached (i.e. before those objects are created),
those notify handlers have to take acpi_scan_lock to prevent races
from taking place (e.g. a struct acpi_device is found to be present
for the given ACPI handle, but right after that it is removed by
acpi_bus_trim() running in parallel to the given notify handler).
Moreover, since some of them call acpi_bus_scan() and
acpi_bus_trim(), this leads to the conclusion that acpi_scan_lock
should be acquired by the callers of these two funtions rather by
these functions themselves.

For these reasons, make all notify handlers that can handle device
addition and eject events take acpi_scan_lock and remove the
acpi_scan_lock locking from acpi_bus_scan() and acpi_bus_trim().
Accordingly, update all of their users to make sure that they
are always called under acpi_scan_lock.

Furthermore, since eject operations are carried out asynchronously
with respect to the notify events that trigger them, with the help
of acpi_bus_hot_remove_device(), even if notify handlers take the
ACPI scan lock, it still is possible that, for example,
acpi_bus_trim() will run between acpi_bus_hot_remove_device() and
the notify handler that scheduled its execution and that
acpi_bus_trim() will remove the device node passed to
acpi_bus_hot_remove_device() for ejection.  In that case, the struct
acpi_device object obtained by acpi_bus_hot_remove_device() will be
invalid and not-so-funny things will ensue.  To protect agaist that,
make the users of acpi_bus_hot_remove_device() run get_device() on
ACPI device node objects that are about to be passed to it and make
acpi_bus_hot_remove_device() run put_device() on them and check if
their ACPI handles are not NULL (make acpi_device_unregister() clear
the device nodes' ACPI handles for that check to work).

Finally, observe that acpi_os_hotplug_execute() actually can fail,
in which case its caller ought to free memory allocated for the
context object to prevent leaks from happening.  It also needs to
run put_device() on the device node that it ran get_device() on
previously in that case.  Modify the code accordingly.

Signed-off-by: Rafael J. Wysocki 
---

On top of linux-pm.git/linux-next.

Thanks,
Rafael

---
 drivers/acpi/acpi_memhotplug.c |   56 +++---
 drivers/acpi/container.c   |   10 +++--
 drivers/acpi/dock.c|   19 --
 drivers/acpi/processor_driver.c|   24 +---
 drivers/acpi/scan.c|   69 +
 drivers/pci/hotplug/acpiphp_glue.c |6 +++
 drivers/pci/hotplug/sgi_hotplug.c  |5 ++
 include/acpi/acpi_bus.h|3 +
 8 files changed, 138 insertions(+), 54 deletions(-)

Index: test/drivers/acpi/scan.c
===
--- test.orig/drivers/acpi/scan.c
+++ test/drivers/acpi/scan.c
@@ -42,6 +42,18 @@ struct acpi_device_bus_id{
struct list_head node;
 };
 
+void acpi_scan_lock_acquire(void)
+{
+   mutex_lock(_scan_lock);
+}
+EXPORT_SYMBOL_GPL(acpi_scan_lock_acquire);
+
+void acpi_scan_lock_release(void)
+{
+   mutex_unlock(_scan_lock);
+}
+EXPORT_SYMBOL_GPL(acpi_scan_lock_release);
+
 int acpi_scan_add_handler(struct acpi_scan_handler *handler)
 {
if (!handler || !handler->attach)
@@ -95,8 +107,6 @@ acpi_device_modalias_show(struct device
 }
 static DEVICE_ATTR(modalias, 0444, acpi_device_modalias_show, NULL);
 
-static void __acpi_bus_trim(struct acpi_device *start);
-
 /**
  * acpi_bus_hot_remove_device: hot-remove a device and its children
  * @context: struct acpi_eject_event pointer (freed in this func)
@@ -107,7 +117,7 @@ static void __acpi_bus_trim(struct acpi_
  */
 void acpi_bus_hot_remove_device(void *context)
 {
-   struct acpi_eject_event *ej_event = (struct acpi_eject_event *) context;
+   struct acpi_eject_event *ej_event = context;
struct acpi_device *device = ej_event->device;
acpi_handle handle = device->handle;
acpi_handle temp;
@@ -118,11 +128,19 @@ void acpi_bus_hot_remove_device(void *co
 
mutex_lock(_scan_lock);
 
+   /* If there is no handle, the device node has been unregistered. */
+   if (!device->handle) {
+   dev_dbg(>dev, "ACPI handle missing\n");
+   put_device(>dev);
+   goto out;
+   }
+
ACPI_DEBUG_PRINT((ACPI_DB_INFO,
"Hot-removing device %s...\n", dev_name(>dev)));
 
-   __acpi_bus_trim(device);
-   /* Device node has been released. */
+   acpi_bus_trim(device);
+   /* Device node has been unregistered. */
+   put_device(>dev);

[PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

2013-02-12 Thread Rafael J. Wysocki

From: Rafael J. Wysocki rafael.j.wyso...@intel.com

This changeset is aimed at fixing a few different but related
problems in the ACPI hotplug infrastructure.

First of all, since notify handlers may be run in parallel with
acpi_bus_scan(), acpi_bus_trim() and acpi_bus_hot_remove_device()
and some of them are installed for ACPI handles that have no struct
acpi_device objects attached (i.e. before those objects are created),
those notify handlers have to take acpi_scan_lock to prevent races
from taking place (e.g. a struct acpi_device is found to be present
for the given ACPI handle, but right after that it is removed by
acpi_bus_trim() running in parallel to the given notify handler).
Moreover, since some of them call acpi_bus_scan() and
acpi_bus_trim(), this leads to the conclusion that acpi_scan_lock
should be acquired by the callers of these two funtions rather by
these functions themselves.

For these reasons, make all notify handlers that can handle device
addition and eject events take acpi_scan_lock and remove the
acpi_scan_lock locking from acpi_bus_scan() and acpi_bus_trim().
Accordingly, update all of their users to make sure that they
are always called under acpi_scan_lock.

Furthermore, since eject operations are carried out asynchronously
with respect to the notify events that trigger them, with the help
of acpi_bus_hot_remove_device(), even if notify handlers take the
ACPI scan lock, it still is possible that, for example,
acpi_bus_trim() will run between acpi_bus_hot_remove_device() and
the notify handler that scheduled its execution and that
acpi_bus_trim() will remove the device node passed to
acpi_bus_hot_remove_device() for ejection.  In that case, the struct
acpi_device object obtained by acpi_bus_hot_remove_device() will be
invalid and not-so-funny things will ensue.  To protect agaist that,
make the users of acpi_bus_hot_remove_device() run get_device() on
ACPI device node objects that are about to be passed to it and make
acpi_bus_hot_remove_device() run put_device() on them and check if
their ACPI handles are not NULL (make acpi_device_unregister() clear
the device nodes' ACPI handles for that check to work).

Finally, observe that acpi_os_hotplug_execute() actually can fail,
in which case its caller ought to free memory allocated for the
context object to prevent leaks from happening.  It also needs to
run put_device() on the device node that it ran get_device() on
previously in that case.  Modify the code accordingly.

Signed-off-by: Rafael J. Wysocki rafael.j.wyso...@intel.com
---

On top of linux-pm.git/linux-next.

Thanks,
Rafael

---
 drivers/acpi/acpi_memhotplug.c |   56 +++---
 drivers/acpi/container.c   |   10 +++--
 drivers/acpi/dock.c|   19 --
 drivers/acpi/processor_driver.c|   24 +---
 drivers/acpi/scan.c|   69 +
 drivers/pci/hotplug/acpiphp_glue.c |6 +++
 drivers/pci/hotplug/sgi_hotplug.c  |5 ++
 include/acpi/acpi_bus.h|3 +
 8 files changed, 138 insertions(+), 54 deletions(-)

Index: test/drivers/acpi/scan.c
===
--- test.orig/drivers/acpi/scan.c
+++ test/drivers/acpi/scan.c
@@ -42,6 +42,18 @@ struct acpi_device_bus_id{
struct list_head node;
 };
 
+void acpi_scan_lock_acquire(void)
+{
+   mutex_lock(acpi_scan_lock);
+}
+EXPORT_SYMBOL_GPL(acpi_scan_lock_acquire);
+
+void acpi_scan_lock_release(void)
+{
+   mutex_unlock(acpi_scan_lock);
+}
+EXPORT_SYMBOL_GPL(acpi_scan_lock_release);
+
 int acpi_scan_add_handler(struct acpi_scan_handler *handler)
 {
if (!handler || !handler-attach)
@@ -95,8 +107,6 @@ acpi_device_modalias_show(struct device
 }
 static DEVICE_ATTR(modalias, 0444, acpi_device_modalias_show, NULL);
 
-static void __acpi_bus_trim(struct acpi_device *start);
-
 /**
  * acpi_bus_hot_remove_device: hot-remove a device and its children
  * @context: struct acpi_eject_event pointer (freed in this func)
@@ -107,7 +117,7 @@ static void __acpi_bus_trim(struct acpi_
  */
 void acpi_bus_hot_remove_device(void *context)
 {
-   struct acpi_eject_event *ej_event = (struct acpi_eject_event *) context;
+   struct acpi_eject_event *ej_event = context;
struct acpi_device *device = ej_event-device;
acpi_handle handle = device-handle;
acpi_handle temp;
@@ -118,11 +128,19 @@ void acpi_bus_hot_remove_device(void *co
 
mutex_lock(acpi_scan_lock);
 
+   /* If there is no handle, the device node has been unregistered. */
+   if (!device-handle) {
+   dev_dbg(device-dev, ACPI handle missing\n);
+   put_device(device-dev);
+   goto out;
+   }
+
ACPI_DEBUG_PRINT((ACPI_DB_INFO,
Hot-removing device %s...\n, dev_name(device-dev)));
 
-   __acpi_bus_trim(device);
-   /* Device node has been released. */
+   acpi_bus_trim(device);
+

Re: [PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

2013-02-12 Thread Yinghai Lu

On Tue, Feb 12, 2013 at 4:19 PM, Rafael J. Wysocki r...@sisk.pl wrote:
 From: Rafael J. Wysocki rafael.j.wyso...@intel.com

 This changeset is aimed at fixing a few different but related
 problems in the ACPI hotplug infrastructure.

 First of all, since notify handlers may be run in parallel with
 acpi_bus_scan(), acpi_bus_trim() and acpi_bus_hot_remove_device()
 and some of them are installed for ACPI handles that have no struct
 acpi_device objects attached (i.e. before those objects are created),
 those notify handlers have to take acpi_scan_lock to prevent races
 from taking place (e.g. a struct acpi_device is found to be present
 for the given ACPI handle, but right after that it is removed by
 acpi_bus_trim() running in parallel to the given notify handler).
 Moreover, since some of them call acpi_bus_scan() and
 acpi_bus_trim(), this leads to the conclusion that acpi_scan_lock
 should be acquired by the callers of these two funtions rather by
 these functions themselves.

 For these reasons, make all notify handlers that can handle device
 addition and eject events take acpi_scan_lock and remove the
 acpi_scan_lock locking from acpi_bus_scan() and acpi_bus_trim().
 Accordingly, update all of their users to make sure that they
 are always called under acpi_scan_lock.

 Furthermore, since eject operations are carried out asynchronously
 with respect to the notify events that trigger them, with the help
 of acpi_bus_hot_remove_device(), even if notify handlers take the
 ACPI scan lock, it still is possible that, for example,
 acpi_bus_trim() will run between acpi_bus_hot_remove_device() and
 the notify handler that scheduled its execution and that
 acpi_bus_trim() will remove the device node passed to
 acpi_bus_hot_remove_device() for ejection.  In that case, the struct
 acpi_device object obtained by acpi_bus_hot_remove_device() will be
 invalid and not-so-funny things will ensue.  To protect agaist that,
 make the users of acpi_bus_hot_remove_device() run get_device() on
 ACPI device node objects that are about to be passed to it and make
 acpi_bus_hot_remove_device() run put_device() on them and check if
 their ACPI handles are not NULL (make acpi_device_unregister() clear
 the device nodes' ACPI handles for that check to work).

 Finally, observe that acpi_os_hotplug_execute() actually can fail,
 in which case its caller ought to free memory allocated for the
 context object to prevent leaks from happening.  It also needs to
 run put_device() on the device node that it ran get_device() on
 previously in that case.  Modify the code accordingly.

 Signed-off-by: Rafael J. Wysocki rafael.j.wyso...@intel.com

Acked-by: Yinghai Lu ying...@kernel.org
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks


Hi Rafael,

The patch seems good.
There is a comment below.

2013/02/13 9:19, Rafael J. Wysocki wrote:

From: Rafael J. Wysocki rafael.j.wyso...@intel.com

This changeset is aimed at fixing a few different but related
problems in the ACPI hotplug infrastructure.

First of all, since notify handlers may be run in parallel with
acpi_bus_scan(), acpi_bus_trim() and acpi_bus_hot_remove_device()
and some of them are installed for ACPI handles that have no struct
acpi_device objects attached (i.e. before those objects are created),
those notify handlers have to take acpi_scan_lock to prevent races
from taking place (e.g. a struct acpi_device is found to be present
for the given ACPI handle, but right after that it is removed by
acpi_bus_trim() running in parallel to the given notify handler).
Moreover, since some of them call acpi_bus_scan() and
acpi_bus_trim(), this leads to the conclusion that acpi_scan_lock
should be acquired by the callers of these two funtions rather by
these functions themselves.

For these reasons, make all notify handlers that can handle device
addition and eject events take acpi_scan_lock and remove the
acpi_scan_lock locking from acpi_bus_scan() and acpi_bus_trim().
Accordingly, update all of their users to make sure that they
are always called under acpi_scan_lock.

Furthermore, since eject operations are carried out asynchronously
with respect to the notify events that trigger them, with the help
of acpi_bus_hot_remove_device(), even if notify handlers take the
ACPI scan lock, it still is possible that, for example,
acpi_bus_trim() will run between acpi_bus_hot_remove_device() and
the notify handler that scheduled its execution and that
acpi_bus_trim() will remove the device node passed to
acpi_bus_hot_remove_device() for ejection.  In that case, the struct
acpi_device object obtained by acpi_bus_hot_remove_device() will be
invalid and not-so-funny things will ensue.  To protect agaist that,
make the users of acpi_bus_hot_remove_device() run get_device() on
ACPI device node objects that are about to be passed to it and make
acpi_bus_hot_remove_device() run put_device() on them and check if
their ACPI handles are not NULL (make acpi_device_unregister() clear
the device nodes' ACPI handles for that check to work).

Finally, observe that acpi_os_hotplug_execute() actually can fail,
in which case its caller ought to free memory allocated for the
context object to prevent leaks from happening.  It also needs to
run put_device() on the device node that it ran get_device() on
previously in that case.  Modify the code accordingly.

Signed-off-by: Rafael J. Wysocki rafael.j.wyso...@intel.com
---

On top of linux-pm.git/linux-next.

Thanks,
Rafael

---
  drivers/acpi/acpi_memhotplug.c |   56 +++---
  drivers/acpi/container.c   |   10 +++--
  drivers/acpi/dock.c|   19 --
  drivers/acpi/processor_driver.c|   24 +---
  drivers/acpi/scan.c|   69 
+
  drivers/pci/hotplug/acpiphp_glue.c |6 +++
  drivers/pci/hotplug/sgi_hotplug.c  |5 ++
  include/acpi/acpi_bus.h|3 +
  8 files changed, 138 insertions(+), 54 deletions(-)

Index: test/drivers/acpi/scan.c
===
--- test.orig/drivers/acpi/scan.c
+++ test/drivers/acpi/scan.c
@@ -42,6 +42,18 @@ struct acpi_device_bus_id{
struct list_head node;
  };

+void acpi_scan_lock_acquire(void)
+{
+   mutex_lock(acpi_scan_lock);
+}
+EXPORT_SYMBOL_GPL(acpi_scan_lock_acquire);
+
+void acpi_scan_lock_release(void)
+{
+   mutex_unlock(acpi_scan_lock);
+}
+EXPORT_SYMBOL_GPL(acpi_scan_lock_release);
+
  int acpi_scan_add_handler(struct acpi_scan_handler *handler)
  {
if (!handler || !handler-attach)
@@ -95,8 +107,6 @@ acpi_device_modalias_show(struct device
  }
  static DEVICE_ATTR(modalias, 0444, acpi_device_modalias_show, NULL);

-static void __acpi_bus_trim(struct acpi_device *start);
-
  /**
   * acpi_bus_hot_remove_device: hot-remove a device and its children
   * @context: struct acpi_eject_event pointer (freed in this func)
@@ -107,7 +117,7 @@ static void __acpi_bus_trim(struct acpi_
   */
  void acpi_bus_hot_remove_device(void *context)
  {
-   struct acpi_eject_event *ej_event = (struct acpi_eject_event *) context;
+   struct acpi_eject_event *ej_event = context;
struct acpi_device *device = ej_event-device;
acpi_handle handle = device-handle;
acpi_handle temp;
@@ -118,11 +128,19 @@ void acpi_bus_hot_remove_device(void *co

mutex_lock(acpi_scan_lock);

+   /* If there is no handle, the device node has been unregistered. */
+   if (!device-handle) {
+   dev_dbg(device-dev, ACPI handle missing\n);
+   put_device(device-dev);
+   goto out;
+   }
+
ACPI_DEBUG_PRINT((ACPI_DB_INFO,
Hot-removing device %s...\n,

Re: [PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks