Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-14 Thread Toshi Kani
On Thu, 2012-12-13 at 23:15 +0800, Jiang Liu wrote:
> On 12/13/2012 10:42 PM, Toshi Kani wrote:
> > On Tue, 2012-12-11 at 22:34 +0800, Jiang Liu wrote:
> >> On 12/08/2012 09:08 AM, Toshi Kani wrote:
> >>> On Fri, 2012-12-07 at 13:57 +0800, Jiang Liu wrote:
>  On 2012-12-7 10:57, Toshi Kani wrote:
> > On Fri, 2012-12-07 at 00:40 +0800, Jiang Liu wrote:
 :
> >>>
>  2) an ACPI based hotplug manager driver, which is a platform independent
> driver and manages all hotplug slot created by the slot driver.
> >>>
> >>> It is surely impressive work, but I think is is a bit overdoing.  I
> >>> expect hot-pluggable servers come with management console and/or GUI
> >>> where a user can manage hardware units and initiate hot-plug operations.
> >>> I do not think the kernel needs to step into such area since it tends to
> >>> be platform-specific. 
> >> One of the major usages of this feature is for testing. 
> >> It will be hard for OSVs and OEMs to verify hotplug functionalities if it 
> >> could
> >> only be tested by physical hotplug or through management console. So to 
> >> pave the
> >> way for hotplug, we need to provide a mechanism for OEMs and OSVs to 
> >> execute 
> >> auto stress tests for hotplug functionalities.
> > 
> > Yes, but such OS->FW interface is platform-specific.  Some platforms use
> > IPMI for the OS to communicate with the management console.  In this
> > case, an OEM-specific command can be used to request a hotplug through
> > IPMI.  Some platforms may also support test programs to run on the
> > management console for validations.
> > 
> > For early development testing, Yinghai's SCI emulation patch can be used
> > to emulate hotplug events from the OS.  It would be part of the kernel
> > debugging features once this patch is accepted. 
> Hi Toshi,
>   ACPI 5.0 has provided some mechanism to normalize the way to issue
> RAS related requests to firmware. I hope ACPI 5.x will define some 
> standardized
> ways based on the PCC defined in 5.0. If needed, we may provide platform
> specific methods for them too.

Thanks for the pointer!  Yeah, the spec purposely does not define the
command.  When we support PCC, we will need to provide a way for user
app or oem module to supply a payload. 

Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-14 Thread Toshi Kani
On Thu, 2012-12-13 at 23:15 +0800, Jiang Liu wrote:
 On 12/13/2012 10:42 PM, Toshi Kani wrote:
  On Tue, 2012-12-11 at 22:34 +0800, Jiang Liu wrote:
  On 12/08/2012 09:08 AM, Toshi Kani wrote:
  On Fri, 2012-12-07 at 13:57 +0800, Jiang Liu wrote:
  On 2012-12-7 10:57, Toshi Kani wrote:
  On Fri, 2012-12-07 at 00:40 +0800, Jiang Liu wrote:
 :
 
  2) an ACPI based hotplug manager driver, which is a platform independent
 driver and manages all hotplug slot created by the slot driver.
 
  It is surely impressive work, but I think is is a bit overdoing.  I
  expect hot-pluggable servers come with management console and/or GUI
  where a user can manage hardware units and initiate hot-plug operations.
  I do not think the kernel needs to step into such area since it tends to
  be platform-specific. 
  One of the major usages of this feature is for testing. 
  It will be hard for OSVs and OEMs to verify hotplug functionalities if it 
  could
  only be tested by physical hotplug or through management console. So to 
  pave the
  way for hotplug, we need to provide a mechanism for OEMs and OSVs to 
  execute 
  auto stress tests for hotplug functionalities.
  
  Yes, but such OS-FW interface is platform-specific.  Some platforms use
  IPMI for the OS to communicate with the management console.  In this
  case, an OEM-specific command can be used to request a hotplug through
  IPMI.  Some platforms may also support test programs to run on the
  management console for validations.
  
  For early development testing, Yinghai's SCI emulation patch can be used
  to emulate hotplug events from the OS.  It would be part of the kernel
  debugging features once this patch is accepted. 
 Hi Toshi,
   ACPI 5.0 has provided some mechanism to normalize the way to issue
 RAS related requests to firmware. I hope ACPI 5.x will define some 
 standardized
 ways based on the PCC defined in 5.0. If needed, we may provide platform
 specific methods for them too.

Thanks for the pointer!  Yeah, the spec purposely does not define the
command.  When we support PCC, we will need to provide a way for user
app or oem module to supply a payload. 

Thanks,
-Toshi

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-13 Thread Jiang Liu
On 12/13/2012 10:42 PM, Toshi Kani wrote:
> On Tue, 2012-12-11 at 22:34 +0800, Jiang Liu wrote:
>> On 12/08/2012 09:08 AM, Toshi Kani wrote:
>>> On Fri, 2012-12-07 at 13:57 +0800, Jiang Liu wrote:
 On 2012-12-7 10:57, Toshi Kani wrote:
> On Fri, 2012-12-07 at 00:40 +0800, Jiang Liu wrote:
>> On 12/04/2012 08:10 AM, Toshi Kani wrote:
>>> On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
 On 2012/11/30 6:27, Toshi Kani wrote:
> On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
>  :
> Yes, the framework should allow such future work.  I also think that the
> framework itself should be independent from such ACPI issue.  Ideally,
> it should be able to support non-ACPI platforms.
 The same point here. The ACPI based hotplug framework is designed as:
 1) an ACPI based hotplug slot driver to handle platform specific logic.
Platform may provide platform specific slot drivers to discover, manage
hotplug slots. We have provided a default implementation of slot driver
according to the ACPI spec.
>>>
>>> The ACPI spec does not define that _EJ0 is required to receive a hot-add
>>> request, i.e. bus/device check.  This is a major issue.  Since Windows
>>> only supports hot-add, I think there are platforms that only support
>>> hot-add today.
>>>
 2) an ACPI based hotplug manager driver, which is a platform independent
driver and manages all hotplug slot created by the slot driver.
>>>
>>> It is surely impressive work, but I think is is a bit overdoing.  I
>>> expect hot-pluggable servers come with management console and/or GUI
>>> where a user can manage hardware units and initiate hot-plug operations.
>>> I do not think the kernel needs to step into such area since it tends to
>>> be platform-specific. 
>> One of the major usages of this feature is for testing. 
>> It will be hard for OSVs and OEMs to verify hotplug functionalities if it 
>> could
>> only be tested by physical hotplug or through management console. So to pave 
>> the
>> way for hotplug, we need to provide a mechanism for OEMs and OSVs to execute 
>> auto stress tests for hotplug functionalities.
> 
> Yes, but such OS->FW interface is platform-specific.  Some platforms use
> IPMI for the OS to communicate with the management console.  In this
> case, an OEM-specific command can be used to request a hotplug through
> IPMI.  Some platforms may also support test programs to run on the
> management console for validations.
> 
> For early development testing, Yinghai's SCI emulation patch can be used
> to emulate hotplug events from the OS.  It would be part of the kernel
> debugging features once this patch is accepted. 
Hi Toshi,
ACPI 5.0 has provided some mechanism to normalize the way to issue
RAS related requests to firmware. I hope ACPI 5.x will define some standardized
ways based on the PCC defined in 5.0. If needed, we may provide platform
specific methods for them too.
Regards!
Gerry

> 
>  
 We haven't gone further enough to provide an ACPI independent hotplug 
 framework
 because we only have experience with x86 and Itanium, both are ACPI based.
 We may try to implement an ACPI independent hotplug framework by pushing 
 all
 ACPI specific logic into the slot driver, I think it's doable. But we need
 suggestions from experts of other architectures, such as SPARC and Power.
 But seems Power already have some sorts of hotplug framework, right?
>>>
>>> I do not know about the Linux hot-plug support on other architectures.
>>> PA-RISC SuperDome also supports Node hot-plug, but it is not supported
>>> by Linux.  Since ARM is getting used by servers, I would not surprise if
>>> there will be an ARM based server with hot-plug support in future.
>> Seems ARM is on the way to adopt ACPI, so may be we could support ARM servers
>> in the future.
> 
> That's good to know.
> 
>  :
>> So in our framework, we have an option to relay hotplug event from 
>> firmware
>> to userspace, so the userspace has a chance to reject the hotplug 
>> operations
>> if it may cause unacceptable disturbance to userspace services.
>
> I think validation from user-space is necessary for deleting I/O
> devices.  For CPU and memory, the kernel check works fine.
 Agreed. But we may need help from userspace to handle cgroup/cpuset/cpuisol
 etc for cpu and memory hot-removal. Especially for telecom applications, 
 they
 have strong dependency on cgroup/cpuisol to guarantee latency.
>>>
>>> I have not looked at the code, but isn't these cpu attributes managed in
>>> the kernel?
>> Some Telecom applications want to run in an deterministic environment, so 
>> they
>> depend on cpuisol/cpuset to provide such an environment. If hotplug event 
>> happens,
>> these Telecom application should be notified so they have a chance to 
>> redistribute
>> the workload.
> 
> I agree that we need to generate an event that 

Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-13 Thread Toshi Kani
On Tue, 2012-12-11 at 22:34 +0800, Jiang Liu wrote:
> On 12/08/2012 09:08 AM, Toshi Kani wrote:
> > On Fri, 2012-12-07 at 13:57 +0800, Jiang Liu wrote:
> >> On 2012-12-7 10:57, Toshi Kani wrote:
> >>> On Fri, 2012-12-07 at 00:40 +0800, Jiang Liu wrote:
>  On 12/04/2012 08:10 AM, Toshi Kani wrote:
> > On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
> >> On 2012/11/30 6:27, Toshi Kani wrote:
> >>> On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
 :
> >>> Yes, the framework should allow such future work.  I also think that the
> >>> framework itself should be independent from such ACPI issue.  Ideally,
> >>> it should be able to support non-ACPI platforms.
> >> The same point here. The ACPI based hotplug framework is designed as:
> >> 1) an ACPI based hotplug slot driver to handle platform specific logic.
> >>Platform may provide platform specific slot drivers to discover, manage
> >>hotplug slots. We have provided a default implementation of slot driver
> >>according to the ACPI spec.
> > 
> > The ACPI spec does not define that _EJ0 is required to receive a hot-add
> > request, i.e. bus/device check.  This is a major issue.  Since Windows
> > only supports hot-add, I think there are platforms that only support
> > hot-add today.
> > 
> >> 2) an ACPI based hotplug manager driver, which is a platform independent
> >>driver and manages all hotplug slot created by the slot driver.
> > 
> > It is surely impressive work, but I think is is a bit overdoing.  I
> > expect hot-pluggable servers come with management console and/or GUI
> > where a user can manage hardware units and initiate hot-plug operations.
> > I do not think the kernel needs to step into such area since it tends to
> > be platform-specific. 
> One of the major usages of this feature is for testing. 
> It will be hard for OSVs and OEMs to verify hotplug functionalities if it 
> could
> only be tested by physical hotplug or through management console. So to pave 
> the
> way for hotplug, we need to provide a mechanism for OEMs and OSVs to execute 
> auto stress tests for hotplug functionalities.

Yes, but such OS->FW interface is platform-specific.  Some platforms use
IPMI for the OS to communicate with the management console.  In this
case, an OEM-specific command can be used to request a hotplug through
IPMI.  Some platforms may also support test programs to run on the
management console for validations.

For early development testing, Yinghai's SCI emulation patch can be used
to emulate hotplug events from the OS.  It would be part of the kernel
debugging features once this patch is accepted. 

 
> >> We haven't gone further enough to provide an ACPI independent hotplug 
> >> framework
> >> because we only have experience with x86 and Itanium, both are ACPI based.
> >> We may try to implement an ACPI independent hotplug framework by pushing 
> >> all
> >> ACPI specific logic into the slot driver, I think it's doable. But we need
> >> suggestions from experts of other architectures, such as SPARC and Power.
> >> But seems Power already have some sorts of hotplug framework, right?
> > 
> > I do not know about the Linux hot-plug support on other architectures.
> > PA-RISC SuperDome also supports Node hot-plug, but it is not supported
> > by Linux.  Since ARM is getting used by servers, I would not surprise if
> > there will be an ARM based server with hot-plug support in future.
> Seems ARM is on the way to adopt ACPI, so may be we could support ARM servers
> in the future.

That's good to know.

 :
>  So in our framework, we have an option to relay hotplug event from 
>  firmware
>  to userspace, so the userspace has a chance to reject the hotplug 
>  operations
>  if it may cause unacceptable disturbance to userspace services.
> >>>
> >>> I think validation from user-space is necessary for deleting I/O
> >>> devices.  For CPU and memory, the kernel check works fine.
> >> Agreed. But we may need help from userspace to handle cgroup/cpuset/cpuisol
> >> etc for cpu and memory hot-removal. Especially for telecom applications, 
> >> they
> >> have strong dependency on cgroup/cpuisol to guarantee latency.
> > 
> > I have not looked at the code, but isn't these cpu attributes managed in
> > the kernel?
> Some Telecom applications want to run in an deterministic environment, so they
> depend on cpuisol/cpuset to provide such an environment. If hotplug event 
> happens,
> these Telecom application should be notified so they have a chance to 
> redistribute
> the workload.

I agree that we need to generate an event that can be subscribed by
those applications, so that they can react quickly on the change.

Thanks,
-Toshi


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-13 Thread Toshi Kani
On Tue, 2012-12-11 at 22:34 +0800, Jiang Liu wrote:
 On 12/08/2012 09:08 AM, Toshi Kani wrote:
  On Fri, 2012-12-07 at 13:57 +0800, Jiang Liu wrote:
  On 2012-12-7 10:57, Toshi Kani wrote:
  On Fri, 2012-12-07 at 00:40 +0800, Jiang Liu wrote:
  On 12/04/2012 08:10 AM, Toshi Kani wrote:
  On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
  On 2012/11/30 6:27, Toshi Kani wrote:
  On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
 :
  Yes, the framework should allow such future work.  I also think that the
  framework itself should be independent from such ACPI issue.  Ideally,
  it should be able to support non-ACPI platforms.
  The same point here. The ACPI based hotplug framework is designed as:
  1) an ACPI based hotplug slot driver to handle platform specific logic.
 Platform may provide platform specific slot drivers to discover, manage
 hotplug slots. We have provided a default implementation of slot driver
 according to the ACPI spec.
  
  The ACPI spec does not define that _EJ0 is required to receive a hot-add
  request, i.e. bus/device check.  This is a major issue.  Since Windows
  only supports hot-add, I think there are platforms that only support
  hot-add today.
  
  2) an ACPI based hotplug manager driver, which is a platform independent
 driver and manages all hotplug slot created by the slot driver.
  
  It is surely impressive work, but I think is is a bit overdoing.  I
  expect hot-pluggable servers come with management console and/or GUI
  where a user can manage hardware units and initiate hot-plug operations.
  I do not think the kernel needs to step into such area since it tends to
  be platform-specific. 
 One of the major usages of this feature is for testing. 
 It will be hard for OSVs and OEMs to verify hotplug functionalities if it 
 could
 only be tested by physical hotplug or through management console. So to pave 
 the
 way for hotplug, we need to provide a mechanism for OEMs and OSVs to execute 
 auto stress tests for hotplug functionalities.

Yes, but such OS-FW interface is platform-specific.  Some platforms use
IPMI for the OS to communicate with the management console.  In this
case, an OEM-specific command can be used to request a hotplug through
IPMI.  Some platforms may also support test programs to run on the
management console for validations.

For early development testing, Yinghai's SCI emulation patch can be used
to emulate hotplug events from the OS.  It would be part of the kernel
debugging features once this patch is accepted. 

 
  We haven't gone further enough to provide an ACPI independent hotplug 
  framework
  because we only have experience with x86 and Itanium, both are ACPI based.
  We may try to implement an ACPI independent hotplug framework by pushing 
  all
  ACPI specific logic into the slot driver, I think it's doable. But we need
  suggestions from experts of other architectures, such as SPARC and Power.
  But seems Power already have some sorts of hotplug framework, right?
  
  I do not know about the Linux hot-plug support on other architectures.
  PA-RISC SuperDome also supports Node hot-plug, but it is not supported
  by Linux.  Since ARM is getting used by servers, I would not surprise if
  there will be an ARM based server with hot-plug support in future.
 Seems ARM is on the way to adopt ACPI, so may be we could support ARM servers
 in the future.

That's good to know.

 :
  So in our framework, we have an option to relay hotplug event from 
  firmware
  to userspace, so the userspace has a chance to reject the hotplug 
  operations
  if it may cause unacceptable disturbance to userspace services.
 
  I think validation from user-space is necessary for deleting I/O
  devices.  For CPU and memory, the kernel check works fine.
  Agreed. But we may need help from userspace to handle cgroup/cpuset/cpuisol
  etc for cpu and memory hot-removal. Especially for telecom applications, 
  they
  have strong dependency on cgroup/cpuisol to guarantee latency.
  
  I have not looked at the code, but isn't these cpu attributes managed in
  the kernel?
 Some Telecom applications want to run in an deterministic environment, so they
 depend on cpuisol/cpuset to provide such an environment. If hotplug event 
 happens,
 these Telecom application should be notified so they have a chance to 
 redistribute
 the workload.

I agree that we need to generate an event that can be subscribed by
those applications, so that they can react quickly on the change.

Thanks,
-Toshi


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-13 Thread Jiang Liu
On 12/13/2012 10:42 PM, Toshi Kani wrote:
 On Tue, 2012-12-11 at 22:34 +0800, Jiang Liu wrote:
 On 12/08/2012 09:08 AM, Toshi Kani wrote:
 On Fri, 2012-12-07 at 13:57 +0800, Jiang Liu wrote:
 On 2012-12-7 10:57, Toshi Kani wrote:
 On Fri, 2012-12-07 at 00:40 +0800, Jiang Liu wrote:
 On 12/04/2012 08:10 AM, Toshi Kani wrote:
 On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
 On 2012/11/30 6:27, Toshi Kani wrote:
 On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
  :
 Yes, the framework should allow such future work.  I also think that the
 framework itself should be independent from such ACPI issue.  Ideally,
 it should be able to support non-ACPI platforms.
 The same point here. The ACPI based hotplug framework is designed as:
 1) an ACPI based hotplug slot driver to handle platform specific logic.
Platform may provide platform specific slot drivers to discover, manage
hotplug slots. We have provided a default implementation of slot driver
according to the ACPI spec.

 The ACPI spec does not define that _EJ0 is required to receive a hot-add
 request, i.e. bus/device check.  This is a major issue.  Since Windows
 only supports hot-add, I think there are platforms that only support
 hot-add today.

 2) an ACPI based hotplug manager driver, which is a platform independent
driver and manages all hotplug slot created by the slot driver.

 It is surely impressive work, but I think is is a bit overdoing.  I
 expect hot-pluggable servers come with management console and/or GUI
 where a user can manage hardware units and initiate hot-plug operations.
 I do not think the kernel needs to step into such area since it tends to
 be platform-specific. 
 One of the major usages of this feature is for testing. 
 It will be hard for OSVs and OEMs to verify hotplug functionalities if it 
 could
 only be tested by physical hotplug or through management console. So to pave 
 the
 way for hotplug, we need to provide a mechanism for OEMs and OSVs to execute 
 auto stress tests for hotplug functionalities.
 
 Yes, but such OS-FW interface is platform-specific.  Some platforms use
 IPMI for the OS to communicate with the management console.  In this
 case, an OEM-specific command can be used to request a hotplug through
 IPMI.  Some platforms may also support test programs to run on the
 management console for validations.
 
 For early development testing, Yinghai's SCI emulation patch can be used
 to emulate hotplug events from the OS.  It would be part of the kernel
 debugging features once this patch is accepted. 
Hi Toshi,
ACPI 5.0 has provided some mechanism to normalize the way to issue
RAS related requests to firmware. I hope ACPI 5.x will define some standardized
ways based on the PCC defined in 5.0. If needed, we may provide platform
specific methods for them too.
Regards!
Gerry

 
  
 We haven't gone further enough to provide an ACPI independent hotplug 
 framework
 because we only have experience with x86 and Itanium, both are ACPI based.
 We may try to implement an ACPI independent hotplug framework by pushing 
 all
 ACPI specific logic into the slot driver, I think it's doable. But we need
 suggestions from experts of other architectures, such as SPARC and Power.
 But seems Power already have some sorts of hotplug framework, right?

 I do not know about the Linux hot-plug support on other architectures.
 PA-RISC SuperDome also supports Node hot-plug, but it is not supported
 by Linux.  Since ARM is getting used by servers, I would not surprise if
 there will be an ARM based server with hot-plug support in future.
 Seems ARM is on the way to adopt ACPI, so may be we could support ARM servers
 in the future.
 
 That's good to know.
 
  :
 So in our framework, we have an option to relay hotplug event from 
 firmware
 to userspace, so the userspace has a chance to reject the hotplug 
 operations
 if it may cause unacceptable disturbance to userspace services.

 I think validation from user-space is necessary for deleting I/O
 devices.  For CPU and memory, the kernel check works fine.
 Agreed. But we may need help from userspace to handle cgroup/cpuset/cpuisol
 etc for cpu and memory hot-removal. Especially for telecom applications, 
 they
 have strong dependency on cgroup/cpuisol to guarantee latency.

 I have not looked at the code, but isn't these cpu attributes managed in
 the kernel?
 Some Telecom applications want to run in an deterministic environment, so 
 they
 depend on cpuisol/cpuset to provide such an environment. If hotplug event 
 happens,
 these Telecom application should be notified so they have a chance to 
 redistribute
 the workload.
 
 I agree that we need to generate an event that can be subscribed by
 those applications, so that they can react quickly on the change.
 
 Thanks,
 -Toshi
 
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-11 Thread Jiang Liu
On 12/08/2012 09:08 AM, Toshi Kani wrote:
> On Fri, 2012-12-07 at 13:57 +0800, Jiang Liu wrote:
>> On 2012-12-7 10:57, Toshi Kani wrote:
>>> On Fri, 2012-12-07 at 00:40 +0800, Jiang Liu wrote:
 On 12/04/2012 08:10 AM, Toshi Kani wrote:
> On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
>> On 2012/11/30 6:27, Toshi Kani wrote:
>>> On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
>  :
>
> If I read the code right, the framework calls ACPI drivers differently
> at boot-time and hot-add as follows.  That is, the new entry points are
> called at hot-add only, but .add() is called at both cases.  This
> requires .add() to work differently.
>
> Boot: .add()
> Hot-Add : .add(), .pre_configure(), configure(), etc.
>
> I think the boot-time and hot-add initialization should be done
> consistently.  While there is difficulty with the current boot sequence,
> the framework should be designed to allow them consistent, not make them
> diverged.
 Hi Toshi,
We have separated hotplug operations from driver binding/unbinding 
 interface
 due to following considerations.
 1) Physical CPU and memory devices are initialized/used before the ACPI 
 subsystem
is initialized. So under normal case, .add() of processor and 
 acpi_memhotplug only
figures out information about device already in working state instead 
 of starting
the device.
>>>
>>> I agree that the current boot sequence is not very hot-plug friendly...
>>>
 2) It's impossible to rmmod the processor and acpi_memhotplug driver at 
 runtime 
if .remove() of CPU and memory drivers do really remove the CPU/memory 
 device
from the system. And the ACPI processor driver also implements CPU PM 
 funcitonality
other than hotplug.
>>>
>>> Agreed.
>>>
 And recently Rafael has mentioned that he has a long term view to get rid 
 of the
 concept of "ACPI device". If that happens, we could easily move the hotplug
 logic from ACPI device drivers into the hotplug framework if the hotplug 
 logic
 is separated from the .add()/.remove() callbacks. Actually we could even 
 move all
 hotplug only logic into the hotplug framework and don't rely on any ACPI 
 device
 driver any more. So we could get rid of all these messy things. We could 
 achieve
 that by:
 1) moving code shared by ACPI device drivers and the hotplug framework 
 into the core.
 2) moving hotplug only code to the framework.
>>>
>>> Yes, the framework should allow such future work.  I also think that the
>>> framework itself should be independent from such ACPI issue.  Ideally,
>>> it should be able to support non-ACPI platforms.
>> The same point here. The ACPI based hotplug framework is designed as:
>> 1) an ACPI based hotplug slot driver to handle platform specific logic.
>>Platform may provide platform specific slot drivers to discover, manage
>>hotplug slots. We have provided a default implementation of slot driver
>>according to the ACPI spec.
> 
> The ACPI spec does not define that _EJ0 is required to receive a hot-add
> request, i.e. bus/device check.  This is a major issue.  Since Windows
> only supports hot-add, I think there are platforms that only support
> hot-add today.
> 
>> 2) an ACPI based hotplug manager driver, which is a platform independent
>>driver and manages all hotplug slot created by the slot driver.
> 
> It is surely impressive work, but I think is is a bit overdoing.  I
> expect hot-pluggable servers come with management console and/or GUI
> where a user can manage hardware units and initiate hot-plug operations.
> I do not think the kernel needs to step into such area since it tends to
> be platform-specific. 
One of the major usages of this feature is for testing. 
It will be hard for OSVs and OEMs to verify hotplug functionalities if it could
only be tested by physical hotplug or through management console. So to pave the
way for hotplug, we need to provide a mechanism for OEMs and OSVs to execute 
auto stress tests for hotplug functionalities.

> 
>> We haven't gone further enough to provide an ACPI independent hotplug 
>> framework
>> because we only have experience with x86 and Itanium, both are ACPI based.
>> We may try to implement an ACPI independent hotplug framework by pushing all
>> ACPI specific logic into the slot driver, I think it's doable. But we need
>> suggestions from experts of other architectures, such as SPARC and Power.
>> But seems Power already have some sorts of hotplug framework, right?
> 
> I do not know about the Linux hot-plug support on other architectures.
> PA-RISC SuperDome also supports Node hot-plug, but it is not supported
> by Linux.  Since ARM is getting used by servers, I would not surprise if
> there will be an ARM based server with hot-plug support in future.
Seems ARM is on the way to adopt ACPI, so 

Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-11 Thread Jiang Liu
On 12/08/2012 09:08 AM, Toshi Kani wrote:
 On Fri, 2012-12-07 at 13:57 +0800, Jiang Liu wrote:
 On 2012-12-7 10:57, Toshi Kani wrote:
 On Fri, 2012-12-07 at 00:40 +0800, Jiang Liu wrote:
 On 12/04/2012 08:10 AM, Toshi Kani wrote:
 On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
 On 2012/11/30 6:27, Toshi Kani wrote:
 On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
  :

 If I read the code right, the framework calls ACPI drivers differently
 at boot-time and hot-add as follows.  That is, the new entry points are
 called at hot-add only, but .add() is called at both cases.  This
 requires .add() to work differently.

 Boot: .add()
 Hot-Add : .add(), .pre_configure(), configure(), etc.

 I think the boot-time and hot-add initialization should be done
 consistently.  While there is difficulty with the current boot sequence,
 the framework should be designed to allow them consistent, not make them
 diverged.
 Hi Toshi,
We have separated hotplug operations from driver binding/unbinding 
 interface
 due to following considerations.
 1) Physical CPU and memory devices are initialized/used before the ACPI 
 subsystem
is initialized. So under normal case, .add() of processor and 
 acpi_memhotplug only
figures out information about device already in working state instead 
 of starting
the device.

 I agree that the current boot sequence is not very hot-plug friendly...

 2) It's impossible to rmmod the processor and acpi_memhotplug driver at 
 runtime 
if .remove() of CPU and memory drivers do really remove the CPU/memory 
 device
from the system. And the ACPI processor driver also implements CPU PM 
 funcitonality
other than hotplug.

 Agreed.

 And recently Rafael has mentioned that he has a long term view to get rid 
 of the
 concept of ACPI device. If that happens, we could easily move the hotplug
 logic from ACPI device drivers into the hotplug framework if the hotplug 
 logic
 is separated from the .add()/.remove() callbacks. Actually we could even 
 move all
 hotplug only logic into the hotplug framework and don't rely on any ACPI 
 device
 driver any more. So we could get rid of all these messy things. We could 
 achieve
 that by:
 1) moving code shared by ACPI device drivers and the hotplug framework 
 into the core.
 2) moving hotplug only code to the framework.

 Yes, the framework should allow such future work.  I also think that the
 framework itself should be independent from such ACPI issue.  Ideally,
 it should be able to support non-ACPI platforms.
 The same point here. The ACPI based hotplug framework is designed as:
 1) an ACPI based hotplug slot driver to handle platform specific logic.
Platform may provide platform specific slot drivers to discover, manage
hotplug slots. We have provided a default implementation of slot driver
according to the ACPI spec.
 
 The ACPI spec does not define that _EJ0 is required to receive a hot-add
 request, i.e. bus/device check.  This is a major issue.  Since Windows
 only supports hot-add, I think there are platforms that only support
 hot-add today.
 
 2) an ACPI based hotplug manager driver, which is a platform independent
driver and manages all hotplug slot created by the slot driver.
 
 It is surely impressive work, but I think is is a bit overdoing.  I
 expect hot-pluggable servers come with management console and/or GUI
 where a user can manage hardware units and initiate hot-plug operations.
 I do not think the kernel needs to step into such area since it tends to
 be platform-specific. 
One of the major usages of this feature is for testing. 
It will be hard for OSVs and OEMs to verify hotplug functionalities if it could
only be tested by physical hotplug or through management console. So to pave the
way for hotplug, we need to provide a mechanism for OEMs and OSVs to execute 
auto stress tests for hotplug functionalities.

 
 We haven't gone further enough to provide an ACPI independent hotplug 
 framework
 because we only have experience with x86 and Itanium, both are ACPI based.
 We may try to implement an ACPI independent hotplug framework by pushing all
 ACPI specific logic into the slot driver, I think it's doable. But we need
 suggestions from experts of other architectures, such as SPARC and Power.
 But seems Power already have some sorts of hotplug framework, right?
 
 I do not know about the Linux hot-plug support on other architectures.
 PA-RISC SuperDome also supports Node hot-plug, but it is not supported
 by Linux.  Since ARM is getting used by servers, I would not surprise if
 there will be an ARM based server with hot-plug support in future.
Seems ARM is on the way to adopt ACPI, so may be we could support ARM servers
in the future.

 
 Hi Rafael, what's your thoughts here?


 1. Validate phase - Verify if the request is a supported operation.  
 All
 known restrictions are verified at this phase.  For instance, if a
 hot-remove request involves kernel memory, it is failed in this 

Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-07 Thread Toshi Kani
On Fri, 2012-12-07 at 13:57 +0800, Jiang Liu wrote:
> On 2012-12-7 10:57, Toshi Kani wrote:
> > On Fri, 2012-12-07 at 00:40 +0800, Jiang Liu wrote:
> >> On 12/04/2012 08:10 AM, Toshi Kani wrote:
> >>> On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
>  On 2012/11/30 6:27, Toshi Kani wrote:
> > On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
 :
> >>>
> >>> If I read the code right, the framework calls ACPI drivers differently
> >>> at boot-time and hot-add as follows.  That is, the new entry points are
> >>> called at hot-add only, but .add() is called at both cases.  This
> >>> requires .add() to work differently.
> >>>
> >>> Boot: .add()
> >>> Hot-Add : .add(), .pre_configure(), configure(), etc.
> >>>
> >>> I think the boot-time and hot-add initialization should be done
> >>> consistently.  While there is difficulty with the current boot sequence,
> >>> the framework should be designed to allow them consistent, not make them
> >>> diverged.
> >> Hi Toshi,
> >>We have separated hotplug operations from driver binding/unbinding 
> >> interface
> >> due to following considerations.
> >> 1) Physical CPU and memory devices are initialized/used before the ACPI 
> >> subsystem
> >>is initialized. So under normal case, .add() of processor and 
> >> acpi_memhotplug only
> >>figures out information about device already in working state instead 
> >> of starting
> >>the device.
> > 
> > I agree that the current boot sequence is not very hot-plug friendly...
> > 
> >> 2) It's impossible to rmmod the processor and acpi_memhotplug driver at 
> >> runtime 
> >>if .remove() of CPU and memory drivers do really remove the CPU/memory 
> >> device
> >>from the system. And the ACPI processor driver also implements CPU PM 
> >> funcitonality
> >>other than hotplug.
> > 
> > Agreed.
> > 
> >> And recently Rafael has mentioned that he has a long term view to get rid 
> >> of the
> >> concept of "ACPI device". If that happens, we could easily move the hotplug
> >> logic from ACPI device drivers into the hotplug framework if the hotplug 
> >> logic
> >> is separated from the .add()/.remove() callbacks. Actually we could even 
> >> move all
> >> hotplug only logic into the hotplug framework and don't rely on any ACPI 
> >> device
> >> driver any more. So we could get rid of all these messy things. We could 
> >> achieve
> >> that by:
> >> 1) moving code shared by ACPI device drivers and the hotplug framework 
> >> into the core.
> >> 2) moving hotplug only code to the framework.
> > 
> > Yes, the framework should allow such future work.  I also think that the
> > framework itself should be independent from such ACPI issue.  Ideally,
> > it should be able to support non-ACPI platforms.
> The same point here. The ACPI based hotplug framework is designed as:
> 1) an ACPI based hotplug slot driver to handle platform specific logic.
>Platform may provide platform specific slot drivers to discover, manage
>hotplug slots. We have provided a default implementation of slot driver
>according to the ACPI spec.

The ACPI spec does not define that _EJ0 is required to receive a hot-add
request, i.e. bus/device check.  This is a major issue.  Since Windows
only supports hot-add, I think there are platforms that only support
hot-add today.

> 2) an ACPI based hotplug manager driver, which is a platform independent
>driver and manages all hotplug slot created by the slot driver.

It is surely impressive work, but I think is is a bit overdoing.  I
expect hot-pluggable servers come with management console and/or GUI
where a user can manage hardware units and initiate hot-plug operations.
I do not think the kernel needs to step into such area since it tends to
be platform-specific. 

> We haven't gone further enough to provide an ACPI independent hotplug 
> framework
> because we only have experience with x86 and Itanium, both are ACPI based.
> We may try to implement an ACPI independent hotplug framework by pushing all
> ACPI specific logic into the slot driver, I think it's doable. But we need
> suggestions from experts of other architectures, such as SPARC and Power.
> But seems Power already have some sorts of hotplug framework, right?

I do not know about the Linux hot-plug support on other architectures.
PA-RISC SuperDome also supports Node hot-plug, but it is not supported
by Linux.  Since ARM is getting used by servers, I would not surprise if
there will be an ARM based server with hot-plug support in future.

> >> Hi Rafael, what's your thoughts here?
> >>
> >>>
> >>> 1. Validate phase - Verify if the request is a supported operation.  
> >>> All
> >>> known restrictions are verified at this phase.  For instance, if a
> >>> hot-remove request involves kernel memory, it is failed in this phase.
> >>> Since this phase makes no change, no rollback is necessary to fail. 
> >>
> >> Yes, we have done this in acpihp_drv_pre_execute, and check following 
> 

Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-07 Thread Toshi Kani
On Fri, 2012-12-07 at 13:57 +0800, Jiang Liu wrote:
 On 2012-12-7 10:57, Toshi Kani wrote:
  On Fri, 2012-12-07 at 00:40 +0800, Jiang Liu wrote:
  On 12/04/2012 08:10 AM, Toshi Kani wrote:
  On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
  On 2012/11/30 6:27, Toshi Kani wrote:
  On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
 :
 
  If I read the code right, the framework calls ACPI drivers differently
  at boot-time and hot-add as follows.  That is, the new entry points are
  called at hot-add only, but .add() is called at both cases.  This
  requires .add() to work differently.
 
  Boot: .add()
  Hot-Add : .add(), .pre_configure(), configure(), etc.
 
  I think the boot-time and hot-add initialization should be done
  consistently.  While there is difficulty with the current boot sequence,
  the framework should be designed to allow them consistent, not make them
  diverged.
  Hi Toshi,
 We have separated hotplug operations from driver binding/unbinding 
  interface
  due to following considerations.
  1) Physical CPU and memory devices are initialized/used before the ACPI 
  subsystem
 is initialized. So under normal case, .add() of processor and 
  acpi_memhotplug only
 figures out information about device already in working state instead 
  of starting
 the device.
  
  I agree that the current boot sequence is not very hot-plug friendly...
  
  2) It's impossible to rmmod the processor and acpi_memhotplug driver at 
  runtime 
 if .remove() of CPU and memory drivers do really remove the CPU/memory 
  device
 from the system. And the ACPI processor driver also implements CPU PM 
  funcitonality
 other than hotplug.
  
  Agreed.
  
  And recently Rafael has mentioned that he has a long term view to get rid 
  of the
  concept of ACPI device. If that happens, we could easily move the hotplug
  logic from ACPI device drivers into the hotplug framework if the hotplug 
  logic
  is separated from the .add()/.remove() callbacks. Actually we could even 
  move all
  hotplug only logic into the hotplug framework and don't rely on any ACPI 
  device
  driver any more. So we could get rid of all these messy things. We could 
  achieve
  that by:
  1) moving code shared by ACPI device drivers and the hotplug framework 
  into the core.
  2) moving hotplug only code to the framework.
  
  Yes, the framework should allow such future work.  I also think that the
  framework itself should be independent from such ACPI issue.  Ideally,
  it should be able to support non-ACPI platforms.
 The same point here. The ACPI based hotplug framework is designed as:
 1) an ACPI based hotplug slot driver to handle platform specific logic.
Platform may provide platform specific slot drivers to discover, manage
hotplug slots. We have provided a default implementation of slot driver
according to the ACPI spec.

The ACPI spec does not define that _EJ0 is required to receive a hot-add
request, i.e. bus/device check.  This is a major issue.  Since Windows
only supports hot-add, I think there are platforms that only support
hot-add today.

 2) an ACPI based hotplug manager driver, which is a platform independent
driver and manages all hotplug slot created by the slot driver.

It is surely impressive work, but I think is is a bit overdoing.  I
expect hot-pluggable servers come with management console and/or GUI
where a user can manage hardware units and initiate hot-plug operations.
I do not think the kernel needs to step into such area since it tends to
be platform-specific. 

 We haven't gone further enough to provide an ACPI independent hotplug 
 framework
 because we only have experience with x86 and Itanium, both are ACPI based.
 We may try to implement an ACPI independent hotplug framework by pushing all
 ACPI specific logic into the slot driver, I think it's doable. But we need
 suggestions from experts of other architectures, such as SPARC and Power.
 But seems Power already have some sorts of hotplug framework, right?

I do not know about the Linux hot-plug support on other architectures.
PA-RISC SuperDome also supports Node hot-plug, but it is not supported
by Linux.  Since ARM is getting used by servers, I would not surprise if
there will be an ARM based server with hot-plug support in future.

  Hi Rafael, what's your thoughts here?
 
 
  1. Validate phase - Verify if the request is a supported operation.  
  All
  known restrictions are verified at this phase.  For instance, if a
  hot-remove request involves kernel memory, it is failed in this phase.
  Since this phase makes no change, no rollback is necessary to fail. 
 
  Yes, we have done this in acpihp_drv_pre_execute, and check following 
  things:
 
  1) Hot-plugble or not. the instance kernel memory you mentioned is 
  also checked
 when memory device remove;
 
  Agreed.
 
  2) Dependency check involved. For instance, if hot-add a memory device,
 processor should be added first, otherwise it's not 

Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-06 Thread Jiang Liu
On 2012-12-7 10:57, Toshi Kani wrote:
> On Fri, 2012-12-07 at 00:40 +0800, Jiang Liu wrote:
>> On 12/04/2012 08:10 AM, Toshi Kani wrote:
>>> On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
 On 2012/11/30 6:27, Toshi Kani wrote:
> On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
>> On 2012/11/29 2:41, Toshi Kani wrote:
>>> On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
 On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
 As you may know, the ACPI based hotplug framework we are working on 
 already addressed
 this problem, and the way we slove this problem is a bit like yours.

 We introduce hp_ops in struct acpi_device_ops:
 struct acpi_device_ops {
acpi_op_add add;
acpi_op_remove remove;
acpi_op_start start;
acpi_op_bind bind;
acpi_op_unbind unbind;
acpi_op_notify notify;
 #ifdef CONFIG_ACPI_HOTPLUG
struct acpihp_dev_ops *hp_ops;
 #endif /* CONFIG_ACPI_HOTPLUG */
 };

 in hp_ops, we divide the prepare_remove into six small steps, that is:
 1) pre_release(): optional step to mark device going to be removed/busy
 2) release(): reclaim device from running system
 3) post_release(): rollback if cancelled by user or error happened
 4) pre_unconfigure(): optional step to solve possible dependency issue
 5) unconfigure(): remove devices from running system
 6) post_unconfigure(): free resources used by devices

 In this way, we can easily rollback if error happens.
 How do you think of this solution, any suggestion ? I think we can 
 achieve
 a better way for sharing ideas. :)
>>>
>>> Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
>>> have not looked at all your changes yet..), but in my mind, a hot-plug
>>> operation should be composed with the following 3 phases.
>>
>> Good idea ! we also implement a hot-plug operation in 3 phases:
>> 1) acpihp_drv_pre_execute
>> 2) acpihp_drv_execute
>> 3) acpihp_drv_post_execute
>> you may refer to :
>> https://lkml.org/lkml/2012/11/4/79
>
> Great.  Yes, I will take a look.

 Thanks, any comments are welcomed :)
>>>
>>> If I read the code right, the framework calls ACPI drivers differently
>>> at boot-time and hot-add as follows.  That is, the new entry points are
>>> called at hot-add only, but .add() is called at both cases.  This
>>> requires .add() to work differently.
>>>
>>> Boot: .add()
>>> Hot-Add : .add(), .pre_configure(), configure(), etc.
>>>
>>> I think the boot-time and hot-add initialization should be done
>>> consistently.  While there is difficulty with the current boot sequence,
>>> the framework should be designed to allow them consistent, not make them
>>> diverged.
>> Hi Toshi,
>>  We have separated hotplug operations from driver binding/unbinding 
>> interface
>> due to following considerations.
>> 1) Physical CPU and memory devices are initialized/used before the ACPI 
>> subsystem
>>is initialized. So under normal case, .add() of processor and 
>> acpi_memhotplug only
>>figures out information about device already in working state instead of 
>> starting
>>the device.
> 
> I agree that the current boot sequence is not very hot-plug friendly...
> 
>> 2) It's impossible to rmmod the processor and acpi_memhotplug driver at 
>> runtime 
>>if .remove() of CPU and memory drivers do really remove the CPU/memory 
>> device
>>from the system. And the ACPI processor driver also implements CPU PM 
>> funcitonality
>>other than hotplug.
> 
> Agreed.
> 
>> And recently Rafael has mentioned that he has a long term view to get rid of 
>> the
>> concept of "ACPI device". If that happens, we could easily move the hotplug
>> logic from ACPI device drivers into the hotplug framework if the hotplug 
>> logic
>> is separated from the .add()/.remove() callbacks. Actually we could even 
>> move all
>> hotplug only logic into the hotplug framework and don't rely on any ACPI 
>> device
>> driver any more. So we could get rid of all these messy things. We could 
>> achieve
>> that by:
>> 1) moving code shared by ACPI device drivers and the hotplug framework into 
>> the core.
>> 2) moving hotplug only code to the framework.
> 
> Yes, the framework should allow such future work.  I also think that the
> framework itself should be independent from such ACPI issue.  Ideally,
> it should be able to support non-ACPI platforms.
The same point here. The ACPI based hotplug framework is designed as:
1) an ACPI based hotplug slot driver to handle platform specific logic.
   Platform may provide platform specific slot drivers to discover, manage
   hotplug slots. We have provided a default implementation of slot driver
   according to the ACPI spec.
2) an 

Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-06 Thread Toshi Kani
On Fri, 2012-12-07 at 00:40 +0800, Jiang Liu wrote:
> On 12/04/2012 08:10 AM, Toshi Kani wrote:
> > On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
> >> On 2012/11/30 6:27, Toshi Kani wrote:
> >>> On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
>  On 2012/11/29 2:41, Toshi Kani wrote:
> > On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
> >> On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
> >> As you may know, the ACPI based hotplug framework we are working on 
> >> already addressed
> >> this problem, and the way we slove this problem is a bit like yours.
> >>
> >> We introduce hp_ops in struct acpi_device_ops:
> >> struct acpi_device_ops {
> >>acpi_op_add add;
> >>acpi_op_remove remove;
> >>acpi_op_start start;
> >>acpi_op_bind bind;
> >>acpi_op_unbind unbind;
> >>acpi_op_notify notify;
> >> #ifdef CONFIG_ACPI_HOTPLUG
> >>struct acpihp_dev_ops *hp_ops;
> >> #endif /* CONFIG_ACPI_HOTPLUG */
> >> };
> >>
> >> in hp_ops, we divide the prepare_remove into six small steps, that is:
> >> 1) pre_release(): optional step to mark device going to be removed/busy
> >> 2) release(): reclaim device from running system
> >> 3) post_release(): rollback if cancelled by user or error happened
> >> 4) pre_unconfigure(): optional step to solve possible dependency issue
> >> 5) unconfigure(): remove devices from running system
> >> 6) post_unconfigure(): free resources used by devices
> >>
> >> In this way, we can easily rollback if error happens.
> >> How do you think of this solution, any suggestion ? I think we can 
> >> achieve
> >> a better way for sharing ideas. :)
> >
> > Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
> > have not looked at all your changes yet..), but in my mind, a hot-plug
> > operation should be composed with the following 3 phases.
> 
>  Good idea ! we also implement a hot-plug operation in 3 phases:
>  1) acpihp_drv_pre_execute
>  2) acpihp_drv_execute
>  3) acpihp_drv_post_execute
>  you may refer to :
>  https://lkml.org/lkml/2012/11/4/79
> >>>
> >>> Great.  Yes, I will take a look.
> >>
> >> Thanks, any comments are welcomed :)
> > 
> > If I read the code right, the framework calls ACPI drivers differently
> > at boot-time and hot-add as follows.  That is, the new entry points are
> > called at hot-add only, but .add() is called at both cases.  This
> > requires .add() to work differently.
> > 
> > Boot: .add()
> > Hot-Add : .add(), .pre_configure(), configure(), etc.
> > 
> > I think the boot-time and hot-add initialization should be done
> > consistently.  While there is difficulty with the current boot sequence,
> > the framework should be designed to allow them consistent, not make them
> > diverged.
> Hi Toshi,
>   We have separated hotplug operations from driver binding/unbinding 
> interface
> due to following considerations.
> 1) Physical CPU and memory devices are initialized/used before the ACPI 
> subsystem
>is initialized. So under normal case, .add() of processor and 
> acpi_memhotplug only
>figures out information about device already in working state instead of 
> starting
>the device.

I agree that the current boot sequence is not very hot-plug friendly...

> 2) It's impossible to rmmod the processor and acpi_memhotplug driver at 
> runtime 
>if .remove() of CPU and memory drivers do really remove the CPU/memory 
> device
>from the system. And the ACPI processor driver also implements CPU PM 
> funcitonality
>other than hotplug.

Agreed.

> And recently Rafael has mentioned that he has a long term view to get rid of 
> the
> concept of "ACPI device". If that happens, we could easily move the hotplug
> logic from ACPI device drivers into the hotplug framework if the hotplug logic
> is separated from the .add()/.remove() callbacks. Actually we could even move 
> all
> hotplug only logic into the hotplug framework and don't rely on any ACPI 
> device
> driver any more. So we could get rid of all these messy things. We could 
> achieve
> that by:
> 1) moving code shared by ACPI device drivers and the hotplug framework into 
> the core.
> 2) moving hotplug only code to the framework.

Yes, the framework should allow such future work.  I also think that the
framework itself should be independent from such ACPI issue.  Ideally,
it should be able to support non-ACPI platforms.

> Hi Rafael, what's your thoughts here?
> 
> > 
> > 1. Validate phase - Verify if the request is a supported operation.  All
> > known restrictions are verified at this phase.  For instance, if a
> > hot-remove request involves kernel memory, it is failed in this phase.
> > Since this phase makes no change, no rollback is necessary to fail. 
> 
>  Yes, we have done this in 

Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-06 Thread Toshi Kani
On Fri, 2012-12-07 at 00:47 +0800, Jiang Liu wrote:
> On 12/05/2012 07:23 AM, Toshi Kani wrote:
> > On Tue, 2012-12-04 at 17:16 +0800, Hanjun Guo wrote:
> >> On 2012/12/4 8:10, Toshi Kani wrote:
> >>> On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
>  On 2012/11/30 6:27, Toshi Kani wrote:
> > On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
> >> On 2012/11/29 2:41, Toshi Kani wrote:
:
>  The ACPI specification provides _EDL method to
>  tell OS the eject device list, but still has no method to tell OS the 
>  add device
>  list now.
> >>>
> >>> Yes, but I do not think the OS needs special handling for add...
> >>
> >> Hmm, how about trigger a hot add operation by OS ? we have eject interface 
> >> for OS, but
> >> have no add interface now, do you think this feature is useful? If it is, 
> >> I think OS
> >> should analyze the dependency first and tell the user.
> > 
> > The OS can eject an ACPI device because a target device is owned by the
> > OS (i.e. enabled).  For hot-add, a target ACPI device is not owned by
> > the OS (i.e. disabled).  Therefore, the OS is not supposed to change its
> > state.  So, I do not think we should support a hot-add operation by the
> > OS.
> We depends on the firmware to provide an interface to actually hot-add the 
> device.
> The sequence is:
> 1) user trigger hot-add request by sysfs interfaces.
> 2) hotplug framework validates conditions for hot-adding (dependency)
> 3) hotplug framework invokes firmware interfaces to request a hot-adding 
> operation.
> 4) firmware sends an ACPI notificaitons after powering on/initializing the 
> device
> 5) OS adds the devices into running system.

Interesting...  In this sequence, I think FW must validate and check the
dependency before sending a SCI.  FW owns unassigned resources and is
responsible for the procedure necessary to enable resources on the
platform.  Such steps are basically platform-specific.  So, I do not
think the common OS code should step into such business.

Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-06 Thread Rafael J. Wysocki
On Friday, December 07, 2012 12:40:48 AM Jiang Liu wrote:
> On 12/04/2012 08:10 AM, Toshi Kani wrote:
> > On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
> >> On 2012/11/30 6:27, Toshi Kani wrote:
> >>> On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
>  On 2012/11/29 2:41, Toshi Kani wrote:
> > On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
> >> On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
> >> As you may know, the ACPI based hotplug framework we are working on 
> >> already addressed
> >> this problem, and the way we slove this problem is a bit like yours.
> >>
> >> We introduce hp_ops in struct acpi_device_ops:
> >> struct acpi_device_ops {
> >>acpi_op_add add;
> >>acpi_op_remove remove;
> >>acpi_op_start start;
> >>acpi_op_bind bind;
> >>acpi_op_unbind unbind;
> >>acpi_op_notify notify;
> >> #ifdef CONFIG_ACPI_HOTPLUG
> >>struct acpihp_dev_ops *hp_ops;
> >> #endif /* CONFIG_ACPI_HOTPLUG */
> >> };
> >>
> >> in hp_ops, we divide the prepare_remove into six small steps, that is:
> >> 1) pre_release(): optional step to mark device going to be removed/busy
> >> 2) release(): reclaim device from running system
> >> 3) post_release(): rollback if cancelled by user or error happened
> >> 4) pre_unconfigure(): optional step to solve possible dependency issue
> >> 5) unconfigure(): remove devices from running system
> >> 6) post_unconfigure(): free resources used by devices
> >>
> >> In this way, we can easily rollback if error happens.
> >> How do you think of this solution, any suggestion ? I think we can 
> >> achieve
> >> a better way for sharing ideas. :)
> >
> > Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
> > have not looked at all your changes yet..), but in my mind, a hot-plug
> > operation should be composed with the following 3 phases.
> 
>  Good idea ! we also implement a hot-plug operation in 3 phases:
>  1) acpihp_drv_pre_execute
>  2) acpihp_drv_execute
>  3) acpihp_drv_post_execute
>  you may refer to :
>  https://lkml.org/lkml/2012/11/4/79
> >>>
> >>> Great.  Yes, I will take a look.
> >>
> >> Thanks, any comments are welcomed :)
> > 
> > If I read the code right, the framework calls ACPI drivers differently
> > at boot-time and hot-add as follows.  That is, the new entry points are
> > called at hot-add only, but .add() is called at both cases.  This
> > requires .add() to work differently.
> > 
> > Boot: .add()
> > Hot-Add : .add(), .pre_configure(), configure(), etc.
> > 
> > I think the boot-time and hot-add initialization should be done
> > consistently.  While there is difficulty with the current boot sequence,
> > the framework should be designed to allow them consistent, not make them
> > diverged.
> Hi Toshi,
>   We have separated hotplug operations from driver binding/unbinding 
> interface
> due to following considerations.
> 1) Physical CPU and memory devices are initialized/used before the ACPI 
> subsystem
>is initialized. So under normal case, .add() of processor and 
> acpi_memhotplug only
>figures out information about device already in working state instead of 
> starting
>the device.
> 2) It's impossible to rmmod the processor and acpi_memhotplug driver at 
> runtime 
>if .remove() of CPU and memory drivers do really remove the CPU/memory 
> device
>from the system. And the ACPI processor driver also implements CPU PM 
> funcitonality
>other than hotplug.
> 
> And recently Rafael has mentioned that he has a long term view to get rid of 
> the
> concept of "ACPI device". If that happens, we could easily move the hotplug
> logic from ACPI device drivers into the hotplug framework if the hotplug logic
> is separated from the .add()/.remove() callbacks. Actually we could even move 
> all
> hotplug only logic into the hotplug framework and don't rely on any ACPI 
> device
> driver any more. So we could get rid of all these messy things. We could 
> achieve
> that by:
> 1) moving code shared by ACPI device drivers and the hotplug framework into 
> the core.
> 2) moving hotplug only code to the framework.
> 
> Hi Rafael, what's your thoughts here?

I think that sounds good at the high level, but we need to get there
incrementally.  This way it will be easier to maintain backwards
compatibility and follow the changes.  Also, it will be easier for all of
the interested people from different companies to participate in the
development and make sure that everyones needs are going to be met this
way.

At this point, I'd like to see where the Toshi Kani's proposal is going to
take us.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to 

Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-06 Thread Toshi Kani
On Fri, 2012-12-07 at 01:30 +0800, Jiang Liu wrote:
> On 12/07/2012 01:09 AM, Toshi Kani wrote:
> > On Fri, 2012-12-07 at 00:52 +0800, Jiang Liu wrote:
> >> On 12/07/2012 12:31 AM, Toshi Kani wrote:
> >>> On Fri, 2012-12-07 at 00:25 +0800, Jiang Liu wrote:
>  On 12/07/2012 12:03 AM, Toshi Kani wrote:
> > On Fri, 2012-12-07 at 00:00 +0800, Jiang Liu wrote:
> >> On 11/29/2012 02:41 AM, Toshi Kani wrote:
> >>> On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
> >>>  : 
> >>> Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
> >>> have not looked at all your changes yet..), but in my mind, a hot-plug
> >>> operation should be composed with the following 3 phases.
> >>>
> >>> 1. Validate phase - Verify if the request is a supported operation.  
> >>> All
> >>> known restrictions are verified at this phase.  For instance, if a
> >>> hot-remove request involves kernel memory, it is failed in this phase.
> >>> Since this phase makes no change, no rollback is necessary to fail.  
> >>>
> >>> 2. Execute phase - Perform hot-add / hot-remove operation that can be
> >>> rolled-back in case of error or cancel.
> >>>
> >>> 3. Commit phase - Perform the final hot-add / hot-remove operation 
> >>> that
> >>> cannot be rolled-back.  No error / cancel is allowed in this phase.  
> >>> For
> >>> instance, eject operation is performed at this phase.  
> >> Hi Toshi,
> >>There are one more step needed. Linux provides sysfs interfaces 
> >> to
> >> online/offline CPU/memory sections, so we need to protect from 
> >> concurrent
> >> operations from those interfaces when doing physical hotplug. Think 
> >> about
> >> following sequence:
> >> Thread 1
> >> 1. validate conditions for hot-removal
> >> 2. offline memory section A
> >> 3. online memory section A 
> >> 
> >> 4. offline memory section B
> >> 5 hot-remove memory device hosting A and B.
> >
> > Hi Gerry,
> >
> > I agree.  And I am working on a proposal that tries to address this
> > issue by integrating both sysfs and hotplug operations into a framework.
>  Hi Toshi,
>   But the sysfs for CPU and memory online/offline are platform independent
>  interfaces, and the ACPI based hotplug is platform dependent interfaces. 
>  I'm not
>  sure whether it's feasible to merge them. For example we still need 
>  offline interface
>  to stop using faulty CPUs on platform without physical hotplug 
>  capabilities.
>   We have solved this by adding a "busy" flag to the device, so the sysfs
>  will just return -EBUSY if the busy flag is set.
> >>>
> >>> I am making the framework code platform-independent so that it can
> >>> handle both cases.  Well, I am still prototyping, so hopefully it will
> >>> work. :)
> >> Do you mean implementing a framework to manage hotplug of any type of 
> >> devices?
> >> That sounds like a huge plan:)
> >>
> >> Otherwise there may be a gap. CPU online/offline interface deals with 
> >> logical
> >> CPU, and hotplug driver deals with physical devices(processor). They may 
> >> be different
> >> by related objects.
> > 
> > Actually it is not a huge plan.  The framework I am thinking of is to
> > enable a hotplug sequencer something analogous to do_initcalls() at the
> > boot sequence.  I am not doing any huge re-work.  That said, I am
> > currently testing my theory, so I won't promise anything, either. :)
> Please do give us an update when you get any progress:)

Yes, will do.

Thanks,
-Toshi


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-06 Thread Jiang Liu
On 12/07/2012 01:09 AM, Toshi Kani wrote:
> On Fri, 2012-12-07 at 00:52 +0800, Jiang Liu wrote:
>> On 12/07/2012 12:31 AM, Toshi Kani wrote:
>>> On Fri, 2012-12-07 at 00:25 +0800, Jiang Liu wrote:
 On 12/07/2012 12:03 AM, Toshi Kani wrote:
> On Fri, 2012-12-07 at 00:00 +0800, Jiang Liu wrote:
>> On 11/29/2012 02:41 AM, Toshi Kani wrote:
>>> On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
>>>  : 
>>> Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
>>> have not looked at all your changes yet..), but in my mind, a hot-plug
>>> operation should be composed with the following 3 phases.
>>>
>>> 1. Validate phase - Verify if the request is a supported operation.  All
>>> known restrictions are verified at this phase.  For instance, if a
>>> hot-remove request involves kernel memory, it is failed in this phase.
>>> Since this phase makes no change, no rollback is necessary to fail.  
>>>
>>> 2. Execute phase - Perform hot-add / hot-remove operation that can be
>>> rolled-back in case of error or cancel.
>>>
>>> 3. Commit phase - Perform the final hot-add / hot-remove operation that
>>> cannot be rolled-back.  No error / cancel is allowed in this phase.  For
>>> instance, eject operation is performed at this phase.  
>> Hi Toshi,
>>  There are one more step needed. Linux provides sysfs interfaces to
>> online/offline CPU/memory sections, so we need to protect from concurrent
>> operations from those interfaces when doing physical hotplug. Think about
>> following sequence:
>> Thread 1
>> 1. validate conditions for hot-removal
>> 2. offline memory section A
>> 3.   online memory section A 
>> 
>> 4. offline memory section B
>> 5 hot-remove memory device hosting A and B.
>
> Hi Gerry,
>
> I agree.  And I am working on a proposal that tries to address this
> issue by integrating both sysfs and hotplug operations into a framework.
 Hi Toshi,
But the sysfs for CPU and memory online/offline are platform independent
 interfaces, and the ACPI based hotplug is platform dependent interfaces. 
 I'm not
 sure whether it's feasible to merge them. For example we still need 
 offline interface
 to stop using faulty CPUs on platform without physical hotplug 
 capabilities.
We have solved this by adding a "busy" flag to the device, so the sysfs
 will just return -EBUSY if the busy flag is set.
>>>
>>> I am making the framework code platform-independent so that it can
>>> handle both cases.  Well, I am still prototyping, so hopefully it will
>>> work. :)
>> Do you mean implementing a framework to manage hotplug of any type of 
>> devices?
>> That sounds like a huge plan:)
>>
>> Otherwise there may be a gap. CPU online/offline interface deals with logical
>> CPU, and hotplug driver deals with physical devices(processor). They may be 
>> different
>> by related objects.
> 
> Actually it is not a huge plan.  The framework I am thinking of is to
> enable a hotplug sequencer something analogous to do_initcalls() at the
> boot sequence.  I am not doing any huge re-work.  That said, I am
> currently testing my theory, so I won't promise anything, either. :)
Please do give us an update when you get any progress:)

> 
> Thanks,
> -Toshi
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-06 Thread Toshi Kani
On Fri, 2012-12-07 at 00:52 +0800, Jiang Liu wrote:
> On 12/07/2012 12:31 AM, Toshi Kani wrote:
> > On Fri, 2012-12-07 at 00:25 +0800, Jiang Liu wrote:
> >> On 12/07/2012 12:03 AM, Toshi Kani wrote:
> >>> On Fri, 2012-12-07 at 00:00 +0800, Jiang Liu wrote:
>  On 11/29/2012 02:41 AM, Toshi Kani wrote:
> > On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
> >  : 
> > Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
> > have not looked at all your changes yet..), but in my mind, a hot-plug
> > operation should be composed with the following 3 phases.
> >
> > 1. Validate phase - Verify if the request is a supported operation.  All
> > known restrictions are verified at this phase.  For instance, if a
> > hot-remove request involves kernel memory, it is failed in this phase.
> > Since this phase makes no change, no rollback is necessary to fail.  
> >
> > 2. Execute phase - Perform hot-add / hot-remove operation that can be
> > rolled-back in case of error or cancel.
> >
> > 3. Commit phase - Perform the final hot-add / hot-remove operation that
> > cannot be rolled-back.  No error / cancel is allowed in this phase.  For
> > instance, eject operation is performed at this phase.  
>  Hi Toshi,
>   There are one more step needed. Linux provides sysfs interfaces to
>  online/offline CPU/memory sections, so we need to protect from concurrent
>  operations from those interfaces when doing physical hotplug. Think about
>  following sequence:
>  Thread 1
>  1. validate conditions for hot-removal
>  2. offline memory section A
>  3.   online memory section A 
>  
>  4. offline memory section B
>  5 hot-remove memory device hosting A and B.
> >>>
> >>> Hi Gerry,
> >>>
> >>> I agree.  And I am working on a proposal that tries to address this
> >>> issue by integrating both sysfs and hotplug operations into a framework.
> >> Hi Toshi,
> >>But the sysfs for CPU and memory online/offline are platform independent
> >> interfaces, and the ACPI based hotplug is platform dependent interfaces. 
> >> I'm not
> >> sure whether it's feasible to merge them. For example we still need 
> >> offline interface
> >> to stop using faulty CPUs on platform without physical hotplug 
> >> capabilities.
> >>We have solved this by adding a "busy" flag to the device, so the sysfs
> >> will just return -EBUSY if the busy flag is set.
> > 
> > I am making the framework code platform-independent so that it can
> > handle both cases.  Well, I am still prototyping, so hopefully it will
> > work. :)
> Do you mean implementing a framework to manage hotplug of any type of devices?
> That sounds like a huge plan:)
> 
> Otherwise there may be a gap. CPU online/offline interface deals with logical
> CPU, and hotplug driver deals with physical devices(processor). They may be 
> different
> by related objects.

Actually it is not a huge plan.  The framework I am thinking of is to
enable a hotplug sequencer something analogous to do_initcalls() at the
boot sequence.  I am not doing any huge re-work.  That said, I am
currently testing my theory, so I won't promise anything, either. :)

Thanks,
-Toshi


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-06 Thread Jiang Liu
On 11/30/2012 05:25 AM, Rafael J. Wysocki wrote:
> On Thursday, November 29, 2012 01:56:17 PM Toshi Kani wrote:
>> On Thu, 2012-11-29 at 13:39 -0700, Toshi Kani wrote:
>>> On Thu, 2012-11-29 at 21:30 +0100, Rafael J. Wysocki wrote:
 On Thursday, November 29, 2012 10:03:12 AM Toshi Kani wrote:
> On Thu, 2012-11-29 at 11:15 +0100, Rafael J. Wysocki wrote:
>> On Wednesday, November 28, 2012 11:41:36 AM Toshi Kani wrote:
>>> 1. Validate phase - Verify if the request is a supported operation.  All
>>> known restrictions are verified at this phase.  For instance, if a
>>> hot-remove request involves kernel memory, it is failed in this phase.
>>> Since this phase makes no change, no rollback is necessary to fail.  
>>
>> Actually, we can't do it this way, because the conditions may change 
>> between
>> the check and the execution.  So the first phase needs to involve 
>> execution
>> to some extent, although only as far as it remains reversible.
>
> For memory hot-remove, we can check if the target memory ranges are
> within ZONE_MOVABLE.  We should not allow user to change this setup
> during hot-remove operation.  Other things may be to check if a target
> node contains cpu0 (until it is supported), the console UART (assuming
> we cannot delete it), etc.  We should avoid doing rollback as much as we
> can.

 Yes, we can make some checks upfront as an optimization and fail early if
 the conditions are not met, but for correctness we need to repeat those
 checks later anyway.  Once we've decided to go for the eject, the 
 conditions
 must hold whatever happens.
>>>
>>> Agreed.
>>
>> BTW, it is not an optimization I am after for this phase.  There are
>> many error cases during hot-plug operations.  It is difficult to assure
>> that rollback is successful for every error condition in terms of
>> testing and maintaining the code.  So, it is easier to fail beforehand
>> when possible.
> 
> OK, but as I said it is necessary to ensure that the conditions will be met
> in the next phases as well if we don't fail.
Yes, that's absolutely an requirement. Otherwise QA people will call you
when doing stress tests.

> 
> Thanks,
> Rafael
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-06 Thread Jiang Liu
On 11/30/2012 04:30 AM, Rafael J. Wysocki wrote:
> On Thursday, November 29, 2012 10:03:12 AM Toshi Kani wrote:
>> On Thu, 2012-11-29 at 11:15 +0100, Rafael J. Wysocki wrote:
>>> On Wednesday, November 28, 2012 11:41:36 AM Toshi Kani wrote:
 On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
> On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
>> As discussed in https://patchwork.kernel.org/patch/1581581/
>> the driver core remove function needs to always succeed. This means we 
>> need
>> to know that the device can be successfully removed before acpi_bus_trim 
>> / 
>> acpi_bus_hot_remove_device are called. This can cause panics when 
>> OSPM-initiated
>> or SCI-initiated eject of memory devices fail e.g with:
>> echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
>>
>> since the ACPI core goes ahead and ejects the device regardless of 
>> whether the
>> the memory is still in use or not.
>>
>> For this reason a new acpi_device operation called prepare_remove is 
>> introduced.
>> This operation should be registered for acpi devices whose removal (from 
>> kernel
>> perspective) can fail.  Memory devices fall in this category.
>>
>> acpi_bus_remove() is changed to handle removal in 2 steps:
>> - preparation for removal i.e. perform part of removal that can fail. 
>> Should
>>   succeed for device and all its children.
>> - if above step was successfull, proceed to actual device removal
>
> Hi Vasilis,
> We met the same problem when we doing computer node hotplug, It is a good 
> idea
> to introduce prepare_remove before actual device removal.
>
> I think we could do more in prepare_remove, such as rollback. In most 
> cases, we can
> offline most of memory sections except kernel used pages now, should we 
> rollback
> and online the memory sections when prepare_remove failed ?

 I think hot-plug operation should have all-or-nothing semantics.  That
 is, an operation should either complete successfully, or rollback to the
 original state.
>>>
>>> That's correct.
>>>
> As you may know, the ACPI based hotplug framework we are working on 
> already addressed
> this problem, and the way we slove this problem is a bit like yours.
>
> We introduce hp_ops in struct acpi_device_ops:
> struct acpi_device_ops {
>   acpi_op_add add;
>   acpi_op_remove remove;
>   acpi_op_start start;
>   acpi_op_bind bind;
>   acpi_op_unbind unbind;
>   acpi_op_notify notify;
> #ifdefCONFIG_ACPI_HOTPLUG
>   struct acpihp_dev_ops *hp_ops;
> #endif/* CONFIG_ACPI_HOTPLUG */
> };
>
> in hp_ops, we divide the prepare_remove into six small steps, that is:
> 1) pre_release(): optional step to mark device going to be removed/busy
> 2) release(): reclaim device from running system
> 3) post_release(): rollback if cancelled by user or error happened
> 4) pre_unconfigure(): optional step to solve possible dependency issue
> 5) unconfigure(): remove devices from running system
> 6) post_unconfigure(): free resources used by devices
>
> In this way, we can easily rollback if error happens.
> How do you think of this solution, any suggestion ? I think we can achieve
> a better way for sharing ideas. :)

 Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
 have not looked at all your changes yet..), but in my mind, a hot-plug
 operation should be composed with the following 3 phases.

 1. Validate phase - Verify if the request is a supported operation.  All
 known restrictions are verified at this phase.  For instance, if a
 hot-remove request involves kernel memory, it is failed in this phase.
 Since this phase makes no change, no rollback is necessary to fail.  
>>>
>>> Actually, we can't do it this way, because the conditions may change between
>>> the check and the execution.  So the first phase needs to involve execution
>>> to some extent, although only as far as it remains reversible.
>>
>> For memory hot-remove, we can check if the target memory ranges are
>> within ZONE_MOVABLE.  We should not allow user to change this setup
>> during hot-remove operation.  Other things may be to check if a target
>> node contains cpu0 (until it is supported), the console UART (assuming
>> we cannot delete it), etc.  We should avoid doing rollback as much as we
>> can.
> 
> Yes, we can make some checks upfront as an optimization and fail early if
> the conditions are not met, but for correctness we need to repeat those
> checks later anyway.  Once we've decided to go for the eject, the conditions
> must hold whatever happens.
Hi Rafael,
Another reason for us to split hotplug operations into minor/tiny
steps is to support cancellation other than error handling. Theoretical
it may take infinite time to hot-remove a 

Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-06 Thread Jiang Liu
On 11/30/2012 01:03 AM, Toshi Kani wrote:
> On Thu, 2012-11-29 at 11:15 +0100, Rafael J. Wysocki wrote:
>> On Wednesday, November 28, 2012 11:41:36 AM Toshi Kani wrote:
>>> On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
 On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
> As discussed in https://patchwork.kernel.org/patch/1581581/
> the driver core remove function needs to always succeed. This means we 
> need
> to know that the device can be successfully removed before acpi_bus_trim 
> / 
> acpi_bus_hot_remove_device are called. This can cause panics when 
> OSPM-initiated
> or SCI-initiated eject of memory devices fail e.g with:
> echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
>
> since the ACPI core goes ahead and ejects the device regardless of 
> whether the
> the memory is still in use or not.
>
> For this reason a new acpi_device operation called prepare_remove is 
> introduced.
> This operation should be registered for acpi devices whose removal (from 
> kernel
> perspective) can fail.  Memory devices fall in this category.
>
> acpi_bus_remove() is changed to handle removal in 2 steps:
> - preparation for removal i.e. perform part of removal that can fail. 
> Should
>   succeed for device and all its children.
> - if above step was successfull, proceed to actual device removal

 Hi Vasilis,
 We met the same problem when we doing computer node hotplug, It is a good 
 idea
 to introduce prepare_remove before actual device removal.

 I think we could do more in prepare_remove, such as rollback. In most 
 cases, we can
 offline most of memory sections except kernel used pages now, should we 
 rollback
 and online the memory sections when prepare_remove failed ?
>>>
>>> I think hot-plug operation should have all-or-nothing semantics.  That
>>> is, an operation should either complete successfully, or rollback to the
>>> original state.
>>
>> That's correct.
>>
 As you may know, the ACPI based hotplug framework we are working on 
 already addressed
 this problem, and the way we slove this problem is a bit like yours.

 We introduce hp_ops in struct acpi_device_ops:
 struct acpi_device_ops {
acpi_op_add add;
acpi_op_remove remove;
acpi_op_start start;
acpi_op_bind bind;
acpi_op_unbind unbind;
acpi_op_notify notify;
 #ifdef CONFIG_ACPI_HOTPLUG
struct acpihp_dev_ops *hp_ops;
 #endif /* CONFIG_ACPI_HOTPLUG */
 };

 in hp_ops, we divide the prepare_remove into six small steps, that is:
 1) pre_release(): optional step to mark device going to be removed/busy
 2) release(): reclaim device from running system
 3) post_release(): rollback if cancelled by user or error happened
 4) pre_unconfigure(): optional step to solve possible dependency issue
 5) unconfigure(): remove devices from running system
 6) post_unconfigure(): free resources used by devices

 In this way, we can easily rollback if error happens.
 How do you think of this solution, any suggestion ? I think we can achieve
 a better way for sharing ideas. :)
>>>
>>> Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
>>> have not looked at all your changes yet..), but in my mind, a hot-plug
>>> operation should be composed with the following 3 phases.
>>>
>>> 1. Validate phase - Verify if the request is a supported operation.  All
>>> known restrictions are verified at this phase.  For instance, if a
>>> hot-remove request involves kernel memory, it is failed in this phase.
>>> Since this phase makes no change, no rollback is necessary to fail.  
>>
>> Actually, we can't do it this way, because the conditions may change between
>> the check and the execution.  So the first phase needs to involve execution
>> to some extent, although only as far as it remains reversible.
> 
> For memory hot-remove, we can check if the target memory ranges are
> within ZONE_MOVABLE.  We should not allow user to change this setup
> during hot-remove operation.  Other things may be to check if a target
> node contains cpu0 (until it is supported), the console UART (assuming
> we cannot delete it), etc.  We should avoid doing rollback as much as we
> can.
Fengguang from Intel is working on a patchset to hot-remove CPU0(BSP)
on x86 platforms and he has posted several versions. Maybe we could eventually
remove CPU0 on x86.

> 
> Thanks,
> -Toshi
> 
> 
>>> 2. Execute phase - Perform hot-add / hot-remove operation that can be
>>> rolled-back in case of error or cancel.
>>
>> I would just merge 1 and 2.
>>
>>> 3. Commit phase - Perform the final hot-add / hot-remove operation that
>>> cannot be rolled-back.  No error / cancel is allowed in this phase.  For
>>> instance, eject operation is performed at this phase.  
>>
>> Yup.
>>
>> Thanks,
>> Rafael
>>
>>
> 
> 
> --
> 

Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-06 Thread Jiang Liu
On 11/29/2012 07:36 PM, Vasilis Liaskovitis wrote:
> On Thu, Nov 29, 2012 at 11:15:31AM +0100, Rafael J. Wysocki wrote:
>> On Wednesday, November 28, 2012 11:41:36 AM Toshi Kani wrote:
>>> On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
 We met the same problem when we doing computer node hotplug, It is a good 
 idea
 to introduce prepare_remove before actual device removal.

 I think we could do more in prepare_remove, such as rollback. In most 
 cases, we can
 offline most of memory sections except kernel used pages now, should we 
 rollback
 and online the memory sections when prepare_remove failed ?
>>>
>>> I think hot-plug operation should have all-or-nothing semantics.  That
>>> is, an operation should either complete successfully, or rollback to the
>>> original state.
>>
>> That's correct.
>>
 As you may know, the ACPI based hotplug framework we are working on 
 already addressed
 this problem, and the way we slove this problem is a bit like yours.

 We introduce hp_ops in struct acpi_device_ops:
 struct acpi_device_ops {
acpi_op_add add;
acpi_op_remove remove;
acpi_op_start start;
acpi_op_bind bind;
acpi_op_unbind unbind;
acpi_op_notify notify;
 #ifdef CONFIG_ACPI_HOTPLUG
struct acpihp_dev_ops *hp_ops;
 #endif /* CONFIG_ACPI_HOTPLUG */
 };

 in hp_ops, we divide the prepare_remove into six small steps, that is:
 1) pre_release(): optional step to mark device going to be removed/busy
 2) release(): reclaim device from running system
 3) post_release(): rollback if cancelled by user or error happened
 4) pre_unconfigure(): optional step to solve possible dependency issue
 5) unconfigure(): remove devices from running system
 6) post_unconfigure(): free resources used by devices

 In this way, we can easily rollback if error happens.
 How do you think of this solution, any suggestion ? I think we can achieve
 a better way for sharing ideas. :)
>>>
>>> Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
>>> have not looked at all your changes yet..), but in my mind, a hot-plug
>>> operation should be composed with the following 3 phases.
>>>
>>> 1. Validate phase - Verify if the request is a supported operation.  All
>>> known restrictions are verified at this phase.  For instance, if a
>>> hot-remove request involves kernel memory, it is failed in this phase.
>>> Since this phase makes no change, no rollback is necessary to fail.  
>>
>> Actually, we can't do it this way, because the conditions may change between
>> the check and the execution.  So the first phase needs to involve execution
>> to some extent, although only as far as it remains reversible.
>>
>>> 2. Execute phase - Perform hot-add / hot-remove operation that can be
>>> rolled-back in case of error or cancel.
>>
>> I would just merge 1 and 2.
> 
> I agree steps 1 and 2 can be merged, at least for the current ACPI framework.
> E.g. for memory hotplug, the mm function we call for memory removal
> (remove_memory) handles both these steps.
> 
> The new ACPI framework could perhaps expand the operations as Hanjun 
> described,
> if it makes sense.
Hi Vasilis,
We have worked some prototypes to split the memory hotplug logic in 
mem_hotplug.c
into minor steps, so it would be easier for error handling/cancellation. But we 
still
need to improve the code quality and merge with changes from Fujitsu.
Regards!

> 
> thanks,
> 
> - Vasilis
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-06 Thread Jiang Liu
On 11/29/2012 06:15 PM, Rafael J. Wysocki wrote:
> On Wednesday, November 28, 2012 11:41:36 AM Toshi Kani wrote:
>> On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
>>> On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
 As discussed in https://patchwork.kernel.org/patch/1581581/
 the driver core remove function needs to always succeed. This means we need
 to know that the device can be successfully removed before acpi_bus_trim / 
 acpi_bus_hot_remove_device are called. This can cause panics when 
 OSPM-initiated
 or SCI-initiated eject of memory devices fail e.g with:
 echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject

 since the ACPI core goes ahead and ejects the device regardless of whether 
 the
 the memory is still in use or not.

 For this reason a new acpi_device operation called prepare_remove is 
 introduced.
 This operation should be registered for acpi devices whose removal (from 
 kernel
 perspective) can fail.  Memory devices fall in this category.

 acpi_bus_remove() is changed to handle removal in 2 steps:
 - preparation for removal i.e. perform part of removal that can fail. 
 Should
   succeed for device and all its children.
 - if above step was successfull, proceed to actual device removal
>>>
>>> Hi Vasilis,
>>> We met the same problem when we doing computer node hotplug, It is a good 
>>> idea
>>> to introduce prepare_remove before actual device removal.
>>>
>>> I think we could do more in prepare_remove, such as rollback. In most 
>>> cases, we can
>>> offline most of memory sections except kernel used pages now, should we 
>>> rollback
>>> and online the memory sections when prepare_remove failed ?
>>
>> I think hot-plug operation should have all-or-nothing semantics.  That
>> is, an operation should either complete successfully, or rollback to the
>> original state.
> 
> That's correct.
> 
>>> As you may know, the ACPI based hotplug framework we are working on already 
>>> addressed
>>> this problem, and the way we slove this problem is a bit like yours.
>>>
>>> We introduce hp_ops in struct acpi_device_ops:
>>> struct acpi_device_ops {
>>> acpi_op_add add;
>>> acpi_op_remove remove;
>>> acpi_op_start start;
>>> acpi_op_bind bind;
>>> acpi_op_unbind unbind;
>>> acpi_op_notify notify;
>>> #ifdef  CONFIG_ACPI_HOTPLUG
>>> struct acpihp_dev_ops *hp_ops;
>>> #endif  /* CONFIG_ACPI_HOTPLUG */
>>> };
>>>
>>> in hp_ops, we divide the prepare_remove into six small steps, that is:
>>> 1) pre_release(): optional step to mark device going to be removed/busy
>>> 2) release(): reclaim device from running system
>>> 3) post_release(): rollback if cancelled by user or error happened
>>> 4) pre_unconfigure(): optional step to solve possible dependency issue
>>> 5) unconfigure(): remove devices from running system
>>> 6) post_unconfigure(): free resources used by devices
>>>
>>> In this way, we can easily rollback if error happens.
>>> How do you think of this solution, any suggestion ? I think we can achieve
>>> a better way for sharing ideas. :)
>>
>> Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
>> have not looked at all your changes yet..), but in my mind, a hot-plug
>> operation should be composed with the following 3 phases.
>>
>> 1. Validate phase - Verify if the request is a supported operation.  All
>> known restrictions are verified at this phase.  For instance, if a
>> hot-remove request involves kernel memory, it is failed in this phase.
>> Since this phase makes no change, no rollback is necessary to fail.  
> 
> Actually, we can't do it this way, because the conditions may change between
> the check and the execution.  So the first phase needs to involve execution
> to some extent, although only as far as it remains reversible.
Hi Rafael,
A possible way to solve this issue is:
1) mark device busy
2) check condition and mark device as normal if condition check fails.
3) reclaim the device and mark device as normal if reclaim fails.
4) remove the device.

> 
>> 2. Execute phase - Perform hot-add / hot-remove operation that can be
>> rolled-back in case of error or cancel.
> 
> I would just merge 1 and 2.
> 
>> 3. Commit phase - Perform the final hot-add / hot-remove operation that
>> cannot be rolled-back.  No error / cancel is allowed in this phase.  For
>> instance, eject operation is performed at this phase.  
> 
> Yup.
> 
> Thanks,
> Rafael
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-06 Thread Jiang Liu
On 12/07/2012 12:31 AM, Toshi Kani wrote:
> On Fri, 2012-12-07 at 00:25 +0800, Jiang Liu wrote:
>> On 12/07/2012 12:03 AM, Toshi Kani wrote:
>>> On Fri, 2012-12-07 at 00:00 +0800, Jiang Liu wrote:
 On 11/29/2012 02:41 AM, Toshi Kani wrote:
> On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
>  : 
> Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
> have not looked at all your changes yet..), but in my mind, a hot-plug
> operation should be composed with the following 3 phases.
>
> 1. Validate phase - Verify if the request is a supported operation.  All
> known restrictions are verified at this phase.  For instance, if a
> hot-remove request involves kernel memory, it is failed in this phase.
> Since this phase makes no change, no rollback is necessary to fail.  
>
> 2. Execute phase - Perform hot-add / hot-remove operation that can be
> rolled-back in case of error or cancel.
>
> 3. Commit phase - Perform the final hot-add / hot-remove operation that
> cannot be rolled-back.  No error / cancel is allowed in this phase.  For
> instance, eject operation is performed at this phase.  
 Hi Toshi,
There are one more step needed. Linux provides sysfs interfaces to
 online/offline CPU/memory sections, so we need to protect from concurrent
 operations from those interfaces when doing physical hotplug. Think about
 following sequence:
 Thread 1
 1. validate conditions for hot-removal
 2. offline memory section A
 3. online memory section A 
 
 4. offline memory section B
 5 hot-remove memory device hosting A and B.
>>>
>>> Hi Gerry,
>>>
>>> I agree.  And I am working on a proposal that tries to address this
>>> issue by integrating both sysfs and hotplug operations into a framework.
>> Hi Toshi,
>>  But the sysfs for CPU and memory online/offline are platform independent
>> interfaces, and the ACPI based hotplug is platform dependent interfaces. I'm 
>> not
>> sure whether it's feasible to merge them. For example we still need offline 
>> interface
>> to stop using faulty CPUs on platform without physical hotplug capabilities.
>>  We have solved this by adding a "busy" flag to the device, so the sysfs
>> will just return -EBUSY if the busy flag is set.
> 
> I am making the framework code platform-independent so that it can
> handle both cases.  Well, I am still prototyping, so hopefully it will
> work. :)
Do you mean implementing a framework to manage hotplug of any type of devices?
That sounds like a huge plan:)

Otherwise there may be a gap. CPU online/offline interface deals with logical
CPU, and hotplug driver deals with physical devices(processor). They may be 
different
by related objects.

> 
> Thanks,
> -Toshi
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-06 Thread Jiang Liu
On 12/05/2012 07:23 AM, Toshi Kani wrote:
> On Tue, 2012-12-04 at 17:16 +0800, Hanjun Guo wrote:
>> On 2012/12/4 8:10, Toshi Kani wrote:
>>> On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
 On 2012/11/30 6:27, Toshi Kani wrote:
> On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
>> On 2012/11/29 2:41, Toshi Kani wrote:
>>> On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
 On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
 As you may know, the ACPI based hotplug framework we are working on 
 already addressed
 this problem, and the way we slove this problem is a bit like yours.

 We introduce hp_ops in struct acpi_device_ops:
 struct acpi_device_ops {
acpi_op_add add;
acpi_op_remove remove;
acpi_op_start start;
acpi_op_bind bind;
acpi_op_unbind unbind;
acpi_op_notify notify;
 #ifdef CONFIG_ACPI_HOTPLUG
struct acpihp_dev_ops *hp_ops;
 #endif /* CONFIG_ACPI_HOTPLUG */
 };

 in hp_ops, we divide the prepare_remove into six small steps, that is:
 1) pre_release(): optional step to mark device going to be removed/busy
 2) release(): reclaim device from running system
 3) post_release(): rollback if cancelled by user or error happened
 4) pre_unconfigure(): optional step to solve possible dependency issue
 5) unconfigure(): remove devices from running system
 6) post_unconfigure(): free resources used by devices

 In this way, we can easily rollback if error happens.
 How do you think of this solution, any suggestion ? I think we can 
 achieve
 a better way for sharing ideas. :)
>>>
>>> Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
>>> have not looked at all your changes yet..), but in my mind, a hot-plug
>>> operation should be composed with the following 3 phases.
>>
>> Good idea ! we also implement a hot-plug operation in 3 phases:
>> 1) acpihp_drv_pre_execute
>> 2) acpihp_drv_execute
>> 3) acpihp_drv_post_execute
>> you may refer to :
>> https://lkml.org/lkml/2012/11/4/79
>
> Great.  Yes, I will take a look.

 Thanks, any comments are welcomed :)
>>>
>>> If I read the code right, the framework calls ACPI drivers differently
>>> at boot-time and hot-add as follows.  That is, the new entry points are
>>> called at hot-add only, but .add() is called at both cases.  This
>>> requires .add() to work differently.
>>
>> Hi Toshi,
>> Thanks for your comments!
>>
>>>
>>> Boot: .add()
>>
>> Actually, at boot time: .add(), .start()
> 
> Right.
> 
>>> Hot-Add : .add(), .pre_configure(), configure(), etc.
>>
>> Yes, we did it as you said in the framework. We use .pre_configure(), 
>> configure(),
>> and post_configure() to instead of .start() for better error handling and 
>> recovery.
> 
> I think we should have hot-plug interfaces at the module level, not at
> the ACPI-internal level.  In this way, the interfaces can be
> platform-neutral and allow any modules to register, which makes it more
> consistent with the boot-up sequence.  It can also allow ordering of the
> sequence among the registered modules.  Right now, we initiate all
> procedures from ACPI during hot-plug, which I think is inflexible and
> steps into other module's role.
> 
> I am also concerned about the slot handling, which is the core piece of
> the infrastructure and only allows hot-plug operations on ACPI objects
> where slot objects are previously created by checking _EJ0.  The
> infrastructure should allow hot-plug operations on any objects, and it
> should not be dependent on the slot design.
> 
> I have some rough idea, and it may be easier to review / explain if I
> make some code changes.  So, let me prototype it, and send it you all if
> that works out.  Hopefully, it won't take too long.
> 
>>> I think the boot-time and hot-add initialization should be done
>>> consistently.  While there is difficulty with the current boot sequence,
>>> the framework should be designed to allow them consistent, not make them
>>> diverged.
>>>
>>> 1. Validate phase - Verify if the request is a supported operation.  All
>>> known restrictions are verified at this phase.  For instance, if a
>>> hot-remove request involves kernel memory, it is failed in this phase.
>>> Since this phase makes no change, no rollback is necessary to fail. 
>>
>> Yes, we have done this in acpihp_drv_pre_execute, and check following 
>> things:
>>
>> 1) Hot-plugble or not. the instance kernel memory you mentioned is also 
>> checked
>>when memory device remove;
>
> Agreed.
>
>> 2) Dependency check involved. For instance, if hot-add a memory device,
>>processor should be added first, otherwise it's not valid 

Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-06 Thread Jiang Liu
On 12/04/2012 08:10 AM, Toshi Kani wrote:
> On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
>> On 2012/11/30 6:27, Toshi Kani wrote:
>>> On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
 On 2012/11/29 2:41, Toshi Kani wrote:
> On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
>> On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
>> As you may know, the ACPI based hotplug framework we are working on 
>> already addressed
>> this problem, and the way we slove this problem is a bit like yours.
>>
>> We introduce hp_ops in struct acpi_device_ops:
>> struct acpi_device_ops {
>>  acpi_op_add add;
>>  acpi_op_remove remove;
>>  acpi_op_start start;
>>  acpi_op_bind bind;
>>  acpi_op_unbind unbind;
>>  acpi_op_notify notify;
>> #ifdef   CONFIG_ACPI_HOTPLUG
>>  struct acpihp_dev_ops *hp_ops;
>> #endif   /* CONFIG_ACPI_HOTPLUG */
>> };
>>
>> in hp_ops, we divide the prepare_remove into six small steps, that is:
>> 1) pre_release(): optional step to mark device going to be removed/busy
>> 2) release(): reclaim device from running system
>> 3) post_release(): rollback if cancelled by user or error happened
>> 4) pre_unconfigure(): optional step to solve possible dependency issue
>> 5) unconfigure(): remove devices from running system
>> 6) post_unconfigure(): free resources used by devices
>>
>> In this way, we can easily rollback if error happens.
>> How do you think of this solution, any suggestion ? I think we can 
>> achieve
>> a better way for sharing ideas. :)
>
> Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
> have not looked at all your changes yet..), but in my mind, a hot-plug
> operation should be composed with the following 3 phases.

 Good idea ! we also implement a hot-plug operation in 3 phases:
 1) acpihp_drv_pre_execute
 2) acpihp_drv_execute
 3) acpihp_drv_post_execute
 you may refer to :
 https://lkml.org/lkml/2012/11/4/79
>>>
>>> Great.  Yes, I will take a look.
>>
>> Thanks, any comments are welcomed :)
> 
> If I read the code right, the framework calls ACPI drivers differently
> at boot-time and hot-add as follows.  That is, the new entry points are
> called at hot-add only, but .add() is called at both cases.  This
> requires .add() to work differently.
> 
> Boot: .add()
> Hot-Add : .add(), .pre_configure(), configure(), etc.
> 
> I think the boot-time and hot-add initialization should be done
> consistently.  While there is difficulty with the current boot sequence,
> the framework should be designed to allow them consistent, not make them
> diverged.
Hi Toshi,
We have separated hotplug operations from driver binding/unbinding 
interface
due to following considerations.
1) Physical CPU and memory devices are initialized/used before the ACPI 
subsystem
   is initialized. So under normal case, .add() of processor and 
acpi_memhotplug only
   figures out information about device already in working state instead of 
starting
   the device.
2) It's impossible to rmmod the processor and acpi_memhotplug driver at runtime 
   if .remove() of CPU and memory drivers do really remove the CPU/memory device
   from the system. And the ACPI processor driver also implements CPU PM 
funcitonality
   other than hotplug.

And recently Rafael has mentioned that he has a long term view to get rid of the
concept of "ACPI device". If that happens, we could easily move the hotplug
logic from ACPI device drivers into the hotplug framework if the hotplug logic
is separated from the .add()/.remove() callbacks. Actually we could even move 
all
hotplug only logic into the hotplug framework and don't rely on any ACPI device
driver any more. So we could get rid of all these messy things. We could achieve
that by:
1) moving code shared by ACPI device drivers and the hotplug framework into the 
core.
2) moving hotplug only code to the framework.

Hi Rafael, what's your thoughts here?

> 
> 1. Validate phase - Verify if the request is a supported operation.  All
> known restrictions are verified at this phase.  For instance, if a
> hot-remove request involves kernel memory, it is failed in this phase.
> Since this phase makes no change, no rollback is necessary to fail. 

 Yes, we have done this in acpihp_drv_pre_execute, and check following 
 things:

 1) Hot-plugble or not. the instance kernel memory you mentioned is also 
 checked
when memory device remove;
>>>
>>> Agreed.
>>>
 2) Dependency check involved. For instance, if hot-add a memory device,
processor should be added first, otherwise it's not valid to this 
 operation.
>>>
>>> I think FW should be the one that assures such dependency.  That is,
>>> when a memory device object is marked as present/enabled/functioning, it
>>> should be ready for the OS to use.
>>
>> Yes, BIOS should do 

Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-06 Thread Toshi Kani
On Fri, 2012-12-07 at 00:25 +0800, Jiang Liu wrote:
> On 12/07/2012 12:03 AM, Toshi Kani wrote:
> > On Fri, 2012-12-07 at 00:00 +0800, Jiang Liu wrote:
> >> On 11/29/2012 02:41 AM, Toshi Kani wrote:
> >>> On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
 : 
> >>> Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
> >>> have not looked at all your changes yet..), but in my mind, a hot-plug
> >>> operation should be composed with the following 3 phases.
> >>>
> >>> 1. Validate phase - Verify if the request is a supported operation.  All
> >>> known restrictions are verified at this phase.  For instance, if a
> >>> hot-remove request involves kernel memory, it is failed in this phase.
> >>> Since this phase makes no change, no rollback is necessary to fail.  
> >>>
> >>> 2. Execute phase - Perform hot-add / hot-remove operation that can be
> >>> rolled-back in case of error or cancel.
> >>>
> >>> 3. Commit phase - Perform the final hot-add / hot-remove operation that
> >>> cannot be rolled-back.  No error / cancel is allowed in this phase.  For
> >>> instance, eject operation is performed at this phase.  
> >> Hi Toshi,
> >>There are one more step needed. Linux provides sysfs interfaces to
> >> online/offline CPU/memory sections, so we need to protect from concurrent
> >> operations from those interfaces when doing physical hotplug. Think about
> >> following sequence:
> >> Thread 1
> >> 1. validate conditions for hot-removal
> >> 2. offline memory section A
> >> 3. online memory section A 
> >> 
> >> 4. offline memory section B
> >> 5 hot-remove memory device hosting A and B.
> > 
> > Hi Gerry,
> > 
> > I agree.  And I am working on a proposal that tries to address this
> > issue by integrating both sysfs and hotplug operations into a framework.
> Hi Toshi,
>   But the sysfs for CPU and memory online/offline are platform independent
> interfaces, and the ACPI based hotplug is platform dependent interfaces. I'm 
> not
> sure whether it's feasible to merge them. For example we still need offline 
> interface
> to stop using faulty CPUs on platform without physical hotplug capabilities.
>   We have solved this by adding a "busy" flag to the device, so the sysfs
> will just return -EBUSY if the busy flag is set.

I am making the framework code platform-independent so that it can
handle both cases.  Well, I am still prototyping, so hopefully it will
work. :)

Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-06 Thread Jiang Liu
On 12/07/2012 12:03 AM, Toshi Kani wrote:
> On Fri, 2012-12-07 at 00:00 +0800, Jiang Liu wrote:
>> On 11/29/2012 02:41 AM, Toshi Kani wrote:
>>> On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
 On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
> As discussed in https://patchwork.kernel.org/patch/1581581/
> the driver core remove function needs to always succeed. This means we 
> need
> to know that the device can be successfully removed before acpi_bus_trim 
> / 
> acpi_bus_hot_remove_device are called. This can cause panics when 
> OSPM-initiated
> or SCI-initiated eject of memory devices fail e.g with:
> echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
>
> since the ACPI core goes ahead and ejects the device regardless of 
> whether the
> the memory is still in use or not.
>
> For this reason a new acpi_device operation called prepare_remove is 
> introduced.
> This operation should be registered for acpi devices whose removal (from 
> kernel
> perspective) can fail.  Memory devices fall in this category.
>
> acpi_bus_remove() is changed to handle removal in 2 steps:
> - preparation for removal i.e. perform part of removal that can fail. 
> Should
>   succeed for device and all its children.
> - if above step was successfull, proceed to actual device removal

 Hi Vasilis,
 We met the same problem when we doing computer node hotplug, It is a good 
 idea
 to introduce prepare_remove before actual device removal.

 I think we could do more in prepare_remove, such as rollback. In most 
 cases, we can
 offline most of memory sections except kernel used pages now, should we 
 rollback
 and online the memory sections when prepare_remove failed ?
>>>
>>> I think hot-plug operation should have all-or-nothing semantics.  That
>>> is, an operation should either complete successfully, or rollback to the
>>> original state.
>>>
 As you may know, the ACPI based hotplug framework we are working on 
 already addressed
 this problem, and the way we slove this problem is a bit like yours.

 We introduce hp_ops in struct acpi_device_ops:
 struct acpi_device_ops {
acpi_op_add add;
acpi_op_remove remove;
acpi_op_start start;
acpi_op_bind bind;
acpi_op_unbind unbind;
acpi_op_notify notify;
 #ifdef CONFIG_ACPI_HOTPLUG
struct acpihp_dev_ops *hp_ops;
 #endif /* CONFIG_ACPI_HOTPLUG */
 };

 in hp_ops, we divide the prepare_remove into six small steps, that is:
 1) pre_release(): optional step to mark device going to be removed/busy
 2) release(): reclaim device from running system
 3) post_release(): rollback if cancelled by user or error happened
 4) pre_unconfigure(): optional step to solve possible dependency issue
 5) unconfigure(): remove devices from running system
 6) post_unconfigure(): free resources used by devices

 In this way, we can easily rollback if error happens.
 How do you think of this solution, any suggestion ? I think we can achieve
 a better way for sharing ideas. :)
>>>
>>> Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
>>> have not looked at all your changes yet..), but in my mind, a hot-plug
>>> operation should be composed with the following 3 phases.
>>>
>>> 1. Validate phase - Verify if the request is a supported operation.  All
>>> known restrictions are verified at this phase.  For instance, if a
>>> hot-remove request involves kernel memory, it is failed in this phase.
>>> Since this phase makes no change, no rollback is necessary to fail.  
>>>
>>> 2. Execute phase - Perform hot-add / hot-remove operation that can be
>>> rolled-back in case of error or cancel.
>>>
>>> 3. Commit phase - Perform the final hot-add / hot-remove operation that
>>> cannot be rolled-back.  No error / cancel is allowed in this phase.  For
>>> instance, eject operation is performed at this phase.  
>> Hi Toshi,
>>  There are one more step needed. Linux provides sysfs interfaces to
>> online/offline CPU/memory sections, so we need to protect from concurrent
>> operations from those interfaces when doing physical hotplug. Think about
>> following sequence:
>> Thread 1
>> 1. validate conditions for hot-removal
>> 2. offline memory section A
>> 3.   online memory section A 
>> 
>> 4. offline memory section B
>> 5 hot-remove memory device hosting A and B.
> 
> Hi Gerry,
> 
> I agree.  And I am working on a proposal that tries to address this
> issue by integrating both sysfs and hotplug operations into a framework.
Hi Toshi,
But the sysfs for CPU and memory online/offline are platform independent
interfaces, and the ACPI based hotplug is platform dependent interfaces. I'm not
sure whether it's feasible to merge them. For example we still need 

Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-06 Thread Toshi Kani
On Fri, 2012-12-07 at 00:00 +0800, Jiang Liu wrote:
> On 11/29/2012 02:41 AM, Toshi Kani wrote:
> > On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
> >> On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
> >>> As discussed in https://patchwork.kernel.org/patch/1581581/
> >>> the driver core remove function needs to always succeed. This means we 
> >>> need
> >>> to know that the device can be successfully removed before acpi_bus_trim 
> >>> / 
> >>> acpi_bus_hot_remove_device are called. This can cause panics when 
> >>> OSPM-initiated
> >>> or SCI-initiated eject of memory devices fail e.g with:
> >>> echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
> >>>
> >>> since the ACPI core goes ahead and ejects the device regardless of 
> >>> whether the
> >>> the memory is still in use or not.
> >>>
> >>> For this reason a new acpi_device operation called prepare_remove is 
> >>> introduced.
> >>> This operation should be registered for acpi devices whose removal (from 
> >>> kernel
> >>> perspective) can fail.  Memory devices fall in this category.
> >>>
> >>> acpi_bus_remove() is changed to handle removal in 2 steps:
> >>> - preparation for removal i.e. perform part of removal that can fail. 
> >>> Should
> >>>   succeed for device and all its children.
> >>> - if above step was successfull, proceed to actual device removal
> >>
> >> Hi Vasilis,
> >> We met the same problem when we doing computer node hotplug, It is a good 
> >> idea
> >> to introduce prepare_remove before actual device removal.
> >>
> >> I think we could do more in prepare_remove, such as rollback. In most 
> >> cases, we can
> >> offline most of memory sections except kernel used pages now, should we 
> >> rollback
> >> and online the memory sections when prepare_remove failed ?
> > 
> > I think hot-plug operation should have all-or-nothing semantics.  That
> > is, an operation should either complete successfully, or rollback to the
> > original state.
> > 
> >> As you may know, the ACPI based hotplug framework we are working on 
> >> already addressed
> >> this problem, and the way we slove this problem is a bit like yours.
> >>
> >> We introduce hp_ops in struct acpi_device_ops:
> >> struct acpi_device_ops {
> >>acpi_op_add add;
> >>acpi_op_remove remove;
> >>acpi_op_start start;
> >>acpi_op_bind bind;
> >>acpi_op_unbind unbind;
> >>acpi_op_notify notify;
> >> #ifdef CONFIG_ACPI_HOTPLUG
> >>struct acpihp_dev_ops *hp_ops;
> >> #endif /* CONFIG_ACPI_HOTPLUG */
> >> };
> >>
> >> in hp_ops, we divide the prepare_remove into six small steps, that is:
> >> 1) pre_release(): optional step to mark device going to be removed/busy
> >> 2) release(): reclaim device from running system
> >> 3) post_release(): rollback if cancelled by user or error happened
> >> 4) pre_unconfigure(): optional step to solve possible dependency issue
> >> 5) unconfigure(): remove devices from running system
> >> 6) post_unconfigure(): free resources used by devices
> >>
> >> In this way, we can easily rollback if error happens.
> >> How do you think of this solution, any suggestion ? I think we can achieve
> >> a better way for sharing ideas. :)
> > 
> > Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
> > have not looked at all your changes yet..), but in my mind, a hot-plug
> > operation should be composed with the following 3 phases.
> > 
> > 1. Validate phase - Verify if the request is a supported operation.  All
> > known restrictions are verified at this phase.  For instance, if a
> > hot-remove request involves kernel memory, it is failed in this phase.
> > Since this phase makes no change, no rollback is necessary to fail.  
> > 
> > 2. Execute phase - Perform hot-add / hot-remove operation that can be
> > rolled-back in case of error or cancel.
> > 
> > 3. Commit phase - Perform the final hot-add / hot-remove operation that
> > cannot be rolled-back.  No error / cancel is allowed in this phase.  For
> > instance, eject operation is performed at this phase.  
> Hi Toshi,
>   There are one more step needed. Linux provides sysfs interfaces to
> online/offline CPU/memory sections, so we need to protect from concurrent
> operations from those interfaces when doing physical hotplug. Think about
> following sequence:
> Thread 1
> 1. validate conditions for hot-removal
> 2. offline memory section A
> 3.online memory section A 
> 
> 4. offline memory section B
> 5 hot-remove memory device hosting A and B.

Hi Gerry,

I agree.  And I am working on a proposal that tries to address this
issue by integrating both sysfs and hotplug operations into a framework.


Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-06 Thread Jiang Liu
On 11/29/2012 02:41 AM, Toshi Kani wrote:
> On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
>> On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
>>> As discussed in https://patchwork.kernel.org/patch/1581581/
>>> the driver core remove function needs to always succeed. This means we need
>>> to know that the device can be successfully removed before acpi_bus_trim / 
>>> acpi_bus_hot_remove_device are called. This can cause panics when 
>>> OSPM-initiated
>>> or SCI-initiated eject of memory devices fail e.g with:
>>> echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
>>>
>>> since the ACPI core goes ahead and ejects the device regardless of whether 
>>> the
>>> the memory is still in use or not.
>>>
>>> For this reason a new acpi_device operation called prepare_remove is 
>>> introduced.
>>> This operation should be registered for acpi devices whose removal (from 
>>> kernel
>>> perspective) can fail.  Memory devices fall in this category.
>>>
>>> acpi_bus_remove() is changed to handle removal in 2 steps:
>>> - preparation for removal i.e. perform part of removal that can fail. Should
>>>   succeed for device and all its children.
>>> - if above step was successfull, proceed to actual device removal
>>
>> Hi Vasilis,
>> We met the same problem when we doing computer node hotplug, It is a good 
>> idea
>> to introduce prepare_remove before actual device removal.
>>
>> I think we could do more in prepare_remove, such as rollback. In most cases, 
>> we can
>> offline most of memory sections except kernel used pages now, should we 
>> rollback
>> and online the memory sections when prepare_remove failed ?
> 
> I think hot-plug operation should have all-or-nothing semantics.  That
> is, an operation should either complete successfully, or rollback to the
> original state.
> 
>> As you may know, the ACPI based hotplug framework we are working on already 
>> addressed
>> this problem, and the way we slove this problem is a bit like yours.
>>
>> We introduce hp_ops in struct acpi_device_ops:
>> struct acpi_device_ops {
>>  acpi_op_add add;
>>  acpi_op_remove remove;
>>  acpi_op_start start;
>>  acpi_op_bind bind;
>>  acpi_op_unbind unbind;
>>  acpi_op_notify notify;
>> #ifdef   CONFIG_ACPI_HOTPLUG
>>  struct acpihp_dev_ops *hp_ops;
>> #endif   /* CONFIG_ACPI_HOTPLUG */
>> };
>>
>> in hp_ops, we divide the prepare_remove into six small steps, that is:
>> 1) pre_release(): optional step to mark device going to be removed/busy
>> 2) release(): reclaim device from running system
>> 3) post_release(): rollback if cancelled by user or error happened
>> 4) pre_unconfigure(): optional step to solve possible dependency issue
>> 5) unconfigure(): remove devices from running system
>> 6) post_unconfigure(): free resources used by devices
>>
>> In this way, we can easily rollback if error happens.
>> How do you think of this solution, any suggestion ? I think we can achieve
>> a better way for sharing ideas. :)
> 
> Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
> have not looked at all your changes yet..), but in my mind, a hot-plug
> operation should be composed with the following 3 phases.
> 
> 1. Validate phase - Verify if the request is a supported operation.  All
> known restrictions are verified at this phase.  For instance, if a
> hot-remove request involves kernel memory, it is failed in this phase.
> Since this phase makes no change, no rollback is necessary to fail.  
> 
> 2. Execute phase - Perform hot-add / hot-remove operation that can be
> rolled-back in case of error or cancel.
> 
> 3. Commit phase - Perform the final hot-add / hot-remove operation that
> cannot be rolled-back.  No error / cancel is allowed in this phase.  For
> instance, eject operation is performed at this phase.  
Hi Toshi,
There are one more step needed. Linux provides sysfs interfaces to
online/offline CPU/memory sections, so we need to protect from concurrent
operations from those interfaces when doing physical hotplug. Think about
following sequence:
Thread 1
1. validate conditions for hot-removal
2. offline memory section A
3.  online memory section A 

4. offline memory section B
5 hot-remove memory device hosting A and B.
Regards!
Gerry
> 
> 
> Thanks,
> -Toshi
> 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-06 Thread Jiang Liu
On 11/29/2012 02:41 AM, Toshi Kani wrote:
 On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
 On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
 As discussed in https://patchwork.kernel.org/patch/1581581/
 the driver core remove function needs to always succeed. This means we need
 to know that the device can be successfully removed before acpi_bus_trim / 
 acpi_bus_hot_remove_device are called. This can cause panics when 
 OSPM-initiated
 or SCI-initiated eject of memory devices fail e.g with:
 echo 1 /sys/bus/pci/devices/PNP0C80:XX/eject

 since the ACPI core goes ahead and ejects the device regardless of whether 
 the
 the memory is still in use or not.

 For this reason a new acpi_device operation called prepare_remove is 
 introduced.
 This operation should be registered for acpi devices whose removal (from 
 kernel
 perspective) can fail.  Memory devices fall in this category.

 acpi_bus_remove() is changed to handle removal in 2 steps:
 - preparation for removal i.e. perform part of removal that can fail. Should
   succeed for device and all its children.
 - if above step was successfull, proceed to actual device removal

 Hi Vasilis,
 We met the same problem when we doing computer node hotplug, It is a good 
 idea
 to introduce prepare_remove before actual device removal.

 I think we could do more in prepare_remove, such as rollback. In most cases, 
 we can
 offline most of memory sections except kernel used pages now, should we 
 rollback
 and online the memory sections when prepare_remove failed ?
 
 I think hot-plug operation should have all-or-nothing semantics.  That
 is, an operation should either complete successfully, or rollback to the
 original state.
 
 As you may know, the ACPI based hotplug framework we are working on already 
 addressed
 this problem, and the way we slove this problem is a bit like yours.

 We introduce hp_ops in struct acpi_device_ops:
 struct acpi_device_ops {
  acpi_op_add add;
  acpi_op_remove remove;
  acpi_op_start start;
  acpi_op_bind bind;
  acpi_op_unbind unbind;
  acpi_op_notify notify;
 #ifdef   CONFIG_ACPI_HOTPLUG
  struct acpihp_dev_ops *hp_ops;
 #endif   /* CONFIG_ACPI_HOTPLUG */
 };

 in hp_ops, we divide the prepare_remove into six small steps, that is:
 1) pre_release(): optional step to mark device going to be removed/busy
 2) release(): reclaim device from running system
 3) post_release(): rollback if cancelled by user or error happened
 4) pre_unconfigure(): optional step to solve possible dependency issue
 5) unconfigure(): remove devices from running system
 6) post_unconfigure(): free resources used by devices

 In this way, we can easily rollback if error happens.
 How do you think of this solution, any suggestion ? I think we can achieve
 a better way for sharing ideas. :)
 
 Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
 have not looked at all your changes yet..), but in my mind, a hot-plug
 operation should be composed with the following 3 phases.
 
 1. Validate phase - Verify if the request is a supported operation.  All
 known restrictions are verified at this phase.  For instance, if a
 hot-remove request involves kernel memory, it is failed in this phase.
 Since this phase makes no change, no rollback is necessary to fail.  
 
 2. Execute phase - Perform hot-add / hot-remove operation that can be
 rolled-back in case of error or cancel.
 
 3. Commit phase - Perform the final hot-add / hot-remove operation that
 cannot be rolled-back.  No error / cancel is allowed in this phase.  For
 instance, eject operation is performed at this phase.  
Hi Toshi,
There are one more step needed. Linux provides sysfs interfaces to
online/offline CPU/memory sections, so we need to protect from concurrent
operations from those interfaces when doing physical hotplug. Think about
following sequence:
Thread 1
1. validate conditions for hot-removal
2. offline memory section A
3.  online memory section A 

4. offline memory section B
5 hot-remove memory device hosting A and B.
Regards!
Gerry
 
 
 Thanks,
 -Toshi
 
 
 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-acpi in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-06 Thread Toshi Kani
On Fri, 2012-12-07 at 00:00 +0800, Jiang Liu wrote:
 On 11/29/2012 02:41 AM, Toshi Kani wrote:
  On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
  On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
  As discussed in https://patchwork.kernel.org/patch/1581581/
  the driver core remove function needs to always succeed. This means we 
  need
  to know that the device can be successfully removed before acpi_bus_trim 
  / 
  acpi_bus_hot_remove_device are called. This can cause panics when 
  OSPM-initiated
  or SCI-initiated eject of memory devices fail e.g with:
  echo 1 /sys/bus/pci/devices/PNP0C80:XX/eject
 
  since the ACPI core goes ahead and ejects the device regardless of 
  whether the
  the memory is still in use or not.
 
  For this reason a new acpi_device operation called prepare_remove is 
  introduced.
  This operation should be registered for acpi devices whose removal (from 
  kernel
  perspective) can fail.  Memory devices fall in this category.
 
  acpi_bus_remove() is changed to handle removal in 2 steps:
  - preparation for removal i.e. perform part of removal that can fail. 
  Should
succeed for device and all its children.
  - if above step was successfull, proceed to actual device removal
 
  Hi Vasilis,
  We met the same problem when we doing computer node hotplug, It is a good 
  idea
  to introduce prepare_remove before actual device removal.
 
  I think we could do more in prepare_remove, such as rollback. In most 
  cases, we can
  offline most of memory sections except kernel used pages now, should we 
  rollback
  and online the memory sections when prepare_remove failed ?
  
  I think hot-plug operation should have all-or-nothing semantics.  That
  is, an operation should either complete successfully, or rollback to the
  original state.
  
  As you may know, the ACPI based hotplug framework we are working on 
  already addressed
  this problem, and the way we slove this problem is a bit like yours.
 
  We introduce hp_ops in struct acpi_device_ops:
  struct acpi_device_ops {
 acpi_op_add add;
 acpi_op_remove remove;
 acpi_op_start start;
 acpi_op_bind bind;
 acpi_op_unbind unbind;
 acpi_op_notify notify;
  #ifdef CONFIG_ACPI_HOTPLUG
 struct acpihp_dev_ops *hp_ops;
  #endif /* CONFIG_ACPI_HOTPLUG */
  };
 
  in hp_ops, we divide the prepare_remove into six small steps, that is:
  1) pre_release(): optional step to mark device going to be removed/busy
  2) release(): reclaim device from running system
  3) post_release(): rollback if cancelled by user or error happened
  4) pre_unconfigure(): optional step to solve possible dependency issue
  5) unconfigure(): remove devices from running system
  6) post_unconfigure(): free resources used by devices
 
  In this way, we can easily rollback if error happens.
  How do you think of this solution, any suggestion ? I think we can achieve
  a better way for sharing ideas. :)
  
  Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
  have not looked at all your changes yet..), but in my mind, a hot-plug
  operation should be composed with the following 3 phases.
  
  1. Validate phase - Verify if the request is a supported operation.  All
  known restrictions are verified at this phase.  For instance, if a
  hot-remove request involves kernel memory, it is failed in this phase.
  Since this phase makes no change, no rollback is necessary to fail.  
  
  2. Execute phase - Perform hot-add / hot-remove operation that can be
  rolled-back in case of error or cancel.
  
  3. Commit phase - Perform the final hot-add / hot-remove operation that
  cannot be rolled-back.  No error / cancel is allowed in this phase.  For
  instance, eject operation is performed at this phase.  
 Hi Toshi,
   There are one more step needed. Linux provides sysfs interfaces to
 online/offline CPU/memory sections, so we need to protect from concurrent
 operations from those interfaces when doing physical hotplug. Think about
 following sequence:
 Thread 1
 1. validate conditions for hot-removal
 2. offline memory section A
 3.online memory section A 
 
 4. offline memory section B
 5 hot-remove memory device hosting A and B.

Hi Gerry,

I agree.  And I am working on a proposal that tries to address this
issue by integrating both sysfs and hotplug operations into a framework.


Thanks,
-Toshi

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-06 Thread Jiang Liu
On 12/07/2012 12:03 AM, Toshi Kani wrote:
 On Fri, 2012-12-07 at 00:00 +0800, Jiang Liu wrote:
 On 11/29/2012 02:41 AM, Toshi Kani wrote:
 On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
 On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
 As discussed in https://patchwork.kernel.org/patch/1581581/
 the driver core remove function needs to always succeed. This means we 
 need
 to know that the device can be successfully removed before acpi_bus_trim 
 / 
 acpi_bus_hot_remove_device are called. This can cause panics when 
 OSPM-initiated
 or SCI-initiated eject of memory devices fail e.g with:
 echo 1 /sys/bus/pci/devices/PNP0C80:XX/eject

 since the ACPI core goes ahead and ejects the device regardless of 
 whether the
 the memory is still in use or not.

 For this reason a new acpi_device operation called prepare_remove is 
 introduced.
 This operation should be registered for acpi devices whose removal (from 
 kernel
 perspective) can fail.  Memory devices fall in this category.

 acpi_bus_remove() is changed to handle removal in 2 steps:
 - preparation for removal i.e. perform part of removal that can fail. 
 Should
   succeed for device and all its children.
 - if above step was successfull, proceed to actual device removal

 Hi Vasilis,
 We met the same problem when we doing computer node hotplug, It is a good 
 idea
 to introduce prepare_remove before actual device removal.

 I think we could do more in prepare_remove, such as rollback. In most 
 cases, we can
 offline most of memory sections except kernel used pages now, should we 
 rollback
 and online the memory sections when prepare_remove failed ?

 I think hot-plug operation should have all-or-nothing semantics.  That
 is, an operation should either complete successfully, or rollback to the
 original state.

 As you may know, the ACPI based hotplug framework we are working on 
 already addressed
 this problem, and the way we slove this problem is a bit like yours.

 We introduce hp_ops in struct acpi_device_ops:
 struct acpi_device_ops {
acpi_op_add add;
acpi_op_remove remove;
acpi_op_start start;
acpi_op_bind bind;
acpi_op_unbind unbind;
acpi_op_notify notify;
 #ifdef CONFIG_ACPI_HOTPLUG
struct acpihp_dev_ops *hp_ops;
 #endif /* CONFIG_ACPI_HOTPLUG */
 };

 in hp_ops, we divide the prepare_remove into six small steps, that is:
 1) pre_release(): optional step to mark device going to be removed/busy
 2) release(): reclaim device from running system
 3) post_release(): rollback if cancelled by user or error happened
 4) pre_unconfigure(): optional step to solve possible dependency issue
 5) unconfigure(): remove devices from running system
 6) post_unconfigure(): free resources used by devices

 In this way, we can easily rollback if error happens.
 How do you think of this solution, any suggestion ? I think we can achieve
 a better way for sharing ideas. :)

 Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
 have not looked at all your changes yet..), but in my mind, a hot-plug
 operation should be composed with the following 3 phases.

 1. Validate phase - Verify if the request is a supported operation.  All
 known restrictions are verified at this phase.  For instance, if a
 hot-remove request involves kernel memory, it is failed in this phase.
 Since this phase makes no change, no rollback is necessary to fail.  

 2. Execute phase - Perform hot-add / hot-remove operation that can be
 rolled-back in case of error or cancel.

 3. Commit phase - Perform the final hot-add / hot-remove operation that
 cannot be rolled-back.  No error / cancel is allowed in this phase.  For
 instance, eject operation is performed at this phase.  
 Hi Toshi,
  There are one more step needed. Linux provides sysfs interfaces to
 online/offline CPU/memory sections, so we need to protect from concurrent
 operations from those interfaces when doing physical hotplug. Think about
 following sequence:
 Thread 1
 1. validate conditions for hot-removal
 2. offline memory section A
 3.   online memory section A 
 
 4. offline memory section B
 5 hot-remove memory device hosting A and B.
 
 Hi Gerry,
 
 I agree.  And I am working on a proposal that tries to address this
 issue by integrating both sysfs and hotplug operations into a framework.
Hi Toshi,
But the sysfs for CPU and memory online/offline are platform independent
interfaces, and the ACPI based hotplug is platform dependent interfaces. I'm not
sure whether it's feasible to merge them. For example we still need offline 
interface
to stop using faulty CPUs on platform without physical hotplug capabilities.
We have solved this by adding a busy flag to the device, so the sysfs
will just return -EBUSY if the busy flag is set.

Regards!
Gerry

 
 
 Thanks,
 -Toshi
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to 

Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-06 Thread Toshi Kani
On Fri, 2012-12-07 at 00:25 +0800, Jiang Liu wrote:
 On 12/07/2012 12:03 AM, Toshi Kani wrote:
  On Fri, 2012-12-07 at 00:00 +0800, Jiang Liu wrote:
  On 11/29/2012 02:41 AM, Toshi Kani wrote:
  On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
 : 
  Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
  have not looked at all your changes yet..), but in my mind, a hot-plug
  operation should be composed with the following 3 phases.
 
  1. Validate phase - Verify if the request is a supported operation.  All
  known restrictions are verified at this phase.  For instance, if a
  hot-remove request involves kernel memory, it is failed in this phase.
  Since this phase makes no change, no rollback is necessary to fail.  
 
  2. Execute phase - Perform hot-add / hot-remove operation that can be
  rolled-back in case of error or cancel.
 
  3. Commit phase - Perform the final hot-add / hot-remove operation that
  cannot be rolled-back.  No error / cancel is allowed in this phase.  For
  instance, eject operation is performed at this phase.  
  Hi Toshi,
 There are one more step needed. Linux provides sysfs interfaces to
  online/offline CPU/memory sections, so we need to protect from concurrent
  operations from those interfaces when doing physical hotplug. Think about
  following sequence:
  Thread 1
  1. validate conditions for hot-removal
  2. offline memory section A
  3. online memory section A 
  
  4. offline memory section B
  5 hot-remove memory device hosting A and B.
  
  Hi Gerry,
  
  I agree.  And I am working on a proposal that tries to address this
  issue by integrating both sysfs and hotplug operations into a framework.
 Hi Toshi,
   But the sysfs for CPU and memory online/offline are platform independent
 interfaces, and the ACPI based hotplug is platform dependent interfaces. I'm 
 not
 sure whether it's feasible to merge them. For example we still need offline 
 interface
 to stop using faulty CPUs on platform without physical hotplug capabilities.
   We have solved this by adding a busy flag to the device, so the sysfs
 will just return -EBUSY if the busy flag is set.

I am making the framework code platform-independent so that it can
handle both cases.  Well, I am still prototyping, so hopefully it will
work. :)

Thanks,
-Toshi

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-06 Thread Jiang Liu
On 12/04/2012 08:10 AM, Toshi Kani wrote:
 On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
 On 2012/11/30 6:27, Toshi Kani wrote:
 On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
 On 2012/11/29 2:41, Toshi Kani wrote:
 On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
 On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
 As you may know, the ACPI based hotplug framework we are working on 
 already addressed
 this problem, and the way we slove this problem is a bit like yours.

 We introduce hp_ops in struct acpi_device_ops:
 struct acpi_device_ops {
  acpi_op_add add;
  acpi_op_remove remove;
  acpi_op_start start;
  acpi_op_bind bind;
  acpi_op_unbind unbind;
  acpi_op_notify notify;
 #ifdef   CONFIG_ACPI_HOTPLUG
  struct acpihp_dev_ops *hp_ops;
 #endif   /* CONFIG_ACPI_HOTPLUG */
 };

 in hp_ops, we divide the prepare_remove into six small steps, that is:
 1) pre_release(): optional step to mark device going to be removed/busy
 2) release(): reclaim device from running system
 3) post_release(): rollback if cancelled by user or error happened
 4) pre_unconfigure(): optional step to solve possible dependency issue
 5) unconfigure(): remove devices from running system
 6) post_unconfigure(): free resources used by devices

 In this way, we can easily rollback if error happens.
 How do you think of this solution, any suggestion ? I think we can 
 achieve
 a better way for sharing ideas. :)

 Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
 have not looked at all your changes yet..), but in my mind, a hot-plug
 operation should be composed with the following 3 phases.

 Good idea ! we also implement a hot-plug operation in 3 phases:
 1) acpihp_drv_pre_execute
 2) acpihp_drv_execute
 3) acpihp_drv_post_execute
 you may refer to :
 https://lkml.org/lkml/2012/11/4/79

 Great.  Yes, I will take a look.

 Thanks, any comments are welcomed :)
 
 If I read the code right, the framework calls ACPI drivers differently
 at boot-time and hot-add as follows.  That is, the new entry points are
 called at hot-add only, but .add() is called at both cases.  This
 requires .add() to work differently.
 
 Boot: .add()
 Hot-Add : .add(), .pre_configure(), configure(), etc.
 
 I think the boot-time and hot-add initialization should be done
 consistently.  While there is difficulty with the current boot sequence,
 the framework should be designed to allow them consistent, not make them
 diverged.
Hi Toshi,
We have separated hotplug operations from driver binding/unbinding 
interface
due to following considerations.
1) Physical CPU and memory devices are initialized/used before the ACPI 
subsystem
   is initialized. So under normal case, .add() of processor and 
acpi_memhotplug only
   figures out information about device already in working state instead of 
starting
   the device.
2) It's impossible to rmmod the processor and acpi_memhotplug driver at runtime 
   if .remove() of CPU and memory drivers do really remove the CPU/memory device
   from the system. And the ACPI processor driver also implements CPU PM 
funcitonality
   other than hotplug.

And recently Rafael has mentioned that he has a long term view to get rid of the
concept of ACPI device. If that happens, we could easily move the hotplug
logic from ACPI device drivers into the hotplug framework if the hotplug logic
is separated from the .add()/.remove() callbacks. Actually we could even move 
all
hotplug only logic into the hotplug framework and don't rely on any ACPI device
driver any more. So we could get rid of all these messy things. We could achieve
that by:
1) moving code shared by ACPI device drivers and the hotplug framework into the 
core.
2) moving hotplug only code to the framework.

Hi Rafael, what's your thoughts here?

 
 1. Validate phase - Verify if the request is a supported operation.  All
 known restrictions are verified at this phase.  For instance, if a
 hot-remove request involves kernel memory, it is failed in this phase.
 Since this phase makes no change, no rollback is necessary to fail. 

 Yes, we have done this in acpihp_drv_pre_execute, and check following 
 things:

 1) Hot-plugble or not. the instance kernel memory you mentioned is also 
 checked
when memory device remove;

 Agreed.

 2) Dependency check involved. For instance, if hot-add a memory device,
processor should be added first, otherwise it's not valid to this 
 operation.

 I think FW should be the one that assures such dependency.  That is,
 when a memory device object is marked as present/enabled/functioning, it
 should be ready for the OS to use.

 Yes, BIOS should do something for the dependency, because BIOS knows the
 actual hardware topology. 
 
 Right.
 
 The ACPI specification provides _EDL method to
 tell OS the eject device list, but still has no method to tell OS the add 
 device
 list now.
 
 Yes, but I do not think the OS needs special handling for add...
We have a plan to support triggering hot-adding events 

Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-06 Thread Jiang Liu
On 12/05/2012 07:23 AM, Toshi Kani wrote:
 On Tue, 2012-12-04 at 17:16 +0800, Hanjun Guo wrote:
 On 2012/12/4 8:10, Toshi Kani wrote:
 On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
 On 2012/11/30 6:27, Toshi Kani wrote:
 On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
 On 2012/11/29 2:41, Toshi Kani wrote:
 On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
 On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
 As you may know, the ACPI based hotplug framework we are working on 
 already addressed
 this problem, and the way we slove this problem is a bit like yours.

 We introduce hp_ops in struct acpi_device_ops:
 struct acpi_device_ops {
acpi_op_add add;
acpi_op_remove remove;
acpi_op_start start;
acpi_op_bind bind;
acpi_op_unbind unbind;
acpi_op_notify notify;
 #ifdef CONFIG_ACPI_HOTPLUG
struct acpihp_dev_ops *hp_ops;
 #endif /* CONFIG_ACPI_HOTPLUG */
 };

 in hp_ops, we divide the prepare_remove into six small steps, that is:
 1) pre_release(): optional step to mark device going to be removed/busy
 2) release(): reclaim device from running system
 3) post_release(): rollback if cancelled by user or error happened
 4) pre_unconfigure(): optional step to solve possible dependency issue
 5) unconfigure(): remove devices from running system
 6) post_unconfigure(): free resources used by devices

 In this way, we can easily rollback if error happens.
 How do you think of this solution, any suggestion ? I think we can 
 achieve
 a better way for sharing ideas. :)

 Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
 have not looked at all your changes yet..), but in my mind, a hot-plug
 operation should be composed with the following 3 phases.

 Good idea ! we also implement a hot-plug operation in 3 phases:
 1) acpihp_drv_pre_execute
 2) acpihp_drv_execute
 3) acpihp_drv_post_execute
 you may refer to :
 https://lkml.org/lkml/2012/11/4/79

 Great.  Yes, I will take a look.

 Thanks, any comments are welcomed :)

 If I read the code right, the framework calls ACPI drivers differently
 at boot-time and hot-add as follows.  That is, the new entry points are
 called at hot-add only, but .add() is called at both cases.  This
 requires .add() to work differently.

 Hi Toshi,
 Thanks for your comments!


 Boot: .add()

 Actually, at boot time: .add(), .start()
 
 Right.
 
 Hot-Add : .add(), .pre_configure(), configure(), etc.

 Yes, we did it as you said in the framework. We use .pre_configure(), 
 configure(),
 and post_configure() to instead of .start() for better error handling and 
 recovery.
 
 I think we should have hot-plug interfaces at the module level, not at
 the ACPI-internal level.  In this way, the interfaces can be
 platform-neutral and allow any modules to register, which makes it more
 consistent with the boot-up sequence.  It can also allow ordering of the
 sequence among the registered modules.  Right now, we initiate all
 procedures from ACPI during hot-plug, which I think is inflexible and
 steps into other module's role.
 
 I am also concerned about the slot handling, which is the core piece of
 the infrastructure and only allows hot-plug operations on ACPI objects
 where slot objects are previously created by checking _EJ0.  The
 infrastructure should allow hot-plug operations on any objects, and it
 should not be dependent on the slot design.
 
 I have some rough idea, and it may be easier to review / explain if I
 make some code changes.  So, let me prototype it, and send it you all if
 that works out.  Hopefully, it won't take too long.
 
 I think the boot-time and hot-add initialization should be done
 consistently.  While there is difficulty with the current boot sequence,
 the framework should be designed to allow them consistent, not make them
 diverged.

 1. Validate phase - Verify if the request is a supported operation.  All
 known restrictions are verified at this phase.  For instance, if a
 hot-remove request involves kernel memory, it is failed in this phase.
 Since this phase makes no change, no rollback is necessary to fail. 

 Yes, we have done this in acpihp_drv_pre_execute, and check following 
 things:

 1) Hot-plugble or not. the instance kernel memory you mentioned is also 
 checked
when memory device remove;

 Agreed.

 2) Dependency check involved. For instance, if hot-add a memory device,
processor should be added first, otherwise it's not valid to this 
 operation.

 I think FW should be the one that assures such dependency.  That is,
 when a memory device object is marked as present/enabled/functioning, it
 should be ready for the OS to use.

 Yes, BIOS should do something for the dependency, because BIOS knows the
 actual hardware topology. 

 Right.

 The ACPI specification provides _EDL method to
 tell OS the eject device list, but still has no method to tell OS the add 
 device
 list now.

 Yes, but I do not think the OS needs special handling for add...

 Hmm, how 

Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-06 Thread Jiang Liu
On 12/07/2012 12:31 AM, Toshi Kani wrote:
 On Fri, 2012-12-07 at 00:25 +0800, Jiang Liu wrote:
 On 12/07/2012 12:03 AM, Toshi Kani wrote:
 On Fri, 2012-12-07 at 00:00 +0800, Jiang Liu wrote:
 On 11/29/2012 02:41 AM, Toshi Kani wrote:
 On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
  : 
 Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
 have not looked at all your changes yet..), but in my mind, a hot-plug
 operation should be composed with the following 3 phases.

 1. Validate phase - Verify if the request is a supported operation.  All
 known restrictions are verified at this phase.  For instance, if a
 hot-remove request involves kernel memory, it is failed in this phase.
 Since this phase makes no change, no rollback is necessary to fail.  

 2. Execute phase - Perform hot-add / hot-remove operation that can be
 rolled-back in case of error or cancel.

 3. Commit phase - Perform the final hot-add / hot-remove operation that
 cannot be rolled-back.  No error / cancel is allowed in this phase.  For
 instance, eject operation is performed at this phase.  
 Hi Toshi,
There are one more step needed. Linux provides sysfs interfaces to
 online/offline CPU/memory sections, so we need to protect from concurrent
 operations from those interfaces when doing physical hotplug. Think about
 following sequence:
 Thread 1
 1. validate conditions for hot-removal
 2. offline memory section A
 3. online memory section A 
 
 4. offline memory section B
 5 hot-remove memory device hosting A and B.

 Hi Gerry,

 I agree.  And I am working on a proposal that tries to address this
 issue by integrating both sysfs and hotplug operations into a framework.
 Hi Toshi,
  But the sysfs for CPU and memory online/offline are platform independent
 interfaces, and the ACPI based hotplug is platform dependent interfaces. I'm 
 not
 sure whether it's feasible to merge them. For example we still need offline 
 interface
 to stop using faulty CPUs on platform without physical hotplug capabilities.
  We have solved this by adding a busy flag to the device, so the sysfs
 will just return -EBUSY if the busy flag is set.
 
 I am making the framework code platform-independent so that it can
 handle both cases.  Well, I am still prototyping, so hopefully it will
 work. :)
Do you mean implementing a framework to manage hotplug of any type of devices?
That sounds like a huge plan:)

Otherwise there may be a gap. CPU online/offline interface deals with logical
CPU, and hotplug driver deals with physical devices(processor). They may be 
different
by related objects.

 
 Thanks,
 -Toshi
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-06 Thread Jiang Liu
On 11/29/2012 06:15 PM, Rafael J. Wysocki wrote:
 On Wednesday, November 28, 2012 11:41:36 AM Toshi Kani wrote:
 On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
 On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
 As discussed in https://patchwork.kernel.org/patch/1581581/
 the driver core remove function needs to always succeed. This means we need
 to know that the device can be successfully removed before acpi_bus_trim / 
 acpi_bus_hot_remove_device are called. This can cause panics when 
 OSPM-initiated
 or SCI-initiated eject of memory devices fail e.g with:
 echo 1 /sys/bus/pci/devices/PNP0C80:XX/eject

 since the ACPI core goes ahead and ejects the device regardless of whether 
 the
 the memory is still in use or not.

 For this reason a new acpi_device operation called prepare_remove is 
 introduced.
 This operation should be registered for acpi devices whose removal (from 
 kernel
 perspective) can fail.  Memory devices fall in this category.

 acpi_bus_remove() is changed to handle removal in 2 steps:
 - preparation for removal i.e. perform part of removal that can fail. 
 Should
   succeed for device and all its children.
 - if above step was successfull, proceed to actual device removal

 Hi Vasilis,
 We met the same problem when we doing computer node hotplug, It is a good 
 idea
 to introduce prepare_remove before actual device removal.

 I think we could do more in prepare_remove, such as rollback. In most 
 cases, we can
 offline most of memory sections except kernel used pages now, should we 
 rollback
 and online the memory sections when prepare_remove failed ?

 I think hot-plug operation should have all-or-nothing semantics.  That
 is, an operation should either complete successfully, or rollback to the
 original state.
 
 That's correct.
 
 As you may know, the ACPI based hotplug framework we are working on already 
 addressed
 this problem, and the way we slove this problem is a bit like yours.

 We introduce hp_ops in struct acpi_device_ops:
 struct acpi_device_ops {
 acpi_op_add add;
 acpi_op_remove remove;
 acpi_op_start start;
 acpi_op_bind bind;
 acpi_op_unbind unbind;
 acpi_op_notify notify;
 #ifdef  CONFIG_ACPI_HOTPLUG
 struct acpihp_dev_ops *hp_ops;
 #endif  /* CONFIG_ACPI_HOTPLUG */
 };

 in hp_ops, we divide the prepare_remove into six small steps, that is:
 1) pre_release(): optional step to mark device going to be removed/busy
 2) release(): reclaim device from running system
 3) post_release(): rollback if cancelled by user or error happened
 4) pre_unconfigure(): optional step to solve possible dependency issue
 5) unconfigure(): remove devices from running system
 6) post_unconfigure(): free resources used by devices

 In this way, we can easily rollback if error happens.
 How do you think of this solution, any suggestion ? I think we can achieve
 a better way for sharing ideas. :)

 Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
 have not looked at all your changes yet..), but in my mind, a hot-plug
 operation should be composed with the following 3 phases.

 1. Validate phase - Verify if the request is a supported operation.  All
 known restrictions are verified at this phase.  For instance, if a
 hot-remove request involves kernel memory, it is failed in this phase.
 Since this phase makes no change, no rollback is necessary to fail.  
 
 Actually, we can't do it this way, because the conditions may change between
 the check and the execution.  So the first phase needs to involve execution
 to some extent, although only as far as it remains reversible.
Hi Rafael,
A possible way to solve this issue is:
1) mark device busy
2) check condition and mark device as normal if condition check fails.
3) reclaim the device and mark device as normal if reclaim fails.
4) remove the device.

 
 2. Execute phase - Perform hot-add / hot-remove operation that can be
 rolled-back in case of error or cancel.
 
 I would just merge 1 and 2.
 
 3. Commit phase - Perform the final hot-add / hot-remove operation that
 cannot be rolled-back.  No error / cancel is allowed in this phase.  For
 instance, eject operation is performed at this phase.  
 
 Yup.
 
 Thanks,
 Rafael
 
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-06 Thread Jiang Liu
On 11/29/2012 07:36 PM, Vasilis Liaskovitis wrote:
 On Thu, Nov 29, 2012 at 11:15:31AM +0100, Rafael J. Wysocki wrote:
 On Wednesday, November 28, 2012 11:41:36 AM Toshi Kani wrote:
 On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
 We met the same problem when we doing computer node hotplug, It is a good 
 idea
 to introduce prepare_remove before actual device removal.

 I think we could do more in prepare_remove, such as rollback. In most 
 cases, we can
 offline most of memory sections except kernel used pages now, should we 
 rollback
 and online the memory sections when prepare_remove failed ?

 I think hot-plug operation should have all-or-nothing semantics.  That
 is, an operation should either complete successfully, or rollback to the
 original state.

 That's correct.

 As you may know, the ACPI based hotplug framework we are working on 
 already addressed
 this problem, and the way we slove this problem is a bit like yours.

 We introduce hp_ops in struct acpi_device_ops:
 struct acpi_device_ops {
acpi_op_add add;
acpi_op_remove remove;
acpi_op_start start;
acpi_op_bind bind;
acpi_op_unbind unbind;
acpi_op_notify notify;
 #ifdef CONFIG_ACPI_HOTPLUG
struct acpihp_dev_ops *hp_ops;
 #endif /* CONFIG_ACPI_HOTPLUG */
 };

 in hp_ops, we divide the prepare_remove into six small steps, that is:
 1) pre_release(): optional step to mark device going to be removed/busy
 2) release(): reclaim device from running system
 3) post_release(): rollback if cancelled by user or error happened
 4) pre_unconfigure(): optional step to solve possible dependency issue
 5) unconfigure(): remove devices from running system
 6) post_unconfigure(): free resources used by devices

 In this way, we can easily rollback if error happens.
 How do you think of this solution, any suggestion ? I think we can achieve
 a better way for sharing ideas. :)

 Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
 have not looked at all your changes yet..), but in my mind, a hot-plug
 operation should be composed with the following 3 phases.

 1. Validate phase - Verify if the request is a supported operation.  All
 known restrictions are verified at this phase.  For instance, if a
 hot-remove request involves kernel memory, it is failed in this phase.
 Since this phase makes no change, no rollback is necessary to fail.  

 Actually, we can't do it this way, because the conditions may change between
 the check and the execution.  So the first phase needs to involve execution
 to some extent, although only as far as it remains reversible.

 2. Execute phase - Perform hot-add / hot-remove operation that can be
 rolled-back in case of error or cancel.

 I would just merge 1 and 2.
 
 I agree steps 1 and 2 can be merged, at least for the current ACPI framework.
 E.g. for memory hotplug, the mm function we call for memory removal
 (remove_memory) handles both these steps.
 
 The new ACPI framework could perhaps expand the operations as Hanjun 
 described,
 if it makes sense.
Hi Vasilis,
We have worked some prototypes to split the memory hotplug logic in 
mem_hotplug.c
into minor steps, so it would be easier for error handling/cancellation. But we 
still
need to improve the code quality and merge with changes from Fujitsu.
Regards!

 
 thanks,
 
 - Vasilis
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-06 Thread Jiang Liu
On 11/30/2012 01:03 AM, Toshi Kani wrote:
 On Thu, 2012-11-29 at 11:15 +0100, Rafael J. Wysocki wrote:
 On Wednesday, November 28, 2012 11:41:36 AM Toshi Kani wrote:
 On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
 On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
 As discussed in https://patchwork.kernel.org/patch/1581581/
 the driver core remove function needs to always succeed. This means we 
 need
 to know that the device can be successfully removed before acpi_bus_trim 
 / 
 acpi_bus_hot_remove_device are called. This can cause panics when 
 OSPM-initiated
 or SCI-initiated eject of memory devices fail e.g with:
 echo 1 /sys/bus/pci/devices/PNP0C80:XX/eject

 since the ACPI core goes ahead and ejects the device regardless of 
 whether the
 the memory is still in use or not.

 For this reason a new acpi_device operation called prepare_remove is 
 introduced.
 This operation should be registered for acpi devices whose removal (from 
 kernel
 perspective) can fail.  Memory devices fall in this category.

 acpi_bus_remove() is changed to handle removal in 2 steps:
 - preparation for removal i.e. perform part of removal that can fail. 
 Should
   succeed for device and all its children.
 - if above step was successfull, proceed to actual device removal

 Hi Vasilis,
 We met the same problem when we doing computer node hotplug, It is a good 
 idea
 to introduce prepare_remove before actual device removal.

 I think we could do more in prepare_remove, such as rollback. In most 
 cases, we can
 offline most of memory sections except kernel used pages now, should we 
 rollback
 and online the memory sections when prepare_remove failed ?

 I think hot-plug operation should have all-or-nothing semantics.  That
 is, an operation should either complete successfully, or rollback to the
 original state.

 That's correct.

 As you may know, the ACPI based hotplug framework we are working on 
 already addressed
 this problem, and the way we slove this problem is a bit like yours.

 We introduce hp_ops in struct acpi_device_ops:
 struct acpi_device_ops {
acpi_op_add add;
acpi_op_remove remove;
acpi_op_start start;
acpi_op_bind bind;
acpi_op_unbind unbind;
acpi_op_notify notify;
 #ifdef CONFIG_ACPI_HOTPLUG
struct acpihp_dev_ops *hp_ops;
 #endif /* CONFIG_ACPI_HOTPLUG */
 };

 in hp_ops, we divide the prepare_remove into six small steps, that is:
 1) pre_release(): optional step to mark device going to be removed/busy
 2) release(): reclaim device from running system
 3) post_release(): rollback if cancelled by user or error happened
 4) pre_unconfigure(): optional step to solve possible dependency issue
 5) unconfigure(): remove devices from running system
 6) post_unconfigure(): free resources used by devices

 In this way, we can easily rollback if error happens.
 How do you think of this solution, any suggestion ? I think we can achieve
 a better way for sharing ideas. :)

 Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
 have not looked at all your changes yet..), but in my mind, a hot-plug
 operation should be composed with the following 3 phases.

 1. Validate phase - Verify if the request is a supported operation.  All
 known restrictions are verified at this phase.  For instance, if a
 hot-remove request involves kernel memory, it is failed in this phase.
 Since this phase makes no change, no rollback is necessary to fail.  

 Actually, we can't do it this way, because the conditions may change between
 the check and the execution.  So the first phase needs to involve execution
 to some extent, although only as far as it remains reversible.
 
 For memory hot-remove, we can check if the target memory ranges are
 within ZONE_MOVABLE.  We should not allow user to change this setup
 during hot-remove operation.  Other things may be to check if a target
 node contains cpu0 (until it is supported), the console UART (assuming
 we cannot delete it), etc.  We should avoid doing rollback as much as we
 can.
Fengguang from Intel is working on a patchset to hot-remove CPU0(BSP)
on x86 platforms and he has posted several versions. Maybe we could eventually
remove CPU0 on x86.

 
 Thanks,
 -Toshi
 
 
 2. Execute phase - Perform hot-add / hot-remove operation that can be
 rolled-back in case of error or cancel.

 I would just merge 1 and 2.

 3. Commit phase - Perform the final hot-add / hot-remove operation that
 cannot be rolled-back.  No error / cancel is allowed in this phase.  For
 instance, eject operation is performed at this phase.  

 Yup.

 Thanks,
 Rafael


 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at 

Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-06 Thread Jiang Liu
On 11/30/2012 04:30 AM, Rafael J. Wysocki wrote:
 On Thursday, November 29, 2012 10:03:12 AM Toshi Kani wrote:
 On Thu, 2012-11-29 at 11:15 +0100, Rafael J. Wysocki wrote:
 On Wednesday, November 28, 2012 11:41:36 AM Toshi Kani wrote:
 On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
 On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
 As discussed in https://patchwork.kernel.org/patch/1581581/
 the driver core remove function needs to always succeed. This means we 
 need
 to know that the device can be successfully removed before acpi_bus_trim 
 / 
 acpi_bus_hot_remove_device are called. This can cause panics when 
 OSPM-initiated
 or SCI-initiated eject of memory devices fail e.g with:
 echo 1 /sys/bus/pci/devices/PNP0C80:XX/eject

 since the ACPI core goes ahead and ejects the device regardless of 
 whether the
 the memory is still in use or not.

 For this reason a new acpi_device operation called prepare_remove is 
 introduced.
 This operation should be registered for acpi devices whose removal (from 
 kernel
 perspective) can fail.  Memory devices fall in this category.

 acpi_bus_remove() is changed to handle removal in 2 steps:
 - preparation for removal i.e. perform part of removal that can fail. 
 Should
   succeed for device and all its children.
 - if above step was successfull, proceed to actual device removal

 Hi Vasilis,
 We met the same problem when we doing computer node hotplug, It is a good 
 idea
 to introduce prepare_remove before actual device removal.

 I think we could do more in prepare_remove, such as rollback. In most 
 cases, we can
 offline most of memory sections except kernel used pages now, should we 
 rollback
 and online the memory sections when prepare_remove failed ?

 I think hot-plug operation should have all-or-nothing semantics.  That
 is, an operation should either complete successfully, or rollback to the
 original state.

 That's correct.

 As you may know, the ACPI based hotplug framework we are working on 
 already addressed
 this problem, and the way we slove this problem is a bit like yours.

 We introduce hp_ops in struct acpi_device_ops:
 struct acpi_device_ops {
   acpi_op_add add;
   acpi_op_remove remove;
   acpi_op_start start;
   acpi_op_bind bind;
   acpi_op_unbind unbind;
   acpi_op_notify notify;
 #ifdefCONFIG_ACPI_HOTPLUG
   struct acpihp_dev_ops *hp_ops;
 #endif/* CONFIG_ACPI_HOTPLUG */
 };

 in hp_ops, we divide the prepare_remove into six small steps, that is:
 1) pre_release(): optional step to mark device going to be removed/busy
 2) release(): reclaim device from running system
 3) post_release(): rollback if cancelled by user or error happened
 4) pre_unconfigure(): optional step to solve possible dependency issue
 5) unconfigure(): remove devices from running system
 6) post_unconfigure(): free resources used by devices

 In this way, we can easily rollback if error happens.
 How do you think of this solution, any suggestion ? I think we can achieve
 a better way for sharing ideas. :)

 Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
 have not looked at all your changes yet..), but in my mind, a hot-plug
 operation should be composed with the following 3 phases.

 1. Validate phase - Verify if the request is a supported operation.  All
 known restrictions are verified at this phase.  For instance, if a
 hot-remove request involves kernel memory, it is failed in this phase.
 Since this phase makes no change, no rollback is necessary to fail.  

 Actually, we can't do it this way, because the conditions may change between
 the check and the execution.  So the first phase needs to involve execution
 to some extent, although only as far as it remains reversible.

 For memory hot-remove, we can check if the target memory ranges are
 within ZONE_MOVABLE.  We should not allow user to change this setup
 during hot-remove operation.  Other things may be to check if a target
 node contains cpu0 (until it is supported), the console UART (assuming
 we cannot delete it), etc.  We should avoid doing rollback as much as we
 can.
 
 Yes, we can make some checks upfront as an optimization and fail early if
 the conditions are not met, but for correctness we need to repeat those
 checks later anyway.  Once we've decided to go for the eject, the conditions
 must hold whatever happens.
Hi Rafael,
Another reason for us to split hotplug operations into minor/tiny
steps is to support cancellation other than error handling. Theoretical
it may take infinite time to hot-remove a memory device, so we should provide
an interface for user to cancel ongoing hot-removal operations. Currently that's
done by timeout in the memory hot-remove code path, but I think it not the 
best solutions. We should provide choices to users:
1) wait for ever to remove a hot-removal operation
2) cancel an ongoing hot-removal operation if it takes too long

Regards!
Gerry
 
 Thanks,
 Rafael
 
 

--
To unsubscribe from this list: send the line 

Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-06 Thread Jiang Liu
On 11/30/2012 05:25 AM, Rafael J. Wysocki wrote:
 On Thursday, November 29, 2012 01:56:17 PM Toshi Kani wrote:
 On Thu, 2012-11-29 at 13:39 -0700, Toshi Kani wrote:
 On Thu, 2012-11-29 at 21:30 +0100, Rafael J. Wysocki wrote:
 On Thursday, November 29, 2012 10:03:12 AM Toshi Kani wrote:
 On Thu, 2012-11-29 at 11:15 +0100, Rafael J. Wysocki wrote:
 On Wednesday, November 28, 2012 11:41:36 AM Toshi Kani wrote:
 1. Validate phase - Verify if the request is a supported operation.  All
 known restrictions are verified at this phase.  For instance, if a
 hot-remove request involves kernel memory, it is failed in this phase.
 Since this phase makes no change, no rollback is necessary to fail.  

 Actually, we can't do it this way, because the conditions may change 
 between
 the check and the execution.  So the first phase needs to involve 
 execution
 to some extent, although only as far as it remains reversible.

 For memory hot-remove, we can check if the target memory ranges are
 within ZONE_MOVABLE.  We should not allow user to change this setup
 during hot-remove operation.  Other things may be to check if a target
 node contains cpu0 (until it is supported), the console UART (assuming
 we cannot delete it), etc.  We should avoid doing rollback as much as we
 can.

 Yes, we can make some checks upfront as an optimization and fail early if
 the conditions are not met, but for correctness we need to repeat those
 checks later anyway.  Once we've decided to go for the eject, the 
 conditions
 must hold whatever happens.

 Agreed.

 BTW, it is not an optimization I am after for this phase.  There are
 many error cases during hot-plug operations.  It is difficult to assure
 that rollback is successful for every error condition in terms of
 testing and maintaining the code.  So, it is easier to fail beforehand
 when possible.
 
 OK, but as I said it is necessary to ensure that the conditions will be met
 in the next phases as well if we don't fail.
Yes, that's absolutely an requirement. Otherwise QA people will call you
when doing stress tests.

 
 Thanks,
 Rafael
 
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-06 Thread Toshi Kani
On Fri, 2012-12-07 at 00:52 +0800, Jiang Liu wrote:
 On 12/07/2012 12:31 AM, Toshi Kani wrote:
  On Fri, 2012-12-07 at 00:25 +0800, Jiang Liu wrote:
  On 12/07/2012 12:03 AM, Toshi Kani wrote:
  On Fri, 2012-12-07 at 00:00 +0800, Jiang Liu wrote:
  On 11/29/2012 02:41 AM, Toshi Kani wrote:
  On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
   : 
  Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
  have not looked at all your changes yet..), but in my mind, a hot-plug
  operation should be composed with the following 3 phases.
 
  1. Validate phase - Verify if the request is a supported operation.  All
  known restrictions are verified at this phase.  For instance, if a
  hot-remove request involves kernel memory, it is failed in this phase.
  Since this phase makes no change, no rollback is necessary to fail.  
 
  2. Execute phase - Perform hot-add / hot-remove operation that can be
  rolled-back in case of error or cancel.
 
  3. Commit phase - Perform the final hot-add / hot-remove operation that
  cannot be rolled-back.  No error / cancel is allowed in this phase.  For
  instance, eject operation is performed at this phase.  
  Hi Toshi,
   There are one more step needed. Linux provides sysfs interfaces to
  online/offline CPU/memory sections, so we need to protect from concurrent
  operations from those interfaces when doing physical hotplug. Think about
  following sequence:
  Thread 1
  1. validate conditions for hot-removal
  2. offline memory section A
  3.   online memory section A 
  
  4. offline memory section B
  5 hot-remove memory device hosting A and B.
 
  Hi Gerry,
 
  I agree.  And I am working on a proposal that tries to address this
  issue by integrating both sysfs and hotplug operations into a framework.
  Hi Toshi,
 But the sysfs for CPU and memory online/offline are platform independent
  interfaces, and the ACPI based hotplug is platform dependent interfaces. 
  I'm not
  sure whether it's feasible to merge them. For example we still need 
  offline interface
  to stop using faulty CPUs on platform without physical hotplug 
  capabilities.
 We have solved this by adding a busy flag to the device, so the sysfs
  will just return -EBUSY if the busy flag is set.
  
  I am making the framework code platform-independent so that it can
  handle both cases.  Well, I am still prototyping, so hopefully it will
  work. :)
 Do you mean implementing a framework to manage hotplug of any type of devices?
 That sounds like a huge plan:)
 
 Otherwise there may be a gap. CPU online/offline interface deals with logical
 CPU, and hotplug driver deals with physical devices(processor). They may be 
 different
 by related objects.

Actually it is not a huge plan.  The framework I am thinking of is to
enable a hotplug sequencer something analogous to do_initcalls() at the
boot sequence.  I am not doing any huge re-work.  That said, I am
currently testing my theory, so I won't promise anything, either. :)

Thanks,
-Toshi


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-06 Thread Jiang Liu
On 12/07/2012 01:09 AM, Toshi Kani wrote:
 On Fri, 2012-12-07 at 00:52 +0800, Jiang Liu wrote:
 On 12/07/2012 12:31 AM, Toshi Kani wrote:
 On Fri, 2012-12-07 at 00:25 +0800, Jiang Liu wrote:
 On 12/07/2012 12:03 AM, Toshi Kani wrote:
 On Fri, 2012-12-07 at 00:00 +0800, Jiang Liu wrote:
 On 11/29/2012 02:41 AM, Toshi Kani wrote:
 On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
  : 
 Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
 have not looked at all your changes yet..), but in my mind, a hot-plug
 operation should be composed with the following 3 phases.

 1. Validate phase - Verify if the request is a supported operation.  All
 known restrictions are verified at this phase.  For instance, if a
 hot-remove request involves kernel memory, it is failed in this phase.
 Since this phase makes no change, no rollback is necessary to fail.  

 2. Execute phase - Perform hot-add / hot-remove operation that can be
 rolled-back in case of error or cancel.

 3. Commit phase - Perform the final hot-add / hot-remove operation that
 cannot be rolled-back.  No error / cancel is allowed in this phase.  For
 instance, eject operation is performed at this phase.  
 Hi Toshi,
  There are one more step needed. Linux provides sysfs interfaces to
 online/offline CPU/memory sections, so we need to protect from concurrent
 operations from those interfaces when doing physical hotplug. Think about
 following sequence:
 Thread 1
 1. validate conditions for hot-removal
 2. offline memory section A
 3.   online memory section A 
 
 4. offline memory section B
 5 hot-remove memory device hosting A and B.

 Hi Gerry,

 I agree.  And I am working on a proposal that tries to address this
 issue by integrating both sysfs and hotplug operations into a framework.
 Hi Toshi,
But the sysfs for CPU and memory online/offline are platform independent
 interfaces, and the ACPI based hotplug is platform dependent interfaces. 
 I'm not
 sure whether it's feasible to merge them. For example we still need 
 offline interface
 to stop using faulty CPUs on platform without physical hotplug 
 capabilities.
We have solved this by adding a busy flag to the device, so the sysfs
 will just return -EBUSY if the busy flag is set.

 I am making the framework code platform-independent so that it can
 handle both cases.  Well, I am still prototyping, so hopefully it will
 work. :)
 Do you mean implementing a framework to manage hotplug of any type of 
 devices?
 That sounds like a huge plan:)

 Otherwise there may be a gap. CPU online/offline interface deals with logical
 CPU, and hotplug driver deals with physical devices(processor). They may be 
 different
 by related objects.
 
 Actually it is not a huge plan.  The framework I am thinking of is to
 enable a hotplug sequencer something analogous to do_initcalls() at the
 boot sequence.  I am not doing any huge re-work.  That said, I am
 currently testing my theory, so I won't promise anything, either. :)
Please do give us an update when you get any progress:)

 
 Thanks,
 -Toshi
 
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-06 Thread Toshi Kani
On Fri, 2012-12-07 at 01:30 +0800, Jiang Liu wrote:
 On 12/07/2012 01:09 AM, Toshi Kani wrote:
  On Fri, 2012-12-07 at 00:52 +0800, Jiang Liu wrote:
  On 12/07/2012 12:31 AM, Toshi Kani wrote:
  On Fri, 2012-12-07 at 00:25 +0800, Jiang Liu wrote:
  On 12/07/2012 12:03 AM, Toshi Kani wrote:
  On Fri, 2012-12-07 at 00:00 +0800, Jiang Liu wrote:
  On 11/29/2012 02:41 AM, Toshi Kani wrote:
  On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
   : 
  Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
  have not looked at all your changes yet..), but in my mind, a hot-plug
  operation should be composed with the following 3 phases.
 
  1. Validate phase - Verify if the request is a supported operation.  
  All
  known restrictions are verified at this phase.  For instance, if a
  hot-remove request involves kernel memory, it is failed in this phase.
  Since this phase makes no change, no rollback is necessary to fail.  
 
  2. Execute phase - Perform hot-add / hot-remove operation that can be
  rolled-back in case of error or cancel.
 
  3. Commit phase - Perform the final hot-add / hot-remove operation 
  that
  cannot be rolled-back.  No error / cancel is allowed in this phase.  
  For
  instance, eject operation is performed at this phase.  
  Hi Toshi,
 There are one more step needed. Linux provides sysfs interfaces 
  to
  online/offline CPU/memory sections, so we need to protect from 
  concurrent
  operations from those interfaces when doing physical hotplug. Think 
  about
  following sequence:
  Thread 1
  1. validate conditions for hot-removal
  2. offline memory section A
  3. online memory section A 
  
  4. offline memory section B
  5 hot-remove memory device hosting A and B.
 
  Hi Gerry,
 
  I agree.  And I am working on a proposal that tries to address this
  issue by integrating both sysfs and hotplug operations into a framework.
  Hi Toshi,
   But the sysfs for CPU and memory online/offline are platform independent
  interfaces, and the ACPI based hotplug is platform dependent interfaces. 
  I'm not
  sure whether it's feasible to merge them. For example we still need 
  offline interface
  to stop using faulty CPUs on platform without physical hotplug 
  capabilities.
   We have solved this by adding a busy flag to the device, so the sysfs
  will just return -EBUSY if the busy flag is set.
 
  I am making the framework code platform-independent so that it can
  handle both cases.  Well, I am still prototyping, so hopefully it will
  work. :)
  Do you mean implementing a framework to manage hotplug of any type of 
  devices?
  That sounds like a huge plan:)
 
  Otherwise there may be a gap. CPU online/offline interface deals with 
  logical
  CPU, and hotplug driver deals with physical devices(processor). They may 
  be different
  by related objects.
  
  Actually it is not a huge plan.  The framework I am thinking of is to
  enable a hotplug sequencer something analogous to do_initcalls() at the
  boot sequence.  I am not doing any huge re-work.  That said, I am
  currently testing my theory, so I won't promise anything, either. :)
 Please do give us an update when you get any progress:)

Yes, will do.

Thanks,
-Toshi


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-06 Thread Rafael J. Wysocki
On Friday, December 07, 2012 12:40:48 AM Jiang Liu wrote:
 On 12/04/2012 08:10 AM, Toshi Kani wrote:
  On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
  On 2012/11/30 6:27, Toshi Kani wrote:
  On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
  On 2012/11/29 2:41, Toshi Kani wrote:
  On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
  On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
  As you may know, the ACPI based hotplug framework we are working on 
  already addressed
  this problem, and the way we slove this problem is a bit like yours.
 
  We introduce hp_ops in struct acpi_device_ops:
  struct acpi_device_ops {
 acpi_op_add add;
 acpi_op_remove remove;
 acpi_op_start start;
 acpi_op_bind bind;
 acpi_op_unbind unbind;
 acpi_op_notify notify;
  #ifdef CONFIG_ACPI_HOTPLUG
 struct acpihp_dev_ops *hp_ops;
  #endif /* CONFIG_ACPI_HOTPLUG */
  };
 
  in hp_ops, we divide the prepare_remove into six small steps, that is:
  1) pre_release(): optional step to mark device going to be removed/busy
  2) release(): reclaim device from running system
  3) post_release(): rollback if cancelled by user or error happened
  4) pre_unconfigure(): optional step to solve possible dependency issue
  5) unconfigure(): remove devices from running system
  6) post_unconfigure(): free resources used by devices
 
  In this way, we can easily rollback if error happens.
  How do you think of this solution, any suggestion ? I think we can 
  achieve
  a better way for sharing ideas. :)
 
  Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
  have not looked at all your changes yet..), but in my mind, a hot-plug
  operation should be composed with the following 3 phases.
 
  Good idea ! we also implement a hot-plug operation in 3 phases:
  1) acpihp_drv_pre_execute
  2) acpihp_drv_execute
  3) acpihp_drv_post_execute
  you may refer to :
  https://lkml.org/lkml/2012/11/4/79
 
  Great.  Yes, I will take a look.
 
  Thanks, any comments are welcomed :)
  
  If I read the code right, the framework calls ACPI drivers differently
  at boot-time and hot-add as follows.  That is, the new entry points are
  called at hot-add only, but .add() is called at both cases.  This
  requires .add() to work differently.
  
  Boot: .add()
  Hot-Add : .add(), .pre_configure(), configure(), etc.
  
  I think the boot-time and hot-add initialization should be done
  consistently.  While there is difficulty with the current boot sequence,
  the framework should be designed to allow them consistent, not make them
  diverged.
 Hi Toshi,
   We have separated hotplug operations from driver binding/unbinding 
 interface
 due to following considerations.
 1) Physical CPU and memory devices are initialized/used before the ACPI 
 subsystem
is initialized. So under normal case, .add() of processor and 
 acpi_memhotplug only
figures out information about device already in working state instead of 
 starting
the device.
 2) It's impossible to rmmod the processor and acpi_memhotplug driver at 
 runtime 
if .remove() of CPU and memory drivers do really remove the CPU/memory 
 device
from the system. And the ACPI processor driver also implements CPU PM 
 funcitonality
other than hotplug.
 
 And recently Rafael has mentioned that he has a long term view to get rid of 
 the
 concept of ACPI device. If that happens, we could easily move the hotplug
 logic from ACPI device drivers into the hotplug framework if the hotplug logic
 is separated from the .add()/.remove() callbacks. Actually we could even move 
 all
 hotplug only logic into the hotplug framework and don't rely on any ACPI 
 device
 driver any more. So we could get rid of all these messy things. We could 
 achieve
 that by:
 1) moving code shared by ACPI device drivers and the hotplug framework into 
 the core.
 2) moving hotplug only code to the framework.
 
 Hi Rafael, what's your thoughts here?

I think that sounds good at the high level, but we need to get there
incrementally.  This way it will be easier to maintain backwards
compatibility and follow the changes.  Also, it will be easier for all of
the interested people from different companies to participate in the
development and make sure that everyones needs are going to be met this
way.

At this point, I'd like to see where the Toshi Kani's proposal is going to
take us.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-06 Thread Toshi Kani
On Fri, 2012-12-07 at 00:47 +0800, Jiang Liu wrote:
 On 12/05/2012 07:23 AM, Toshi Kani wrote:
  On Tue, 2012-12-04 at 17:16 +0800, Hanjun Guo wrote:
  On 2012/12/4 8:10, Toshi Kani wrote:
  On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
  On 2012/11/30 6:27, Toshi Kani wrote:
  On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
  On 2012/11/29 2:41, Toshi Kani wrote:
:
  The ACPI specification provides _EDL method to
  tell OS the eject device list, but still has no method to tell OS the 
  add device
  list now.
 
  Yes, but I do not think the OS needs special handling for add...
 
  Hmm, how about trigger a hot add operation by OS ? we have eject interface 
  for OS, but
  have no add interface now, do you think this feature is useful? If it is, 
  I think OS
  should analyze the dependency first and tell the user.
  
  The OS can eject an ACPI device because a target device is owned by the
  OS (i.e. enabled).  For hot-add, a target ACPI device is not owned by
  the OS (i.e. disabled).  Therefore, the OS is not supposed to change its
  state.  So, I do not think we should support a hot-add operation by the
  OS.
 We depends on the firmware to provide an interface to actually hot-add the 
 device.
 The sequence is:
 1) user trigger hot-add request by sysfs interfaces.
 2) hotplug framework validates conditions for hot-adding (dependency)
 3) hotplug framework invokes firmware interfaces to request a hot-adding 
 operation.
 4) firmware sends an ACPI notificaitons after powering on/initializing the 
 device
 5) OS adds the devices into running system.

Interesting...  In this sequence, I think FW must validate and check the
dependency before sending a SCI.  FW owns unassigned resources and is
responsible for the procedure necessary to enable resources on the
platform.  Such steps are basically platform-specific.  So, I do not
think the common OS code should step into such business.

Thanks,
-Toshi

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-06 Thread Toshi Kani
On Fri, 2012-12-07 at 00:40 +0800, Jiang Liu wrote:
 On 12/04/2012 08:10 AM, Toshi Kani wrote:
  On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
  On 2012/11/30 6:27, Toshi Kani wrote:
  On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
  On 2012/11/29 2:41, Toshi Kani wrote:
  On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
  On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
  As you may know, the ACPI based hotplug framework we are working on 
  already addressed
  this problem, and the way we slove this problem is a bit like yours.
 
  We introduce hp_ops in struct acpi_device_ops:
  struct acpi_device_ops {
 acpi_op_add add;
 acpi_op_remove remove;
 acpi_op_start start;
 acpi_op_bind bind;
 acpi_op_unbind unbind;
 acpi_op_notify notify;
  #ifdef CONFIG_ACPI_HOTPLUG
 struct acpihp_dev_ops *hp_ops;
  #endif /* CONFIG_ACPI_HOTPLUG */
  };
 
  in hp_ops, we divide the prepare_remove into six small steps, that is:
  1) pre_release(): optional step to mark device going to be removed/busy
  2) release(): reclaim device from running system
  3) post_release(): rollback if cancelled by user or error happened
  4) pre_unconfigure(): optional step to solve possible dependency issue
  5) unconfigure(): remove devices from running system
  6) post_unconfigure(): free resources used by devices
 
  In this way, we can easily rollback if error happens.
  How do you think of this solution, any suggestion ? I think we can 
  achieve
  a better way for sharing ideas. :)
 
  Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
  have not looked at all your changes yet..), but in my mind, a hot-plug
  operation should be composed with the following 3 phases.
 
  Good idea ! we also implement a hot-plug operation in 3 phases:
  1) acpihp_drv_pre_execute
  2) acpihp_drv_execute
  3) acpihp_drv_post_execute
  you may refer to :
  https://lkml.org/lkml/2012/11/4/79
 
  Great.  Yes, I will take a look.
 
  Thanks, any comments are welcomed :)
  
  If I read the code right, the framework calls ACPI drivers differently
  at boot-time and hot-add as follows.  That is, the new entry points are
  called at hot-add only, but .add() is called at both cases.  This
  requires .add() to work differently.
  
  Boot: .add()
  Hot-Add : .add(), .pre_configure(), configure(), etc.
  
  I think the boot-time and hot-add initialization should be done
  consistently.  While there is difficulty with the current boot sequence,
  the framework should be designed to allow them consistent, not make them
  diverged.
 Hi Toshi,
   We have separated hotplug operations from driver binding/unbinding 
 interface
 due to following considerations.
 1) Physical CPU and memory devices are initialized/used before the ACPI 
 subsystem
is initialized. So under normal case, .add() of processor and 
 acpi_memhotplug only
figures out information about device already in working state instead of 
 starting
the device.

I agree that the current boot sequence is not very hot-plug friendly...

 2) It's impossible to rmmod the processor and acpi_memhotplug driver at 
 runtime 
if .remove() of CPU and memory drivers do really remove the CPU/memory 
 device
from the system. And the ACPI processor driver also implements CPU PM 
 funcitonality
other than hotplug.

Agreed.

 And recently Rafael has mentioned that he has a long term view to get rid of 
 the
 concept of ACPI device. If that happens, we could easily move the hotplug
 logic from ACPI device drivers into the hotplug framework if the hotplug logic
 is separated from the .add()/.remove() callbacks. Actually we could even move 
 all
 hotplug only logic into the hotplug framework and don't rely on any ACPI 
 device
 driver any more. So we could get rid of all these messy things. We could 
 achieve
 that by:
 1) moving code shared by ACPI device drivers and the hotplug framework into 
 the core.
 2) moving hotplug only code to the framework.

Yes, the framework should allow such future work.  I also think that the
framework itself should be independent from such ACPI issue.  Ideally,
it should be able to support non-ACPI platforms.

 Hi Rafael, what's your thoughts here?
 
  
  1. Validate phase - Verify if the request is a supported operation.  All
  known restrictions are verified at this phase.  For instance, if a
  hot-remove request involves kernel memory, it is failed in this phase.
  Since this phase makes no change, no rollback is necessary to fail. 
 
  Yes, we have done this in acpihp_drv_pre_execute, and check following 
  things:
 
  1) Hot-plugble or not. the instance kernel memory you mentioned is also 
  checked
 when memory device remove;
 
  Agreed.
 
  2) Dependency check involved. For instance, if hot-add a memory device,
 processor should be added first, otherwise it's not valid to this 
  operation.
 
  I think FW should be the one that assures such dependency.  That is,
  

Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-06 Thread Jiang Liu
On 2012-12-7 10:57, Toshi Kani wrote:
 On Fri, 2012-12-07 at 00:40 +0800, Jiang Liu wrote:
 On 12/04/2012 08:10 AM, Toshi Kani wrote:
 On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
 On 2012/11/30 6:27, Toshi Kani wrote:
 On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
 On 2012/11/29 2:41, Toshi Kani wrote:
 On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
 On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
 As you may know, the ACPI based hotplug framework we are working on 
 already addressed
 this problem, and the way we slove this problem is a bit like yours.

 We introduce hp_ops in struct acpi_device_ops:
 struct acpi_device_ops {
acpi_op_add add;
acpi_op_remove remove;
acpi_op_start start;
acpi_op_bind bind;
acpi_op_unbind unbind;
acpi_op_notify notify;
 #ifdef CONFIG_ACPI_HOTPLUG
struct acpihp_dev_ops *hp_ops;
 #endif /* CONFIG_ACPI_HOTPLUG */
 };

 in hp_ops, we divide the prepare_remove into six small steps, that is:
 1) pre_release(): optional step to mark device going to be removed/busy
 2) release(): reclaim device from running system
 3) post_release(): rollback if cancelled by user or error happened
 4) pre_unconfigure(): optional step to solve possible dependency issue
 5) unconfigure(): remove devices from running system
 6) post_unconfigure(): free resources used by devices

 In this way, we can easily rollback if error happens.
 How do you think of this solution, any suggestion ? I think we can 
 achieve
 a better way for sharing ideas. :)

 Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
 have not looked at all your changes yet..), but in my mind, a hot-plug
 operation should be composed with the following 3 phases.

 Good idea ! we also implement a hot-plug operation in 3 phases:
 1) acpihp_drv_pre_execute
 2) acpihp_drv_execute
 3) acpihp_drv_post_execute
 you may refer to :
 https://lkml.org/lkml/2012/11/4/79

 Great.  Yes, I will take a look.

 Thanks, any comments are welcomed :)

 If I read the code right, the framework calls ACPI drivers differently
 at boot-time and hot-add as follows.  That is, the new entry points are
 called at hot-add only, but .add() is called at both cases.  This
 requires .add() to work differently.

 Boot: .add()
 Hot-Add : .add(), .pre_configure(), configure(), etc.

 I think the boot-time and hot-add initialization should be done
 consistently.  While there is difficulty with the current boot sequence,
 the framework should be designed to allow them consistent, not make them
 diverged.
 Hi Toshi,
  We have separated hotplug operations from driver binding/unbinding 
 interface
 due to following considerations.
 1) Physical CPU and memory devices are initialized/used before the ACPI 
 subsystem
is initialized. So under normal case, .add() of processor and 
 acpi_memhotplug only
figures out information about device already in working state instead of 
 starting
the device.
 
 I agree that the current boot sequence is not very hot-plug friendly...
 
 2) It's impossible to rmmod the processor and acpi_memhotplug driver at 
 runtime 
if .remove() of CPU and memory drivers do really remove the CPU/memory 
 device
from the system. And the ACPI processor driver also implements CPU PM 
 funcitonality
other than hotplug.
 
 Agreed.
 
 And recently Rafael has mentioned that he has a long term view to get rid of 
 the
 concept of ACPI device. If that happens, we could easily move the hotplug
 logic from ACPI device drivers into the hotplug framework if the hotplug 
 logic
 is separated from the .add()/.remove() callbacks. Actually we could even 
 move all
 hotplug only logic into the hotplug framework and don't rely on any ACPI 
 device
 driver any more. So we could get rid of all these messy things. We could 
 achieve
 that by:
 1) moving code shared by ACPI device drivers and the hotplug framework into 
 the core.
 2) moving hotplug only code to the framework.
 
 Yes, the framework should allow such future work.  I also think that the
 framework itself should be independent from such ACPI issue.  Ideally,
 it should be able to support non-ACPI platforms.
The same point here. The ACPI based hotplug framework is designed as:
1) an ACPI based hotplug slot driver to handle platform specific logic.
   Platform may provide platform specific slot drivers to discover, manage
   hotplug slots. We have provided a default implementation of slot driver
   according to the ACPI spec.
2) an ACPI based hotplug manager driver, which is a platform independent
   driver and manages all hotplug slot created by the slot driver.

We haven't gone further enough to provide an ACPI independent hotplug framework
because we only have experience with x86 and Itanium, both are ACPI based.
We may try to implement an ACPI independent hotplug framework by pushing all
ACPI specific logic into the slot driver, I think it's doable. But we need
suggestions from experts of 

Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-05 Thread Toshi Kani
On Wed, 2012-12-05 at 20:10 +0800, Hanjun Guo wrote:
> On 2012/12/5 7:23, Toshi Kani wrote:
> > On Tue, 2012-12-04 at 17:16 +0800, Hanjun Guo wrote:
> >> On 2012/12/4 8:10, Toshi Kani wrote:
> >>> On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
>  On 2012/11/30 6:27, Toshi Kani wrote:
> >>>
> >>> If I read the code right, the framework calls ACPI drivers differently
> >>> at boot-time and hot-add as follows.  That is, the new entry points are
> >>> called at hot-add only, but .add() is called at both cases.  This
> >>> requires .add() to work differently.
> >>
> >> Hi Toshi,
> >> Thanks for your comments!
> >>
> >>>
> >>> Boot: .add()
> >>
> >> Actually, at boot time: .add(), .start()
> > 
> > Right.
> > 
> >>> Hot-Add : .add(), .pre_configure(), configure(), etc.
> >>
> >> Yes, we did it as you said in the framework. We use .pre_configure(), 
> >> configure(),
> >> and post_configure() to instead of .start() for better error handling and 
> >> recovery.
> > 
> > I think we should have hot-plug interfaces at the module level, not at
> > the ACPI-internal level.  In this way, the interfaces can be
> > platform-neutral and allow any modules to register, which makes it more
> > consistent with the boot-up sequence.  It can also allow ordering of the
> > sequence among the registered modules.  Right now, we initiate all
> > procedures from ACPI during hot-plug, which I think is inflexible and
> > steps into other module's role.
> > 
> > I am also concerned about the slot handling, which is the core piece of
> > the infrastructure and only allows hot-plug operations on ACPI objects
> > where slot objects are previously created by checking _EJ0.  The
> > infrastructure should allow hot-plug operations on any objects, and it
> > should not be dependent on the slot design.
> > 
> > I have some rough idea, and it may be easier to review / explain if I
> > make some code changes.  So, let me prototype it, and send it you all if
> > that works out.  Hopefully, it won't take too long.
> 
> Great! If any thing I can do, please let me know it.

Cool.  Yes, if the prototype turns out to be a good one, we can work
together to improve it. :)
 
Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-05 Thread Hanjun Guo
On 2012/12/5 7:23, Toshi Kani wrote:
> On Tue, 2012-12-04 at 17:16 +0800, Hanjun Guo wrote:
>> On 2012/12/4 8:10, Toshi Kani wrote:
>>> On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
 On 2012/11/30 6:27, Toshi Kani wrote:
> On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
>> On 2012/11/29 2:41, Toshi Kani wrote:
>>> On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
 On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
 As you may know, the ACPI based hotplug framework we are working on 
 already addressed
 this problem, and the way we slove this problem is a bit like yours.

 We introduce hp_ops in struct acpi_device_ops:
 struct acpi_device_ops {
acpi_op_add add;
acpi_op_remove remove;
acpi_op_start start;
acpi_op_bind bind;
acpi_op_unbind unbind;
acpi_op_notify notify;
 #ifdef CONFIG_ACPI_HOTPLUG
struct acpihp_dev_ops *hp_ops;
 #endif /* CONFIG_ACPI_HOTPLUG */
 };

 in hp_ops, we divide the prepare_remove into six small steps, that is:
 1) pre_release(): optional step to mark device going to be removed/busy
 2) release(): reclaim device from running system
 3) post_release(): rollback if cancelled by user or error happened
 4) pre_unconfigure(): optional step to solve possible dependency issue
 5) unconfigure(): remove devices from running system
 6) post_unconfigure(): free resources used by devices

 In this way, we can easily rollback if error happens.
 How do you think of this solution, any suggestion ? I think we can 
 achieve
 a better way for sharing ideas. :)
>>>
>>> Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
>>> have not looked at all your changes yet..), but in my mind, a hot-plug
>>> operation should be composed with the following 3 phases.
>>
>> Good idea ! we also implement a hot-plug operation in 3 phases:
>> 1) acpihp_drv_pre_execute
>> 2) acpihp_drv_execute
>> 3) acpihp_drv_post_execute
>> you may refer to :
>> https://lkml.org/lkml/2012/11/4/79
>
> Great.  Yes, I will take a look.

 Thanks, any comments are welcomed :)
>>>
>>> If I read the code right, the framework calls ACPI drivers differently
>>> at boot-time and hot-add as follows.  That is, the new entry points are
>>> called at hot-add only, but .add() is called at both cases.  This
>>> requires .add() to work differently.
>>
>> Hi Toshi,
>> Thanks for your comments!
>>
>>>
>>> Boot: .add()
>>
>> Actually, at boot time: .add(), .start()
> 
> Right.
> 
>>> Hot-Add : .add(), .pre_configure(), configure(), etc.
>>
>> Yes, we did it as you said in the framework. We use .pre_configure(), 
>> configure(),
>> and post_configure() to instead of .start() for better error handling and 
>> recovery.
> 
> I think we should have hot-plug interfaces at the module level, not at
> the ACPI-internal level.  In this way, the interfaces can be
> platform-neutral and allow any modules to register, which makes it more
> consistent with the boot-up sequence.  It can also allow ordering of the
> sequence among the registered modules.  Right now, we initiate all
> procedures from ACPI during hot-plug, which I think is inflexible and
> steps into other module's role.
> 
> I am also concerned about the slot handling, which is the core piece of
> the infrastructure and only allows hot-plug operations on ACPI objects
> where slot objects are previously created by checking _EJ0.  The
> infrastructure should allow hot-plug operations on any objects, and it
> should not be dependent on the slot design.
> 
> I have some rough idea, and it may be easier to review / explain if I
> make some code changes.  So, let me prototype it, and send it you all if
> that works out.  Hopefully, it won't take too long.

Great! If any thing I can do, please let me know it.

> 
>>> I think the boot-time and hot-add initialization should be done
>>> consistently.  While there is difficulty with the current boot sequence,
>>> the framework should be designed to allow them consistent, not make them
>>> diverged.
>>>
>>> 1. Validate phase - Verify if the request is a supported operation.  All
>>> known restrictions are verified at this phase.  For instance, if a
>>> hot-remove request involves kernel memory, it is failed in this phase.
>>> Since this phase makes no change, no rollback is necessary to fail. 
>>
>> Yes, we have done this in acpihp_drv_pre_execute, and check following 
>> things:
>>
>> 1) Hot-plugble or not. the instance kernel memory you mentioned is also 
>> checked
>>when memory device remove;
>
> Agreed.
>
>> 2) Dependency check involved. For instance, if hot-add a memory device,
>>processor 

Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-05 Thread Hanjun Guo
On 2012/12/5 7:23, Toshi Kani wrote:
 On Tue, 2012-12-04 at 17:16 +0800, Hanjun Guo wrote:
 On 2012/12/4 8:10, Toshi Kani wrote:
 On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
 On 2012/11/30 6:27, Toshi Kani wrote:
 On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
 On 2012/11/29 2:41, Toshi Kani wrote:
 On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
 On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
 As you may know, the ACPI based hotplug framework we are working on 
 already addressed
 this problem, and the way we slove this problem is a bit like yours.

 We introduce hp_ops in struct acpi_device_ops:
 struct acpi_device_ops {
acpi_op_add add;
acpi_op_remove remove;
acpi_op_start start;
acpi_op_bind bind;
acpi_op_unbind unbind;
acpi_op_notify notify;
 #ifdef CONFIG_ACPI_HOTPLUG
struct acpihp_dev_ops *hp_ops;
 #endif /* CONFIG_ACPI_HOTPLUG */
 };

 in hp_ops, we divide the prepare_remove into six small steps, that is:
 1) pre_release(): optional step to mark device going to be removed/busy
 2) release(): reclaim device from running system
 3) post_release(): rollback if cancelled by user or error happened
 4) pre_unconfigure(): optional step to solve possible dependency issue
 5) unconfigure(): remove devices from running system
 6) post_unconfigure(): free resources used by devices

 In this way, we can easily rollback if error happens.
 How do you think of this solution, any suggestion ? I think we can 
 achieve
 a better way for sharing ideas. :)

 Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
 have not looked at all your changes yet..), but in my mind, a hot-plug
 operation should be composed with the following 3 phases.

 Good idea ! we also implement a hot-plug operation in 3 phases:
 1) acpihp_drv_pre_execute
 2) acpihp_drv_execute
 3) acpihp_drv_post_execute
 you may refer to :
 https://lkml.org/lkml/2012/11/4/79

 Great.  Yes, I will take a look.

 Thanks, any comments are welcomed :)

 If I read the code right, the framework calls ACPI drivers differently
 at boot-time and hot-add as follows.  That is, the new entry points are
 called at hot-add only, but .add() is called at both cases.  This
 requires .add() to work differently.

 Hi Toshi,
 Thanks for your comments!


 Boot: .add()

 Actually, at boot time: .add(), .start()
 
 Right.
 
 Hot-Add : .add(), .pre_configure(), configure(), etc.

 Yes, we did it as you said in the framework. We use .pre_configure(), 
 configure(),
 and post_configure() to instead of .start() for better error handling and 
 recovery.
 
 I think we should have hot-plug interfaces at the module level, not at
 the ACPI-internal level.  In this way, the interfaces can be
 platform-neutral and allow any modules to register, which makes it more
 consistent with the boot-up sequence.  It can also allow ordering of the
 sequence among the registered modules.  Right now, we initiate all
 procedures from ACPI during hot-plug, which I think is inflexible and
 steps into other module's role.
 
 I am also concerned about the slot handling, which is the core piece of
 the infrastructure and only allows hot-plug operations on ACPI objects
 where slot objects are previously created by checking _EJ0.  The
 infrastructure should allow hot-plug operations on any objects, and it
 should not be dependent on the slot design.
 
 I have some rough idea, and it may be easier to review / explain if I
 make some code changes.  So, let me prototype it, and send it you all if
 that works out.  Hopefully, it won't take too long.

Great! If any thing I can do, please let me know it.

 
 I think the boot-time and hot-add initialization should be done
 consistently.  While there is difficulty with the current boot sequence,
 the framework should be designed to allow them consistent, not make them
 diverged.

 1. Validate phase - Verify if the request is a supported operation.  All
 known restrictions are verified at this phase.  For instance, if a
 hot-remove request involves kernel memory, it is failed in this phase.
 Since this phase makes no change, no rollback is necessary to fail. 

 Yes, we have done this in acpihp_drv_pre_execute, and check following 
 things:

 1) Hot-plugble or not. the instance kernel memory you mentioned is also 
 checked
when memory device remove;

 Agreed.

 2) Dependency check involved. For instance, if hot-add a memory device,
processor should be added first, otherwise it's not valid to this 
 operation.

 I think FW should be the one that assures such dependency.  That is,
 when a memory device object is marked as present/enabled/functioning, it
 should be ready for the OS to use.

 Yes, BIOS should do something for the dependency, because BIOS knows the
 actual hardware topology. 

 Right.

 The ACPI specification provides _EDL method to
 tell OS the eject device list, but still has no method to tell OS the add 
 device
 list now.

 Yes, but I do not think the 

Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-05 Thread Toshi Kani
On Wed, 2012-12-05 at 20:10 +0800, Hanjun Guo wrote:
 On 2012/12/5 7:23, Toshi Kani wrote:
  On Tue, 2012-12-04 at 17:16 +0800, Hanjun Guo wrote:
  On 2012/12/4 8:10, Toshi Kani wrote:
  On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
  On 2012/11/30 6:27, Toshi Kani wrote:
 
  If I read the code right, the framework calls ACPI drivers differently
  at boot-time and hot-add as follows.  That is, the new entry points are
  called at hot-add only, but .add() is called at both cases.  This
  requires .add() to work differently.
 
  Hi Toshi,
  Thanks for your comments!
 
 
  Boot: .add()
 
  Actually, at boot time: .add(), .start()
  
  Right.
  
  Hot-Add : .add(), .pre_configure(), configure(), etc.
 
  Yes, we did it as you said in the framework. We use .pre_configure(), 
  configure(),
  and post_configure() to instead of .start() for better error handling and 
  recovery.
  
  I think we should have hot-plug interfaces at the module level, not at
  the ACPI-internal level.  In this way, the interfaces can be
  platform-neutral and allow any modules to register, which makes it more
  consistent with the boot-up sequence.  It can also allow ordering of the
  sequence among the registered modules.  Right now, we initiate all
  procedures from ACPI during hot-plug, which I think is inflexible and
  steps into other module's role.
  
  I am also concerned about the slot handling, which is the core piece of
  the infrastructure and only allows hot-plug operations on ACPI objects
  where slot objects are previously created by checking _EJ0.  The
  infrastructure should allow hot-plug operations on any objects, and it
  should not be dependent on the slot design.
  
  I have some rough idea, and it may be easier to review / explain if I
  make some code changes.  So, let me prototype it, and send it you all if
  that works out.  Hopefully, it won't take too long.
 
 Great! If any thing I can do, please let me know it.

Cool.  Yes, if the prototype turns out to be a good one, we can work
together to improve it. :)
 
Thanks,
-Toshi

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-04 Thread Toshi Kani
On Tue, 2012-12-04 at 17:16 +0800, Hanjun Guo wrote:
> On 2012/12/4 8:10, Toshi Kani wrote:
> > On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
> >> On 2012/11/30 6:27, Toshi Kani wrote:
> >>> On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
>  On 2012/11/29 2:41, Toshi Kani wrote:
> > On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
> >> On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
> >> As you may know, the ACPI based hotplug framework we are working on 
> >> already addressed
> >> this problem, and the way we slove this problem is a bit like yours.
> >>
> >> We introduce hp_ops in struct acpi_device_ops:
> >> struct acpi_device_ops {
> >>acpi_op_add add;
> >>acpi_op_remove remove;
> >>acpi_op_start start;
> >>acpi_op_bind bind;
> >>acpi_op_unbind unbind;
> >>acpi_op_notify notify;
> >> #ifdef CONFIG_ACPI_HOTPLUG
> >>struct acpihp_dev_ops *hp_ops;
> >> #endif /* CONFIG_ACPI_HOTPLUG */
> >> };
> >>
> >> in hp_ops, we divide the prepare_remove into six small steps, that is:
> >> 1) pre_release(): optional step to mark device going to be removed/busy
> >> 2) release(): reclaim device from running system
> >> 3) post_release(): rollback if cancelled by user or error happened
> >> 4) pre_unconfigure(): optional step to solve possible dependency issue
> >> 5) unconfigure(): remove devices from running system
> >> 6) post_unconfigure(): free resources used by devices
> >>
> >> In this way, we can easily rollback if error happens.
> >> How do you think of this solution, any suggestion ? I think we can 
> >> achieve
> >> a better way for sharing ideas. :)
> >
> > Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
> > have not looked at all your changes yet..), but in my mind, a hot-plug
> > operation should be composed with the following 3 phases.
> 
>  Good idea ! we also implement a hot-plug operation in 3 phases:
>  1) acpihp_drv_pre_execute
>  2) acpihp_drv_execute
>  3) acpihp_drv_post_execute
>  you may refer to :
>  https://lkml.org/lkml/2012/11/4/79
> >>>
> >>> Great.  Yes, I will take a look.
> >>
> >> Thanks, any comments are welcomed :)
> > 
> > If I read the code right, the framework calls ACPI drivers differently
> > at boot-time and hot-add as follows.  That is, the new entry points are
> > called at hot-add only, but .add() is called at both cases.  This
> > requires .add() to work differently.
> 
> Hi Toshi,
> Thanks for your comments!
> 
> > 
> > Boot: .add()
> 
> Actually, at boot time: .add(), .start()

Right.

> > Hot-Add : .add(), .pre_configure(), configure(), etc.
> 
> Yes, we did it as you said in the framework. We use .pre_configure(), 
> configure(),
> and post_configure() to instead of .start() for better error handling and 
> recovery.

I think we should have hot-plug interfaces at the module level, not at
the ACPI-internal level.  In this way, the interfaces can be
platform-neutral and allow any modules to register, which makes it more
consistent with the boot-up sequence.  It can also allow ordering of the
sequence among the registered modules.  Right now, we initiate all
procedures from ACPI during hot-plug, which I think is inflexible and
steps into other module's role.

I am also concerned about the slot handling, which is the core piece of
the infrastructure and only allows hot-plug operations on ACPI objects
where slot objects are previously created by checking _EJ0.  The
infrastructure should allow hot-plug operations on any objects, and it
should not be dependent on the slot design.

I have some rough idea, and it may be easier to review / explain if I
make some code changes.  So, let me prototype it, and send it you all if
that works out.  Hopefully, it won't take too long.

> > I think the boot-time and hot-add initialization should be done
> > consistently.  While there is difficulty with the current boot sequence,
> > the framework should be designed to allow them consistent, not make them
> > diverged.
> > 
> > 1. Validate phase - Verify if the request is a supported operation.  All
> > known restrictions are verified at this phase.  For instance, if a
> > hot-remove request involves kernel memory, it is failed in this phase.
> > Since this phase makes no change, no rollback is necessary to fail. 
> 
>  Yes, we have done this in acpihp_drv_pre_execute, and check following 
>  things:
> 
>  1) Hot-plugble or not. the instance kernel memory you mentioned is also 
>  checked
> when memory device remove;
> >>>
> >>> Agreed.
> >>>
>  2) Dependency check involved. For instance, if hot-add a memory device,
> processor should be added first, otherwise it's not valid to this 
>  operation.
> >>>
> >>> I think FW should be the one that assures such 

Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-04 Thread Hanjun Guo
On 2012/12/4 8:10, Toshi Kani wrote:
> On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
>> On 2012/11/30 6:27, Toshi Kani wrote:
>>> On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
 On 2012/11/29 2:41, Toshi Kani wrote:
> On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
>> On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
>> As you may know, the ACPI based hotplug framework we are working on 
>> already addressed
>> this problem, and the way we slove this problem is a bit like yours.
>>
>> We introduce hp_ops in struct acpi_device_ops:
>> struct acpi_device_ops {
>>  acpi_op_add add;
>>  acpi_op_remove remove;
>>  acpi_op_start start;
>>  acpi_op_bind bind;
>>  acpi_op_unbind unbind;
>>  acpi_op_notify notify;
>> #ifdef   CONFIG_ACPI_HOTPLUG
>>  struct acpihp_dev_ops *hp_ops;
>> #endif   /* CONFIG_ACPI_HOTPLUG */
>> };
>>
>> in hp_ops, we divide the prepare_remove into six small steps, that is:
>> 1) pre_release(): optional step to mark device going to be removed/busy
>> 2) release(): reclaim device from running system
>> 3) post_release(): rollback if cancelled by user or error happened
>> 4) pre_unconfigure(): optional step to solve possible dependency issue
>> 5) unconfigure(): remove devices from running system
>> 6) post_unconfigure(): free resources used by devices
>>
>> In this way, we can easily rollback if error happens.
>> How do you think of this solution, any suggestion ? I think we can 
>> achieve
>> a better way for sharing ideas. :)
>
> Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
> have not looked at all your changes yet..), but in my mind, a hot-plug
> operation should be composed with the following 3 phases.

 Good idea ! we also implement a hot-plug operation in 3 phases:
 1) acpihp_drv_pre_execute
 2) acpihp_drv_execute
 3) acpihp_drv_post_execute
 you may refer to :
 https://lkml.org/lkml/2012/11/4/79
>>>
>>> Great.  Yes, I will take a look.
>>
>> Thanks, any comments are welcomed :)
> 
> If I read the code right, the framework calls ACPI drivers differently
> at boot-time and hot-add as follows.  That is, the new entry points are
> called at hot-add only, but .add() is called at both cases.  This
> requires .add() to work differently.

Hi Toshi,
Thanks for your comments!

> 
> Boot: .add()

Actually, at boot time: .add(), .start()

> Hot-Add : .add(), .pre_configure(), configure(), etc.

Yes, we did it as you said in the framework. We use .pre_configure(), 
configure(),
and post_configure() to instead of .start() for better error handling and 
recovery.

> 
> I think the boot-time and hot-add initialization should be done
> consistently.  While there is difficulty with the current boot sequence,
> the framework should be designed to allow them consistent, not make them
> diverged.
> 
> 1. Validate phase - Verify if the request is a supported operation.  All
> known restrictions are verified at this phase.  For instance, if a
> hot-remove request involves kernel memory, it is failed in this phase.
> Since this phase makes no change, no rollback is necessary to fail. 

 Yes, we have done this in acpihp_drv_pre_execute, and check following 
 things:

 1) Hot-plugble or not. the instance kernel memory you mentioned is also 
 checked
when memory device remove;
>>>
>>> Agreed.
>>>
 2) Dependency check involved. For instance, if hot-add a memory device,
processor should be added first, otherwise it's not valid to this 
 operation.
>>>
>>> I think FW should be the one that assures such dependency.  That is,
>>> when a memory device object is marked as present/enabled/functioning, it
>>> should be ready for the OS to use.
>>
>> Yes, BIOS should do something for the dependency, because BIOS knows the
>> actual hardware topology. 
> 
> Right.
> 
>> The ACPI specification provides _EDL method to
>> tell OS the eject device list, but still has no method to tell OS the add 
>> device
>> list now.
> 
> Yes, but I do not think the OS needs special handling for add...

Hmm, how about trigger a hot add operation by OS ? we have eject interface for 
OS, but
have no add interface now, do you think this feature is useful? If it is, I 
think OS
should analyze the dependency first and tell the user.

> 
>> For some cases, OS should analyze the dependency in the validate phase. For 
>> example,
>> when hot remove a node (container device), OS should analyze the dependency 
>> to get
>> the remove order as following:
>> 1) Host bridge;
>> 2) Memory devices;
>> 3) Processor devices;
>> 4) Container device itself;
> 
> This may be off-topic, but how do you plan to delete I/O devices under a
> node?  Are you planning to delete all I/O devices along with the node?

Yes, we delete all I/O devices under the node. we delete I/O devices as

Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-04 Thread Hanjun Guo
On 2012/12/4 8:10, Toshi Kani wrote:
 On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
 On 2012/11/30 6:27, Toshi Kani wrote:
 On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
 On 2012/11/29 2:41, Toshi Kani wrote:
 On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
 On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
 As you may know, the ACPI based hotplug framework we are working on 
 already addressed
 this problem, and the way we slove this problem is a bit like yours.

 We introduce hp_ops in struct acpi_device_ops:
 struct acpi_device_ops {
  acpi_op_add add;
  acpi_op_remove remove;
  acpi_op_start start;
  acpi_op_bind bind;
  acpi_op_unbind unbind;
  acpi_op_notify notify;
 #ifdef   CONFIG_ACPI_HOTPLUG
  struct acpihp_dev_ops *hp_ops;
 #endif   /* CONFIG_ACPI_HOTPLUG */
 };

 in hp_ops, we divide the prepare_remove into six small steps, that is:
 1) pre_release(): optional step to mark device going to be removed/busy
 2) release(): reclaim device from running system
 3) post_release(): rollback if cancelled by user or error happened
 4) pre_unconfigure(): optional step to solve possible dependency issue
 5) unconfigure(): remove devices from running system
 6) post_unconfigure(): free resources used by devices

 In this way, we can easily rollback if error happens.
 How do you think of this solution, any suggestion ? I think we can 
 achieve
 a better way for sharing ideas. :)

 Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
 have not looked at all your changes yet..), but in my mind, a hot-plug
 operation should be composed with the following 3 phases.

 Good idea ! we also implement a hot-plug operation in 3 phases:
 1) acpihp_drv_pre_execute
 2) acpihp_drv_execute
 3) acpihp_drv_post_execute
 you may refer to :
 https://lkml.org/lkml/2012/11/4/79

 Great.  Yes, I will take a look.

 Thanks, any comments are welcomed :)
 
 If I read the code right, the framework calls ACPI drivers differently
 at boot-time and hot-add as follows.  That is, the new entry points are
 called at hot-add only, but .add() is called at both cases.  This
 requires .add() to work differently.

Hi Toshi,
Thanks for your comments!

 
 Boot: .add()

Actually, at boot time: .add(), .start()

 Hot-Add : .add(), .pre_configure(), configure(), etc.

Yes, we did it as you said in the framework. We use .pre_configure(), 
configure(),
and post_configure() to instead of .start() for better error handling and 
recovery.

 
 I think the boot-time and hot-add initialization should be done
 consistently.  While there is difficulty with the current boot sequence,
 the framework should be designed to allow them consistent, not make them
 diverged.
 
 1. Validate phase - Verify if the request is a supported operation.  All
 known restrictions are verified at this phase.  For instance, if a
 hot-remove request involves kernel memory, it is failed in this phase.
 Since this phase makes no change, no rollback is necessary to fail. 

 Yes, we have done this in acpihp_drv_pre_execute, and check following 
 things:

 1) Hot-plugble or not. the instance kernel memory you mentioned is also 
 checked
when memory device remove;

 Agreed.

 2) Dependency check involved. For instance, if hot-add a memory device,
processor should be added first, otherwise it's not valid to this 
 operation.

 I think FW should be the one that assures such dependency.  That is,
 when a memory device object is marked as present/enabled/functioning, it
 should be ready for the OS to use.

 Yes, BIOS should do something for the dependency, because BIOS knows the
 actual hardware topology. 
 
 Right.
 
 The ACPI specification provides _EDL method to
 tell OS the eject device list, but still has no method to tell OS the add 
 device
 list now.
 
 Yes, but I do not think the OS needs special handling for add...

Hmm, how about trigger a hot add operation by OS ? we have eject interface for 
OS, but
have no add interface now, do you think this feature is useful? If it is, I 
think OS
should analyze the dependency first and tell the user.

 
 For some cases, OS should analyze the dependency in the validate phase. For 
 example,
 when hot remove a node (container device), OS should analyze the dependency 
 to get
 the remove order as following:
 1) Host bridge;
 2) Memory devices;
 3) Processor devices;
 4) Container device itself;
 
 This may be off-topic, but how do you plan to delete I/O devices under a
 node?  Are you planning to delete all I/O devices along with the node?

Yes, we delete all I/O devices under the node. we delete I/O devices as
following steps:
1) Offline PCI devices;
2) Offline IOAPIC and IOMMU;
and offline I/O devices no matter in use or not.

 
 On other OS, we made a separate step called I/O chassis delete, which
 off-lines all I/O devices under the node, and is required before a node
 hot-remove.  It basically triggers PCIe hot-remove to detach drivers
 from all devices.  It does not eject the devices so that 

Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-04 Thread Toshi Kani
On Tue, 2012-12-04 at 17:16 +0800, Hanjun Guo wrote:
 On 2012/12/4 8:10, Toshi Kani wrote:
  On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
  On 2012/11/30 6:27, Toshi Kani wrote:
  On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
  On 2012/11/29 2:41, Toshi Kani wrote:
  On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
  On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
  As you may know, the ACPI based hotplug framework we are working on 
  already addressed
  this problem, and the way we slove this problem is a bit like yours.
 
  We introduce hp_ops in struct acpi_device_ops:
  struct acpi_device_ops {
 acpi_op_add add;
 acpi_op_remove remove;
 acpi_op_start start;
 acpi_op_bind bind;
 acpi_op_unbind unbind;
 acpi_op_notify notify;
  #ifdef CONFIG_ACPI_HOTPLUG
 struct acpihp_dev_ops *hp_ops;
  #endif /* CONFIG_ACPI_HOTPLUG */
  };
 
  in hp_ops, we divide the prepare_remove into six small steps, that is:
  1) pre_release(): optional step to mark device going to be removed/busy
  2) release(): reclaim device from running system
  3) post_release(): rollback if cancelled by user or error happened
  4) pre_unconfigure(): optional step to solve possible dependency issue
  5) unconfigure(): remove devices from running system
  6) post_unconfigure(): free resources used by devices
 
  In this way, we can easily rollback if error happens.
  How do you think of this solution, any suggestion ? I think we can 
  achieve
  a better way for sharing ideas. :)
 
  Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
  have not looked at all your changes yet..), but in my mind, a hot-plug
  operation should be composed with the following 3 phases.
 
  Good idea ! we also implement a hot-plug operation in 3 phases:
  1) acpihp_drv_pre_execute
  2) acpihp_drv_execute
  3) acpihp_drv_post_execute
  you may refer to :
  https://lkml.org/lkml/2012/11/4/79
 
  Great.  Yes, I will take a look.
 
  Thanks, any comments are welcomed :)
  
  If I read the code right, the framework calls ACPI drivers differently
  at boot-time and hot-add as follows.  That is, the new entry points are
  called at hot-add only, but .add() is called at both cases.  This
  requires .add() to work differently.
 
 Hi Toshi,
 Thanks for your comments!
 
  
  Boot: .add()
 
 Actually, at boot time: .add(), .start()

Right.

  Hot-Add : .add(), .pre_configure(), configure(), etc.
 
 Yes, we did it as you said in the framework. We use .pre_configure(), 
 configure(),
 and post_configure() to instead of .start() for better error handling and 
 recovery.

I think we should have hot-plug interfaces at the module level, not at
the ACPI-internal level.  In this way, the interfaces can be
platform-neutral and allow any modules to register, which makes it more
consistent with the boot-up sequence.  It can also allow ordering of the
sequence among the registered modules.  Right now, we initiate all
procedures from ACPI during hot-plug, which I think is inflexible and
steps into other module's role.

I am also concerned about the slot handling, which is the core piece of
the infrastructure and only allows hot-plug operations on ACPI objects
where slot objects are previously created by checking _EJ0.  The
infrastructure should allow hot-plug operations on any objects, and it
should not be dependent on the slot design.

I have some rough idea, and it may be easier to review / explain if I
make some code changes.  So, let me prototype it, and send it you all if
that works out.  Hopefully, it won't take too long.

  I think the boot-time and hot-add initialization should be done
  consistently.  While there is difficulty with the current boot sequence,
  the framework should be designed to allow them consistent, not make them
  diverged.
  
  1. Validate phase - Verify if the request is a supported operation.  All
  known restrictions are verified at this phase.  For instance, if a
  hot-remove request involves kernel memory, it is failed in this phase.
  Since this phase makes no change, no rollback is necessary to fail. 
 
  Yes, we have done this in acpihp_drv_pre_execute, and check following 
  things:
 
  1) Hot-plugble or not. the instance kernel memory you mentioned is also 
  checked
 when memory device remove;
 
  Agreed.
 
  2) Dependency check involved. For instance, if hot-add a memory device,
 processor should be added first, otherwise it's not valid to this 
  operation.
 
  I think FW should be the one that assures such dependency.  That is,
  when a memory device object is marked as present/enabled/functioning, it
  should be ready for the OS to use.
 
  Yes, BIOS should do something for the dependency, because BIOS knows the
  actual hardware topology. 
  
  Right.
  
  The ACPI specification provides _EDL method to
  tell OS the eject device list, but still has no method to tell OS the add 
  device
  list now.
  
  Yes, but I do not think the OS needs 

Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-03 Thread Toshi Kani
On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
> On 2012/11/30 6:27, Toshi Kani wrote:
> > On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
> >> On 2012/11/29 2:41, Toshi Kani wrote:
> >>> On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
>  On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
>  As you may know, the ACPI based hotplug framework we are working on 
>  already addressed
>  this problem, and the way we slove this problem is a bit like yours.
> 
>  We introduce hp_ops in struct acpi_device_ops:
>  struct acpi_device_ops {
>   acpi_op_add add;
>   acpi_op_remove remove;
>   acpi_op_start start;
>   acpi_op_bind bind;
>   acpi_op_unbind unbind;
>   acpi_op_notify notify;
>  #ifdef   CONFIG_ACPI_HOTPLUG
>   struct acpihp_dev_ops *hp_ops;
>  #endif   /* CONFIG_ACPI_HOTPLUG */
>  };
> 
>  in hp_ops, we divide the prepare_remove into six small steps, that is:
>  1) pre_release(): optional step to mark device going to be removed/busy
>  2) release(): reclaim device from running system
>  3) post_release(): rollback if cancelled by user or error happened
>  4) pre_unconfigure(): optional step to solve possible dependency issue
>  5) unconfigure(): remove devices from running system
>  6) post_unconfigure(): free resources used by devices
> 
>  In this way, we can easily rollback if error happens.
>  How do you think of this solution, any suggestion ? I think we can 
>  achieve
>  a better way for sharing ideas. :)
> >>>
> >>> Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
> >>> have not looked at all your changes yet..), but in my mind, a hot-plug
> >>> operation should be composed with the following 3 phases.
> >>
> >> Good idea ! we also implement a hot-plug operation in 3 phases:
> >> 1) acpihp_drv_pre_execute
> >> 2) acpihp_drv_execute
> >> 3) acpihp_drv_post_execute
> >> you may refer to :
> >> https://lkml.org/lkml/2012/11/4/79
> > 
> > Great.  Yes, I will take a look.
> 
> Thanks, any comments are welcomed :)

If I read the code right, the framework calls ACPI drivers differently
at boot-time and hot-add as follows.  That is, the new entry points are
called at hot-add only, but .add() is called at both cases.  This
requires .add() to work differently.

Boot: .add()
Hot-Add : .add(), .pre_configure(), configure(), etc.

I think the boot-time and hot-add initialization should be done
consistently.  While there is difficulty with the current boot sequence,
the framework should be designed to allow them consistent, not make them
diverged.

> >>> 1. Validate phase - Verify if the request is a supported operation.  All
> >>> known restrictions are verified at this phase.  For instance, if a
> >>> hot-remove request involves kernel memory, it is failed in this phase.
> >>> Since this phase makes no change, no rollback is necessary to fail. 
> >>
> >> Yes, we have done this in acpihp_drv_pre_execute, and check following 
> >> things:
> >>
> >> 1) Hot-plugble or not. the instance kernel memory you mentioned is also 
> >> checked
> >>when memory device remove;
> > 
> > Agreed.
> > 
> >> 2) Dependency check involved. For instance, if hot-add a memory device,
> >>processor should be added first, otherwise it's not valid to this 
> >> operation.
> > 
> > I think FW should be the one that assures such dependency.  That is,
> > when a memory device object is marked as present/enabled/functioning, it
> > should be ready for the OS to use.
> 
> Yes, BIOS should do something for the dependency, because BIOS knows the
> actual hardware topology. 

Right.

> The ACPI specification provides _EDL method to
> tell OS the eject device list, but still has no method to tell OS the add 
> device
> list now.

Yes, but I do not think the OS needs special handling for add...

> For some cases, OS should analyze the dependency in the validate phase. For 
> example,
> when hot remove a node (container device), OS should analyze the dependency 
> to get
> the remove order as following:
> 1) Host bridge;
> 2) Memory devices;
> 3) Processor devices;
> 4) Container device itself;

This may be off-topic, but how do you plan to delete I/O devices under a
node?  Are you planning to delete all I/O devices along with the node?

On other OS, we made a separate step called I/O chassis delete, which
off-lines all I/O devices under the node, and is required before a node
hot-remove.  It basically triggers PCIe hot-remove to detach drivers
from all devices.  It does not eject the devices so that they do not
have to be on hot-plug slots.  This step runs user-space scripts to
verify if the devices can be off-lined without disrupting user's
applications, and provides comprehensive reports if any of them are in
use.  Not sure if Linux's PCI hot-remove has such check, but I thought
I'd mention it. :)

> In this way, we can check that all the devices are hot-plugble or not under 
> 

Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-03 Thread Toshi Kani
On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
 On 2012/11/30 6:27, Toshi Kani wrote:
  On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
  On 2012/11/29 2:41, Toshi Kani wrote:
  On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
  On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
  As you may know, the ACPI based hotplug framework we are working on 
  already addressed
  this problem, and the way we slove this problem is a bit like yours.
 
  We introduce hp_ops in struct acpi_device_ops:
  struct acpi_device_ops {
   acpi_op_add add;
   acpi_op_remove remove;
   acpi_op_start start;
   acpi_op_bind bind;
   acpi_op_unbind unbind;
   acpi_op_notify notify;
  #ifdef   CONFIG_ACPI_HOTPLUG
   struct acpihp_dev_ops *hp_ops;
  #endif   /* CONFIG_ACPI_HOTPLUG */
  };
 
  in hp_ops, we divide the prepare_remove into six small steps, that is:
  1) pre_release(): optional step to mark device going to be removed/busy
  2) release(): reclaim device from running system
  3) post_release(): rollback if cancelled by user or error happened
  4) pre_unconfigure(): optional step to solve possible dependency issue
  5) unconfigure(): remove devices from running system
  6) post_unconfigure(): free resources used by devices
 
  In this way, we can easily rollback if error happens.
  How do you think of this solution, any suggestion ? I think we can 
  achieve
  a better way for sharing ideas. :)
 
  Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
  have not looked at all your changes yet..), but in my mind, a hot-plug
  operation should be composed with the following 3 phases.
 
  Good idea ! we also implement a hot-plug operation in 3 phases:
  1) acpihp_drv_pre_execute
  2) acpihp_drv_execute
  3) acpihp_drv_post_execute
  you may refer to :
  https://lkml.org/lkml/2012/11/4/79
  
  Great.  Yes, I will take a look.
 
 Thanks, any comments are welcomed :)

If I read the code right, the framework calls ACPI drivers differently
at boot-time and hot-add as follows.  That is, the new entry points are
called at hot-add only, but .add() is called at both cases.  This
requires .add() to work differently.

Boot: .add()
Hot-Add : .add(), .pre_configure(), configure(), etc.

I think the boot-time and hot-add initialization should be done
consistently.  While there is difficulty with the current boot sequence,
the framework should be designed to allow them consistent, not make them
diverged.

  1. Validate phase - Verify if the request is a supported operation.  All
  known restrictions are verified at this phase.  For instance, if a
  hot-remove request involves kernel memory, it is failed in this phase.
  Since this phase makes no change, no rollback is necessary to fail. 
 
  Yes, we have done this in acpihp_drv_pre_execute, and check following 
  things:
 
  1) Hot-plugble or not. the instance kernel memory you mentioned is also 
  checked
 when memory device remove;
  
  Agreed.
  
  2) Dependency check involved. For instance, if hot-add a memory device,
 processor should be added first, otherwise it's not valid to this 
  operation.
  
  I think FW should be the one that assures such dependency.  That is,
  when a memory device object is marked as present/enabled/functioning, it
  should be ready for the OS to use.
 
 Yes, BIOS should do something for the dependency, because BIOS knows the
 actual hardware topology. 

Right.

 The ACPI specification provides _EDL method to
 tell OS the eject device list, but still has no method to tell OS the add 
 device
 list now.

Yes, but I do not think the OS needs special handling for add...

 For some cases, OS should analyze the dependency in the validate phase. For 
 example,
 when hot remove a node (container device), OS should analyze the dependency 
 to get
 the remove order as following:
 1) Host bridge;
 2) Memory devices;
 3) Processor devices;
 4) Container device itself;

This may be off-topic, but how do you plan to delete I/O devices under a
node?  Are you planning to delete all I/O devices along with the node?

On other OS, we made a separate step called I/O chassis delete, which
off-lines all I/O devices under the node, and is required before a node
hot-remove.  It basically triggers PCIe hot-remove to detach drivers
from all devices.  It does not eject the devices so that they do not
have to be on hot-plug slots.  This step runs user-space scripts to
verify if the devices can be off-lined without disrupting user's
applications, and provides comprehensive reports if any of them are in
use.  Not sure if Linux's PCI hot-remove has such check, but I thought
I'd mention it. :)

 In this way, we can check that all the devices are hot-plugble or not under 
 the
 container device before execute phase, and further more, we can remove devices
 in order to avoid some crash problems.

Yes, we should check if all the resources under the node can be
off-lined at validate phase.  (note, all the devices do not have to have
_EJ0 if that's what 

Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-02 Thread Hanjun Guo
On 2012/11/30 6:27, Toshi Kani wrote:
> On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
>> On 2012/11/29 2:41, Toshi Kani wrote:
>>> On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
 On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
> As discussed in https://patchwork.kernel.org/patch/1581581/
> the driver core remove function needs to always succeed. This means we 
> need
> to know that the device can be successfully removed before acpi_bus_trim 
> / 
> acpi_bus_hot_remove_device are called. This can cause panics when 
> OSPM-initiated
> or SCI-initiated eject of memory devices fail e.g with:
> echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
>
> since the ACPI core goes ahead and ejects the device regardless of 
> whether the
> the memory is still in use or not.
>
> For this reason a new acpi_device operation called prepare_remove is 
> introduced.
> This operation should be registered for acpi devices whose removal (from 
> kernel
> perspective) can fail.  Memory devices fall in this category.
>
> acpi_bus_remove() is changed to handle removal in 2 steps:
> - preparation for removal i.e. perform part of removal that can fail. 
> Should
>   succeed for device and all its children.
> - if above step was successfull, proceed to actual device removal

 Hi Vasilis,
 We met the same problem when we doing computer node hotplug, It is a good 
 idea
 to introduce prepare_remove before actual device removal.

 I think we could do more in prepare_remove, such as rollback. In most 
 cases, we can
 offline most of memory sections except kernel used pages now, should we 
 rollback
 and online the memory sections when prepare_remove failed ?
>>>
>>> I think hot-plug operation should have all-or-nothing semantics.  That
>>> is, an operation should either complete successfully, or rollback to the
>>> original state.
>>
>> Yes, we have the same point of view with you. We handle this problem in the 
>> ACPI
>> based hot-plug framework as following:
>> 1) hot add / hot remove complete successfully if no error happens;
>> 2) automatic rollback to the original state if meets some error ;
>> 3) rollback to the original if hot-plug operation cancelled by user ;
> 
> Cool!
>  
 As you may know, the ACPI based hotplug framework we are working on 
 already addressed
 this problem, and the way we slove this problem is a bit like yours.

 We introduce hp_ops in struct acpi_device_ops:
 struct acpi_device_ops {
acpi_op_add add;
acpi_op_remove remove;
acpi_op_start start;
acpi_op_bind bind;
acpi_op_unbind unbind;
acpi_op_notify notify;
 #ifdef CONFIG_ACPI_HOTPLUG
struct acpihp_dev_ops *hp_ops;
 #endif /* CONFIG_ACPI_HOTPLUG */
 };

 in hp_ops, we divide the prepare_remove into six small steps, that is:
 1) pre_release(): optional step to mark device going to be removed/busy
 2) release(): reclaim device from running system
 3) post_release(): rollback if cancelled by user or error happened
 4) pre_unconfigure(): optional step to solve possible dependency issue
 5) unconfigure(): remove devices from running system
 6) post_unconfigure(): free resources used by devices

 In this way, we can easily rollback if error happens.
 How do you think of this solution, any suggestion ? I think we can achieve
 a better way for sharing ideas. :)
>>>
>>> Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
>>> have not looked at all your changes yet..), but in my mind, a hot-plug
>>> operation should be composed with the following 3 phases.
>>
>> Good idea ! we also implement a hot-plug operation in 3 phases:
>> 1) acpihp_drv_pre_execute
>> 2) acpihp_drv_execute
>> 3) acpihp_drv_post_execute
>> you may refer to :
>> https://lkml.org/lkml/2012/11/4/79
> 
> Great.  Yes, I will take a look.

Thanks, any comments are welcomed :)

>  
>>> 1. Validate phase - Verify if the request is a supported operation.  All
>>> known restrictions are verified at this phase.  For instance, if a
>>> hot-remove request involves kernel memory, it is failed in this phase.
>>> Since this phase makes no change, no rollback is necessary to fail. 
>>
>> Yes, we have done this in acpihp_drv_pre_execute, and check following things:
>>
>> 1) Hot-plugble or not. the instance kernel memory you mentioned is also 
>> checked
>>when memory device remove;
> 
> Agreed.
> 
>> 2) Dependency check involved. For instance, if hot-add a memory device,
>>processor should be added first, otherwise it's not valid to this 
>> operation.
> 
> I think FW should be the one that assures such dependency.  That is,
> when a memory device object is marked as present/enabled/functioning, it
> should be ready for the OS to use.

Yes, BIOS should do something for the dependency, 

Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-02 Thread Hanjun Guo
On 2012/11/30 6:27, Toshi Kani wrote:
 On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
 On 2012/11/29 2:41, Toshi Kani wrote:
 On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
 On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
 As discussed in https://patchwork.kernel.org/patch/1581581/
 the driver core remove function needs to always succeed. This means we 
 need
 to know that the device can be successfully removed before acpi_bus_trim 
 / 
 acpi_bus_hot_remove_device are called. This can cause panics when 
 OSPM-initiated
 or SCI-initiated eject of memory devices fail e.g with:
 echo 1 /sys/bus/pci/devices/PNP0C80:XX/eject

 since the ACPI core goes ahead and ejects the device regardless of 
 whether the
 the memory is still in use or not.

 For this reason a new acpi_device operation called prepare_remove is 
 introduced.
 This operation should be registered for acpi devices whose removal (from 
 kernel
 perspective) can fail.  Memory devices fall in this category.

 acpi_bus_remove() is changed to handle removal in 2 steps:
 - preparation for removal i.e. perform part of removal that can fail. 
 Should
   succeed for device and all its children.
 - if above step was successfull, proceed to actual device removal

 Hi Vasilis,
 We met the same problem when we doing computer node hotplug, It is a good 
 idea
 to introduce prepare_remove before actual device removal.

 I think we could do more in prepare_remove, such as rollback. In most 
 cases, we can
 offline most of memory sections except kernel used pages now, should we 
 rollback
 and online the memory sections when prepare_remove failed ?

 I think hot-plug operation should have all-or-nothing semantics.  That
 is, an operation should either complete successfully, or rollback to the
 original state.

 Yes, we have the same point of view with you. We handle this problem in the 
 ACPI
 based hot-plug framework as following:
 1) hot add / hot remove complete successfully if no error happens;
 2) automatic rollback to the original state if meets some error ;
 3) rollback to the original if hot-plug operation cancelled by user ;
 
 Cool!
  
 As you may know, the ACPI based hotplug framework we are working on 
 already addressed
 this problem, and the way we slove this problem is a bit like yours.

 We introduce hp_ops in struct acpi_device_ops:
 struct acpi_device_ops {
acpi_op_add add;
acpi_op_remove remove;
acpi_op_start start;
acpi_op_bind bind;
acpi_op_unbind unbind;
acpi_op_notify notify;
 #ifdef CONFIG_ACPI_HOTPLUG
struct acpihp_dev_ops *hp_ops;
 #endif /* CONFIG_ACPI_HOTPLUG */
 };

 in hp_ops, we divide the prepare_remove into six small steps, that is:
 1) pre_release(): optional step to mark device going to be removed/busy
 2) release(): reclaim device from running system
 3) post_release(): rollback if cancelled by user or error happened
 4) pre_unconfigure(): optional step to solve possible dependency issue
 5) unconfigure(): remove devices from running system
 6) post_unconfigure(): free resources used by devices

 In this way, we can easily rollback if error happens.
 How do you think of this solution, any suggestion ? I think we can achieve
 a better way for sharing ideas. :)

 Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
 have not looked at all your changes yet..), but in my mind, a hot-plug
 operation should be composed with the following 3 phases.

 Good idea ! we also implement a hot-plug operation in 3 phases:
 1) acpihp_drv_pre_execute
 2) acpihp_drv_execute
 3) acpihp_drv_post_execute
 you may refer to :
 https://lkml.org/lkml/2012/11/4/79
 
 Great.  Yes, I will take a look.

Thanks, any comments are welcomed :)

  
 1. Validate phase - Verify if the request is a supported operation.  All
 known restrictions are verified at this phase.  For instance, if a
 hot-remove request involves kernel memory, it is failed in this phase.
 Since this phase makes no change, no rollback is necessary to fail. 

 Yes, we have done this in acpihp_drv_pre_execute, and check following things:

 1) Hot-plugble or not. the instance kernel memory you mentioned is also 
 checked
when memory device remove;
 
 Agreed.
 
 2) Dependency check involved. For instance, if hot-add a memory device,
processor should be added first, otherwise it's not valid to this 
 operation.
 
 I think FW should be the one that assures such dependency.  That is,
 when a memory device object is marked as present/enabled/functioning, it
 should be ready for the OS to use.

Yes, BIOS should do something for the dependency, because BIOS knows the
actual hardware topology. The ACPI specification provides _EDL method to
tell OS the eject device list, but still has no method to tell OS the add device
list now.

For some cases, OS should analyze the dependency in the validate phase. For 
example,
when hot remove a node (container device), OS should analyze the dependency to 
get
the remove order as following:
1) 

Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-11-29 Thread Toshi Kani
On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
> On 2012/11/29 2:41, Toshi Kani wrote:
> > On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
> >> On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
> >>> As discussed in https://patchwork.kernel.org/patch/1581581/
> >>> the driver core remove function needs to always succeed. This means we 
> >>> need
> >>> to know that the device can be successfully removed before acpi_bus_trim 
> >>> / 
> >>> acpi_bus_hot_remove_device are called. This can cause panics when 
> >>> OSPM-initiated
> >>> or SCI-initiated eject of memory devices fail e.g with:
> >>> echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
> >>>
> >>> since the ACPI core goes ahead and ejects the device regardless of 
> >>> whether the
> >>> the memory is still in use or not.
> >>>
> >>> For this reason a new acpi_device operation called prepare_remove is 
> >>> introduced.
> >>> This operation should be registered for acpi devices whose removal (from 
> >>> kernel
> >>> perspective) can fail.  Memory devices fall in this category.
> >>>
> >>> acpi_bus_remove() is changed to handle removal in 2 steps:
> >>> - preparation for removal i.e. perform part of removal that can fail. 
> >>> Should
> >>>   succeed for device and all its children.
> >>> - if above step was successfull, proceed to actual device removal
> >>
> >> Hi Vasilis,
> >> We met the same problem when we doing computer node hotplug, It is a good 
> >> idea
> >> to introduce prepare_remove before actual device removal.
> >>
> >> I think we could do more in prepare_remove, such as rollback. In most 
> >> cases, we can
> >> offline most of memory sections except kernel used pages now, should we 
> >> rollback
> >> and online the memory sections when prepare_remove failed ?
> > 
> > I think hot-plug operation should have all-or-nothing semantics.  That
> > is, an operation should either complete successfully, or rollback to the
> > original state.
> 
> Yes, we have the same point of view with you. We handle this problem in the 
> ACPI
> based hot-plug framework as following:
> 1) hot add / hot remove complete successfully if no error happens;
> 2) automatic rollback to the original state if meets some error ;
> 3) rollback to the original if hot-plug operation cancelled by user ;

Cool!
 
> >> As you may know, the ACPI based hotplug framework we are working on 
> >> already addressed
> >> this problem, and the way we slove this problem is a bit like yours.
> >>
> >> We introduce hp_ops in struct acpi_device_ops:
> >> struct acpi_device_ops {
> >>acpi_op_add add;
> >>acpi_op_remove remove;
> >>acpi_op_start start;
> >>acpi_op_bind bind;
> >>acpi_op_unbind unbind;
> >>acpi_op_notify notify;
> >> #ifdef CONFIG_ACPI_HOTPLUG
> >>struct acpihp_dev_ops *hp_ops;
> >> #endif /* CONFIG_ACPI_HOTPLUG */
> >> };
> >>
> >> in hp_ops, we divide the prepare_remove into six small steps, that is:
> >> 1) pre_release(): optional step to mark device going to be removed/busy
> >> 2) release(): reclaim device from running system
> >> 3) post_release(): rollback if cancelled by user or error happened
> >> 4) pre_unconfigure(): optional step to solve possible dependency issue
> >> 5) unconfigure(): remove devices from running system
> >> 6) post_unconfigure(): free resources used by devices
> >>
> >> In this way, we can easily rollback if error happens.
> >> How do you think of this solution, any suggestion ? I think we can achieve
> >> a better way for sharing ideas. :)
> > 
> > Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
> > have not looked at all your changes yet..), but in my mind, a hot-plug
> > operation should be composed with the following 3 phases.
> 
> Good idea ! we also implement a hot-plug operation in 3 phases:
> 1) acpihp_drv_pre_execute
> 2) acpihp_drv_execute
> 3) acpihp_drv_post_execute
> you may refer to :
> https://lkml.org/lkml/2012/11/4/79

Great.  Yes, I will take a look.
 
> > 1. Validate phase - Verify if the request is a supported operation.  All
> > known restrictions are verified at this phase.  For instance, if a
> > hot-remove request involves kernel memory, it is failed in this phase.
> > Since this phase makes no change, no rollback is necessary to fail. 
> 
> Yes, we have done this in acpihp_drv_pre_execute, and check following things:
> 
> 1) Hot-plugble or not. the instance kernel memory you mentioned is also 
> checked
>when memory device remove;

Agreed.

> 2) Dependency check involved. For instance, if hot-add a memory device,
>processor should be added first, otherwise it's not valid to this 
> operation.

I think FW should be the one that assures such dependency.  That is,
when a memory device object is marked as present/enabled/functioning, it
should be ready for the OS to use.

> 3) Race condition check. if the device and its dependent device is in hot-plug
>process, another request will be denied.

I agree that hot-plug operation should be 

Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-11-29 Thread Rafael J. Wysocki
On Thursday, November 29, 2012 01:56:17 PM Toshi Kani wrote:
> On Thu, 2012-11-29 at 13:39 -0700, Toshi Kani wrote:
> > On Thu, 2012-11-29 at 21:30 +0100, Rafael J. Wysocki wrote:
> > > On Thursday, November 29, 2012 10:03:12 AM Toshi Kani wrote:
> > > > On Thu, 2012-11-29 at 11:15 +0100, Rafael J. Wysocki wrote:
> > > > > On Wednesday, November 28, 2012 11:41:36 AM Toshi Kani wrote:
> > > > > > 1. Validate phase - Verify if the request is a supported operation. 
> > > > > >  All
> > > > > > known restrictions are verified at this phase.  For instance, if a
> > > > > > hot-remove request involves kernel memory, it is failed in this 
> > > > > > phase.
> > > > > > Since this phase makes no change, no rollback is necessary to fail. 
> > > > > >  
> > > > > 
> > > > > Actually, we can't do it this way, because the conditions may change 
> > > > > between
> > > > > the check and the execution.  So the first phase needs to involve 
> > > > > execution
> > > > > to some extent, although only as far as it remains reversible.
> > > > 
> > > > For memory hot-remove, we can check if the target memory ranges are
> > > > within ZONE_MOVABLE.  We should not allow user to change this setup
> > > > during hot-remove operation.  Other things may be to check if a target
> > > > node contains cpu0 (until it is supported), the console UART (assuming
> > > > we cannot delete it), etc.  We should avoid doing rollback as much as we
> > > > can.
> > > 
> > > Yes, we can make some checks upfront as an optimization and fail early if
> > > the conditions are not met, but for correctness we need to repeat those
> > > checks later anyway.  Once we've decided to go for the eject, the 
> > > conditions
> > > must hold whatever happens.
> > 
> > Agreed.
> 
> BTW, it is not an optimization I am after for this phase.  There are
> many error cases during hot-plug operations.  It is difficult to assure
> that rollback is successful for every error condition in terms of
> testing and maintaining the code.  So, it is easier to fail beforehand
> when possible.

OK, but as I said it is necessary to ensure that the conditions will be met
in the next phases as well if we don't fail.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-11-29 Thread Toshi Kani
On Thu, 2012-11-29 at 13:39 -0700, Toshi Kani wrote:
> On Thu, 2012-11-29 at 21:30 +0100, Rafael J. Wysocki wrote:
> > On Thursday, November 29, 2012 10:03:12 AM Toshi Kani wrote:
> > > On Thu, 2012-11-29 at 11:15 +0100, Rafael J. Wysocki wrote:
> > > > On Wednesday, November 28, 2012 11:41:36 AM Toshi Kani wrote:
> > > > > 1. Validate phase - Verify if the request is a supported operation.  
> > > > > All
> > > > > known restrictions are verified at this phase.  For instance, if a
> > > > > hot-remove request involves kernel memory, it is failed in this phase.
> > > > > Since this phase makes no change, no rollback is necessary to fail.  
> > > > 
> > > > Actually, we can't do it this way, because the conditions may change 
> > > > between
> > > > the check and the execution.  So the first phase needs to involve 
> > > > execution
> > > > to some extent, although only as far as it remains reversible.
> > > 
> > > For memory hot-remove, we can check if the target memory ranges are
> > > within ZONE_MOVABLE.  We should not allow user to change this setup
> > > during hot-remove operation.  Other things may be to check if a target
> > > node contains cpu0 (until it is supported), the console UART (assuming
> > > we cannot delete it), etc.  We should avoid doing rollback as much as we
> > > can.
> > 
> > Yes, we can make some checks upfront as an optimization and fail early if
> > the conditions are not met, but for correctness we need to repeat those
> > checks later anyway.  Once we've decided to go for the eject, the conditions
> > must hold whatever happens.
> 
> Agreed.

BTW, it is not an optimization I am after for this phase.  There are
many error cases during hot-plug operations.  It is difficult to assure
that rollback is successful for every error condition in terms of
testing and maintaining the code.  So, it is easier to fail beforehand
when possible.

Thanks,
-Toshi


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-11-29 Thread Toshi Kani
On Thu, 2012-11-29 at 21:30 +0100, Rafael J. Wysocki wrote:
> On Thursday, November 29, 2012 10:03:12 AM Toshi Kani wrote:
> > On Thu, 2012-11-29 at 11:15 +0100, Rafael J. Wysocki wrote:
> > > On Wednesday, November 28, 2012 11:41:36 AM Toshi Kani wrote:
> > > > On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
> > > > > On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
> > > > > > As discussed in https://patchwork.kernel.org/patch/1581581/
> > > > > > the driver core remove function needs to always succeed. This means 
> > > > > > we need
> > > > > > to know that the device can be successfully removed before 
> > > > > > acpi_bus_trim / 
> > > > > > acpi_bus_hot_remove_device are called. This can cause panics when 
> > > > > > OSPM-initiated
> > > > > > or SCI-initiated eject of memory devices fail e.g with:
> > > > > > echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
> > > > > > 
> > > > > > since the ACPI core goes ahead and ejects the device regardless of 
> > > > > > whether the
> > > > > > the memory is still in use or not.
> > > > > > 
> > > > > > For this reason a new acpi_device operation called prepare_remove 
> > > > > > is introduced.
> > > > > > This operation should be registered for acpi devices whose removal 
> > > > > > (from kernel
> > > > > > perspective) can fail.  Memory devices fall in this category.
> > > > > > 
> > > > > > acpi_bus_remove() is changed to handle removal in 2 steps:
> > > > > > - preparation for removal i.e. perform part of removal that can 
> > > > > > fail. Should
> > > > > >   succeed for device and all its children.
> > > > > > - if above step was successfull, proceed to actual device removal
> > > > > 
> > > > > Hi Vasilis,
> > > > > We met the same problem when we doing computer node hotplug, It is a 
> > > > > good idea
> > > > > to introduce prepare_remove before actual device removal.
> > > > > 
> > > > > I think we could do more in prepare_remove, such as rollback. In most 
> > > > > cases, we can
> > > > > offline most of memory sections except kernel used pages now, should 
> > > > > we rollback
> > > > > and online the memory sections when prepare_remove failed ?
> > > > 
> > > > I think hot-plug operation should have all-or-nothing semantics.  That
> > > > is, an operation should either complete successfully, or rollback to the
> > > > original state.
> > > 
> > > That's correct.
> > > 
> > > > > As you may know, the ACPI based hotplug framework we are working on 
> > > > > already addressed
> > > > > this problem, and the way we slove this problem is a bit like yours.
> > > > > 
> > > > > We introduce hp_ops in struct acpi_device_ops:
> > > > > struct acpi_device_ops {
> > > > >   acpi_op_add add;
> > > > >   acpi_op_remove remove;
> > > > >   acpi_op_start start;
> > > > >   acpi_op_bind bind;
> > > > >   acpi_op_unbind unbind;
> > > > >   acpi_op_notify notify;
> > > > > #ifdefCONFIG_ACPI_HOTPLUG
> > > > >   struct acpihp_dev_ops *hp_ops;
> > > > > #endif/* CONFIG_ACPI_HOTPLUG */
> > > > > };
> > > > > 
> > > > > in hp_ops, we divide the prepare_remove into six small steps, that is:
> > > > > 1) pre_release(): optional step to mark device going to be 
> > > > > removed/busy
> > > > > 2) release(): reclaim device from running system
> > > > > 3) post_release(): rollback if cancelled by user or error happened
> > > > > 4) pre_unconfigure(): optional step to solve possible dependency issue
> > > > > 5) unconfigure(): remove devices from running system
> > > > > 6) post_unconfigure(): free resources used by devices
> > > > > 
> > > > > In this way, we can easily rollback if error happens.
> > > > > How do you think of this solution, any suggestion ? I think we can 
> > > > > achieve
> > > > > a better way for sharing ideas. :)
> > > > 
> > > > Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
> > > > have not looked at all your changes yet..), but in my mind, a hot-plug
> > > > operation should be composed with the following 3 phases.
> > > > 
> > > > 1. Validate phase - Verify if the request is a supported operation.  All
> > > > known restrictions are verified at this phase.  For instance, if a
> > > > hot-remove request involves kernel memory, it is failed in this phase.
> > > > Since this phase makes no change, no rollback is necessary to fail.  
> > > 
> > > Actually, we can't do it this way, because the conditions may change 
> > > between
> > > the check and the execution.  So the first phase needs to involve 
> > > execution
> > > to some extent, although only as far as it remains reversible.
> > 
> > For memory hot-remove, we can check if the target memory ranges are
> > within ZONE_MOVABLE.  We should not allow user to change this setup
> > during hot-remove operation.  Other things may be to check if a target
> > node contains cpu0 (until it is supported), the console UART (assuming
> > we cannot delete it), etc.  We should avoid doing rollback as much as we
> > can.
> 
> Yes, we 

Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-11-29 Thread Rafael J. Wysocki
On Thursday, November 29, 2012 10:03:12 AM Toshi Kani wrote:
> On Thu, 2012-11-29 at 11:15 +0100, Rafael J. Wysocki wrote:
> > On Wednesday, November 28, 2012 11:41:36 AM Toshi Kani wrote:
> > > On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
> > > > On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
> > > > > As discussed in https://patchwork.kernel.org/patch/1581581/
> > > > > the driver core remove function needs to always succeed. This means 
> > > > > we need
> > > > > to know that the device can be successfully removed before 
> > > > > acpi_bus_trim / 
> > > > > acpi_bus_hot_remove_device are called. This can cause panics when 
> > > > > OSPM-initiated
> > > > > or SCI-initiated eject of memory devices fail e.g with:
> > > > > echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
> > > > > 
> > > > > since the ACPI core goes ahead and ejects the device regardless of 
> > > > > whether the
> > > > > the memory is still in use or not.
> > > > > 
> > > > > For this reason a new acpi_device operation called prepare_remove is 
> > > > > introduced.
> > > > > This operation should be registered for acpi devices whose removal 
> > > > > (from kernel
> > > > > perspective) can fail.  Memory devices fall in this category.
> > > > > 
> > > > > acpi_bus_remove() is changed to handle removal in 2 steps:
> > > > > - preparation for removal i.e. perform part of removal that can fail. 
> > > > > Should
> > > > >   succeed for device and all its children.
> > > > > - if above step was successfull, proceed to actual device removal
> > > > 
> > > > Hi Vasilis,
> > > > We met the same problem when we doing computer node hotplug, It is a 
> > > > good idea
> > > > to introduce prepare_remove before actual device removal.
> > > > 
> > > > I think we could do more in prepare_remove, such as rollback. In most 
> > > > cases, we can
> > > > offline most of memory sections except kernel used pages now, should we 
> > > > rollback
> > > > and online the memory sections when prepare_remove failed ?
> > > 
> > > I think hot-plug operation should have all-or-nothing semantics.  That
> > > is, an operation should either complete successfully, or rollback to the
> > > original state.
> > 
> > That's correct.
> > 
> > > > As you may know, the ACPI based hotplug framework we are working on 
> > > > already addressed
> > > > this problem, and the way we slove this problem is a bit like yours.
> > > > 
> > > > We introduce hp_ops in struct acpi_device_ops:
> > > > struct acpi_device_ops {
> > > > acpi_op_add add;
> > > > acpi_op_remove remove;
> > > > acpi_op_start start;
> > > > acpi_op_bind bind;
> > > > acpi_op_unbind unbind;
> > > > acpi_op_notify notify;
> > > > #ifdef  CONFIG_ACPI_HOTPLUG
> > > > struct acpihp_dev_ops *hp_ops;
> > > > #endif  /* CONFIG_ACPI_HOTPLUG */
> > > > };
> > > > 
> > > > in hp_ops, we divide the prepare_remove into six small steps, that is:
> > > > 1) pre_release(): optional step to mark device going to be removed/busy
> > > > 2) release(): reclaim device from running system
> > > > 3) post_release(): rollback if cancelled by user or error happened
> > > > 4) pre_unconfigure(): optional step to solve possible dependency issue
> > > > 5) unconfigure(): remove devices from running system
> > > > 6) post_unconfigure(): free resources used by devices
> > > > 
> > > > In this way, we can easily rollback if error happens.
> > > > How do you think of this solution, any suggestion ? I think we can 
> > > > achieve
> > > > a better way for sharing ideas. :)
> > > 
> > > Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
> > > have not looked at all your changes yet..), but in my mind, a hot-plug
> > > operation should be composed with the following 3 phases.
> > > 
> > > 1. Validate phase - Verify if the request is a supported operation.  All
> > > known restrictions are verified at this phase.  For instance, if a
> > > hot-remove request involves kernel memory, it is failed in this phase.
> > > Since this phase makes no change, no rollback is necessary to fail.  
> > 
> > Actually, we can't do it this way, because the conditions may change between
> > the check and the execution.  So the first phase needs to involve execution
> > to some extent, although only as far as it remains reversible.
> 
> For memory hot-remove, we can check if the target memory ranges are
> within ZONE_MOVABLE.  We should not allow user to change this setup
> during hot-remove operation.  Other things may be to check if a target
> node contains cpu0 (until it is supported), the console UART (assuming
> we cannot delete it), etc.  We should avoid doing rollback as much as we
> can.

Yes, we can make some checks upfront as an optimization and fail early if
the conditions are not met, but for correctness we need to repeat those
checks later anyway.  Once we've decided to go for the eject, the conditions
must hold whatever happens.

Thanks,
Rafael


-- 
I speak only for 

Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-11-29 Thread Toshi Kani
On Thu, 2012-11-29 at 11:15 +0100, Rafael J. Wysocki wrote:
> On Wednesday, November 28, 2012 11:41:36 AM Toshi Kani wrote:
> > On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
> > > On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
> > > > As discussed in https://patchwork.kernel.org/patch/1581581/
> > > > the driver core remove function needs to always succeed. This means we 
> > > > need
> > > > to know that the device can be successfully removed before 
> > > > acpi_bus_trim / 
> > > > acpi_bus_hot_remove_device are called. This can cause panics when 
> > > > OSPM-initiated
> > > > or SCI-initiated eject of memory devices fail e.g with:
> > > > echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
> > > > 
> > > > since the ACPI core goes ahead and ejects the device regardless of 
> > > > whether the
> > > > the memory is still in use or not.
> > > > 
> > > > For this reason a new acpi_device operation called prepare_remove is 
> > > > introduced.
> > > > This operation should be registered for acpi devices whose removal 
> > > > (from kernel
> > > > perspective) can fail.  Memory devices fall in this category.
> > > > 
> > > > acpi_bus_remove() is changed to handle removal in 2 steps:
> > > > - preparation for removal i.e. perform part of removal that can fail. 
> > > > Should
> > > >   succeed for device and all its children.
> > > > - if above step was successfull, proceed to actual device removal
> > > 
> > > Hi Vasilis,
> > > We met the same problem when we doing computer node hotplug, It is a good 
> > > idea
> > > to introduce prepare_remove before actual device removal.
> > > 
> > > I think we could do more in prepare_remove, such as rollback. In most 
> > > cases, we can
> > > offline most of memory sections except kernel used pages now, should we 
> > > rollback
> > > and online the memory sections when prepare_remove failed ?
> > 
> > I think hot-plug operation should have all-or-nothing semantics.  That
> > is, an operation should either complete successfully, or rollback to the
> > original state.
> 
> That's correct.
> 
> > > As you may know, the ACPI based hotplug framework we are working on 
> > > already addressed
> > > this problem, and the way we slove this problem is a bit like yours.
> > > 
> > > We introduce hp_ops in struct acpi_device_ops:
> > > struct acpi_device_ops {
> > >   acpi_op_add add;
> > >   acpi_op_remove remove;
> > >   acpi_op_start start;
> > >   acpi_op_bind bind;
> > >   acpi_op_unbind unbind;
> > >   acpi_op_notify notify;
> > > #ifdefCONFIG_ACPI_HOTPLUG
> > >   struct acpihp_dev_ops *hp_ops;
> > > #endif/* CONFIG_ACPI_HOTPLUG */
> > > };
> > > 
> > > in hp_ops, we divide the prepare_remove into six small steps, that is:
> > > 1) pre_release(): optional step to mark device going to be removed/busy
> > > 2) release(): reclaim device from running system
> > > 3) post_release(): rollback if cancelled by user or error happened
> > > 4) pre_unconfigure(): optional step to solve possible dependency issue
> > > 5) unconfigure(): remove devices from running system
> > > 6) post_unconfigure(): free resources used by devices
> > > 
> > > In this way, we can easily rollback if error happens.
> > > How do you think of this solution, any suggestion ? I think we can achieve
> > > a better way for sharing ideas. :)
> > 
> > Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
> > have not looked at all your changes yet..), but in my mind, a hot-plug
> > operation should be composed with the following 3 phases.
> > 
> > 1. Validate phase - Verify if the request is a supported operation.  All
> > known restrictions are verified at this phase.  For instance, if a
> > hot-remove request involves kernel memory, it is failed in this phase.
> > Since this phase makes no change, no rollback is necessary to fail.  
> 
> Actually, we can't do it this way, because the conditions may change between
> the check and the execution.  So the first phase needs to involve execution
> to some extent, although only as far as it remains reversible.

For memory hot-remove, we can check if the target memory ranges are
within ZONE_MOVABLE.  We should not allow user to change this setup
during hot-remove operation.  Other things may be to check if a target
node contains cpu0 (until it is supported), the console UART (assuming
we cannot delete it), etc.  We should avoid doing rollback as much as we
can.

Thanks,
-Toshi


> > 2. Execute phase - Perform hot-add / hot-remove operation that can be
> > rolled-back in case of error or cancel.
> 
> I would just merge 1 and 2.
> 
> > 3. Commit phase - Perform the final hot-add / hot-remove operation that
> > cannot be rolled-back.  No error / cancel is allowed in this phase.  For
> > instance, eject operation is performed at this phase.  
> 
> Yup.
> 
> Thanks,
> Rafael
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-11-29 Thread Vasilis Liaskovitis
On Thu, Nov 29, 2012 at 11:15:31AM +0100, Rafael J. Wysocki wrote:
> On Wednesday, November 28, 2012 11:41:36 AM Toshi Kani wrote:
> > On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
> > > We met the same problem when we doing computer node hotplug, It is a good 
> > > idea
> > > to introduce prepare_remove before actual device removal.
> > > 
> > > I think we could do more in prepare_remove, such as rollback. In most 
> > > cases, we can
> > > offline most of memory sections except kernel used pages now, should we 
> > > rollback
> > > and online the memory sections when prepare_remove failed ?
> > 
> > I think hot-plug operation should have all-or-nothing semantics.  That
> > is, an operation should either complete successfully, or rollback to the
> > original state.
> 
> That's correct.
> 
> > > As you may know, the ACPI based hotplug framework we are working on 
> > > already addressed
> > > this problem, and the way we slove this problem is a bit like yours.
> > > 
> > > We introduce hp_ops in struct acpi_device_ops:
> > > struct acpi_device_ops {
> > >   acpi_op_add add;
> > >   acpi_op_remove remove;
> > >   acpi_op_start start;
> > >   acpi_op_bind bind;
> > >   acpi_op_unbind unbind;
> > >   acpi_op_notify notify;
> > > #ifdefCONFIG_ACPI_HOTPLUG
> > >   struct acpihp_dev_ops *hp_ops;
> > > #endif/* CONFIG_ACPI_HOTPLUG */
> > > };
> > > 
> > > in hp_ops, we divide the prepare_remove into six small steps, that is:
> > > 1) pre_release(): optional step to mark device going to be removed/busy
> > > 2) release(): reclaim device from running system
> > > 3) post_release(): rollback if cancelled by user or error happened
> > > 4) pre_unconfigure(): optional step to solve possible dependency issue
> > > 5) unconfigure(): remove devices from running system
> > > 6) post_unconfigure(): free resources used by devices
> > > 
> > > In this way, we can easily rollback if error happens.
> > > How do you think of this solution, any suggestion ? I think we can achieve
> > > a better way for sharing ideas. :)
> > 
> > Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
> > have not looked at all your changes yet..), but in my mind, a hot-plug
> > operation should be composed with the following 3 phases.
> > 
> > 1. Validate phase - Verify if the request is a supported operation.  All
> > known restrictions are verified at this phase.  For instance, if a
> > hot-remove request involves kernel memory, it is failed in this phase.
> > Since this phase makes no change, no rollback is necessary to fail.  
> 
> Actually, we can't do it this way, because the conditions may change between
> the check and the execution.  So the first phase needs to involve execution
> to some extent, although only as far as it remains reversible.
> 
> > 2. Execute phase - Perform hot-add / hot-remove operation that can be
> > rolled-back in case of error or cancel.
> 
> I would just merge 1 and 2.

I agree steps 1 and 2 can be merged, at least for the current ACPI framework.
E.g. for memory hotplug, the mm function we call for memory removal
(remove_memory) handles both these steps.

The new ACPI framework could perhaps expand the operations as Hanjun described,
if it makes sense.

thanks,

- Vasilis
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-11-29 Thread Rafael J. Wysocki
On Wednesday, November 28, 2012 11:41:36 AM Toshi Kani wrote:
> On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
> > On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
> > > As discussed in https://patchwork.kernel.org/patch/1581581/
> > > the driver core remove function needs to always succeed. This means we 
> > > need
> > > to know that the device can be successfully removed before acpi_bus_trim 
> > > / 
> > > acpi_bus_hot_remove_device are called. This can cause panics when 
> > > OSPM-initiated
> > > or SCI-initiated eject of memory devices fail e.g with:
> > > echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
> > > 
> > > since the ACPI core goes ahead and ejects the device regardless of 
> > > whether the
> > > the memory is still in use or not.
> > > 
> > > For this reason a new acpi_device operation called prepare_remove is 
> > > introduced.
> > > This operation should be registered for acpi devices whose removal (from 
> > > kernel
> > > perspective) can fail.  Memory devices fall in this category.
> > > 
> > > acpi_bus_remove() is changed to handle removal in 2 steps:
> > > - preparation for removal i.e. perform part of removal that can fail. 
> > > Should
> > >   succeed for device and all its children.
> > > - if above step was successfull, proceed to actual device removal
> > 
> > Hi Vasilis,
> > We met the same problem when we doing computer node hotplug, It is a good 
> > idea
> > to introduce prepare_remove before actual device removal.
> > 
> > I think we could do more in prepare_remove, such as rollback. In most 
> > cases, we can
> > offline most of memory sections except kernel used pages now, should we 
> > rollback
> > and online the memory sections when prepare_remove failed ?
> 
> I think hot-plug operation should have all-or-nothing semantics.  That
> is, an operation should either complete successfully, or rollback to the
> original state.

That's correct.

> > As you may know, the ACPI based hotplug framework we are working on already 
> > addressed
> > this problem, and the way we slove this problem is a bit like yours.
> > 
> > We introduce hp_ops in struct acpi_device_ops:
> > struct acpi_device_ops {
> > acpi_op_add add;
> > acpi_op_remove remove;
> > acpi_op_start start;
> > acpi_op_bind bind;
> > acpi_op_unbind unbind;
> > acpi_op_notify notify;
> > #ifdef  CONFIG_ACPI_HOTPLUG
> > struct acpihp_dev_ops *hp_ops;
> > #endif  /* CONFIG_ACPI_HOTPLUG */
> > };
> > 
> > in hp_ops, we divide the prepare_remove into six small steps, that is:
> > 1) pre_release(): optional step to mark device going to be removed/busy
> > 2) release(): reclaim device from running system
> > 3) post_release(): rollback if cancelled by user or error happened
> > 4) pre_unconfigure(): optional step to solve possible dependency issue
> > 5) unconfigure(): remove devices from running system
> > 6) post_unconfigure(): free resources used by devices
> > 
> > In this way, we can easily rollback if error happens.
> > How do you think of this solution, any suggestion ? I think we can achieve
> > a better way for sharing ideas. :)
> 
> Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
> have not looked at all your changes yet..), but in my mind, a hot-plug
> operation should be composed with the following 3 phases.
> 
> 1. Validate phase - Verify if the request is a supported operation.  All
> known restrictions are verified at this phase.  For instance, if a
> hot-remove request involves kernel memory, it is failed in this phase.
> Since this phase makes no change, no rollback is necessary to fail.  

Actually, we can't do it this way, because the conditions may change between
the check and the execution.  So the first phase needs to involve execution
to some extent, although only as far as it remains reversible.

> 2. Execute phase - Perform hot-add / hot-remove operation that can be
> rolled-back in case of error or cancel.

I would just merge 1 and 2.

> 3. Commit phase - Perform the final hot-add / hot-remove operation that
> cannot be rolled-back.  No error / cancel is allowed in this phase.  For
> instance, eject operation is performed at this phase.  

Yup.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-11-29 Thread Rafael J. Wysocki
On Wednesday, November 28, 2012 11:41:36 AM Toshi Kani wrote:
 On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
  On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
   As discussed in https://patchwork.kernel.org/patch/1581581/
   the driver core remove function needs to always succeed. This means we 
   need
   to know that the device can be successfully removed before acpi_bus_trim 
   / 
   acpi_bus_hot_remove_device are called. This can cause panics when 
   OSPM-initiated
   or SCI-initiated eject of memory devices fail e.g with:
   echo 1 /sys/bus/pci/devices/PNP0C80:XX/eject
   
   since the ACPI core goes ahead and ejects the device regardless of 
   whether the
   the memory is still in use or not.
   
   For this reason a new acpi_device operation called prepare_remove is 
   introduced.
   This operation should be registered for acpi devices whose removal (from 
   kernel
   perspective) can fail.  Memory devices fall in this category.
   
   acpi_bus_remove() is changed to handle removal in 2 steps:
   - preparation for removal i.e. perform part of removal that can fail. 
   Should
 succeed for device and all its children.
   - if above step was successfull, proceed to actual device removal
  
  Hi Vasilis,
  We met the same problem when we doing computer node hotplug, It is a good 
  idea
  to introduce prepare_remove before actual device removal.
  
  I think we could do more in prepare_remove, such as rollback. In most 
  cases, we can
  offline most of memory sections except kernel used pages now, should we 
  rollback
  and online the memory sections when prepare_remove failed ?
 
 I think hot-plug operation should have all-or-nothing semantics.  That
 is, an operation should either complete successfully, or rollback to the
 original state.

That's correct.

  As you may know, the ACPI based hotplug framework we are working on already 
  addressed
  this problem, and the way we slove this problem is a bit like yours.
  
  We introduce hp_ops in struct acpi_device_ops:
  struct acpi_device_ops {
  acpi_op_add add;
  acpi_op_remove remove;
  acpi_op_start start;
  acpi_op_bind bind;
  acpi_op_unbind unbind;
  acpi_op_notify notify;
  #ifdef  CONFIG_ACPI_HOTPLUG
  struct acpihp_dev_ops *hp_ops;
  #endif  /* CONFIG_ACPI_HOTPLUG */
  };
  
  in hp_ops, we divide the prepare_remove into six small steps, that is:
  1) pre_release(): optional step to mark device going to be removed/busy
  2) release(): reclaim device from running system
  3) post_release(): rollback if cancelled by user or error happened
  4) pre_unconfigure(): optional step to solve possible dependency issue
  5) unconfigure(): remove devices from running system
  6) post_unconfigure(): free resources used by devices
  
  In this way, we can easily rollback if error happens.
  How do you think of this solution, any suggestion ? I think we can achieve
  a better way for sharing ideas. :)
 
 Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
 have not looked at all your changes yet..), but in my mind, a hot-plug
 operation should be composed with the following 3 phases.
 
 1. Validate phase - Verify if the request is a supported operation.  All
 known restrictions are verified at this phase.  For instance, if a
 hot-remove request involves kernel memory, it is failed in this phase.
 Since this phase makes no change, no rollback is necessary to fail.  

Actually, we can't do it this way, because the conditions may change between
the check and the execution.  So the first phase needs to involve execution
to some extent, although only as far as it remains reversible.

 2. Execute phase - Perform hot-add / hot-remove operation that can be
 rolled-back in case of error or cancel.

I would just merge 1 and 2.

 3. Commit phase - Perform the final hot-add / hot-remove operation that
 cannot be rolled-back.  No error / cancel is allowed in this phase.  For
 instance, eject operation is performed at this phase.  

Yup.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-11-29 Thread Vasilis Liaskovitis
On Thu, Nov 29, 2012 at 11:15:31AM +0100, Rafael J. Wysocki wrote:
 On Wednesday, November 28, 2012 11:41:36 AM Toshi Kani wrote:
  On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
   We met the same problem when we doing computer node hotplug, It is a good 
   idea
   to introduce prepare_remove before actual device removal.
   
   I think we could do more in prepare_remove, such as rollback. In most 
   cases, we can
   offline most of memory sections except kernel used pages now, should we 
   rollback
   and online the memory sections when prepare_remove failed ?
  
  I think hot-plug operation should have all-or-nothing semantics.  That
  is, an operation should either complete successfully, or rollback to the
  original state.
 
 That's correct.
 
   As you may know, the ACPI based hotplug framework we are working on 
   already addressed
   this problem, and the way we slove this problem is a bit like yours.
   
   We introduce hp_ops in struct acpi_device_ops:
   struct acpi_device_ops {
 acpi_op_add add;
 acpi_op_remove remove;
 acpi_op_start start;
 acpi_op_bind bind;
 acpi_op_unbind unbind;
 acpi_op_notify notify;
   #ifdefCONFIG_ACPI_HOTPLUG
 struct acpihp_dev_ops *hp_ops;
   #endif/* CONFIG_ACPI_HOTPLUG */
   };
   
   in hp_ops, we divide the prepare_remove into six small steps, that is:
   1) pre_release(): optional step to mark device going to be removed/busy
   2) release(): reclaim device from running system
   3) post_release(): rollback if cancelled by user or error happened
   4) pre_unconfigure(): optional step to solve possible dependency issue
   5) unconfigure(): remove devices from running system
   6) post_unconfigure(): free resources used by devices
   
   In this way, we can easily rollback if error happens.
   How do you think of this solution, any suggestion ? I think we can achieve
   a better way for sharing ideas. :)
  
  Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
  have not looked at all your changes yet..), but in my mind, a hot-plug
  operation should be composed with the following 3 phases.
  
  1. Validate phase - Verify if the request is a supported operation.  All
  known restrictions are verified at this phase.  For instance, if a
  hot-remove request involves kernel memory, it is failed in this phase.
  Since this phase makes no change, no rollback is necessary to fail.  
 
 Actually, we can't do it this way, because the conditions may change between
 the check and the execution.  So the first phase needs to involve execution
 to some extent, although only as far as it remains reversible.
 
  2. Execute phase - Perform hot-add / hot-remove operation that can be
  rolled-back in case of error or cancel.
 
 I would just merge 1 and 2.

I agree steps 1 and 2 can be merged, at least for the current ACPI framework.
E.g. for memory hotplug, the mm function we call for memory removal
(remove_memory) handles both these steps.

The new ACPI framework could perhaps expand the operations as Hanjun described,
if it makes sense.

thanks,

- Vasilis
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-11-29 Thread Toshi Kani
On Thu, 2012-11-29 at 11:15 +0100, Rafael J. Wysocki wrote:
 On Wednesday, November 28, 2012 11:41:36 AM Toshi Kani wrote:
  On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
   On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
As discussed in https://patchwork.kernel.org/patch/1581581/
the driver core remove function needs to always succeed. This means we 
need
to know that the device can be successfully removed before 
acpi_bus_trim / 
acpi_bus_hot_remove_device are called. This can cause panics when 
OSPM-initiated
or SCI-initiated eject of memory devices fail e.g with:
echo 1 /sys/bus/pci/devices/PNP0C80:XX/eject

since the ACPI core goes ahead and ejects the device regardless of 
whether the
the memory is still in use or not.

For this reason a new acpi_device operation called prepare_remove is 
introduced.
This operation should be registered for acpi devices whose removal 
(from kernel
perspective) can fail.  Memory devices fall in this category.

acpi_bus_remove() is changed to handle removal in 2 steps:
- preparation for removal i.e. perform part of removal that can fail. 
Should
  succeed for device and all its children.
- if above step was successfull, proceed to actual device removal
   
   Hi Vasilis,
   We met the same problem when we doing computer node hotplug, It is a good 
   idea
   to introduce prepare_remove before actual device removal.
   
   I think we could do more in prepare_remove, such as rollback. In most 
   cases, we can
   offline most of memory sections except kernel used pages now, should we 
   rollback
   and online the memory sections when prepare_remove failed ?
  
  I think hot-plug operation should have all-or-nothing semantics.  That
  is, an operation should either complete successfully, or rollback to the
  original state.
 
 That's correct.
 
   As you may know, the ACPI based hotplug framework we are working on 
   already addressed
   this problem, and the way we slove this problem is a bit like yours.
   
   We introduce hp_ops in struct acpi_device_ops:
   struct acpi_device_ops {
 acpi_op_add add;
 acpi_op_remove remove;
 acpi_op_start start;
 acpi_op_bind bind;
 acpi_op_unbind unbind;
 acpi_op_notify notify;
   #ifdefCONFIG_ACPI_HOTPLUG
 struct acpihp_dev_ops *hp_ops;
   #endif/* CONFIG_ACPI_HOTPLUG */
   };
   
   in hp_ops, we divide the prepare_remove into six small steps, that is:
   1) pre_release(): optional step to mark device going to be removed/busy
   2) release(): reclaim device from running system
   3) post_release(): rollback if cancelled by user or error happened
   4) pre_unconfigure(): optional step to solve possible dependency issue
   5) unconfigure(): remove devices from running system
   6) post_unconfigure(): free resources used by devices
   
   In this way, we can easily rollback if error happens.
   How do you think of this solution, any suggestion ? I think we can achieve
   a better way for sharing ideas. :)
  
  Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
  have not looked at all your changes yet..), but in my mind, a hot-plug
  operation should be composed with the following 3 phases.
  
  1. Validate phase - Verify if the request is a supported operation.  All
  known restrictions are verified at this phase.  For instance, if a
  hot-remove request involves kernel memory, it is failed in this phase.
  Since this phase makes no change, no rollback is necessary to fail.  
 
 Actually, we can't do it this way, because the conditions may change between
 the check and the execution.  So the first phase needs to involve execution
 to some extent, although only as far as it remains reversible.

For memory hot-remove, we can check if the target memory ranges are
within ZONE_MOVABLE.  We should not allow user to change this setup
during hot-remove operation.  Other things may be to check if a target
node contains cpu0 (until it is supported), the console UART (assuming
we cannot delete it), etc.  We should avoid doing rollback as much as we
can.

Thanks,
-Toshi


  2. Execute phase - Perform hot-add / hot-remove operation that can be
  rolled-back in case of error or cancel.
 
 I would just merge 1 and 2.
 
  3. Commit phase - Perform the final hot-add / hot-remove operation that
  cannot be rolled-back.  No error / cancel is allowed in this phase.  For
  instance, eject operation is performed at this phase.  
 
 Yup.
 
 Thanks,
 Rafael
 
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-11-29 Thread Rafael J. Wysocki
On Thursday, November 29, 2012 10:03:12 AM Toshi Kani wrote:
 On Thu, 2012-11-29 at 11:15 +0100, Rafael J. Wysocki wrote:
  On Wednesday, November 28, 2012 11:41:36 AM Toshi Kani wrote:
   On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
 As discussed in https://patchwork.kernel.org/patch/1581581/
 the driver core remove function needs to always succeed. This means 
 we need
 to know that the device can be successfully removed before 
 acpi_bus_trim / 
 acpi_bus_hot_remove_device are called. This can cause panics when 
 OSPM-initiated
 or SCI-initiated eject of memory devices fail e.g with:
 echo 1 /sys/bus/pci/devices/PNP0C80:XX/eject
 
 since the ACPI core goes ahead and ejects the device regardless of 
 whether the
 the memory is still in use or not.
 
 For this reason a new acpi_device operation called prepare_remove is 
 introduced.
 This operation should be registered for acpi devices whose removal 
 (from kernel
 perspective) can fail.  Memory devices fall in this category.
 
 acpi_bus_remove() is changed to handle removal in 2 steps:
 - preparation for removal i.e. perform part of removal that can fail. 
 Should
   succeed for device and all its children.
 - if above step was successfull, proceed to actual device removal

Hi Vasilis,
We met the same problem when we doing computer node hotplug, It is a 
good idea
to introduce prepare_remove before actual device removal.

I think we could do more in prepare_remove, such as rollback. In most 
cases, we can
offline most of memory sections except kernel used pages now, should we 
rollback
and online the memory sections when prepare_remove failed ?
   
   I think hot-plug operation should have all-or-nothing semantics.  That
   is, an operation should either complete successfully, or rollback to the
   original state.
  
  That's correct.
  
As you may know, the ACPI based hotplug framework we are working on 
already addressed
this problem, and the way we slove this problem is a bit like yours.

We introduce hp_ops in struct acpi_device_ops:
struct acpi_device_ops {
acpi_op_add add;
acpi_op_remove remove;
acpi_op_start start;
acpi_op_bind bind;
acpi_op_unbind unbind;
acpi_op_notify notify;
#ifdef  CONFIG_ACPI_HOTPLUG
struct acpihp_dev_ops *hp_ops;
#endif  /* CONFIG_ACPI_HOTPLUG */
};

in hp_ops, we divide the prepare_remove into six small steps, that is:
1) pre_release(): optional step to mark device going to be removed/busy
2) release(): reclaim device from running system
3) post_release(): rollback if cancelled by user or error happened
4) pre_unconfigure(): optional step to solve possible dependency issue
5) unconfigure(): remove devices from running system
6) post_unconfigure(): free resources used by devices

In this way, we can easily rollback if error happens.
How do you think of this solution, any suggestion ? I think we can 
achieve
a better way for sharing ideas. :)
   
   Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
   have not looked at all your changes yet..), but in my mind, a hot-plug
   operation should be composed with the following 3 phases.
   
   1. Validate phase - Verify if the request is a supported operation.  All
   known restrictions are verified at this phase.  For instance, if a
   hot-remove request involves kernel memory, it is failed in this phase.
   Since this phase makes no change, no rollback is necessary to fail.  
  
  Actually, we can't do it this way, because the conditions may change between
  the check and the execution.  So the first phase needs to involve execution
  to some extent, although only as far as it remains reversible.
 
 For memory hot-remove, we can check if the target memory ranges are
 within ZONE_MOVABLE.  We should not allow user to change this setup
 during hot-remove operation.  Other things may be to check if a target
 node contains cpu0 (until it is supported), the console UART (assuming
 we cannot delete it), etc.  We should avoid doing rollback as much as we
 can.

Yes, we can make some checks upfront as an optimization and fail early if
the conditions are not met, but for correctness we need to repeat those
checks later anyway.  Once we've decided to go for the eject, the conditions
must hold whatever happens.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-11-29 Thread Toshi Kani
On Thu, 2012-11-29 at 21:30 +0100, Rafael J. Wysocki wrote:
 On Thursday, November 29, 2012 10:03:12 AM Toshi Kani wrote:
  On Thu, 2012-11-29 at 11:15 +0100, Rafael J. Wysocki wrote:
   On Wednesday, November 28, 2012 11:41:36 AM Toshi Kani wrote:
On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
 On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
  As discussed in https://patchwork.kernel.org/patch/1581581/
  the driver core remove function needs to always succeed. This means 
  we need
  to know that the device can be successfully removed before 
  acpi_bus_trim / 
  acpi_bus_hot_remove_device are called. This can cause panics when 
  OSPM-initiated
  or SCI-initiated eject of memory devices fail e.g with:
  echo 1 /sys/bus/pci/devices/PNP0C80:XX/eject
  
  since the ACPI core goes ahead and ejects the device regardless of 
  whether the
  the memory is still in use or not.
  
  For this reason a new acpi_device operation called prepare_remove 
  is introduced.
  This operation should be registered for acpi devices whose removal 
  (from kernel
  perspective) can fail.  Memory devices fall in this category.
  
  acpi_bus_remove() is changed to handle removal in 2 steps:
  - preparation for removal i.e. perform part of removal that can 
  fail. Should
succeed for device and all its children.
  - if above step was successfull, proceed to actual device removal
 
 Hi Vasilis,
 We met the same problem when we doing computer node hotplug, It is a 
 good idea
 to introduce prepare_remove before actual device removal.
 
 I think we could do more in prepare_remove, such as rollback. In most 
 cases, we can
 offline most of memory sections except kernel used pages now, should 
 we rollback
 and online the memory sections when prepare_remove failed ?

I think hot-plug operation should have all-or-nothing semantics.  That
is, an operation should either complete successfully, or rollback to the
original state.
   
   That's correct.
   
 As you may know, the ACPI based hotplug framework we are working on 
 already addressed
 this problem, and the way we slove this problem is a bit like yours.
 
 We introduce hp_ops in struct acpi_device_ops:
 struct acpi_device_ops {
   acpi_op_add add;
   acpi_op_remove remove;
   acpi_op_start start;
   acpi_op_bind bind;
   acpi_op_unbind unbind;
   acpi_op_notify notify;
 #ifdefCONFIG_ACPI_HOTPLUG
   struct acpihp_dev_ops *hp_ops;
 #endif/* CONFIG_ACPI_HOTPLUG */
 };
 
 in hp_ops, we divide the prepare_remove into six small steps, that is:
 1) pre_release(): optional step to mark device going to be 
 removed/busy
 2) release(): reclaim device from running system
 3) post_release(): rollback if cancelled by user or error happened
 4) pre_unconfigure(): optional step to solve possible dependency issue
 5) unconfigure(): remove devices from running system
 6) post_unconfigure(): free resources used by devices
 
 In this way, we can easily rollback if error happens.
 How do you think of this solution, any suggestion ? I think we can 
 achieve
 a better way for sharing ideas. :)

Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
have not looked at all your changes yet..), but in my mind, a hot-plug
operation should be composed with the following 3 phases.

1. Validate phase - Verify if the request is a supported operation.  All
known restrictions are verified at this phase.  For instance, if a
hot-remove request involves kernel memory, it is failed in this phase.
Since this phase makes no change, no rollback is necessary to fail.  
   
   Actually, we can't do it this way, because the conditions may change 
   between
   the check and the execution.  So the first phase needs to involve 
   execution
   to some extent, although only as far as it remains reversible.
  
  For memory hot-remove, we can check if the target memory ranges are
  within ZONE_MOVABLE.  We should not allow user to change this setup
  during hot-remove operation.  Other things may be to check if a target
  node contains cpu0 (until it is supported), the console UART (assuming
  we cannot delete it), etc.  We should avoid doing rollback as much as we
  can.
 
 Yes, we can make some checks upfront as an optimization and fail early if
 the conditions are not met, but for correctness we need to repeat those
 checks later anyway.  Once we've decided to go for the eject, the conditions
 must hold whatever happens.

Agreed.

Thanks,
-Toshi

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-11-29 Thread Toshi Kani
On Thu, 2012-11-29 at 13:39 -0700, Toshi Kani wrote:
 On Thu, 2012-11-29 at 21:30 +0100, Rafael J. Wysocki wrote:
  On Thursday, November 29, 2012 10:03:12 AM Toshi Kani wrote:
   On Thu, 2012-11-29 at 11:15 +0100, Rafael J. Wysocki wrote:
On Wednesday, November 28, 2012 11:41:36 AM Toshi Kani wrote:
 1. Validate phase - Verify if the request is a supported operation.  
 All
 known restrictions are verified at this phase.  For instance, if a
 hot-remove request involves kernel memory, it is failed in this phase.
 Since this phase makes no change, no rollback is necessary to fail.  

Actually, we can't do it this way, because the conditions may change 
between
the check and the execution.  So the first phase needs to involve 
execution
to some extent, although only as far as it remains reversible.
   
   For memory hot-remove, we can check if the target memory ranges are
   within ZONE_MOVABLE.  We should not allow user to change this setup
   during hot-remove operation.  Other things may be to check if a target
   node contains cpu0 (until it is supported), the console UART (assuming
   we cannot delete it), etc.  We should avoid doing rollback as much as we
   can.
  
  Yes, we can make some checks upfront as an optimization and fail early if
  the conditions are not met, but for correctness we need to repeat those
  checks later anyway.  Once we've decided to go for the eject, the conditions
  must hold whatever happens.
 
 Agreed.

BTW, it is not an optimization I am after for this phase.  There are
many error cases during hot-plug operations.  It is difficult to assure
that rollback is successful for every error condition in terms of
testing and maintaining the code.  So, it is easier to fail beforehand
when possible.

Thanks,
-Toshi


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-11-29 Thread Rafael J. Wysocki
On Thursday, November 29, 2012 01:56:17 PM Toshi Kani wrote:
 On Thu, 2012-11-29 at 13:39 -0700, Toshi Kani wrote:
  On Thu, 2012-11-29 at 21:30 +0100, Rafael J. Wysocki wrote:
   On Thursday, November 29, 2012 10:03:12 AM Toshi Kani wrote:
On Thu, 2012-11-29 at 11:15 +0100, Rafael J. Wysocki wrote:
 On Wednesday, November 28, 2012 11:41:36 AM Toshi Kani wrote:
  1. Validate phase - Verify if the request is a supported operation. 
   All
  known restrictions are verified at this phase.  For instance, if a
  hot-remove request involves kernel memory, it is failed in this 
  phase.
  Since this phase makes no change, no rollback is necessary to fail. 
   
 
 Actually, we can't do it this way, because the conditions may change 
 between
 the check and the execution.  So the first phase needs to involve 
 execution
 to some extent, although only as far as it remains reversible.

For memory hot-remove, we can check if the target memory ranges are
within ZONE_MOVABLE.  We should not allow user to change this setup
during hot-remove operation.  Other things may be to check if a target
node contains cpu0 (until it is supported), the console UART (assuming
we cannot delete it), etc.  We should avoid doing rollback as much as we
can.
   
   Yes, we can make some checks upfront as an optimization and fail early if
   the conditions are not met, but for correctness we need to repeat those
   checks later anyway.  Once we've decided to go for the eject, the 
   conditions
   must hold whatever happens.
  
  Agreed.
 
 BTW, it is not an optimization I am after for this phase.  There are
 many error cases during hot-plug operations.  It is difficult to assure
 that rollback is successful for every error condition in terms of
 testing and maintaining the code.  So, it is easier to fail beforehand
 when possible.

OK, but as I said it is necessary to ensure that the conditions will be met
in the next phases as well if we don't fail.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-11-29 Thread Toshi Kani
On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
 On 2012/11/29 2:41, Toshi Kani wrote:
  On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
  On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
  As discussed in https://patchwork.kernel.org/patch/1581581/
  the driver core remove function needs to always succeed. This means we 
  need
  to know that the device can be successfully removed before acpi_bus_trim 
  / 
  acpi_bus_hot_remove_device are called. This can cause panics when 
  OSPM-initiated
  or SCI-initiated eject of memory devices fail e.g with:
  echo 1 /sys/bus/pci/devices/PNP0C80:XX/eject
 
  since the ACPI core goes ahead and ejects the device regardless of 
  whether the
  the memory is still in use or not.
 
  For this reason a new acpi_device operation called prepare_remove is 
  introduced.
  This operation should be registered for acpi devices whose removal (from 
  kernel
  perspective) can fail.  Memory devices fall in this category.
 
  acpi_bus_remove() is changed to handle removal in 2 steps:
  - preparation for removal i.e. perform part of removal that can fail. 
  Should
succeed for device and all its children.
  - if above step was successfull, proceed to actual device removal
 
  Hi Vasilis,
  We met the same problem when we doing computer node hotplug, It is a good 
  idea
  to introduce prepare_remove before actual device removal.
 
  I think we could do more in prepare_remove, such as rollback. In most 
  cases, we can
  offline most of memory sections except kernel used pages now, should we 
  rollback
  and online the memory sections when prepare_remove failed ?
  
  I think hot-plug operation should have all-or-nothing semantics.  That
  is, an operation should either complete successfully, or rollback to the
  original state.
 
 Yes, we have the same point of view with you. We handle this problem in the 
 ACPI
 based hot-plug framework as following:
 1) hot add / hot remove complete successfully if no error happens;
 2) automatic rollback to the original state if meets some error ;
 3) rollback to the original if hot-plug operation cancelled by user ;

Cool!
 
  As you may know, the ACPI based hotplug framework we are working on 
  already addressed
  this problem, and the way we slove this problem is a bit like yours.
 
  We introduce hp_ops in struct acpi_device_ops:
  struct acpi_device_ops {
 acpi_op_add add;
 acpi_op_remove remove;
 acpi_op_start start;
 acpi_op_bind bind;
 acpi_op_unbind unbind;
 acpi_op_notify notify;
  #ifdef CONFIG_ACPI_HOTPLUG
 struct acpihp_dev_ops *hp_ops;
  #endif /* CONFIG_ACPI_HOTPLUG */
  };
 
  in hp_ops, we divide the prepare_remove into six small steps, that is:
  1) pre_release(): optional step to mark device going to be removed/busy
  2) release(): reclaim device from running system
  3) post_release(): rollback if cancelled by user or error happened
  4) pre_unconfigure(): optional step to solve possible dependency issue
  5) unconfigure(): remove devices from running system
  6) post_unconfigure(): free resources used by devices
 
  In this way, we can easily rollback if error happens.
  How do you think of this solution, any suggestion ? I think we can achieve
  a better way for sharing ideas. :)
  
  Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
  have not looked at all your changes yet..), but in my mind, a hot-plug
  operation should be composed with the following 3 phases.
 
 Good idea ! we also implement a hot-plug operation in 3 phases:
 1) acpihp_drv_pre_execute
 2) acpihp_drv_execute
 3) acpihp_drv_post_execute
 you may refer to :
 https://lkml.org/lkml/2012/11/4/79

Great.  Yes, I will take a look.
 
  1. Validate phase - Verify if the request is a supported operation.  All
  known restrictions are verified at this phase.  For instance, if a
  hot-remove request involves kernel memory, it is failed in this phase.
  Since this phase makes no change, no rollback is necessary to fail. 
 
 Yes, we have done this in acpihp_drv_pre_execute, and check following things:
 
 1) Hot-plugble or not. the instance kernel memory you mentioned is also 
 checked
when memory device remove;

Agreed.

 2) Dependency check involved. For instance, if hot-add a memory device,
processor should be added first, otherwise it's not valid to this 
 operation.

I think FW should be the one that assures such dependency.  That is,
when a memory device object is marked as present/enabled/functioning, it
should be ready for the OS to use.

 3) Race condition check. if the device and its dependent device is in hot-plug
process, another request will be denied.

I agree that hot-plug operation should be serialized.  I think another
request should be either queued or denied based on the caller's intent
(i.e. wait-ok or no-wait). 

 No rollback is needed for the above checks.

Great.

  2. Execute phase - Perform hot-add / hot-remove operation that can be
  rolled-back in case of 

Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-11-28 Thread Hanjun Guo
On 2012/11/29 2:41, Toshi Kani wrote:
> On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
>> On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
>>> As discussed in https://patchwork.kernel.org/patch/1581581/
>>> the driver core remove function needs to always succeed. This means we need
>>> to know that the device can be successfully removed before acpi_bus_trim / 
>>> acpi_bus_hot_remove_device are called. This can cause panics when 
>>> OSPM-initiated
>>> or SCI-initiated eject of memory devices fail e.g with:
>>> echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
>>>
>>> since the ACPI core goes ahead and ejects the device regardless of whether 
>>> the
>>> the memory is still in use or not.
>>>
>>> For this reason a new acpi_device operation called prepare_remove is 
>>> introduced.
>>> This operation should be registered for acpi devices whose removal (from 
>>> kernel
>>> perspective) can fail.  Memory devices fall in this category.
>>>
>>> acpi_bus_remove() is changed to handle removal in 2 steps:
>>> - preparation for removal i.e. perform part of removal that can fail. Should
>>>   succeed for device and all its children.
>>> - if above step was successfull, proceed to actual device removal
>>
>> Hi Vasilis,
>> We met the same problem when we doing computer node hotplug, It is a good 
>> idea
>> to introduce prepare_remove before actual device removal.
>>
>> I think we could do more in prepare_remove, such as rollback. In most cases, 
>> we can
>> offline most of memory sections except kernel used pages now, should we 
>> rollback
>> and online the memory sections when prepare_remove failed ?
> 
> I think hot-plug operation should have all-or-nothing semantics.  That
> is, an operation should either complete successfully, or rollback to the
> original state.

Yes, we have the same point of view with you. We handle this problem in the ACPI
based hot-plug framework as following:
1) hot add / hot remove complete successfully if no error happens;
2) automatic rollback to the original state if meets some error ;
3) rollback to the original if hot-plug operation cancelled by user ;

> 
>> As you may know, the ACPI based hotplug framework we are working on already 
>> addressed
>> this problem, and the way we slove this problem is a bit like yours.
>>
>> We introduce hp_ops in struct acpi_device_ops:
>> struct acpi_device_ops {
>>  acpi_op_add add;
>>  acpi_op_remove remove;
>>  acpi_op_start start;
>>  acpi_op_bind bind;
>>  acpi_op_unbind unbind;
>>  acpi_op_notify notify;
>> #ifdef   CONFIG_ACPI_HOTPLUG
>>  struct acpihp_dev_ops *hp_ops;
>> #endif   /* CONFIG_ACPI_HOTPLUG */
>> };
>>
>> in hp_ops, we divide the prepare_remove into six small steps, that is:
>> 1) pre_release(): optional step to mark device going to be removed/busy
>> 2) release(): reclaim device from running system
>> 3) post_release(): rollback if cancelled by user or error happened
>> 4) pre_unconfigure(): optional step to solve possible dependency issue
>> 5) unconfigure(): remove devices from running system
>> 6) post_unconfigure(): free resources used by devices
>>
>> In this way, we can easily rollback if error happens.
>> How do you think of this solution, any suggestion ? I think we can achieve
>> a better way for sharing ideas. :)
> 
> Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
> have not looked at all your changes yet..), but in my mind, a hot-plug
> operation should be composed with the following 3 phases.

Good idea ! we also implement a hot-plug operation in 3 phases:
1) acpihp_drv_pre_execute
2) acpihp_drv_execute
3) acpihp_drv_post_execute
you may refer to :
https://lkml.org/lkml/2012/11/4/79

> 
> 1. Validate phase - Verify if the request is a supported operation.  All
> known restrictions are verified at this phase.  For instance, if a
> hot-remove request involves kernel memory, it is failed in this phase.
> Since this phase makes no change, no rollback is necessary to fail. 

Yes, we have done this in acpihp_drv_pre_execute, and check following things:

1) Hot-plugble or not. the instance kernel memory you mentioned is also checked
   when memory device remove;

2) Dependency check involved. For instance, if hot-add a memory device,
   processor should be added first, otherwise it's not valid to this operation.

3) Race condition check. if the device and its dependent device is in hot-plug
   process, another request will be denied.

No rollback is needed for the above checks.

> 
> 2. Execute phase - Perform hot-add / hot-remove operation that can be
> rolled-back in case of error or cancel.

In this phase, we introduce a state machine for the hot-plugble device,
please refer to:
https://lkml.org/lkml/2012/11/4/79

I think we have the same idea for the major framework, but the ACPI based
hot-plug framework implement it differently in detail, right ?

Thanks
Hanjun

> 
> 3. Commit phase - Perform the final hot-add / hot-remove operation that
> cannot be 

Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-11-28 Thread Toshi Kani
On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
> On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
> > As discussed in https://patchwork.kernel.org/patch/1581581/
> > the driver core remove function needs to always succeed. This means we need
> > to know that the device can be successfully removed before acpi_bus_trim / 
> > acpi_bus_hot_remove_device are called. This can cause panics when 
> > OSPM-initiated
> > or SCI-initiated eject of memory devices fail e.g with:
> > echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
> > 
> > since the ACPI core goes ahead and ejects the device regardless of whether 
> > the
> > the memory is still in use or not.
> > 
> > For this reason a new acpi_device operation called prepare_remove is 
> > introduced.
> > This operation should be registered for acpi devices whose removal (from 
> > kernel
> > perspective) can fail.  Memory devices fall in this category.
> > 
> > acpi_bus_remove() is changed to handle removal in 2 steps:
> > - preparation for removal i.e. perform part of removal that can fail. Should
> >   succeed for device and all its children.
> > - if above step was successfull, proceed to actual device removal
> 
> Hi Vasilis,
> We met the same problem when we doing computer node hotplug, It is a good idea
> to introduce prepare_remove before actual device removal.
> 
> I think we could do more in prepare_remove, such as rollback. In most cases, 
> we can
> offline most of memory sections except kernel used pages now, should we 
> rollback
> and online the memory sections when prepare_remove failed ?

I think hot-plug operation should have all-or-nothing semantics.  That
is, an operation should either complete successfully, or rollback to the
original state.

> As you may know, the ACPI based hotplug framework we are working on already 
> addressed
> this problem, and the way we slove this problem is a bit like yours.
> 
> We introduce hp_ops in struct acpi_device_ops:
> struct acpi_device_ops {
>   acpi_op_add add;
>   acpi_op_remove remove;
>   acpi_op_start start;
>   acpi_op_bind bind;
>   acpi_op_unbind unbind;
>   acpi_op_notify notify;
> #ifdefCONFIG_ACPI_HOTPLUG
>   struct acpihp_dev_ops *hp_ops;
> #endif/* CONFIG_ACPI_HOTPLUG */
> };
> 
> in hp_ops, we divide the prepare_remove into six small steps, that is:
> 1) pre_release(): optional step to mark device going to be removed/busy
> 2) release(): reclaim device from running system
> 3) post_release(): rollback if cancelled by user or error happened
> 4) pre_unconfigure(): optional step to solve possible dependency issue
> 5) unconfigure(): remove devices from running system
> 6) post_unconfigure(): free resources used by devices
> 
> In this way, we can easily rollback if error happens.
> How do you think of this solution, any suggestion ? I think we can achieve
> a better way for sharing ideas. :)

Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
have not looked at all your changes yet..), but in my mind, a hot-plug
operation should be composed with the following 3 phases.

1. Validate phase - Verify if the request is a supported operation.  All
known restrictions are verified at this phase.  For instance, if a
hot-remove request involves kernel memory, it is failed in this phase.
Since this phase makes no change, no rollback is necessary to fail.  

2. Execute phase - Perform hot-add / hot-remove operation that can be
rolled-back in case of error or cancel.

3. Commit phase - Perform the final hot-add / hot-remove operation that
cannot be rolled-back.  No error / cancel is allowed in this phase.  For
instance, eject operation is performed at this phase.  


Thanks,
-Toshi




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-11-28 Thread Hanjun Guo
On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
> As discussed in https://patchwork.kernel.org/patch/1581581/
> the driver core remove function needs to always succeed. This means we need
> to know that the device can be successfully removed before acpi_bus_trim / 
> acpi_bus_hot_remove_device are called. This can cause panics when 
> OSPM-initiated
> or SCI-initiated eject of memory devices fail e.g with:
> echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
> 
> since the ACPI core goes ahead and ejects the device regardless of whether the
> the memory is still in use or not.
> 
> For this reason a new acpi_device operation called prepare_remove is 
> introduced.
> This operation should be registered for acpi devices whose removal (from 
> kernel
> perspective) can fail.  Memory devices fall in this category.
> 
> acpi_bus_remove() is changed to handle removal in 2 steps:
> - preparation for removal i.e. perform part of removal that can fail. Should
>   succeed for device and all its children.
> - if above step was successfull, proceed to actual device removal

Hi Vasilis,
We met the same problem when we doing computer node hotplug, It is a good idea
to introduce prepare_remove before actual device removal.

I think we could do more in prepare_remove, such as rollback. In most cases, we 
can
offline most of memory sections except kernel used pages now, should we rollback
and online the memory sections when prepare_remove failed ?

As you may know, the ACPI based hotplug framework we are working on already 
addressed
this problem, and the way we slove this problem is a bit like yours.

We introduce hp_ops in struct acpi_device_ops:
struct acpi_device_ops {
acpi_op_add add;
acpi_op_remove remove;
acpi_op_start start;
acpi_op_bind bind;
acpi_op_unbind unbind;
acpi_op_notify notify;
#ifdef  CONFIG_ACPI_HOTPLUG
struct acpihp_dev_ops *hp_ops;
#endif  /* CONFIG_ACPI_HOTPLUG */
};

in hp_ops, we divide the prepare_remove into six small steps, that is:
1) pre_release(): optional step to mark device going to be removed/busy
2) release(): reclaim device from running system
3) post_release(): rollback if cancelled by user or error happened
4) pre_unconfigure(): optional step to solve possible dependency issue
5) unconfigure(): remove devices from running system
6) post_unconfigure(): free resources used by devices

In this way, we can easily rollback if error happens.
How do you think of this solution, any suggestion ? I think we can achieve
a better way for sharing ideas. :)

Thanks
Hanjun Guo

> 
> With this patchset, only acpi memory devices use the new prepare_remove
> device operation. The actual memory removal (VM-related offline and other 
> memory
> cleanups) is moved to prepare_remove. The old remove operation just cleans up
> the acpi structures. Directly ejecting PNP0C80 memory devices works safely. I
> haven't tested yet with an ACPI container which contains memory devices.
> 
> Note that unbinding the acpi driver from a memory device with:
> echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> 
> will no longer try to remove the memory. This is in compliance with normal
> unbind driver core semantics, see the discussion in v2 of this patchset:
> https://lkml.org/lkml/2012/11/16/649
> 
> After a successful unbind of the driver:
> - OSPM ejects of the memory device cannot proceed, as acpi_eject_store will
> return -ENODEV on missing driver.
> - SCI ejects of the memory device also cannot proceed, as they will also get
> a "driver data is NULL" error.
> So the memory can continue to be used safely after unbind.
> 
> Patchset based on Rafael's linux-pm/linux-next (commit 78c38651).
> Comments welcome.
> 
> v2->v3:
> - remove driver core changes. Only acpi core changes needed. Unbind semantics
> follow driver core rules. Unbind does not remove memory.
> - new patch to set enable bit in order to proceed with ejects on driver
> re-binding scenario.
> 
> v1->v2:
> - new patch to introduce bus_type prepare_remove callback. Needed to prepare
> removal on driver unbinding from device-driver core.
> - v1 patches 1 and 2 simplified and merged in one. acpi_bus_trim does not 
> require
> argument changes.
> 
> Vasilis Liaskovitis (3):
>   acpi: Introduce prepare_remove operation in acpi_device_ops
>   acpi_memhotplug: Add prepare_remove operation
>   acpi_memhotplug: Allow eject to proceed on rebind scenario
> 
>  drivers/acpi/acpi_memhotplug.c |   21 +
>  drivers/acpi/scan.c|9 -
>  include/acpi/acpi_bus.h|2 ++
>  3 files changed, 27 insertions(+), 5 deletions(-)
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-11-28 Thread Hanjun Guo
On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
 As discussed in https://patchwork.kernel.org/patch/1581581/
 the driver core remove function needs to always succeed. This means we need
 to know that the device can be successfully removed before acpi_bus_trim / 
 acpi_bus_hot_remove_device are called. This can cause panics when 
 OSPM-initiated
 or SCI-initiated eject of memory devices fail e.g with:
 echo 1 /sys/bus/pci/devices/PNP0C80:XX/eject
 
 since the ACPI core goes ahead and ejects the device regardless of whether the
 the memory is still in use or not.
 
 For this reason a new acpi_device operation called prepare_remove is 
 introduced.
 This operation should be registered for acpi devices whose removal (from 
 kernel
 perspective) can fail.  Memory devices fall in this category.
 
 acpi_bus_remove() is changed to handle removal in 2 steps:
 - preparation for removal i.e. perform part of removal that can fail. Should
   succeed for device and all its children.
 - if above step was successfull, proceed to actual device removal

Hi Vasilis,
We met the same problem when we doing computer node hotplug, It is a good idea
to introduce prepare_remove before actual device removal.

I think we could do more in prepare_remove, such as rollback. In most cases, we 
can
offline most of memory sections except kernel used pages now, should we rollback
and online the memory sections when prepare_remove failed ?

As you may know, the ACPI based hotplug framework we are working on already 
addressed
this problem, and the way we slove this problem is a bit like yours.

We introduce hp_ops in struct acpi_device_ops:
struct acpi_device_ops {
acpi_op_add add;
acpi_op_remove remove;
acpi_op_start start;
acpi_op_bind bind;
acpi_op_unbind unbind;
acpi_op_notify notify;
#ifdef  CONFIG_ACPI_HOTPLUG
struct acpihp_dev_ops *hp_ops;
#endif  /* CONFIG_ACPI_HOTPLUG */
};

in hp_ops, we divide the prepare_remove into six small steps, that is:
1) pre_release(): optional step to mark device going to be removed/busy
2) release(): reclaim device from running system
3) post_release(): rollback if cancelled by user or error happened
4) pre_unconfigure(): optional step to solve possible dependency issue
5) unconfigure(): remove devices from running system
6) post_unconfigure(): free resources used by devices

In this way, we can easily rollback if error happens.
How do you think of this solution, any suggestion ? I think we can achieve
a better way for sharing ideas. :)

Thanks
Hanjun Guo

 
 With this patchset, only acpi memory devices use the new prepare_remove
 device operation. The actual memory removal (VM-related offline and other 
 memory
 cleanups) is moved to prepare_remove. The old remove operation just cleans up
 the acpi structures. Directly ejecting PNP0C80 memory devices works safely. I
 haven't tested yet with an ACPI container which contains memory devices.
 
 Note that unbinding the acpi driver from a memory device with:
 echo PNP0C80:XX  /sys/bus/acpi/drivers/acpi_memhotplug/unbind
 
 will no longer try to remove the memory. This is in compliance with normal
 unbind driver core semantics, see the discussion in v2 of this patchset:
 https://lkml.org/lkml/2012/11/16/649
 
 After a successful unbind of the driver:
 - OSPM ejects of the memory device cannot proceed, as acpi_eject_store will
 return -ENODEV on missing driver.
 - SCI ejects of the memory device also cannot proceed, as they will also get
 a driver data is NULL error.
 So the memory can continue to be used safely after unbind.
 
 Patchset based on Rafael's linux-pm/linux-next (commit 78c38651).
 Comments welcome.
 
 v2-v3:
 - remove driver core changes. Only acpi core changes needed. Unbind semantics
 follow driver core rules. Unbind does not remove memory.
 - new patch to set enable bit in order to proceed with ejects on driver
 re-binding scenario.
 
 v1-v2:
 - new patch to introduce bus_type prepare_remove callback. Needed to prepare
 removal on driver unbinding from device-driver core.
 - v1 patches 1 and 2 simplified and merged in one. acpi_bus_trim does not 
 require
 argument changes.
 
 Vasilis Liaskovitis (3):
   acpi: Introduce prepare_remove operation in acpi_device_ops
   acpi_memhotplug: Add prepare_remove operation
   acpi_memhotplug: Allow eject to proceed on rebind scenario
 
  drivers/acpi/acpi_memhotplug.c |   21 +
  drivers/acpi/scan.c|9 -
  include/acpi/acpi_bus.h|2 ++
  3 files changed, 27 insertions(+), 5 deletions(-)
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-11-28 Thread Toshi Kani
On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
 On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
  As discussed in https://patchwork.kernel.org/patch/1581581/
  the driver core remove function needs to always succeed. This means we need
  to know that the device can be successfully removed before acpi_bus_trim / 
  acpi_bus_hot_remove_device are called. This can cause panics when 
  OSPM-initiated
  or SCI-initiated eject of memory devices fail e.g with:
  echo 1 /sys/bus/pci/devices/PNP0C80:XX/eject
  
  since the ACPI core goes ahead and ejects the device regardless of whether 
  the
  the memory is still in use or not.
  
  For this reason a new acpi_device operation called prepare_remove is 
  introduced.
  This operation should be registered for acpi devices whose removal (from 
  kernel
  perspective) can fail.  Memory devices fall in this category.
  
  acpi_bus_remove() is changed to handle removal in 2 steps:
  - preparation for removal i.e. perform part of removal that can fail. Should
succeed for device and all its children.
  - if above step was successfull, proceed to actual device removal
 
 Hi Vasilis,
 We met the same problem when we doing computer node hotplug, It is a good idea
 to introduce prepare_remove before actual device removal.
 
 I think we could do more in prepare_remove, such as rollback. In most cases, 
 we can
 offline most of memory sections except kernel used pages now, should we 
 rollback
 and online the memory sections when prepare_remove failed ?

I think hot-plug operation should have all-or-nothing semantics.  That
is, an operation should either complete successfully, or rollback to the
original state.

 As you may know, the ACPI based hotplug framework we are working on already 
 addressed
 this problem, and the way we slove this problem is a bit like yours.
 
 We introduce hp_ops in struct acpi_device_ops:
 struct acpi_device_ops {
   acpi_op_add add;
   acpi_op_remove remove;
   acpi_op_start start;
   acpi_op_bind bind;
   acpi_op_unbind unbind;
   acpi_op_notify notify;
 #ifdefCONFIG_ACPI_HOTPLUG
   struct acpihp_dev_ops *hp_ops;
 #endif/* CONFIG_ACPI_HOTPLUG */
 };
 
 in hp_ops, we divide the prepare_remove into six small steps, that is:
 1) pre_release(): optional step to mark device going to be removed/busy
 2) release(): reclaim device from running system
 3) post_release(): rollback if cancelled by user or error happened
 4) pre_unconfigure(): optional step to solve possible dependency issue
 5) unconfigure(): remove devices from running system
 6) post_unconfigure(): free resources used by devices
 
 In this way, we can easily rollback if error happens.
 How do you think of this solution, any suggestion ? I think we can achieve
 a better way for sharing ideas. :)

Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
have not looked at all your changes yet..), but in my mind, a hot-plug
operation should be composed with the following 3 phases.

1. Validate phase - Verify if the request is a supported operation.  All
known restrictions are verified at this phase.  For instance, if a
hot-remove request involves kernel memory, it is failed in this phase.
Since this phase makes no change, no rollback is necessary to fail.  

2. Execute phase - Perform hot-add / hot-remove operation that can be
rolled-back in case of error or cancel.

3. Commit phase - Perform the final hot-add / hot-remove operation that
cannot be rolled-back.  No error / cancel is allowed in this phase.  For
instance, eject operation is performed at this phase.  


Thanks,
-Toshi




--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-11-28 Thread Hanjun Guo
On 2012/11/29 2:41, Toshi Kani wrote:
 On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
 On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
 As discussed in https://patchwork.kernel.org/patch/1581581/
 the driver core remove function needs to always succeed. This means we need
 to know that the device can be successfully removed before acpi_bus_trim / 
 acpi_bus_hot_remove_device are called. This can cause panics when 
 OSPM-initiated
 or SCI-initiated eject of memory devices fail e.g with:
 echo 1 /sys/bus/pci/devices/PNP0C80:XX/eject

 since the ACPI core goes ahead and ejects the device regardless of whether 
 the
 the memory is still in use or not.

 For this reason a new acpi_device operation called prepare_remove is 
 introduced.
 This operation should be registered for acpi devices whose removal (from 
 kernel
 perspective) can fail.  Memory devices fall in this category.

 acpi_bus_remove() is changed to handle removal in 2 steps:
 - preparation for removal i.e. perform part of removal that can fail. Should
   succeed for device and all its children.
 - if above step was successfull, proceed to actual device removal

 Hi Vasilis,
 We met the same problem when we doing computer node hotplug, It is a good 
 idea
 to introduce prepare_remove before actual device removal.

 I think we could do more in prepare_remove, such as rollback. In most cases, 
 we can
 offline most of memory sections except kernel used pages now, should we 
 rollback
 and online the memory sections when prepare_remove failed ?
 
 I think hot-plug operation should have all-or-nothing semantics.  That
 is, an operation should either complete successfully, or rollback to the
 original state.

Yes, we have the same point of view with you. We handle this problem in the ACPI
based hot-plug framework as following:
1) hot add / hot remove complete successfully if no error happens;
2) automatic rollback to the original state if meets some error ;
3) rollback to the original if hot-plug operation cancelled by user ;

 
 As you may know, the ACPI based hotplug framework we are working on already 
 addressed
 this problem, and the way we slove this problem is a bit like yours.

 We introduce hp_ops in struct acpi_device_ops:
 struct acpi_device_ops {
  acpi_op_add add;
  acpi_op_remove remove;
  acpi_op_start start;
  acpi_op_bind bind;
  acpi_op_unbind unbind;
  acpi_op_notify notify;
 #ifdef   CONFIG_ACPI_HOTPLUG
  struct acpihp_dev_ops *hp_ops;
 #endif   /* CONFIG_ACPI_HOTPLUG */
 };

 in hp_ops, we divide the prepare_remove into six small steps, that is:
 1) pre_release(): optional step to mark device going to be removed/busy
 2) release(): reclaim device from running system
 3) post_release(): rollback if cancelled by user or error happened
 4) pre_unconfigure(): optional step to solve possible dependency issue
 5) unconfigure(): remove devices from running system
 6) post_unconfigure(): free resources used by devices

 In this way, we can easily rollback if error happens.
 How do you think of this solution, any suggestion ? I think we can achieve
 a better way for sharing ideas. :)
 
 Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
 have not looked at all your changes yet..), but in my mind, a hot-plug
 operation should be composed with the following 3 phases.

Good idea ! we also implement a hot-plug operation in 3 phases:
1) acpihp_drv_pre_execute
2) acpihp_drv_execute
3) acpihp_drv_post_execute
you may refer to :
https://lkml.org/lkml/2012/11/4/79

 
 1. Validate phase - Verify if the request is a supported operation.  All
 known restrictions are verified at this phase.  For instance, if a
 hot-remove request involves kernel memory, it is failed in this phase.
 Since this phase makes no change, no rollback is necessary to fail. 

Yes, we have done this in acpihp_drv_pre_execute, and check following things:

1) Hot-plugble or not. the instance kernel memory you mentioned is also checked
   when memory device remove;

2) Dependency check involved. For instance, if hot-add a memory device,
   processor should be added first, otherwise it's not valid to this operation.

3) Race condition check. if the device and its dependent device is in hot-plug
   process, another request will be denied.

No rollback is needed for the above checks.

 
 2. Execute phase - Perform hot-add / hot-remove operation that can be
 rolled-back in case of error or cancel.

In this phase, we introduce a state machine for the hot-plugble device,
please refer to:
https://lkml.org/lkml/2012/11/4/79

I think we have the same idea for the major framework, but the ACPI based
hot-plug framework implement it differently in detail, right ?

Thanks
Hanjun

 
 3. Commit phase - Perform the final hot-add / hot-remove operation that
 cannot be rolled-back.  No error / cancel is allowed in this phase.  For
 instance, eject operation is performed at this phase.  
 
 
 Thanks,
 -Toshi
 



--
To unsubscribe from this 

[RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-11-23 Thread Vasilis Liaskovitis
As discussed in https://patchwork.kernel.org/patch/1581581/
the driver core remove function needs to always succeed. This means we need
to know that the device can be successfully removed before acpi_bus_trim / 
acpi_bus_hot_remove_device are called. This can cause panics when OSPM-initiated
or SCI-initiated eject of memory devices fail e.g with:
echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject

since the ACPI core goes ahead and ejects the device regardless of whether the
the memory is still in use or not.

For this reason a new acpi_device operation called prepare_remove is introduced.
This operation should be registered for acpi devices whose removal (from kernel
perspective) can fail.  Memory devices fall in this category.

acpi_bus_remove() is changed to handle removal in 2 steps:
- preparation for removal i.e. perform part of removal that can fail. Should
  succeed for device and all its children.
- if above step was successfull, proceed to actual device removal

With this patchset, only acpi memory devices use the new prepare_remove
device operation. The actual memory removal (VM-related offline and other memory
cleanups) is moved to prepare_remove. The old remove operation just cleans up
the acpi structures. Directly ejecting PNP0C80 memory devices works safely. I
haven't tested yet with an ACPI container which contains memory devices.

Note that unbinding the acpi driver from a memory device with:
echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind

will no longer try to remove the memory. This is in compliance with normal
unbind driver core semantics, see the discussion in v2 of this patchset:
https://lkml.org/lkml/2012/11/16/649

After a successful unbind of the driver:
- OSPM ejects of the memory device cannot proceed, as acpi_eject_store will
return -ENODEV on missing driver.
- SCI ejects of the memory device also cannot proceed, as they will also get
a "driver data is NULL" error.
So the memory can continue to be used safely after unbind.

Patchset based on Rafael's linux-pm/linux-next (commit 78c38651).
Comments welcome.

v2->v3:
- remove driver core changes. Only acpi core changes needed. Unbind semantics
follow driver core rules. Unbind does not remove memory.
- new patch to set enable bit in order to proceed with ejects on driver
re-binding scenario.

v1->v2:
- new patch to introduce bus_type prepare_remove callback. Needed to prepare
removal on driver unbinding from device-driver core.
- v1 patches 1 and 2 simplified and merged in one. acpi_bus_trim does not 
require
argument changes.

Vasilis Liaskovitis (3):
  acpi: Introduce prepare_remove operation in acpi_device_ops
  acpi_memhotplug: Add prepare_remove operation
  acpi_memhotplug: Allow eject to proceed on rebind scenario

 drivers/acpi/acpi_memhotplug.c |   21 +
 drivers/acpi/scan.c|9 -
 include/acpi/acpi_bus.h|2 ++
 3 files changed, 27 insertions(+), 5 deletions(-)

-- 
1.7.9

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-11-23 Thread Vasilis Liaskovitis
As discussed in https://patchwork.kernel.org/patch/1581581/
the driver core remove function needs to always succeed. This means we need
to know that the device can be successfully removed before acpi_bus_trim / 
acpi_bus_hot_remove_device are called. This can cause panics when OSPM-initiated
or SCI-initiated eject of memory devices fail e.g with:
echo 1 /sys/bus/pci/devices/PNP0C80:XX/eject

since the ACPI core goes ahead and ejects the device regardless of whether the
the memory is still in use or not.

For this reason a new acpi_device operation called prepare_remove is introduced.
This operation should be registered for acpi devices whose removal (from kernel
perspective) can fail.  Memory devices fall in this category.

acpi_bus_remove() is changed to handle removal in 2 steps:
- preparation for removal i.e. perform part of removal that can fail. Should
  succeed for device and all its children.
- if above step was successfull, proceed to actual device removal

With this patchset, only acpi memory devices use the new prepare_remove
device operation. The actual memory removal (VM-related offline and other memory
cleanups) is moved to prepare_remove. The old remove operation just cleans up
the acpi structures. Directly ejecting PNP0C80 memory devices works safely. I
haven't tested yet with an ACPI container which contains memory devices.

Note that unbinding the acpi driver from a memory device with:
echo PNP0C80:XX  /sys/bus/acpi/drivers/acpi_memhotplug/unbind

will no longer try to remove the memory. This is in compliance with normal
unbind driver core semantics, see the discussion in v2 of this patchset:
https://lkml.org/lkml/2012/11/16/649

After a successful unbind of the driver:
- OSPM ejects of the memory device cannot proceed, as acpi_eject_store will
return -ENODEV on missing driver.
- SCI ejects of the memory device also cannot proceed, as they will also get
a driver data is NULL error.
So the memory can continue to be used safely after unbind.

Patchset based on Rafael's linux-pm/linux-next (commit 78c38651).
Comments welcome.

v2-v3:
- remove driver core changes. Only acpi core changes needed. Unbind semantics
follow driver core rules. Unbind does not remove memory.
- new patch to set enable bit in order to proceed with ejects on driver
re-binding scenario.

v1-v2:
- new patch to introduce bus_type prepare_remove callback. Needed to prepare
removal on driver unbinding from device-driver core.
- v1 patches 1 and 2 simplified and merged in one. acpi_bus_trim does not 
require
argument changes.

Vasilis Liaskovitis (3):
  acpi: Introduce prepare_remove operation in acpi_device_ops
  acpi_memhotplug: Add prepare_remove operation
  acpi_memhotplug: Allow eject to proceed on rebind scenario

 drivers/acpi/acpi_memhotplug.c |   21 +
 drivers/acpi/scan.c|9 -
 include/acpi/acpi_bus.h|2 ++
 3 files changed, 27 insertions(+), 5 deletions(-)

-- 
1.7.9

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/