Re: Cannot hot remove a memory device

2013-08-13 Thread Toshi Kani
On Tue, 2013-08-13 at 14:02 +0200, Rafael J. Wysocki wrote:
> On Monday, August 12, 2013 07:02:44 PM Toshi Kani wrote:
> > On Tue, 2013-08-13 at 02:45 +0200, Rafael J. Wysocki wrote:
> > > On Monday, August 12, 2013 02:40:43 PM Toshi Kani wrote:
> > > > On Sun, 2013-08-11 at 23:13 +0200, Rafael J. Wysocki wrote:
> > > > > On Thursday, August 08, 2013 04:50:42 PM Toshi Kani wrote:
> > > > > > On Fri, 2013-08-09 at 00:12 +0200, Rafael J. Wysocki wrote:
> > > > > > > On Thursday, August 08, 2013 11:15:20 AM Toshi Kani wrote:
> > > > > > > > On Fri, 2013-08-02 at 18:04 -0600, Toshi Kani wrote:
> > > > > > > > > On Sat, 2013-08-03 at 01:43 +0200, Rafael J. Wysocki wrote:
> > > > > > > > > > On Friday, August 02, 2013 03:46:15 PM Toshi Kani wrote:
> > > > > > > > > > > On Thu, 2013-08-01 at 23:43 +0200, Rafael J. Wysocki 
> > > > > > > > > > > wrote:
> > > > > > > >  :
> > > > > > > > > > > I think it fails with -EINVAL at the place with 
> > > > > > > > > > > dev_warn(dev, "ACPI
> > > > > > > > > > > handle is already set\n").  When two ACPI memory objects 
> > > > > > > > > > > associate with
> > > > > > > > > > > a same memory block, the bind procedure of the 2nd ACPI 
> > > > > > > > > > > memory object
> > > > > > > > > > > sees that ACPI_HANDLE(dev) is already set to the 1st ACPI 
> > > > > > > > > > > memory object.
> > > > > > > > > > 
> > > > > > > > > > That sound's plausible, but I wonder how we can fix that?
> > > > > > > > > > 
> > > > > > > > > > There's no way for a single physical device to have two 
> > > > > > > > > > different ACPI
> > > > > > > > > > "companions".  It looks like the memory blocks should be 64 
> > > > > > > > > > M each in that
> > > > > > > > > > case.  Or we need to create two child devices for each 
> > > > > > > > > > memory block and
> > > > > > > > > > associate each of them with an ACPI object.  That would 
> > > > > > > > > > lead to complications
> > > > > > > > > > in the user space interface, though.
> > > > > > > > > 
> > > > > > > > > Right.  Even bigger issue is that I do not think 
> > > > > > > > > __add_pages() and
> > > > > > > > > __remove_pages() can add / delete a memory chunk that is less 
> > > > > > > > > than
> > > > > > > > > 128MB.  128MB is the granularity of them.  So, we may just 
> > > > > > > > > have to fail
> > > > > > > > > this case gracefully.
> > > > > > > > 
> > > > > > > > FYI: I have submitted the patch blow to close this part of the 
> > > > > > > > issue...
> > > > > > > > 
> > > > > > > > https://lkml.org/lkml/2013/8/8/396
> > > > > > > 
> > > > > > > That looks good to me, but we'd still need to make it possible to 
> > > > > > > have
> > > > > > > memory blocks smaller than 128 MB ...
> > > > > > 
> > > > > > Do you mean acpi_bind_one() needs to be able to handle such case?  
> > > > > > If
> > > > > > so, it should not be a problem since a memory block device won't be
> > > > > > created when add_memory() fails with the change above.  So, there 
> > > > > > is no
> > > > > > binding to be done.  If you mean add_memory() needs to be able to 
> > > > > > handle
> > > > > > a smaller range, that's quite a tough one unless we make the section
> > > > > > size smaller.
> > > > > > 
> > > > > > BTW, when add_memory() fails, the memory hot-add request still 
> > > > > > succeeds
> > > > > > with no driver attached.  This seems logical, but the added device 
> > > > > > is
> > > > > > useless when no handler is attached.  And it does not allow 
> > > > > > ejecting the
> > > > > > device with no handler.  I am not too worry about this since this 
> > > > > > is a
> > > > > > rare case, but it reminded me that the framework won't handle 
> > > > > > rollback.
> > > > > 
> > > > > I'm not sure which rollback you mean.  During removal?
> > > > 
> > > > I meant rollback during hot-add.  Ideally, a device should be either
> > > > added in usable state (success) or failed back to the original state
> > > > (rollback).  Added in un-usable state is not really a success for users,
> > > > and creates an odd state to deal with.  But it is still a LOT better
> > > > than crashing the system.  So, I think this outcome is reasonable on
> > > > this framework because adding rollback at this point will complicate the
> > > > things unnecessarily.
> > > > 
> > > > > There are two slight problems here in my view.  First, even if the 
> > > > > device
> > > > > cannot be ejected directly, it still will be removed when its parent 
> > > > > is
> > > > > ejected, so it may be more consistent to just allow everything to be 
> > > > > ejected
> > > > > regardless of whether or not it has a scan handler.  
> > > > 
> > > > Agreed.
> > > > 
> > > > > Second, I guess the
> > > > > removal is undesirable for memory devices for which the registration 
> > > > > of the
> > > > > scan handler failed, so it would be good to fail the "offline" of 
> > > > > such devices
> > > > > regardless of how we get there.  That's why I thought it would be 
> > > > > good to 

Re: Cannot hot remove a memory device

2013-08-13 Thread Rafael J. Wysocki
On Monday, August 12, 2013 07:02:44 PM Toshi Kani wrote:
> On Tue, 2013-08-13 at 02:45 +0200, Rafael J. Wysocki wrote:
> > On Monday, August 12, 2013 02:40:43 PM Toshi Kani wrote:
> > > On Sun, 2013-08-11 at 23:13 +0200, Rafael J. Wysocki wrote:
> > > > On Thursday, August 08, 2013 04:50:42 PM Toshi Kani wrote:
> > > > > On Fri, 2013-08-09 at 00:12 +0200, Rafael J. Wysocki wrote:
> > > > > > On Thursday, August 08, 2013 11:15:20 AM Toshi Kani wrote:
> > > > > > > On Fri, 2013-08-02 at 18:04 -0600, Toshi Kani wrote:
> > > > > > > > On Sat, 2013-08-03 at 01:43 +0200, Rafael J. Wysocki wrote:
> > > > > > > > > On Friday, August 02, 2013 03:46:15 PM Toshi Kani wrote:
> > > > > > > > > > On Thu, 2013-08-01 at 23:43 +0200, Rafael J. Wysocki wrote:
> > > > > > >  :
> > > > > > > > > > I think it fails with -EINVAL at the place with 
> > > > > > > > > > dev_warn(dev, "ACPI
> > > > > > > > > > handle is already set\n").  When two ACPI memory objects 
> > > > > > > > > > associate with
> > > > > > > > > > a same memory block, the bind procedure of the 2nd ACPI 
> > > > > > > > > > memory object
> > > > > > > > > > sees that ACPI_HANDLE(dev) is already set to the 1st ACPI 
> > > > > > > > > > memory object.
> > > > > > > > > 
> > > > > > > > > That sound's plausible, but I wonder how we can fix that?
> > > > > > > > > 
> > > > > > > > > There's no way for a single physical device to have two 
> > > > > > > > > different ACPI
> > > > > > > > > "companions".  It looks like the memory blocks should be 64 M 
> > > > > > > > > each in that
> > > > > > > > > case.  Or we need to create two child devices for each memory 
> > > > > > > > > block and
> > > > > > > > > associate each of them with an ACPI object.  That would lead 
> > > > > > > > > to complications
> > > > > > > > > in the user space interface, though.
> > > > > > > > 
> > > > > > > > Right.  Even bigger issue is that I do not think __add_pages() 
> > > > > > > > and
> > > > > > > > __remove_pages() can add / delete a memory chunk that is less 
> > > > > > > > than
> > > > > > > > 128MB.  128MB is the granularity of them.  So, we may just have 
> > > > > > > > to fail
> > > > > > > > this case gracefully.
> > > > > > > 
> > > > > > > FYI: I have submitted the patch blow to close this part of the 
> > > > > > > issue...
> > > > > > > 
> > > > > > > https://lkml.org/lkml/2013/8/8/396
> > > > > > 
> > > > > > That looks good to me, but we'd still need to make it possible to 
> > > > > > have
> > > > > > memory blocks smaller than 128 MB ...
> > > > > 
> > > > > Do you mean acpi_bind_one() needs to be able to handle such case?  If
> > > > > so, it should not be a problem since a memory block device won't be
> > > > > created when add_memory() fails with the change above.  So, there is 
> > > > > no
> > > > > binding to be done.  If you mean add_memory() needs to be able to 
> > > > > handle
> > > > > a smaller range, that's quite a tough one unless we make the section
> > > > > size smaller.
> > > > > 
> > > > > BTW, when add_memory() fails, the memory hot-add request still 
> > > > > succeeds
> > > > > with no driver attached.  This seems logical, but the added device is
> > > > > useless when no handler is attached.  And it does not allow ejecting 
> > > > > the
> > > > > device with no handler.  I am not too worry about this since this is a
> > > > > rare case, but it reminded me that the framework won't handle 
> > > > > rollback.
> > > > 
> > > > I'm not sure which rollback you mean.  During removal?
> > > 
> > > I meant rollback during hot-add.  Ideally, a device should be either
> > > added in usable state (success) or failed back to the original state
> > > (rollback).  Added in un-usable state is not really a success for users,
> > > and creates an odd state to deal with.  But it is still a LOT better
> > > than crashing the system.  So, I think this outcome is reasonable on
> > > this framework because adding rollback at this point will complicate the
> > > things unnecessarily.
> > > 
> > > > There are two slight problems here in my view.  First, even if the 
> > > > device
> > > > cannot be ejected directly, it still will be removed when its parent is
> > > > ejected, so it may be more consistent to just allow everything to be 
> > > > ejected
> > > > regardless of whether or not it has a scan handler.  
> > > 
> > > Agreed.
> > > 
> > > > Second, I guess the
> > > > removal is undesirable for memory devices for which the registration of 
> > > > the
> > > > scan handler failed, so it would be good to fail the "offline" of such 
> > > > devices
> > > > regardless of how we get there.  That's why I thought it would be good 
> > > > to have
> > > > an "offline disabled" flag in struct acpi_device.
> > > 
> > > I see.  But when attach() failed, the memory device may not be used by
> > > the kernel.  So, I think it should be safe to remove it.
> > 
> > The failure of .attach() need not mean that the memory is not used by the
> > kernel, though, and the 

Re: Cannot hot remove a memory device

2013-08-13 Thread Rafael J. Wysocki
On Monday, August 12, 2013 07:02:44 PM Toshi Kani wrote:
 On Tue, 2013-08-13 at 02:45 +0200, Rafael J. Wysocki wrote:
  On Monday, August 12, 2013 02:40:43 PM Toshi Kani wrote:
   On Sun, 2013-08-11 at 23:13 +0200, Rafael J. Wysocki wrote:
On Thursday, August 08, 2013 04:50:42 PM Toshi Kani wrote:
 On Fri, 2013-08-09 at 00:12 +0200, Rafael J. Wysocki wrote:
  On Thursday, August 08, 2013 11:15:20 AM Toshi Kani wrote:
   On Fri, 2013-08-02 at 18:04 -0600, Toshi Kani wrote:
On Sat, 2013-08-03 at 01:43 +0200, Rafael J. Wysocki wrote:
 On Friday, August 02, 2013 03:46:15 PM Toshi Kani wrote:
  On Thu, 2013-08-01 at 23:43 +0200, Rafael J. Wysocki wrote:
:
  I think it fails with -EINVAL at the place with 
  dev_warn(dev, ACPI
  handle is already set\n).  When two ACPI memory objects 
  associate with
  a same memory block, the bind procedure of the 2nd ACPI 
  memory object
  sees that ACPI_HANDLE(dev) is already set to the 1st ACPI 
  memory object.
 
 That sound's plausible, but I wonder how we can fix that?
 
 There's no way for a single physical device to have two 
 different ACPI
 companions.  It looks like the memory blocks should be 64 M 
 each in that
 case.  Or we need to create two child devices for each memory 
 block and
 associate each of them with an ACPI object.  That would lead 
 to complications
 in the user space interface, though.

Right.  Even bigger issue is that I do not think __add_pages() 
and
__remove_pages() can add / delete a memory chunk that is less 
than
128MB.  128MB is the granularity of them.  So, we may just have 
to fail
this case gracefully.
   
   FYI: I have submitted the patch blow to close this part of the 
   issue...
   
   https://lkml.org/lkml/2013/8/8/396
  
  That looks good to me, but we'd still need to make it possible to 
  have
  memory blocks smaller than 128 MB ...
 
 Do you mean acpi_bind_one() needs to be able to handle such case?  If
 so, it should not be a problem since a memory block device won't be
 created when add_memory() fails with the change above.  So, there is 
 no
 binding to be done.  If you mean add_memory() needs to be able to 
 handle
 a smaller range, that's quite a tough one unless we make the section
 size smaller.
 
 BTW, when add_memory() fails, the memory hot-add request still 
 succeeds
 with no driver attached.  This seems logical, but the added device is
 useless when no handler is attached.  And it does not allow ejecting 
 the
 device with no handler.  I am not too worry about this since this is a
 rare case, but it reminded me that the framework won't handle 
 rollback.

I'm not sure which rollback you mean.  During removal?
   
   I meant rollback during hot-add.  Ideally, a device should be either
   added in usable state (success) or failed back to the original state
   (rollback).  Added in un-usable state is not really a success for users,
   and creates an odd state to deal with.  But it is still a LOT better
   than crashing the system.  So, I think this outcome is reasonable on
   this framework because adding rollback at this point will complicate the
   things unnecessarily.
   
There are two slight problems here in my view.  First, even if the 
device
cannot be ejected directly, it still will be removed when its parent is
ejected, so it may be more consistent to just allow everything to be 
ejected
regardless of whether or not it has a scan handler.  
   
   Agreed.
   
Second, I guess the
removal is undesirable for memory devices for which the registration of 
the
scan handler failed, so it would be good to fail the offline of such 
devices
regardless of how we get there.  That's why I thought it would be good 
to have
an offline disabled flag in struct acpi_device.
   
   I see.  But when attach() failed, the memory device may not be used by
   the kernel.  So, I think it should be safe to remove it.
  
  The failure of .attach() need not mean that the memory is not used by the
  kernel, though, and the ACPI device object is still there and still may
  be involved in some removal scenarios (through its parent).
 
 As long as .attach() is failed cleanly, which I think is the case for
 the memory handler (if not, we need to fix it), the rest of the kernel
 code may not know what this device is.  That is, the ACPI memory handler
 is the only one that knows PNP0C80 is a memory device.  So, I do not
 think the memory can be used in such case...

Yes, it can, if the kernel discovers it during boot before .attach() is first
called.  So say this happens and the ACPI memory device object 

Re: Cannot hot remove a memory device

2013-08-13 Thread Toshi Kani
On Tue, 2013-08-13 at 14:02 +0200, Rafael J. Wysocki wrote:
 On Monday, August 12, 2013 07:02:44 PM Toshi Kani wrote:
  On Tue, 2013-08-13 at 02:45 +0200, Rafael J. Wysocki wrote:
   On Monday, August 12, 2013 02:40:43 PM Toshi Kani wrote:
On Sun, 2013-08-11 at 23:13 +0200, Rafael J. Wysocki wrote:
 On Thursday, August 08, 2013 04:50:42 PM Toshi Kani wrote:
  On Fri, 2013-08-09 at 00:12 +0200, Rafael J. Wysocki wrote:
   On Thursday, August 08, 2013 11:15:20 AM Toshi Kani wrote:
On Fri, 2013-08-02 at 18:04 -0600, Toshi Kani wrote:
 On Sat, 2013-08-03 at 01:43 +0200, Rafael J. Wysocki wrote:
  On Friday, August 02, 2013 03:46:15 PM Toshi Kani wrote:
   On Thu, 2013-08-01 at 23:43 +0200, Rafael J. Wysocki 
   wrote:
 :
   I think it fails with -EINVAL at the place with 
   dev_warn(dev, ACPI
   handle is already set\n).  When two ACPI memory objects 
   associate with
   a same memory block, the bind procedure of the 2nd ACPI 
   memory object
   sees that ACPI_HANDLE(dev) is already set to the 1st ACPI 
   memory object.
  
  That sound's plausible, but I wonder how we can fix that?
  
  There's no way for a single physical device to have two 
  different ACPI
  companions.  It looks like the memory blocks should be 64 
  M each in that
  case.  Or we need to create two child devices for each 
  memory block and
  associate each of them with an ACPI object.  That would 
  lead to complications
  in the user space interface, though.
 
 Right.  Even bigger issue is that I do not think 
 __add_pages() and
 __remove_pages() can add / delete a memory chunk that is less 
 than
 128MB.  128MB is the granularity of them.  So, we may just 
 have to fail
 this case gracefully.

FYI: I have submitted the patch blow to close this part of the 
issue...

https://lkml.org/lkml/2013/8/8/396
   
   That looks good to me, but we'd still need to make it possible to 
   have
   memory blocks smaller than 128 MB ...
  
  Do you mean acpi_bind_one() needs to be able to handle such case?  
  If
  so, it should not be a problem since a memory block device won't be
  created when add_memory() fails with the change above.  So, there 
  is no
  binding to be done.  If you mean add_memory() needs to be able to 
  handle
  a smaller range, that's quite a tough one unless we make the section
  size smaller.
  
  BTW, when add_memory() fails, the memory hot-add request still 
  succeeds
  with no driver attached.  This seems logical, but the added device 
  is
  useless when no handler is attached.  And it does not allow 
  ejecting the
  device with no handler.  I am not too worry about this since this 
  is a
  rare case, but it reminded me that the framework won't handle 
  rollback.
 
 I'm not sure which rollback you mean.  During removal?

I meant rollback during hot-add.  Ideally, a device should be either
added in usable state (success) or failed back to the original state
(rollback).  Added in un-usable state is not really a success for users,
and creates an odd state to deal with.  But it is still a LOT better
than crashing the system.  So, I think this outcome is reasonable on
this framework because adding rollback at this point will complicate the
things unnecessarily.

 There are two slight problems here in my view.  First, even if the 
 device
 cannot be ejected directly, it still will be removed when its parent 
 is
 ejected, so it may be more consistent to just allow everything to be 
 ejected
 regardless of whether or not it has a scan handler.  

Agreed.

 Second, I guess the
 removal is undesirable for memory devices for which the registration 
 of the
 scan handler failed, so it would be good to fail the offline of 
 such devices
 regardless of how we get there.  That's why I thought it would be 
 good to have
 an offline disabled flag in struct acpi_device.

I see.  But when attach() failed, the memory device may not be used by
the kernel.  So, I think it should be safe to remove it.
   
   The failure of .attach() need not mean that the memory is not used by the
   kernel, though, and the ACPI device object is still there and still may
   be involved in some removal scenarios (through its parent).
  
  As long as .attach() is failed cleanly, which I think is the case for
  the memory handler (if not, we need to fix it), the rest of the kernel
  code may not know what this device is.  That is, the ACPI memory handler
  is the only one that knows PNP0C80 is a memory 

Re: Cannot hot remove a memory device

2013-08-12 Thread Toshi Kani
On Tue, 2013-08-13 at 02:45 +0200, Rafael J. Wysocki wrote:
> On Monday, August 12, 2013 02:40:43 PM Toshi Kani wrote:
> > On Sun, 2013-08-11 at 23:13 +0200, Rafael J. Wysocki wrote:
> > > On Thursday, August 08, 2013 04:50:42 PM Toshi Kani wrote:
> > > > On Fri, 2013-08-09 at 00:12 +0200, Rafael J. Wysocki wrote:
> > > > > On Thursday, August 08, 2013 11:15:20 AM Toshi Kani wrote:
> > > > > > On Fri, 2013-08-02 at 18:04 -0600, Toshi Kani wrote:
> > > > > > > On Sat, 2013-08-03 at 01:43 +0200, Rafael J. Wysocki wrote:
> > > > > > > > On Friday, August 02, 2013 03:46:15 PM Toshi Kani wrote:
> > > > > > > > > On Thu, 2013-08-01 at 23:43 +0200, Rafael J. Wysocki wrote:
> > > > > >  :
> > > > > > > > > I think it fails with -EINVAL at the place with dev_warn(dev, 
> > > > > > > > > "ACPI
> > > > > > > > > handle is already set\n").  When two ACPI memory objects 
> > > > > > > > > associate with
> > > > > > > > > a same memory block, the bind procedure of the 2nd ACPI 
> > > > > > > > > memory object
> > > > > > > > > sees that ACPI_HANDLE(dev) is already set to the 1st ACPI 
> > > > > > > > > memory object.
> > > > > > > > 
> > > > > > > > That sound's plausible, but I wonder how we can fix that?
> > > > > > > > 
> > > > > > > > There's no way for a single physical device to have two 
> > > > > > > > different ACPI
> > > > > > > > "companions".  It looks like the memory blocks should be 64 M 
> > > > > > > > each in that
> > > > > > > > case.  Or we need to create two child devices for each memory 
> > > > > > > > block and
> > > > > > > > associate each of them with an ACPI object.  That would lead to 
> > > > > > > > complications
> > > > > > > > in the user space interface, though.
> > > > > > > 
> > > > > > > Right.  Even bigger issue is that I do not think __add_pages() and
> > > > > > > __remove_pages() can add / delete a memory chunk that is less than
> > > > > > > 128MB.  128MB is the granularity of them.  So, we may just have 
> > > > > > > to fail
> > > > > > > this case gracefully.
> > > > > > 
> > > > > > FYI: I have submitted the patch blow to close this part of the 
> > > > > > issue...
> > > > > > 
> > > > > > https://lkml.org/lkml/2013/8/8/396
> > > > > 
> > > > > That looks good to me, but we'd still need to make it possible to have
> > > > > memory blocks smaller than 128 MB ...
> > > > 
> > > > Do you mean acpi_bind_one() needs to be able to handle such case?  If
> > > > so, it should not be a problem since a memory block device won't be
> > > > created when add_memory() fails with the change above.  So, there is no
> > > > binding to be done.  If you mean add_memory() needs to be able to handle
> > > > a smaller range, that's quite a tough one unless we make the section
> > > > size smaller.
> > > > 
> > > > BTW, when add_memory() fails, the memory hot-add request still succeeds
> > > > with no driver attached.  This seems logical, but the added device is
> > > > useless when no handler is attached.  And it does not allow ejecting the
> > > > device with no handler.  I am not too worry about this since this is a
> > > > rare case, but it reminded me that the framework won't handle rollback.
> > > 
> > > I'm not sure which rollback you mean.  During removal?
> > 
> > I meant rollback during hot-add.  Ideally, a device should be either
> > added in usable state (success) or failed back to the original state
> > (rollback).  Added in un-usable state is not really a success for users,
> > and creates an odd state to deal with.  But it is still a LOT better
> > than crashing the system.  So, I think this outcome is reasonable on
> > this framework because adding rollback at this point will complicate the
> > things unnecessarily.
> > 
> > > There are two slight problems here in my view.  First, even if the device
> > > cannot be ejected directly, it still will be removed when its parent is
> > > ejected, so it may be more consistent to just allow everything to be 
> > > ejected
> > > regardless of whether or not it has a scan handler.  
> > 
> > Agreed.
> > 
> > > Second, I guess the
> > > removal is undesirable for memory devices for which the registration of 
> > > the
> > > scan handler failed, so it would be good to fail the "offline" of such 
> > > devices
> > > regardless of how we get there.  That's why I thought it would be good to 
> > > have
> > > an "offline disabled" flag in struct acpi_device.
> > 
> > I see.  But when attach() failed, the memory device may not be used by
> > the kernel.  So, I think it should be safe to remove it.
> 
> The failure of .attach() need not mean that the memory is not used by the
> kernel, though, and the ACPI device object is still there and still may
> be involved in some removal scenarios (through its parent).

As long as .attach() is failed cleanly, which I think is the case for
the memory handler (if not, we need to fix it), the rest of the kernel
code may not know what this device is.  That is, the ACPI memory handler
is the only one that knows 

Re: Cannot hot remove a memory device

2013-08-12 Thread Rafael J. Wysocki
On Monday, August 12, 2013 02:40:43 PM Toshi Kani wrote:
> On Sun, 2013-08-11 at 23:13 +0200, Rafael J. Wysocki wrote:
> > On Thursday, August 08, 2013 04:50:42 PM Toshi Kani wrote:
> > > On Fri, 2013-08-09 at 00:12 +0200, Rafael J. Wysocki wrote:
> > > > On Thursday, August 08, 2013 11:15:20 AM Toshi Kani wrote:
> > > > > On Fri, 2013-08-02 at 18:04 -0600, Toshi Kani wrote:
> > > > > > On Sat, 2013-08-03 at 01:43 +0200, Rafael J. Wysocki wrote:
> > > > > > > On Friday, August 02, 2013 03:46:15 PM Toshi Kani wrote:
> > > > > > > > On Thu, 2013-08-01 at 23:43 +0200, Rafael J. Wysocki wrote:
> > > > >  :
> > > > > > > > I think it fails with -EINVAL at the place with dev_warn(dev, 
> > > > > > > > "ACPI
> > > > > > > > handle is already set\n").  When two ACPI memory objects 
> > > > > > > > associate with
> > > > > > > > a same memory block, the bind procedure of the 2nd ACPI memory 
> > > > > > > > object
> > > > > > > > sees that ACPI_HANDLE(dev) is already set to the 1st ACPI 
> > > > > > > > memory object.
> > > > > > > 
> > > > > > > That sound's plausible, but I wonder how we can fix that?
> > > > > > > 
> > > > > > > There's no way for a single physical device to have two different 
> > > > > > > ACPI
> > > > > > > "companions".  It looks like the memory blocks should be 64 M 
> > > > > > > each in that
> > > > > > > case.  Or we need to create two child devices for each memory 
> > > > > > > block and
> > > > > > > associate each of them with an ACPI object.  That would lead to 
> > > > > > > complications
> > > > > > > in the user space interface, though.
> > > > > > 
> > > > > > Right.  Even bigger issue is that I do not think __add_pages() and
> > > > > > __remove_pages() can add / delete a memory chunk that is less than
> > > > > > 128MB.  128MB is the granularity of them.  So, we may just have to 
> > > > > > fail
> > > > > > this case gracefully.
> > > > > 
> > > > > FYI: I have submitted the patch blow to close this part of the 
> > > > > issue...
> > > > > 
> > > > > https://lkml.org/lkml/2013/8/8/396
> > > > 
> > > > That looks good to me, but we'd still need to make it possible to have
> > > > memory blocks smaller than 128 MB ...
> > > 
> > > Do you mean acpi_bind_one() needs to be able to handle such case?  If
> > > so, it should not be a problem since a memory block device won't be
> > > created when add_memory() fails with the change above.  So, there is no
> > > binding to be done.  If you mean add_memory() needs to be able to handle
> > > a smaller range, that's quite a tough one unless we make the section
> > > size smaller.
> > > 
> > > BTW, when add_memory() fails, the memory hot-add request still succeeds
> > > with no driver attached.  This seems logical, but the added device is
> > > useless when no handler is attached.  And it does not allow ejecting the
> > > device with no handler.  I am not too worry about this since this is a
> > > rare case, but it reminded me that the framework won't handle rollback.
> > 
> > I'm not sure which rollback you mean.  During removal?
> 
> I meant rollback during hot-add.  Ideally, a device should be either
> added in usable state (success) or failed back to the original state
> (rollback).  Added in un-usable state is not really a success for users,
> and creates an odd state to deal with.  But it is still a LOT better
> than crashing the system.  So, I think this outcome is reasonable on
> this framework because adding rollback at this point will complicate the
> things unnecessarily.
> 
> > There are two slight problems here in my view.  First, even if the device
> > cannot be ejected directly, it still will be removed when its parent is
> > ejected, so it may be more consistent to just allow everything to be ejected
> > regardless of whether or not it has a scan handler.  
> 
> Agreed.
> 
> > Second, I guess the
> > removal is undesirable for memory devices for which the registration of the
> > scan handler failed, so it would be good to fail the "offline" of such 
> > devices
> > regardless of how we get there.  That's why I thought it would be good to 
> > have
> > an "offline disabled" flag in struct acpi_device.
> 
> I see.  But when attach() failed, the memory device may not be used by
> the kernel.  So, I think it should be safe to remove it.

The failure of .attach() need not mean that the memory is not used by the
kernel, though, and the ACPI device object is still there and still may
be involved in some removal scenarios (through its parent).

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Cannot hot remove a memory device

2013-08-12 Thread Toshi Kani
On Sun, 2013-08-11 at 23:13 +0200, Rafael J. Wysocki wrote:
> On Thursday, August 08, 2013 04:50:42 PM Toshi Kani wrote:
> > On Fri, 2013-08-09 at 00:12 +0200, Rafael J. Wysocki wrote:
> > > On Thursday, August 08, 2013 11:15:20 AM Toshi Kani wrote:
> > > > On Fri, 2013-08-02 at 18:04 -0600, Toshi Kani wrote:
> > > > > On Sat, 2013-08-03 at 01:43 +0200, Rafael J. Wysocki wrote:
> > > > > > On Friday, August 02, 2013 03:46:15 PM Toshi Kani wrote:
> > > > > > > On Thu, 2013-08-01 at 23:43 +0200, Rafael J. Wysocki wrote:
> > > >  :
> > > > > > > I think it fails with -EINVAL at the place with dev_warn(dev, 
> > > > > > > "ACPI
> > > > > > > handle is already set\n").  When two ACPI memory objects 
> > > > > > > associate with
> > > > > > > a same memory block, the bind procedure of the 2nd ACPI memory 
> > > > > > > object
> > > > > > > sees that ACPI_HANDLE(dev) is already set to the 1st ACPI memory 
> > > > > > > object.
> > > > > > 
> > > > > > That sound's plausible, but I wonder how we can fix that?
> > > > > > 
> > > > > > There's no way for a single physical device to have two different 
> > > > > > ACPI
> > > > > > "companions".  It looks like the memory blocks should be 64 M each 
> > > > > > in that
> > > > > > case.  Or we need to create two child devices for each memory block 
> > > > > > and
> > > > > > associate each of them with an ACPI object.  That would lead to 
> > > > > > complications
> > > > > > in the user space interface, though.
> > > > > 
> > > > > Right.  Even bigger issue is that I do not think __add_pages() and
> > > > > __remove_pages() can add / delete a memory chunk that is less than
> > > > > 128MB.  128MB is the granularity of them.  So, we may just have to 
> > > > > fail
> > > > > this case gracefully.
> > > > 
> > > > FYI: I have submitted the patch blow to close this part of the issue...
> > > > 
> > > > https://lkml.org/lkml/2013/8/8/396
> > > 
> > > That looks good to me, but we'd still need to make it possible to have
> > > memory blocks smaller than 128 MB ...
> > 
> > Do you mean acpi_bind_one() needs to be able to handle such case?  If
> > so, it should not be a problem since a memory block device won't be
> > created when add_memory() fails with the change above.  So, there is no
> > binding to be done.  If you mean add_memory() needs to be able to handle
> > a smaller range, that's quite a tough one unless we make the section
> > size smaller.
> > 
> > BTW, when add_memory() fails, the memory hot-add request still succeeds
> > with no driver attached.  This seems logical, but the added device is
> > useless when no handler is attached.  And it does not allow ejecting the
> > device with no handler.  I am not too worry about this since this is a
> > rare case, but it reminded me that the framework won't handle rollback.
> 
> I'm not sure which rollback you mean.  During removal?

I meant rollback during hot-add.  Ideally, a device should be either
added in usable state (success) or failed back to the original state
(rollback).  Added in un-usable state is not really a success for users,
and creates an odd state to deal with.  But it is still a LOT better
than crashing the system.  So, I think this outcome is reasonable on
this framework because adding rollback at this point will complicate the
things unnecessarily.

> There are two slight problems here in my view.  First, even if the device
> cannot be ejected directly, it still will be removed when its parent is
> ejected, so it may be more consistent to just allow everything to be ejected
> regardless of whether or not it has a scan handler.  

Agreed.

> Second, I guess the
> removal is undesirable for memory devices for which the registration of the
> scan handler failed, so it would be good to fail the "offline" of such devices
> regardless of how we get there.  That's why I thought it would be good to have
> an "offline disabled" flag in struct acpi_device.

I see.  But when attach() failed, the memory device may not be used by
the kernel.  So, I think it should be safe to remove it.

Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Cannot hot remove a memory device

2013-08-12 Thread Toshi Kani
On Sun, 2013-08-11 at 23:13 +0200, Rafael J. Wysocki wrote:
 On Thursday, August 08, 2013 04:50:42 PM Toshi Kani wrote:
  On Fri, 2013-08-09 at 00:12 +0200, Rafael J. Wysocki wrote:
   On Thursday, August 08, 2013 11:15:20 AM Toshi Kani wrote:
On Fri, 2013-08-02 at 18:04 -0600, Toshi Kani wrote:
 On Sat, 2013-08-03 at 01:43 +0200, Rafael J. Wysocki wrote:
  On Friday, August 02, 2013 03:46:15 PM Toshi Kani wrote:
   On Thu, 2013-08-01 at 23:43 +0200, Rafael J. Wysocki wrote:
 :
   I think it fails with -EINVAL at the place with dev_warn(dev, 
   ACPI
   handle is already set\n).  When two ACPI memory objects 
   associate with
   a same memory block, the bind procedure of the 2nd ACPI memory 
   object
   sees that ACPI_HANDLE(dev) is already set to the 1st ACPI memory 
   object.
  
  That sound's plausible, but I wonder how we can fix that?
  
  There's no way for a single physical device to have two different 
  ACPI
  companions.  It looks like the memory blocks should be 64 M each 
  in that
  case.  Or we need to create two child devices for each memory block 
  and
  associate each of them with an ACPI object.  That would lead to 
  complications
  in the user space interface, though.
 
 Right.  Even bigger issue is that I do not think __add_pages() and
 __remove_pages() can add / delete a memory chunk that is less than
 128MB.  128MB is the granularity of them.  So, we may just have to 
 fail
 this case gracefully.

FYI: I have submitted the patch blow to close this part of the issue...

https://lkml.org/lkml/2013/8/8/396
   
   That looks good to me, but we'd still need to make it possible to have
   memory blocks smaller than 128 MB ...
  
  Do you mean acpi_bind_one() needs to be able to handle such case?  If
  so, it should not be a problem since a memory block device won't be
  created when add_memory() fails with the change above.  So, there is no
  binding to be done.  If you mean add_memory() needs to be able to handle
  a smaller range, that's quite a tough one unless we make the section
  size smaller.
  
  BTW, when add_memory() fails, the memory hot-add request still succeeds
  with no driver attached.  This seems logical, but the added device is
  useless when no handler is attached.  And it does not allow ejecting the
  device with no handler.  I am not too worry about this since this is a
  rare case, but it reminded me that the framework won't handle rollback.
 
 I'm not sure which rollback you mean.  During removal?

I meant rollback during hot-add.  Ideally, a device should be either
added in usable state (success) or failed back to the original state
(rollback).  Added in un-usable state is not really a success for users,
and creates an odd state to deal with.  But it is still a LOT better
than crashing the system.  So, I think this outcome is reasonable on
this framework because adding rollback at this point will complicate the
things unnecessarily.

 There are two slight problems here in my view.  First, even if the device
 cannot be ejected directly, it still will be removed when its parent is
 ejected, so it may be more consistent to just allow everything to be ejected
 regardless of whether or not it has a scan handler.  

Agreed.

 Second, I guess the
 removal is undesirable for memory devices for which the registration of the
 scan handler failed, so it would be good to fail the offline of such devices
 regardless of how we get there.  That's why I thought it would be good to have
 an offline disabled flag in struct acpi_device.

I see.  But when attach() failed, the memory device may not be used by
the kernel.  So, I think it should be safe to remove it.

Thanks,
-Toshi

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Cannot hot remove a memory device

2013-08-12 Thread Rafael J. Wysocki
On Monday, August 12, 2013 02:40:43 PM Toshi Kani wrote:
 On Sun, 2013-08-11 at 23:13 +0200, Rafael J. Wysocki wrote:
  On Thursday, August 08, 2013 04:50:42 PM Toshi Kani wrote:
   On Fri, 2013-08-09 at 00:12 +0200, Rafael J. Wysocki wrote:
On Thursday, August 08, 2013 11:15:20 AM Toshi Kani wrote:
 On Fri, 2013-08-02 at 18:04 -0600, Toshi Kani wrote:
  On Sat, 2013-08-03 at 01:43 +0200, Rafael J. Wysocki wrote:
   On Friday, August 02, 2013 03:46:15 PM Toshi Kani wrote:
On Thu, 2013-08-01 at 23:43 +0200, Rafael J. Wysocki wrote:
  :
I think it fails with -EINVAL at the place with dev_warn(dev, 
ACPI
handle is already set\n).  When two ACPI memory objects 
associate with
a same memory block, the bind procedure of the 2nd ACPI memory 
object
sees that ACPI_HANDLE(dev) is already set to the 1st ACPI 
memory object.
   
   That sound's plausible, but I wonder how we can fix that?
   
   There's no way for a single physical device to have two different 
   ACPI
   companions.  It looks like the memory blocks should be 64 M 
   each in that
   case.  Or we need to create two child devices for each memory 
   block and
   associate each of them with an ACPI object.  That would lead to 
   complications
   in the user space interface, though.
  
  Right.  Even bigger issue is that I do not think __add_pages() and
  __remove_pages() can add / delete a memory chunk that is less than
  128MB.  128MB is the granularity of them.  So, we may just have to 
  fail
  this case gracefully.
 
 FYI: I have submitted the patch blow to close this part of the 
 issue...
 
 https://lkml.org/lkml/2013/8/8/396

That looks good to me, but we'd still need to make it possible to have
memory blocks smaller than 128 MB ...
   
   Do you mean acpi_bind_one() needs to be able to handle such case?  If
   so, it should not be a problem since a memory block device won't be
   created when add_memory() fails with the change above.  So, there is no
   binding to be done.  If you mean add_memory() needs to be able to handle
   a smaller range, that's quite a tough one unless we make the section
   size smaller.
   
   BTW, when add_memory() fails, the memory hot-add request still succeeds
   with no driver attached.  This seems logical, but the added device is
   useless when no handler is attached.  And it does not allow ejecting the
   device with no handler.  I am not too worry about this since this is a
   rare case, but it reminded me that the framework won't handle rollback.
  
  I'm not sure which rollback you mean.  During removal?
 
 I meant rollback during hot-add.  Ideally, a device should be either
 added in usable state (success) or failed back to the original state
 (rollback).  Added in un-usable state is not really a success for users,
 and creates an odd state to deal with.  But it is still a LOT better
 than crashing the system.  So, I think this outcome is reasonable on
 this framework because adding rollback at this point will complicate the
 things unnecessarily.
 
  There are two slight problems here in my view.  First, even if the device
  cannot be ejected directly, it still will be removed when its parent is
  ejected, so it may be more consistent to just allow everything to be ejected
  regardless of whether or not it has a scan handler.  
 
 Agreed.
 
  Second, I guess the
  removal is undesirable for memory devices for which the registration of the
  scan handler failed, so it would be good to fail the offline of such 
  devices
  regardless of how we get there.  That's why I thought it would be good to 
  have
  an offline disabled flag in struct acpi_device.
 
 I see.  But when attach() failed, the memory device may not be used by
 the kernel.  So, I think it should be safe to remove it.

The failure of .attach() need not mean that the memory is not used by the
kernel, though, and the ACPI device object is still there and still may
be involved in some removal scenarios (through its parent).

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Cannot hot remove a memory device

2013-08-12 Thread Toshi Kani
On Tue, 2013-08-13 at 02:45 +0200, Rafael J. Wysocki wrote:
 On Monday, August 12, 2013 02:40:43 PM Toshi Kani wrote:
  On Sun, 2013-08-11 at 23:13 +0200, Rafael J. Wysocki wrote:
   On Thursday, August 08, 2013 04:50:42 PM Toshi Kani wrote:
On Fri, 2013-08-09 at 00:12 +0200, Rafael J. Wysocki wrote:
 On Thursday, August 08, 2013 11:15:20 AM Toshi Kani wrote:
  On Fri, 2013-08-02 at 18:04 -0600, Toshi Kani wrote:
   On Sat, 2013-08-03 at 01:43 +0200, Rafael J. Wysocki wrote:
On Friday, August 02, 2013 03:46:15 PM Toshi Kani wrote:
 On Thu, 2013-08-01 at 23:43 +0200, Rafael J. Wysocki wrote:
   :
 I think it fails with -EINVAL at the place with dev_warn(dev, 
 ACPI
 handle is already set\n).  When two ACPI memory objects 
 associate with
 a same memory block, the bind procedure of the 2nd ACPI 
 memory object
 sees that ACPI_HANDLE(dev) is already set to the 1st ACPI 
 memory object.

That sound's plausible, but I wonder how we can fix that?

There's no way for a single physical device to have two 
different ACPI
companions.  It looks like the memory blocks should be 64 M 
each in that
case.  Or we need to create two child devices for each memory 
block and
associate each of them with an ACPI object.  That would lead to 
complications
in the user space interface, though.
   
   Right.  Even bigger issue is that I do not think __add_pages() and
   __remove_pages() can add / delete a memory chunk that is less than
   128MB.  128MB is the granularity of them.  So, we may just have 
   to fail
   this case gracefully.
  
  FYI: I have submitted the patch blow to close this part of the 
  issue...
  
  https://lkml.org/lkml/2013/8/8/396
 
 That looks good to me, but we'd still need to make it possible to have
 memory blocks smaller than 128 MB ...

Do you mean acpi_bind_one() needs to be able to handle such case?  If
so, it should not be a problem since a memory block device won't be
created when add_memory() fails with the change above.  So, there is no
binding to be done.  If you mean add_memory() needs to be able to handle
a smaller range, that's quite a tough one unless we make the section
size smaller.

BTW, when add_memory() fails, the memory hot-add request still succeeds
with no driver attached.  This seems logical, but the added device is
useless when no handler is attached.  And it does not allow ejecting the
device with no handler.  I am not too worry about this since this is a
rare case, but it reminded me that the framework won't handle rollback.
   
   I'm not sure which rollback you mean.  During removal?
  
  I meant rollback during hot-add.  Ideally, a device should be either
  added in usable state (success) or failed back to the original state
  (rollback).  Added in un-usable state is not really a success for users,
  and creates an odd state to deal with.  But it is still a LOT better
  than crashing the system.  So, I think this outcome is reasonable on
  this framework because adding rollback at this point will complicate the
  things unnecessarily.
  
   There are two slight problems here in my view.  First, even if the device
   cannot be ejected directly, it still will be removed when its parent is
   ejected, so it may be more consistent to just allow everything to be 
   ejected
   regardless of whether or not it has a scan handler.  
  
  Agreed.
  
   Second, I guess the
   removal is undesirable for memory devices for which the registration of 
   the
   scan handler failed, so it would be good to fail the offline of such 
   devices
   regardless of how we get there.  That's why I thought it would be good to 
   have
   an offline disabled flag in struct acpi_device.
  
  I see.  But when attach() failed, the memory device may not be used by
  the kernel.  So, I think it should be safe to remove it.
 
 The failure of .attach() need not mean that the memory is not used by the
 kernel, though, and the ACPI device object is still there and still may
 be involved in some removal scenarios (through its parent).

As long as .attach() is failed cleanly, which I think is the case for
the memory handler (if not, we need to fix it), the rest of the kernel
code may not know what this device is.  That is, the ACPI memory handler
is the only one that knows PNP0C80 is a memory device.  So, I do not
think the memory can be used in such case...

Thanks,
-Toshi


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Cannot hot remove a memory device

2013-08-11 Thread Rafael J. Wysocki
On Thursday, August 08, 2013 04:50:42 PM Toshi Kani wrote:
> On Fri, 2013-08-09 at 00:12 +0200, Rafael J. Wysocki wrote:
> > On Thursday, August 08, 2013 11:15:20 AM Toshi Kani wrote:
> > > On Fri, 2013-08-02 at 18:04 -0600, Toshi Kani wrote:
> > > > On Sat, 2013-08-03 at 01:43 +0200, Rafael J. Wysocki wrote:
> > > > > On Friday, August 02, 2013 03:46:15 PM Toshi Kani wrote:
> > > > > > On Thu, 2013-08-01 at 23:43 +0200, Rafael J. Wysocki wrote:
> > >  :
> > > > > > I think it fails with -EINVAL at the place with dev_warn(dev, "ACPI
> > > > > > handle is already set\n").  When two ACPI memory objects associate 
> > > > > > with
> > > > > > a same memory block, the bind procedure of the 2nd ACPI memory 
> > > > > > object
> > > > > > sees that ACPI_HANDLE(dev) is already set to the 1st ACPI memory 
> > > > > > object.
> > > > > 
> > > > > That sound's plausible, but I wonder how we can fix that?
> > > > > 
> > > > > There's no way for a single physical device to have two different ACPI
> > > > > "companions".  It looks like the memory blocks should be 64 M each in 
> > > > > that
> > > > > case.  Or we need to create two child devices for each memory block 
> > > > > and
> > > > > associate each of them with an ACPI object.  That would lead to 
> > > > > complications
> > > > > in the user space interface, though.
> > > > 
> > > > Right.  Even bigger issue is that I do not think __add_pages() and
> > > > __remove_pages() can add / delete a memory chunk that is less than
> > > > 128MB.  128MB is the granularity of them.  So, we may just have to fail
> > > > this case gracefully.
> > > 
> > > FYI: I have submitted the patch blow to close this part of the issue...
> > > 
> > > https://lkml.org/lkml/2013/8/8/396
> > 
> > That looks good to me, but we'd still need to make it possible to have
> > memory blocks smaller than 128 MB ...
> 
> Do you mean acpi_bind_one() needs to be able to handle such case?  If
> so, it should not be a problem since a memory block device won't be
> created when add_memory() fails with the change above.  So, there is no
> binding to be done.  If you mean add_memory() needs to be able to handle
> a smaller range, that's quite a tough one unless we make the section
> size smaller.
> 
> BTW, when add_memory() fails, the memory hot-add request still succeeds
> with no driver attached.  This seems logical, but the added device is
> useless when no handler is attached.  And it does not allow ejecting the
> device with no handler.  I am not too worry about this since this is a
> rare case, but it reminded me that the framework won't handle rollback.

I'm not sure which rollback you mean.  During removal?

There are two slight problems here in my view.  First, even if the device
cannot be ejected directly, it still will be removed when its parent is
ejected, so it may be more consistent to just allow everything to be ejected
regardless of whether or not it has a scan handler.  Second, I guess the
removal is undesirable for memory devices for which the registration of the
scan handler failed, so it would be good to fail the "offline" of such devices
regardless of how we get there.  That's why I thought it would be good to have
an "offline disabled" flag in struct acpi_device.

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Cannot hot remove a memory device

2013-08-11 Thread Rafael J. Wysocki
On Thursday, August 08, 2013 04:50:42 PM Toshi Kani wrote:
 On Fri, 2013-08-09 at 00:12 +0200, Rafael J. Wysocki wrote:
  On Thursday, August 08, 2013 11:15:20 AM Toshi Kani wrote:
   On Fri, 2013-08-02 at 18:04 -0600, Toshi Kani wrote:
On Sat, 2013-08-03 at 01:43 +0200, Rafael J. Wysocki wrote:
 On Friday, August 02, 2013 03:46:15 PM Toshi Kani wrote:
  On Thu, 2013-08-01 at 23:43 +0200, Rafael J. Wysocki wrote:
:
  I think it fails with -EINVAL at the place with dev_warn(dev, ACPI
  handle is already set\n).  When two ACPI memory objects associate 
  with
  a same memory block, the bind procedure of the 2nd ACPI memory 
  object
  sees that ACPI_HANDLE(dev) is already set to the 1st ACPI memory 
  object.
 
 That sound's plausible, but I wonder how we can fix that?
 
 There's no way for a single physical device to have two different ACPI
 companions.  It looks like the memory blocks should be 64 M each in 
 that
 case.  Or we need to create two child devices for each memory block 
 and
 associate each of them with an ACPI object.  That would lead to 
 complications
 in the user space interface, though.

Right.  Even bigger issue is that I do not think __add_pages() and
__remove_pages() can add / delete a memory chunk that is less than
128MB.  128MB is the granularity of them.  So, we may just have to fail
this case gracefully.
   
   FYI: I have submitted the patch blow to close this part of the issue...
   
   https://lkml.org/lkml/2013/8/8/396
  
  That looks good to me, but we'd still need to make it possible to have
  memory blocks smaller than 128 MB ...
 
 Do you mean acpi_bind_one() needs to be able to handle such case?  If
 so, it should not be a problem since a memory block device won't be
 created when add_memory() fails with the change above.  So, there is no
 binding to be done.  If you mean add_memory() needs to be able to handle
 a smaller range, that's quite a tough one unless we make the section
 size smaller.
 
 BTW, when add_memory() fails, the memory hot-add request still succeeds
 with no driver attached.  This seems logical, but the added device is
 useless when no handler is attached.  And it does not allow ejecting the
 device with no handler.  I am not too worry about this since this is a
 rare case, but it reminded me that the framework won't handle rollback.

I'm not sure which rollback you mean.  During removal?

There are two slight problems here in my view.  First, even if the device
cannot be ejected directly, it still will be removed when its parent is
ejected, so it may be more consistent to just allow everything to be ejected
regardless of whether or not it has a scan handler.  Second, I guess the
removal is undesirable for memory devices for which the registration of the
scan handler failed, so it would be good to fail the offline of such devices
regardless of how we get there.  That's why I thought it would be good to have
an offline disabled flag in struct acpi_device.

Thanks,
Rafael

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Cannot hot remove a memory device

2013-08-08 Thread Toshi Kani
On Fri, 2013-08-09 at 01:14 +0200, Rafael J. Wysocki wrote:
> On Thursday, August 08, 2013 04:50:42 PM Toshi Kani wrote:
> > On Fri, 2013-08-09 at 00:12 +0200, Rafael J. Wysocki wrote:
> > > On Thursday, August 08, 2013 11:15:20 AM Toshi Kani wrote:
> > > > On Fri, 2013-08-02 at 18:04 -0600, Toshi Kani wrote:
> > > > > On Sat, 2013-08-03 at 01:43 +0200, Rafael J. Wysocki wrote:
> > > > > > On Friday, August 02, 2013 03:46:15 PM Toshi Kani wrote:
> > > > > > > On Thu, 2013-08-01 at 23:43 +0200, Rafael J. Wysocki wrote:
> > > >  :
> > > > > > > I think it fails with -EINVAL at the place with dev_warn(dev, 
> > > > > > > "ACPI
> > > > > > > handle is already set\n").  When two ACPI memory objects 
> > > > > > > associate with
> > > > > > > a same memory block, the bind procedure of the 2nd ACPI memory 
> > > > > > > object
> > > > > > > sees that ACPI_HANDLE(dev) is already set to the 1st ACPI memory 
> > > > > > > object.
> > > > > > 
> > > > > > That sound's plausible, but I wonder how we can fix that?
> > > > > > 
> > > > > > There's no way for a single physical device to have two different 
> > > > > > ACPI
> > > > > > "companions".  It looks like the memory blocks should be 64 M each 
> > > > > > in that
> > > > > > case.  Or we need to create two child devices for each memory block 
> > > > > > and
> > > > > > associate each of them with an ACPI object.  That would lead to 
> > > > > > complications
> > > > > > in the user space interface, though.
> > > > > 
> > > > > Right.  Even bigger issue is that I do not think __add_pages() and
> > > > > __remove_pages() can add / delete a memory chunk that is less than
> > > > > 128MB.  128MB is the granularity of them.  So, we may just have to 
> > > > > fail
> > > > > this case gracefully.
> > > > 
> > > > FYI: I have submitted the patch blow to close this part of the issue...
> > > > 
> > > > https://lkml.org/lkml/2013/8/8/396
> > > 
> > > That looks good to me, but we'd still need to make it possible to have
> > > memory blocks smaller than 128 MB ...
> > 
> > Do you mean acpi_bind_one() needs to be able to handle such case?  If
> > so, it should not be a problem since a memory block device won't be
> > created when add_memory() fails with the change above.  So, there is no
> > binding to be done.  If you mean add_memory() needs to be able to handle
> > a smaller range, that's quite a tough one unless we make the section
> > size smaller.
> > 
> > BTW, when add_memory() fails, the memory hot-add request still succeeds
> > with no driver attached.  This seems logical, but the added device is
> > useless when no handler is attached.  And it does not allow ejecting the
> > device with no handler.  I am not too worry about this since this is a
> > rare case, but it reminded me that the framework won't handle rollback.
> 
> That is a valid observation.
> 
> Perhaps we should add a flag that will cause acpi_bus_offline_companions()
> to fail immediately if that flag is set for the given device?
> 
> Then, acpi_memory_enable_device() could set that flag for mem_device->device
> when either add_memory() or acpi_bind_memory_blocks() fails?

I am not sure if we need a new flag, but I agree that we could allow
ejecting a device with no handler attached.  We can probably skip any
device specific part (such as offlining) when no handler is attached.

Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Cannot hot remove a memory device

2013-08-08 Thread Rafael J. Wysocki
On Thursday, August 08, 2013 04:50:42 PM Toshi Kani wrote:
> On Fri, 2013-08-09 at 00:12 +0200, Rafael J. Wysocki wrote:
> > On Thursday, August 08, 2013 11:15:20 AM Toshi Kani wrote:
> > > On Fri, 2013-08-02 at 18:04 -0600, Toshi Kani wrote:
> > > > On Sat, 2013-08-03 at 01:43 +0200, Rafael J. Wysocki wrote:
> > > > > On Friday, August 02, 2013 03:46:15 PM Toshi Kani wrote:
> > > > > > On Thu, 2013-08-01 at 23:43 +0200, Rafael J. Wysocki wrote:
> > >  :
> > > > > > I think it fails with -EINVAL at the place with dev_warn(dev, "ACPI
> > > > > > handle is already set\n").  When two ACPI memory objects associate 
> > > > > > with
> > > > > > a same memory block, the bind procedure of the 2nd ACPI memory 
> > > > > > object
> > > > > > sees that ACPI_HANDLE(dev) is already set to the 1st ACPI memory 
> > > > > > object.
> > > > > 
> > > > > That sound's plausible, but I wonder how we can fix that?
> > > > > 
> > > > > There's no way for a single physical device to have two different ACPI
> > > > > "companions".  It looks like the memory blocks should be 64 M each in 
> > > > > that
> > > > > case.  Or we need to create two child devices for each memory block 
> > > > > and
> > > > > associate each of them with an ACPI object.  That would lead to 
> > > > > complications
> > > > > in the user space interface, though.
> > > > 
> > > > Right.  Even bigger issue is that I do not think __add_pages() and
> > > > __remove_pages() can add / delete a memory chunk that is less than
> > > > 128MB.  128MB is the granularity of them.  So, we may just have to fail
> > > > this case gracefully.
> > > 
> > > FYI: I have submitted the patch blow to close this part of the issue...
> > > 
> > > https://lkml.org/lkml/2013/8/8/396
> > 
> > That looks good to me, but we'd still need to make it possible to have
> > memory blocks smaller than 128 MB ...
> 
> Do you mean acpi_bind_one() needs to be able to handle such case?  If
> so, it should not be a problem since a memory block device won't be
> created when add_memory() fails with the change above.  So, there is no
> binding to be done.  If you mean add_memory() needs to be able to handle
> a smaller range, that's quite a tough one unless we make the section
> size smaller.
> 
> BTW, when add_memory() fails, the memory hot-add request still succeeds
> with no driver attached.  This seems logical, but the added device is
> useless when no handler is attached.  And it does not allow ejecting the
> device with no handler.  I am not too worry about this since this is a
> rare case, but it reminded me that the framework won't handle rollback.

That is a valid observation.

Perhaps we should add a flag that will cause acpi_bus_offline_companions()
to fail immediately if that flag is set for the given device?

Then, acpi_memory_enable_device() could set that flag for mem_device->device
when either add_memory() or acpi_bind_memory_blocks() fails?

Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Cannot hot remove a memory device

2013-08-08 Thread Toshi Kani
On Fri, 2013-08-09 at 00:12 +0200, Rafael J. Wysocki wrote:
> On Thursday, August 08, 2013 11:15:20 AM Toshi Kani wrote:
> > On Fri, 2013-08-02 at 18:04 -0600, Toshi Kani wrote:
> > > On Sat, 2013-08-03 at 01:43 +0200, Rafael J. Wysocki wrote:
> > > > On Friday, August 02, 2013 03:46:15 PM Toshi Kani wrote:
> > > > > On Thu, 2013-08-01 at 23:43 +0200, Rafael J. Wysocki wrote:
> >  :
> > > > > I think it fails with -EINVAL at the place with dev_warn(dev, "ACPI
> > > > > handle is already set\n").  When two ACPI memory objects associate 
> > > > > with
> > > > > a same memory block, the bind procedure of the 2nd ACPI memory object
> > > > > sees that ACPI_HANDLE(dev) is already set to the 1st ACPI memory 
> > > > > object.
> > > > 
> > > > That sound's plausible, but I wonder how we can fix that?
> > > > 
> > > > There's no way for a single physical device to have two different ACPI
> > > > "companions".  It looks like the memory blocks should be 64 M each in 
> > > > that
> > > > case.  Or we need to create two child devices for each memory block and
> > > > associate each of them with an ACPI object.  That would lead to 
> > > > complications
> > > > in the user space interface, though.
> > > 
> > > Right.  Even bigger issue is that I do not think __add_pages() and
> > > __remove_pages() can add / delete a memory chunk that is less than
> > > 128MB.  128MB is the granularity of them.  So, we may just have to fail
> > > this case gracefully.
> > 
> > FYI: I have submitted the patch blow to close this part of the issue...
> > 
> > https://lkml.org/lkml/2013/8/8/396
> 
> That looks good to me, but we'd still need to make it possible to have
> memory blocks smaller than 128 MB ...

Do you mean acpi_bind_one() needs to be able to handle such case?  If
so, it should not be a problem since a memory block device won't be
created when add_memory() fails with the change above.  So, there is no
binding to be done.  If you mean add_memory() needs to be able to handle
a smaller range, that's quite a tough one unless we make the section
size smaller.

BTW, when add_memory() fails, the memory hot-add request still succeeds
with no driver attached.  This seems logical, but the added device is
useless when no handler is attached.  And it does not allow ejecting the
device with no handler.  I am not too worry about this since this is a
rare case, but it reminded me that the framework won't handle rollback.

Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Cannot hot remove a memory device

2013-08-08 Thread Rafael J. Wysocki
On Thursday, August 08, 2013 11:15:20 AM Toshi Kani wrote:
> On Fri, 2013-08-02 at 18:04 -0600, Toshi Kani wrote:
> > On Sat, 2013-08-03 at 01:43 +0200, Rafael J. Wysocki wrote:
> > > On Friday, August 02, 2013 03:46:15 PM Toshi Kani wrote:
> > > > On Thu, 2013-08-01 at 23:43 +0200, Rafael J. Wysocki wrote:
>  :
> > > > I think it fails with -EINVAL at the place with dev_warn(dev, "ACPI
> > > > handle is already set\n").  When two ACPI memory objects associate with
> > > > a same memory block, the bind procedure of the 2nd ACPI memory object
> > > > sees that ACPI_HANDLE(dev) is already set to the 1st ACPI memory object.
> > > 
> > > That sound's plausible, but I wonder how we can fix that?
> > > 
> > > There's no way for a single physical device to have two different ACPI
> > > "companions".  It looks like the memory blocks should be 64 M each in that
> > > case.  Or we need to create two child devices for each memory block and
> > > associate each of them with an ACPI object.  That would lead to 
> > > complications
> > > in the user space interface, though.
> > 
> > Right.  Even bigger issue is that I do not think __add_pages() and
> > __remove_pages() can add / delete a memory chunk that is less than
> > 128MB.  128MB is the granularity of them.  So, we may just have to fail
> > this case gracefully.
> 
> FYI: I have submitted the patch blow to close this part of the issue...
> 
> https://lkml.org/lkml/2013/8/8/396

That looks good to me, but we'd still need to make it possible to have
memory blocks smaller than 128 MB ...

Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Cannot hot remove a memory device

2013-08-08 Thread Toshi Kani
On Fri, 2013-08-02 at 18:04 -0600, Toshi Kani wrote:
> On Sat, 2013-08-03 at 01:43 +0200, Rafael J. Wysocki wrote:
> > On Friday, August 02, 2013 03:46:15 PM Toshi Kani wrote:
> > > On Thu, 2013-08-01 at 23:43 +0200, Rafael J. Wysocki wrote:
 :
> > > I think it fails with -EINVAL at the place with dev_warn(dev, "ACPI
> > > handle is already set\n").  When two ACPI memory objects associate with
> > > a same memory block, the bind procedure of the 2nd ACPI memory object
> > > sees that ACPI_HANDLE(dev) is already set to the 1st ACPI memory object.
> > 
> > That sound's plausible, but I wonder how we can fix that?
> > 
> > There's no way for a single physical device to have two different ACPI
> > "companions".  It looks like the memory blocks should be 64 M each in that
> > case.  Or we need to create two child devices for each memory block and
> > associate each of them with an ACPI object.  That would lead to 
> > complications
> > in the user space interface, though.
> 
> Right.  Even bigger issue is that I do not think __add_pages() and
> __remove_pages() can add / delete a memory chunk that is less than
> 128MB.  128MB is the granularity of them.  So, we may just have to fail
> this case gracefully.

FYI: I have submitted the patch blow to close this part of the issue...

https://lkml.org/lkml/2013/8/8/396

Thanks,
-Toshi


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Cannot hot remove a memory device

2013-08-08 Thread Toshi Kani
On Fri, 2013-08-02 at 18:04 -0600, Toshi Kani wrote:
 On Sat, 2013-08-03 at 01:43 +0200, Rafael J. Wysocki wrote:
  On Friday, August 02, 2013 03:46:15 PM Toshi Kani wrote:
   On Thu, 2013-08-01 at 23:43 +0200, Rafael J. Wysocki wrote:
 :
   I think it fails with -EINVAL at the place with dev_warn(dev, ACPI
   handle is already set\n).  When two ACPI memory objects associate with
   a same memory block, the bind procedure of the 2nd ACPI memory object
   sees that ACPI_HANDLE(dev) is already set to the 1st ACPI memory object.
  
  That sound's plausible, but I wonder how we can fix that?
  
  There's no way for a single physical device to have two different ACPI
  companions.  It looks like the memory blocks should be 64 M each in that
  case.  Or we need to create two child devices for each memory block and
  associate each of them with an ACPI object.  That would lead to 
  complications
  in the user space interface, though.
 
 Right.  Even bigger issue is that I do not think __add_pages() and
 __remove_pages() can add / delete a memory chunk that is less than
 128MB.  128MB is the granularity of them.  So, we may just have to fail
 this case gracefully.

FYI: I have submitted the patch blow to close this part of the issue...

https://lkml.org/lkml/2013/8/8/396

Thanks,
-Toshi


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Cannot hot remove a memory device

2013-08-08 Thread Rafael J. Wysocki
On Thursday, August 08, 2013 11:15:20 AM Toshi Kani wrote:
 On Fri, 2013-08-02 at 18:04 -0600, Toshi Kani wrote:
  On Sat, 2013-08-03 at 01:43 +0200, Rafael J. Wysocki wrote:
   On Friday, August 02, 2013 03:46:15 PM Toshi Kani wrote:
On Thu, 2013-08-01 at 23:43 +0200, Rafael J. Wysocki wrote:
  :
I think it fails with -EINVAL at the place with dev_warn(dev, ACPI
handle is already set\n).  When two ACPI memory objects associate with
a same memory block, the bind procedure of the 2nd ACPI memory object
sees that ACPI_HANDLE(dev) is already set to the 1st ACPI memory object.
   
   That sound's plausible, but I wonder how we can fix that?
   
   There's no way for a single physical device to have two different ACPI
   companions.  It looks like the memory blocks should be 64 M each in that
   case.  Or we need to create two child devices for each memory block and
   associate each of them with an ACPI object.  That would lead to 
   complications
   in the user space interface, though.
  
  Right.  Even bigger issue is that I do not think __add_pages() and
  __remove_pages() can add / delete a memory chunk that is less than
  128MB.  128MB is the granularity of them.  So, we may just have to fail
  this case gracefully.
 
 FYI: I have submitted the patch blow to close this part of the issue...
 
 https://lkml.org/lkml/2013/8/8/396

That looks good to me, but we'd still need to make it possible to have
memory blocks smaller than 128 MB ...

Rafael

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Cannot hot remove a memory device

2013-08-08 Thread Toshi Kani
On Fri, 2013-08-09 at 00:12 +0200, Rafael J. Wysocki wrote:
 On Thursday, August 08, 2013 11:15:20 AM Toshi Kani wrote:
  On Fri, 2013-08-02 at 18:04 -0600, Toshi Kani wrote:
   On Sat, 2013-08-03 at 01:43 +0200, Rafael J. Wysocki wrote:
On Friday, August 02, 2013 03:46:15 PM Toshi Kani wrote:
 On Thu, 2013-08-01 at 23:43 +0200, Rafael J. Wysocki wrote:
   :
 I think it fails with -EINVAL at the place with dev_warn(dev, ACPI
 handle is already set\n).  When two ACPI memory objects associate 
 with
 a same memory block, the bind procedure of the 2nd ACPI memory object
 sees that ACPI_HANDLE(dev) is already set to the 1st ACPI memory 
 object.

That sound's plausible, but I wonder how we can fix that?

There's no way for a single physical device to have two different ACPI
companions.  It looks like the memory blocks should be 64 M each in 
that
case.  Or we need to create two child devices for each memory block and
associate each of them with an ACPI object.  That would lead to 
complications
in the user space interface, though.
   
   Right.  Even bigger issue is that I do not think __add_pages() and
   __remove_pages() can add / delete a memory chunk that is less than
   128MB.  128MB is the granularity of them.  So, we may just have to fail
   this case gracefully.
  
  FYI: I have submitted the patch blow to close this part of the issue...
  
  https://lkml.org/lkml/2013/8/8/396
 
 That looks good to me, but we'd still need to make it possible to have
 memory blocks smaller than 128 MB ...

Do you mean acpi_bind_one() needs to be able to handle such case?  If
so, it should not be a problem since a memory block device won't be
created when add_memory() fails with the change above.  So, there is no
binding to be done.  If you mean add_memory() needs to be able to handle
a smaller range, that's quite a tough one unless we make the section
size smaller.

BTW, when add_memory() fails, the memory hot-add request still succeeds
with no driver attached.  This seems logical, but the added device is
useless when no handler is attached.  And it does not allow ejecting the
device with no handler.  I am not too worry about this since this is a
rare case, but it reminded me that the framework won't handle rollback.

Thanks,
-Toshi

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Cannot hot remove a memory device

2013-08-08 Thread Rafael J. Wysocki
On Thursday, August 08, 2013 04:50:42 PM Toshi Kani wrote:
 On Fri, 2013-08-09 at 00:12 +0200, Rafael J. Wysocki wrote:
  On Thursday, August 08, 2013 11:15:20 AM Toshi Kani wrote:
   On Fri, 2013-08-02 at 18:04 -0600, Toshi Kani wrote:
On Sat, 2013-08-03 at 01:43 +0200, Rafael J. Wysocki wrote:
 On Friday, August 02, 2013 03:46:15 PM Toshi Kani wrote:
  On Thu, 2013-08-01 at 23:43 +0200, Rafael J. Wysocki wrote:
:
  I think it fails with -EINVAL at the place with dev_warn(dev, ACPI
  handle is already set\n).  When two ACPI memory objects associate 
  with
  a same memory block, the bind procedure of the 2nd ACPI memory 
  object
  sees that ACPI_HANDLE(dev) is already set to the 1st ACPI memory 
  object.
 
 That sound's plausible, but I wonder how we can fix that?
 
 There's no way for a single physical device to have two different ACPI
 companions.  It looks like the memory blocks should be 64 M each in 
 that
 case.  Or we need to create two child devices for each memory block 
 and
 associate each of them with an ACPI object.  That would lead to 
 complications
 in the user space interface, though.

Right.  Even bigger issue is that I do not think __add_pages() and
__remove_pages() can add / delete a memory chunk that is less than
128MB.  128MB is the granularity of them.  So, we may just have to fail
this case gracefully.
   
   FYI: I have submitted the patch blow to close this part of the issue...
   
   https://lkml.org/lkml/2013/8/8/396
  
  That looks good to me, but we'd still need to make it possible to have
  memory blocks smaller than 128 MB ...
 
 Do you mean acpi_bind_one() needs to be able to handle such case?  If
 so, it should not be a problem since a memory block device won't be
 created when add_memory() fails with the change above.  So, there is no
 binding to be done.  If you mean add_memory() needs to be able to handle
 a smaller range, that's quite a tough one unless we make the section
 size smaller.
 
 BTW, when add_memory() fails, the memory hot-add request still succeeds
 with no driver attached.  This seems logical, but the added device is
 useless when no handler is attached.  And it does not allow ejecting the
 device with no handler.  I am not too worry about this since this is a
 rare case, but it reminded me that the framework won't handle rollback.

That is a valid observation.

Perhaps we should add a flag that will cause acpi_bus_offline_companions()
to fail immediately if that flag is set for the given device?

Then, acpi_memory_enable_device() could set that flag for mem_device-device
when either add_memory() or acpi_bind_memory_blocks() fails?

Rafael

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Cannot hot remove a memory device

2013-08-08 Thread Toshi Kani
On Fri, 2013-08-09 at 01:14 +0200, Rafael J. Wysocki wrote:
 On Thursday, August 08, 2013 04:50:42 PM Toshi Kani wrote:
  On Fri, 2013-08-09 at 00:12 +0200, Rafael J. Wysocki wrote:
   On Thursday, August 08, 2013 11:15:20 AM Toshi Kani wrote:
On Fri, 2013-08-02 at 18:04 -0600, Toshi Kani wrote:
 On Sat, 2013-08-03 at 01:43 +0200, Rafael J. Wysocki wrote:
  On Friday, August 02, 2013 03:46:15 PM Toshi Kani wrote:
   On Thu, 2013-08-01 at 23:43 +0200, Rafael J. Wysocki wrote:
 :
   I think it fails with -EINVAL at the place with dev_warn(dev, 
   ACPI
   handle is already set\n).  When two ACPI memory objects 
   associate with
   a same memory block, the bind procedure of the 2nd ACPI memory 
   object
   sees that ACPI_HANDLE(dev) is already set to the 1st ACPI memory 
   object.
  
  That sound's plausible, but I wonder how we can fix that?
  
  There's no way for a single physical device to have two different 
  ACPI
  companions.  It looks like the memory blocks should be 64 M each 
  in that
  case.  Or we need to create two child devices for each memory block 
  and
  associate each of them with an ACPI object.  That would lead to 
  complications
  in the user space interface, though.
 
 Right.  Even bigger issue is that I do not think __add_pages() and
 __remove_pages() can add / delete a memory chunk that is less than
 128MB.  128MB is the granularity of them.  So, we may just have to 
 fail
 this case gracefully.

FYI: I have submitted the patch blow to close this part of the issue...

https://lkml.org/lkml/2013/8/8/396
   
   That looks good to me, but we'd still need to make it possible to have
   memory blocks smaller than 128 MB ...
  
  Do you mean acpi_bind_one() needs to be able to handle such case?  If
  so, it should not be a problem since a memory block device won't be
  created when add_memory() fails with the change above.  So, there is no
  binding to be done.  If you mean add_memory() needs to be able to handle
  a smaller range, that's quite a tough one unless we make the section
  size smaller.
  
  BTW, when add_memory() fails, the memory hot-add request still succeeds
  with no driver attached.  This seems logical, but the added device is
  useless when no handler is attached.  And it does not allow ejecting the
  device with no handler.  I am not too worry about this since this is a
  rare case, but it reminded me that the framework won't handle rollback.
 
 That is a valid observation.
 
 Perhaps we should add a flag that will cause acpi_bus_offline_companions()
 to fail immediately if that flag is set for the given device?
 
 Then, acpi_memory_enable_device() could set that flag for mem_device-device
 when either add_memory() or acpi_bind_memory_blocks() fails?

I am not sure if we need a new flag, but I agree that we could allow
ejecting a device with no handler attached.  We can probably skip any
device specific part (such as offlining) when no handler is attached.

Thanks,
-Toshi

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Cannot hot remove a memory device (patch, updated)

2013-08-06 Thread Toshi Kani
On Tue, 2013-08-06 at 02:15 +0200, Rafael J. Wysocki wrote:
> On Monday, August 05, 2013 05:19:56 PM Toshi Kani wrote:
> > On Mon, 2013-08-05 at 15:14 +0200, Rafael J. Wysocki wrote:
> >   :
> > > Can you please test the appended patch?  I tested it somewhat, but since 
> > > the
> > > greatest number of physical nodes per ACPI device object I can get on my 
> > > test
> > > machines is 2 (and even that after hacking the kernel somewhat), that was 
> > > kind
> > > of unconclusive.
> > > 
> > > Thanks,
> > > Rafael
> > > 
> > > 
> > > ---
> > > From: Rafael J. Wysocki 
> > > Subject: ACPI: Drop physical_node_id_bitmap from struct acpi_device
> > > 
> > > The physical_node_id_bitmap in struct acpi_device is only used for
> > > looking up the first currently unused phyiscal dependent node ID
> > > by acpi_bind_one().  It is not really necessary, however, because
> > > acpi_bind_one() walks the entire physical_node_list of the given
> > > device object for sanity checking anyway and if that list is always
> > > sorted by node_id, it is straightforward to find the first gap
> > > between the currently used node IDs and use that number as the ID
> > > of the new list node.
> > > 
> > > This also removes the artificial limit of the maximum number of
> > > dependent physical devices per ACPI device object, which now depends
> > > only on the capacity of unsigend int.
> > > 
> > > Signed-off-by: Rafael J. Wysocki 
> > 
> > I like the change. Much better :-)
> > 
> > Acked-by: Toshi Kani 
> 
> However, it introduces a bug in acpi_unbind_one(), because the size of the 
> name
> array in there has to be increased too.  Updated patch follows.

Right.  Sorry, I overlooked this one.  Please apply my ask to this new
version. 

Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Cannot hot remove a memory device (patch, updated)

2013-08-06 Thread Rafael J. Wysocki
On Tuesday, August 06, 2013 11:12:51 AM Yasuaki Ishimatsu wrote:
> (2013/08/06 9:15), Rafael J. Wysocki wrote:
> > On Monday, August 05, 2013 05:19:56 PM Toshi Kani wrote:
> >> On Mon, 2013-08-05 at 15:14 +0200, Rafael J. Wysocki wrote:
> >>:
> >>> Can you please test the appended patch?  I tested it somewhat, but since 
> >>> the
> >>> greatest number of physical nodes per ACPI device object I can get on my 
> >>> test
> >>> machines is 2 (and even that after hacking the kernel somewhat), that was 
> >>> kind
> >>> of unconclusive.
> >>>
> >>> Thanks,
> >>> Rafael
> >>>
> >>>
> >>> ---
> >>> From: Rafael J. Wysocki 
> >>> Subject: ACPI: Drop physical_node_id_bitmap from struct acpi_device
> >>>
> >>> The physical_node_id_bitmap in struct acpi_device is only used for
> >>> looking up the first currently unused phyiscal dependent node ID
> >>> by acpi_bind_one().  It is not really necessary, however, because
> >>> acpi_bind_one() walks the entire physical_node_list of the given
> >>> device object for sanity checking anyway and if that list is always
> >>> sorted by node_id, it is straightforward to find the first gap
> >>> between the currently used node IDs and use that number as the ID
> >>> of the new list node.
> >>>
> >>> This also removes the artificial limit of the maximum number of
> >>> dependent physical devices per ACPI device object, which now depends
> >>> only on the capacity of unsigend int.
> >>>
> >>> Signed-off-by: Rafael J. Wysocki 
> >>
> >> I like the change. Much better :-)
> >>
> >> Acked-by: Toshi Kani 
> >
> > However, it introduces a bug in acpi_unbind_one(), because the size of the 
> > name
> > array in there has to be increased too.  Updated patch follows.
> >
> > Thanks,
> > Rafael
> >
> >
> > ---
> > From: Rafael J. Wysocki 
> > Subject: ACPI: Drop physical_node_id_bitmap from struct acpi_device
> >
> > The physical_node_id_bitmap in struct acpi_device is only used for
> > looking up the first currently unused dependent phyiscal node ID
> > by acpi_bind_one().  It is not really necessary, however, because
> > acpi_bind_one() walks the entire physical_node_list of the given
> > device object for sanity checking anyway and if that list is always
> > sorted by node_id, it is straightforward to find the first gap
> > between the currently used node IDs and use that number as the ID
> > of the new list node.
> >
> > This also removes the artificial limit of the maximum number of
> > dependent physical devices per ACPI device object, which now depends
> > only on the capacity of unsigend int.
> >
> > Signed-off-by: Rafael J. Wysocki 
> 
> Reviewed-by: Yasuaki Ishimatsu 
> Tested-by: Yasuaki Ishimatsu 
> 
> I confirmed that I can add and remove a memory device correctly.

Great, thanks for the confirmation!

Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Cannot hot remove a memory device (patch, updated)

2013-08-06 Thread Rafael J. Wysocki
On Tuesday, August 06, 2013 11:12:51 AM Yasuaki Ishimatsu wrote:
 (2013/08/06 9:15), Rafael J. Wysocki wrote:
  On Monday, August 05, 2013 05:19:56 PM Toshi Kani wrote:
  On Mon, 2013-08-05 at 15:14 +0200, Rafael J. Wysocki wrote:
 :
  Can you please test the appended patch?  I tested it somewhat, but since 
  the
  greatest number of physical nodes per ACPI device object I can get on my 
  test
  machines is 2 (and even that after hacking the kernel somewhat), that was 
  kind
  of unconclusive.
 
  Thanks,
  Rafael
 
 
  ---
  From: Rafael J. Wysocki rafael.j.wyso...@intel.com
  Subject: ACPI: Drop physical_node_id_bitmap from struct acpi_device
 
  The physical_node_id_bitmap in struct acpi_device is only used for
  looking up the first currently unused phyiscal dependent node ID
  by acpi_bind_one().  It is not really necessary, however, because
  acpi_bind_one() walks the entire physical_node_list of the given
  device object for sanity checking anyway and if that list is always
  sorted by node_id, it is straightforward to find the first gap
  between the currently used node IDs and use that number as the ID
  of the new list node.
 
  This also removes the artificial limit of the maximum number of
  dependent physical devices per ACPI device object, which now depends
  only on the capacity of unsigend int.
 
  Signed-off-by: Rafael J. Wysocki rafael.j.wyso...@intel.com
 
  I like the change. Much better :-)
 
  Acked-by: Toshi Kani toshi.k...@hp.com
 
  However, it introduces a bug in acpi_unbind_one(), because the size of the 
  name
  array in there has to be increased too.  Updated patch follows.
 
  Thanks,
  Rafael
 
 
  ---
  From: Rafael J. Wysocki rafael.j.wyso...@intel.com
  Subject: ACPI: Drop physical_node_id_bitmap from struct acpi_device
 
  The physical_node_id_bitmap in struct acpi_device is only used for
  looking up the first currently unused dependent phyiscal node ID
  by acpi_bind_one().  It is not really necessary, however, because
  acpi_bind_one() walks the entire physical_node_list of the given
  device object for sanity checking anyway and if that list is always
  sorted by node_id, it is straightforward to find the first gap
  between the currently used node IDs and use that number as the ID
  of the new list node.
 
  This also removes the artificial limit of the maximum number of
  dependent physical devices per ACPI device object, which now depends
  only on the capacity of unsigend int.
 
  Signed-off-by: Rafael J. Wysocki rafael.j.wyso...@intel.com
 
 Reviewed-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
 Tested-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
 
 I confirmed that I can add and remove a memory device correctly.

Great, thanks for the confirmation!

Rafael

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Cannot hot remove a memory device (patch, updated)

2013-08-06 Thread Toshi Kani
On Tue, 2013-08-06 at 02:15 +0200, Rafael J. Wysocki wrote:
 On Monday, August 05, 2013 05:19:56 PM Toshi Kani wrote:
  On Mon, 2013-08-05 at 15:14 +0200, Rafael J. Wysocki wrote:
:
   Can you please test the appended patch?  I tested it somewhat, but since 
   the
   greatest number of physical nodes per ACPI device object I can get on my 
   test
   machines is 2 (and even that after hacking the kernel somewhat), that was 
   kind
   of unconclusive.
   
   Thanks,
   Rafael
   
   
   ---
   From: Rafael J. Wysocki rafael.j.wyso...@intel.com
   Subject: ACPI: Drop physical_node_id_bitmap from struct acpi_device
   
   The physical_node_id_bitmap in struct acpi_device is only used for
   looking up the first currently unused phyiscal dependent node ID
   by acpi_bind_one().  It is not really necessary, however, because
   acpi_bind_one() walks the entire physical_node_list of the given
   device object for sanity checking anyway and if that list is always
   sorted by node_id, it is straightforward to find the first gap
   between the currently used node IDs and use that number as the ID
   of the new list node.
   
   This also removes the artificial limit of the maximum number of
   dependent physical devices per ACPI device object, which now depends
   only on the capacity of unsigend int.
   
   Signed-off-by: Rafael J. Wysocki rafael.j.wyso...@intel.com
  
  I like the change. Much better :-)
  
  Acked-by: Toshi Kani toshi.k...@hp.com
 
 However, it introduces a bug in acpi_unbind_one(), because the size of the 
 name
 array in there has to be increased too.  Updated patch follows.

Right.  Sorry, I overlooked this one.  Please apply my ask to this new
version. 

Thanks,
-Toshi

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Cannot hot remove a memory device (patch, updated)

2013-08-05 Thread Yasuaki Ishimatsu

(2013/08/06 9:15), Rafael J. Wysocki wrote:

On Monday, August 05, 2013 05:19:56 PM Toshi Kani wrote:

On Mon, 2013-08-05 at 15:14 +0200, Rafael J. Wysocki wrote:
   :

Can you please test the appended patch?  I tested it somewhat, but since the
greatest number of physical nodes per ACPI device object I can get on my test
machines is 2 (and even that after hacking the kernel somewhat), that was kind
of unconclusive.

Thanks,
Rafael


---
From: Rafael J. Wysocki 
Subject: ACPI: Drop physical_node_id_bitmap from struct acpi_device

The physical_node_id_bitmap in struct acpi_device is only used for
looking up the first currently unused phyiscal dependent node ID
by acpi_bind_one().  It is not really necessary, however, because
acpi_bind_one() walks the entire physical_node_list of the given
device object for sanity checking anyway and if that list is always
sorted by node_id, it is straightforward to find the first gap
between the currently used node IDs and use that number as the ID
of the new list node.

This also removes the artificial limit of the maximum number of
dependent physical devices per ACPI device object, which now depends
only on the capacity of unsigend int.

Signed-off-by: Rafael J. Wysocki 


I like the change. Much better :-)

Acked-by: Toshi Kani 


However, it introduces a bug in acpi_unbind_one(), because the size of the name
array in there has to be increased too.  Updated patch follows.

Thanks,
Rafael


---
From: Rafael J. Wysocki 
Subject: ACPI: Drop physical_node_id_bitmap from struct acpi_device

The physical_node_id_bitmap in struct acpi_device is only used for
looking up the first currently unused dependent phyiscal node ID
by acpi_bind_one().  It is not really necessary, however, because
acpi_bind_one() walks the entire physical_node_list of the given
device object for sanity checking anyway and if that list is always
sorted by node_id, it is straightforward to find the first gap
between the currently used node IDs and use that number as the ID
of the new list node.

This also removes the artificial limit of the maximum number of
dependent physical devices per ACPI device object, which now depends
only on the capacity of unsigend int.

Signed-off-by: Rafael J. Wysocki 


Reviewed-by: Yasuaki Ishimatsu 
Tested-by: Yasuaki Ishimatsu 

I confirmed that I can add and remove a memory device correctly.

Thanks,
Yasuaki Ishimatsu


---
  drivers/acpi/glue.c |   34 +++---
  include/acpi/acpi_bus.h |8 ++--
  2 files changed, 21 insertions(+), 21 deletions(-)

Index: linux-pm/drivers/acpi/glue.c
===
--- linux-pm.orig/drivers/acpi/glue.c
+++ linux-pm/drivers/acpi/glue.c
@@ -31,6 +31,7 @@ static LIST_HEAD(bus_type_list);
  static DECLARE_RWSEM(bus_type_sem);

  #define PHYSICAL_NODE_STRING "physical_node"
+#define PHYSICAL_NODE_NAME_SIZE (sizeof(PHYSICAL_NODE_STRING) + 10)

  int register_acpi_bus_type(struct acpi_bus_type *type)
  {
@@ -112,7 +113,9 @@ int acpi_bind_one(struct device *dev, ac
struct acpi_device *acpi_dev;
acpi_status status;
struct acpi_device_physical_node *physical_node, *pn;
-   char physical_node_name[sizeof(PHYSICAL_NODE_STRING) + 2];
+   char physical_node_name[PHYSICAL_NODE_NAME_SIZE];
+   struct list_head *physnode_list;
+   unsigned int node_id;
int retval = -EINVAL;

if (ACPI_HANDLE(dev)) {
@@ -139,8 +142,14 @@ int acpi_bind_one(struct device *dev, ac

mutex_lock(_dev->physical_node_lock);

-   /* Sanity check. */
-   list_for_each_entry(pn, _dev->physical_node_list, node)
+   /*
+* Keep the list sorted by node_id so that the IDs of removed nodes can
+* be recycled.
+*/
+   physnode_list = _dev->physical_node_list;
+   node_id = 0;
+   list_for_each_entry(pn, _dev->physical_node_list, node) {
+   /* Sanity check. */
if (pn->dev == dev) {
dev_warn(dev, "Already associated with ACPI node\n");
if (ACPI_HANDLE(dev) == handle)
@@ -148,19 +157,15 @@ int acpi_bind_one(struct device *dev, ac

goto out_free;
}
-
-   /* allocate physical node id according to physical_node_id_bitmap */
-   physical_node->node_id =
-   find_first_zero_bit(acpi_dev->physical_node_id_bitmap,
-   ACPI_MAX_PHYSICAL_NODE);
-   if (physical_node->node_id >= ACPI_MAX_PHYSICAL_NODE) {
-   retval = -ENOSPC;
-   goto out_free;
+   if (pn->node_id == node_id) {
+   physnode_list = >node;
+   node_id++;
+   }
}

-   set_bit(physical_node->node_id, acpi_dev->physical_node_id_bitmap);
+   physical_node->node_id = node_id;
physical_node->dev = dev;
-   list_add_tail(_node->node, _dev->physical_node_list);
+   

Re: Cannot hot remove a memory device (patch, updated)

2013-08-05 Thread Rafael J. Wysocki
On Monday, August 05, 2013 05:19:56 PM Toshi Kani wrote:
> On Mon, 2013-08-05 at 15:14 +0200, Rafael J. Wysocki wrote:
>   :
> > Can you please test the appended patch?  I tested it somewhat, but since the
> > greatest number of physical nodes per ACPI device object I can get on my 
> > test
> > machines is 2 (and even that after hacking the kernel somewhat), that was 
> > kind
> > of unconclusive.
> > 
> > Thanks,
> > Rafael
> > 
> > 
> > ---
> > From: Rafael J. Wysocki 
> > Subject: ACPI: Drop physical_node_id_bitmap from struct acpi_device
> > 
> > The physical_node_id_bitmap in struct acpi_device is only used for
> > looking up the first currently unused phyiscal dependent node ID
> > by acpi_bind_one().  It is not really necessary, however, because
> > acpi_bind_one() walks the entire physical_node_list of the given
> > device object for sanity checking anyway and if that list is always
> > sorted by node_id, it is straightforward to find the first gap
> > between the currently used node IDs and use that number as the ID
> > of the new list node.
> > 
> > This also removes the artificial limit of the maximum number of
> > dependent physical devices per ACPI device object, which now depends
> > only on the capacity of unsigend int.
> > 
> > Signed-off-by: Rafael J. Wysocki 
> 
> I like the change. Much better :-)
> 
> Acked-by: Toshi Kani 

However, it introduces a bug in acpi_unbind_one(), because the size of the name
array in there has to be increased too.  Updated patch follows.

Thanks,
Rafael


---
From: Rafael J. Wysocki 
Subject: ACPI: Drop physical_node_id_bitmap from struct acpi_device

The physical_node_id_bitmap in struct acpi_device is only used for
looking up the first currently unused dependent phyiscal node ID
by acpi_bind_one().  It is not really necessary, however, because
acpi_bind_one() walks the entire physical_node_list of the given
device object for sanity checking anyway and if that list is always
sorted by node_id, it is straightforward to find the first gap
between the currently used node IDs and use that number as the ID
of the new list node.

This also removes the artificial limit of the maximum number of
dependent physical devices per ACPI device object, which now depends
only on the capacity of unsigend int.

Signed-off-by: Rafael J. Wysocki 
---
 drivers/acpi/glue.c |   34 +++---
 include/acpi/acpi_bus.h |8 ++--
 2 files changed, 21 insertions(+), 21 deletions(-)

Index: linux-pm/drivers/acpi/glue.c
===
--- linux-pm.orig/drivers/acpi/glue.c
+++ linux-pm/drivers/acpi/glue.c
@@ -31,6 +31,7 @@ static LIST_HEAD(bus_type_list);
 static DECLARE_RWSEM(bus_type_sem);
 
 #define PHYSICAL_NODE_STRING "physical_node"
+#define PHYSICAL_NODE_NAME_SIZE (sizeof(PHYSICAL_NODE_STRING) + 10)
 
 int register_acpi_bus_type(struct acpi_bus_type *type)
 {
@@ -112,7 +113,9 @@ int acpi_bind_one(struct device *dev, ac
struct acpi_device *acpi_dev;
acpi_status status;
struct acpi_device_physical_node *physical_node, *pn;
-   char physical_node_name[sizeof(PHYSICAL_NODE_STRING) + 2];
+   char physical_node_name[PHYSICAL_NODE_NAME_SIZE];
+   struct list_head *physnode_list;
+   unsigned int node_id;
int retval = -EINVAL;
 
if (ACPI_HANDLE(dev)) {
@@ -139,8 +142,14 @@ int acpi_bind_one(struct device *dev, ac
 
mutex_lock(_dev->physical_node_lock);
 
-   /* Sanity check. */
-   list_for_each_entry(pn, _dev->physical_node_list, node)
+   /*
+* Keep the list sorted by node_id so that the IDs of removed nodes can
+* be recycled.
+*/
+   physnode_list = _dev->physical_node_list;
+   node_id = 0;
+   list_for_each_entry(pn, _dev->physical_node_list, node) {
+   /* Sanity check. */
if (pn->dev == dev) {
dev_warn(dev, "Already associated with ACPI node\n");
if (ACPI_HANDLE(dev) == handle)
@@ -148,19 +157,15 @@ int acpi_bind_one(struct device *dev, ac
 
goto out_free;
}
-
-   /* allocate physical node id according to physical_node_id_bitmap */
-   physical_node->node_id =
-   find_first_zero_bit(acpi_dev->physical_node_id_bitmap,
-   ACPI_MAX_PHYSICAL_NODE);
-   if (physical_node->node_id >= ACPI_MAX_PHYSICAL_NODE) {
-   retval = -ENOSPC;
-   goto out_free;
+   if (pn->node_id == node_id) {
+   physnode_list = >node;
+   node_id++;
+   }
}
 
-   set_bit(physical_node->node_id, acpi_dev->physical_node_id_bitmap);
+   physical_node->node_id = node_id;
physical_node->dev = dev;
-   list_add_tail(_node->node, _dev->physical_node_list);
+   list_add(_node->node, physnode_list);
acpi_dev->physical_node_count++;
 

Re: Cannot hot remove a memory device (patch)

2013-08-05 Thread Toshi Kani
On Mon, 2013-08-05 at 15:14 +0200, Rafael J. Wysocki wrote:
  :
> Can you please test the appended patch?  I tested it somewhat, but since the
> greatest number of physical nodes per ACPI device object I can get on my test
> machines is 2 (and even that after hacking the kernel somewhat), that was kind
> of unconclusive.
> 
> Thanks,
> Rafael
> 
> 
> ---
> From: Rafael J. Wysocki 
> Subject: ACPI: Drop physical_node_id_bitmap from struct acpi_device
> 
> The physical_node_id_bitmap in struct acpi_device is only used for
> looking up the first currently unused phyiscal dependent node ID
> by acpi_bind_one().  It is not really necessary, however, because
> acpi_bind_one() walks the entire physical_node_list of the given
> device object for sanity checking anyway and if that list is always
> sorted by node_id, it is straightforward to find the first gap
> between the currently used node IDs and use that number as the ID
> of the new list node.
> 
> This also removes the artificial limit of the maximum number of
> dependent physical devices per ACPI device object, which now depends
> only on the capacity of unsigend int.
> 
> Signed-off-by: Rafael J. Wysocki 

I like the change. Much better :-)

Acked-by: Toshi Kani 

Thanks,
-Toshi


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Cannot hot remove a memory device (patch)

2013-08-05 Thread Rafael J. Wysocki
On Monday, August 05, 2013 04:59:20 PM Yasuaki Ishimatsu wrote:
> (2013/08/05 13:00), Yasuaki Ishimatsu wrote:
> > (2013/08/04 9:37), Toshi Kani wrote:
> >> On Sat, 2013-08-03 at 03:01 +0200, Rafael J. Wysocki wrote:
> >>> On Friday, August 02, 2013 06:04:40 PM Toshi Kani wrote:
>  On Sat, 2013-08-03 at 01:43 +0200, Rafael J. Wysocki wrote:
> > On Friday, August 02, 2013 03:46:15 PM Toshi Kani wrote:
> >> On Thu, 2013-08-01 at 23:43 +0200, Rafael J. Wysocki wrote:
> >>> Hi,
> >>>
> >>> Thanks for your report.
> >>>
> >>> On Thursday, August 01, 2013 05:37:21 PM Yasuaki Ishimatsu wrote:
>  By following commit, I cannot hot remove a memory device.
> 
>  ACPI / memhotplug: Bind removable memory blocks to ACPI device nodes
>  commit e2ff39400d81233374e780b133496a2296643d7d
> 
>  Details are follows:
>  When I add a memory device, acpi_memory_enable_device() always fails
>  as follows:
> 
>  ...
>  [ 1271.114116]  [ea121c40-ea121c7f] PMD -> 
>  [880813c0-880813ff] on node 3
>  [ 1271.128682]  [ea121c80-ea121cbf] PMD -> 
>  [88081380-880813bf] on node 3
>  [ 1271.143298]  [ea121cc0-ea121cff] PMD -> 
>  [88081300-8808133f] on node 3
>  [ 1271.157799]  [ea121d00-ea121d3f] PMD -> 
>  [880812c0-880812ff] on node 3
>  [ 1271.172341]  [ea121d40-ea121d7f] PMD -> 
>  [88081280-880812bf] on node 3
>  [ 1271.186872]  [ea121d80-ea121dbf] PMD -> 
>  [88081240-8808127f] on node 3
>  [ 1271.201481]  [ea121dc0-ea121dff] PMD -> 
>  [88081200-8808123f] on node 3
>  [ 1271.216041]  [ea121e00-ea121e3f] PMD -> 
>  [880811c0-880811ff] on node 3
>  [ 1271.230623]  [ea121e40-ea121e7f] PMD -> 
>  [88081180-880811bf] on node 3
>  [ 1271.245148]  [ea121e80-ea121ebf] PMD -> 
>  [88081140-8808117f] on node 3
>  [ 1271.259683]  [ea121ec0-ea121eff] PMD -> 
>  [88081100-8808113f] on node 3
>  [ 1271.274194]  [ea121f00-ea121f3f] PMD -> 
>  [880810c0-880810ff] on node 3
>  [ 1271.288764]  [ea121f40-ea121f7f] PMD -> 
>  [88081080-880810bf] on node 3
> >>
> >> It appears that each memory object only has 64MB of memory.  This is
> >> less than the memory block size, which is 128MB.  This means that a
> >> single memory block associates with two ACPI memory device objects.
> >
> > That'd be bad.
> >
> > How did that work before if that indeed is the case?
> 
>  Well, it looks to me that it has never worked before...
> 
>  ...
>  [ 1271.325841] acpi PNP0C80:03: acpi_memory_enable_device() error
> >>>
> >>> Well, the only new way acpi_memory_enable_device() can fail after 
> >>> that commit
> >>> is a failure in acpi_bind_memory_blocks().
> >>
> >> Agreed.
> >>
> >>> This means that either handle is NULL, which I think we can exclude, 
> >>> because
> >>> acpi_memory_enable_device() wouldn't be called at all if that were 
> >>> the case, or
> >>> there's a more subtle error in acpi_bind_one().
> >>>
> >>> One that comes to mind is that we may be calling acpi_bind_one() 
> >>> twice for the
> >>> same memory region, in which it will trigger -EINVAL from the sanity 
> >>> check in
> >>> there.
> >>
> >> I think it fails with -EINVAL at the place with dev_warn(dev, "ACPI
> >> handle is already set\n").  When two ACPI memory objects associate with
> >> a same memory block, the bind procedure of the 2nd ACPI memory object
> >> sees that ACPI_HANDLE(dev) is already set to the 1st ACPI memory 
> >> object.
> >
> > That sound's plausible, but I wonder how we can fix that?
> >
> > There's no way for a single physical device to have two different ACPI
> > "companions".  It looks like the memory blocks should be 64 M each in 
> > that
> > case.  Or we need to create two child devices for each memory block and
> > associate each of them with an ACPI object.  That would lead to 
> > complications
> > in the user space interface, though.
> 
>  Right.  Even bigger issue is that I do not think __add_pages() and
>  __remove_pages() can add / delete a memory chunk that is less than
>  128MB.  128MB is the granularity of them.  So, we may just have to fail
>  this case gracefully.
> >>>
> >>> Sigh.
> >>>
> >>> BTW, why do you think 

Re: Cannot hot remove a memory device

2013-08-05 Thread Yasuaki Ishimatsu

(2013/08/05 13:00), Yasuaki Ishimatsu wrote:

(2013/08/04 9:37), Toshi Kani wrote:

On Sat, 2013-08-03 at 03:01 +0200, Rafael J. Wysocki wrote:

On Friday, August 02, 2013 06:04:40 PM Toshi Kani wrote:

On Sat, 2013-08-03 at 01:43 +0200, Rafael J. Wysocki wrote:

On Friday, August 02, 2013 03:46:15 PM Toshi Kani wrote:

On Thu, 2013-08-01 at 23:43 +0200, Rafael J. Wysocki wrote:

Hi,

Thanks for your report.

On Thursday, August 01, 2013 05:37:21 PM Yasuaki Ishimatsu wrote:

By following commit, I cannot hot remove a memory device.

ACPI / memhotplug: Bind removable memory blocks to ACPI device nodes
commit e2ff39400d81233374e780b133496a2296643d7d

Details are follows:
When I add a memory device, acpi_memory_enable_device() always fails
as follows:

...
[ 1271.114116]  [ea121c40-ea121c7f] PMD -> 
[880813c0-880813ff] on node 3
[ 1271.128682]  [ea121c80-ea121cbf] PMD -> 
[88081380-880813bf] on node 3
[ 1271.143298]  [ea121cc0-ea121cff] PMD -> 
[88081300-8808133f] on node 3
[ 1271.157799]  [ea121d00-ea121d3f] PMD -> 
[880812c0-880812ff] on node 3
[ 1271.172341]  [ea121d40-ea121d7f] PMD -> 
[88081280-880812bf] on node 3
[ 1271.186872]  [ea121d80-ea121dbf] PMD -> 
[88081240-8808127f] on node 3
[ 1271.201481]  [ea121dc0-ea121dff] PMD -> 
[88081200-8808123f] on node 3
[ 1271.216041]  [ea121e00-ea121e3f] PMD -> 
[880811c0-880811ff] on node 3
[ 1271.230623]  [ea121e40-ea121e7f] PMD -> 
[88081180-880811bf] on node 3
[ 1271.245148]  [ea121e80-ea121ebf] PMD -> 
[88081140-8808117f] on node 3
[ 1271.259683]  [ea121ec0-ea121eff] PMD -> 
[88081100-8808113f] on node 3
[ 1271.274194]  [ea121f00-ea121f3f] PMD -> 
[880810c0-880810ff] on node 3
[ 1271.288764]  [ea121f40-ea121f7f] PMD -> 
[88081080-880810bf] on node 3


It appears that each memory object only has 64MB of memory.  This is
less than the memory block size, which is 128MB.  This means that a
single memory block associates with two ACPI memory device objects.


That'd be bad.

How did that work before if that indeed is the case?


Well, it looks to me that it has never worked before...


...
[ 1271.325841] acpi PNP0C80:03: acpi_memory_enable_device() error


Well, the only new way acpi_memory_enable_device() can fail after that commit
is a failure in acpi_bind_memory_blocks().


Agreed.


This means that either handle is NULL, which I think we can exclude, because
acpi_memory_enable_device() wouldn't be called at all if that were the case, or
there's a more subtle error in acpi_bind_one().

One that comes to mind is that we may be calling acpi_bind_one() twice for the
same memory region, in which it will trigger -EINVAL from the sanity check in
there.


I think it fails with -EINVAL at the place with dev_warn(dev, "ACPI
handle is already set\n").  When two ACPI memory objects associate with
a same memory block, the bind procedure of the 2nd ACPI memory object
sees that ACPI_HANDLE(dev) is already set to the 1st ACPI memory object.


That sound's plausible, but I wonder how we can fix that?

There's no way for a single physical device to have two different ACPI
"companions".  It looks like the memory blocks should be 64 M each in that
case.  Or we need to create two child devices for each memory block and
associate each of them with an ACPI object.  That would lead to complications
in the user space interface, though.


Right.  Even bigger issue is that I do not think __add_pages() and
__remove_pages() can add / delete a memory chunk that is less than
128MB.  128MB is the granularity of them.  So, we may just have to fail
this case gracefully.


Sigh.

BTW, why do you think they are 64 M each (it's late and I'm obviously tired)?


Oops!  Sorry, I had confused the above messages with the one in
init_memory_mapping(), which shows a memory range being added, i.e. the
size of an ACPI memory device object.  But the above messages actually
came from vmemmap_populate_hugepages(), which was called during boot-up.
So, these messages have nothing to do with ACPI memory device objects.
And even worse, I do not seem to be able to count a number of zeros...
In the above messages, each memory range is 4MB (0x40), not 64MB
(0x400)...  My bad. :-(

So, while we may still need to do something for the less-than-128MB
issue, Yasuaki may be hitting a different one.  Let's wait for Yasuaki
to give us more info.


acpi_bind_memory_blocks() failed with -ENOSPC.

int acpi_bind_one(struct device *dev, acpi_handle handle)
{
...
 /* allocate physical node id according to physical_node_id_bitmap */
 physical_node->node_id =
 

Re: Cannot hot remove a memory device

2013-08-05 Thread Yasuaki Ishimatsu

(2013/08/05 13:00), Yasuaki Ishimatsu wrote:

(2013/08/04 9:37), Toshi Kani wrote:

On Sat, 2013-08-03 at 03:01 +0200, Rafael J. Wysocki wrote:

On Friday, August 02, 2013 06:04:40 PM Toshi Kani wrote:

On Sat, 2013-08-03 at 01:43 +0200, Rafael J. Wysocki wrote:

On Friday, August 02, 2013 03:46:15 PM Toshi Kani wrote:

On Thu, 2013-08-01 at 23:43 +0200, Rafael J. Wysocki wrote:

Hi,

Thanks for your report.

On Thursday, August 01, 2013 05:37:21 PM Yasuaki Ishimatsu wrote:

By following commit, I cannot hot remove a memory device.

ACPI / memhotplug: Bind removable memory blocks to ACPI device nodes
commit e2ff39400d81233374e780b133496a2296643d7d

Details are follows:
When I add a memory device, acpi_memory_enable_device() always fails
as follows:

...
[ 1271.114116]  [ea121c40-ea121c7f] PMD - 
[880813c0-880813ff] on node 3
[ 1271.128682]  [ea121c80-ea121cbf] PMD - 
[88081380-880813bf] on node 3
[ 1271.143298]  [ea121cc0-ea121cff] PMD - 
[88081300-8808133f] on node 3
[ 1271.157799]  [ea121d00-ea121d3f] PMD - 
[880812c0-880812ff] on node 3
[ 1271.172341]  [ea121d40-ea121d7f] PMD - 
[88081280-880812bf] on node 3
[ 1271.186872]  [ea121d80-ea121dbf] PMD - 
[88081240-8808127f] on node 3
[ 1271.201481]  [ea121dc0-ea121dff] PMD - 
[88081200-8808123f] on node 3
[ 1271.216041]  [ea121e00-ea121e3f] PMD - 
[880811c0-880811ff] on node 3
[ 1271.230623]  [ea121e40-ea121e7f] PMD - 
[88081180-880811bf] on node 3
[ 1271.245148]  [ea121e80-ea121ebf] PMD - 
[88081140-8808117f] on node 3
[ 1271.259683]  [ea121ec0-ea121eff] PMD - 
[88081100-8808113f] on node 3
[ 1271.274194]  [ea121f00-ea121f3f] PMD - 
[880810c0-880810ff] on node 3
[ 1271.288764]  [ea121f40-ea121f7f] PMD - 
[88081080-880810bf] on node 3


It appears that each memory object only has 64MB of memory.  This is
less than the memory block size, which is 128MB.  This means that a
single memory block associates with two ACPI memory device objects.


That'd be bad.

How did that work before if that indeed is the case?


Well, it looks to me that it has never worked before...


...
[ 1271.325841] acpi PNP0C80:03: acpi_memory_enable_device() error


Well, the only new way acpi_memory_enable_device() can fail after that commit
is a failure in acpi_bind_memory_blocks().


Agreed.


This means that either handle is NULL, which I think we can exclude, because
acpi_memory_enable_device() wouldn't be called at all if that were the case, or
there's a more subtle error in acpi_bind_one().

One that comes to mind is that we may be calling acpi_bind_one() twice for the
same memory region, in which it will trigger -EINVAL from the sanity check in
there.


I think it fails with -EINVAL at the place with dev_warn(dev, ACPI
handle is already set\n).  When two ACPI memory objects associate with
a same memory block, the bind procedure of the 2nd ACPI memory object
sees that ACPI_HANDLE(dev) is already set to the 1st ACPI memory object.


That sound's plausible, but I wonder how we can fix that?

There's no way for a single physical device to have two different ACPI
companions.  It looks like the memory blocks should be 64 M each in that
case.  Or we need to create two child devices for each memory block and
associate each of them with an ACPI object.  That would lead to complications
in the user space interface, though.


Right.  Even bigger issue is that I do not think __add_pages() and
__remove_pages() can add / delete a memory chunk that is less than
128MB.  128MB is the granularity of them.  So, we may just have to fail
this case gracefully.


Sigh.

BTW, why do you think they are 64 M each (it's late and I'm obviously tired)?


Oops!  Sorry, I had confused the above messages with the one in
init_memory_mapping(), which shows a memory range being added, i.e. the
size of an ACPI memory device object.  But the above messages actually
came from vmemmap_populate_hugepages(), which was called during boot-up.
So, these messages have nothing to do with ACPI memory device objects.
And even worse, I do not seem to be able to count a number of zeros...
In the above messages, each memory range is 4MB (0x40), not 64MB
(0x400)...  My bad. :-(

So, while we may still need to do something for the less-than-128MB
issue, Yasuaki may be hitting a different one.  Let's wait for Yasuaki
to give us more info.


acpi_bind_memory_blocks() failed with -ENOSPC.

int acpi_bind_one(struct device *dev, acpi_handle handle)
{
...
 /* allocate physical node id according to physical_node_id_bitmap */
 physical_node-node_id =
 find_first_zero_bit(acpi_dev-physical_node_id_bitmap,
 

Re: Cannot hot remove a memory device (patch)

2013-08-05 Thread Rafael J. Wysocki
On Monday, August 05, 2013 04:59:20 PM Yasuaki Ishimatsu wrote:
 (2013/08/05 13:00), Yasuaki Ishimatsu wrote:
  (2013/08/04 9:37), Toshi Kani wrote:
  On Sat, 2013-08-03 at 03:01 +0200, Rafael J. Wysocki wrote:
  On Friday, August 02, 2013 06:04:40 PM Toshi Kani wrote:
  On Sat, 2013-08-03 at 01:43 +0200, Rafael J. Wysocki wrote:
  On Friday, August 02, 2013 03:46:15 PM Toshi Kani wrote:
  On Thu, 2013-08-01 at 23:43 +0200, Rafael J. Wysocki wrote:
  Hi,
 
  Thanks for your report.
 
  On Thursday, August 01, 2013 05:37:21 PM Yasuaki Ishimatsu wrote:
  By following commit, I cannot hot remove a memory device.
 
  ACPI / memhotplug: Bind removable memory blocks to ACPI device nodes
  commit e2ff39400d81233374e780b133496a2296643d7d
 
  Details are follows:
  When I add a memory device, acpi_memory_enable_device() always fails
  as follows:
 
  ...
  [ 1271.114116]  [ea121c40-ea121c7f] PMD - 
  [880813c0-880813ff] on node 3
  [ 1271.128682]  [ea121c80-ea121cbf] PMD - 
  [88081380-880813bf] on node 3
  [ 1271.143298]  [ea121cc0-ea121cff] PMD - 
  [88081300-8808133f] on node 3
  [ 1271.157799]  [ea121d00-ea121d3f] PMD - 
  [880812c0-880812ff] on node 3
  [ 1271.172341]  [ea121d40-ea121d7f] PMD - 
  [88081280-880812bf] on node 3
  [ 1271.186872]  [ea121d80-ea121dbf] PMD - 
  [88081240-8808127f] on node 3
  [ 1271.201481]  [ea121dc0-ea121dff] PMD - 
  [88081200-8808123f] on node 3
  [ 1271.216041]  [ea121e00-ea121e3f] PMD - 
  [880811c0-880811ff] on node 3
  [ 1271.230623]  [ea121e40-ea121e7f] PMD - 
  [88081180-880811bf] on node 3
  [ 1271.245148]  [ea121e80-ea121ebf] PMD - 
  [88081140-8808117f] on node 3
  [ 1271.259683]  [ea121ec0-ea121eff] PMD - 
  [88081100-8808113f] on node 3
  [ 1271.274194]  [ea121f00-ea121f3f] PMD - 
  [880810c0-880810ff] on node 3
  [ 1271.288764]  [ea121f40-ea121f7f] PMD - 
  [88081080-880810bf] on node 3
 
  It appears that each memory object only has 64MB of memory.  This is
  less than the memory block size, which is 128MB.  This means that a
  single memory block associates with two ACPI memory device objects.
 
  That'd be bad.
 
  How did that work before if that indeed is the case?
 
  Well, it looks to me that it has never worked before...
 
  ...
  [ 1271.325841] acpi PNP0C80:03: acpi_memory_enable_device() error
 
  Well, the only new way acpi_memory_enable_device() can fail after 
  that commit
  is a failure in acpi_bind_memory_blocks().
 
  Agreed.
 
  This means that either handle is NULL, which I think we can exclude, 
  because
  acpi_memory_enable_device() wouldn't be called at all if that were 
  the case, or
  there's a more subtle error in acpi_bind_one().
 
  One that comes to mind is that we may be calling acpi_bind_one() 
  twice for the
  same memory region, in which it will trigger -EINVAL from the sanity 
  check in
  there.
 
  I think it fails with -EINVAL at the place with dev_warn(dev, ACPI
  handle is already set\n).  When two ACPI memory objects associate with
  a same memory block, the bind procedure of the 2nd ACPI memory object
  sees that ACPI_HANDLE(dev) is already set to the 1st ACPI memory 
  object.
 
  That sound's plausible, but I wonder how we can fix that?
 
  There's no way for a single physical device to have two different ACPI
  companions.  It looks like the memory blocks should be 64 M each in 
  that
  case.  Or we need to create two child devices for each memory block and
  associate each of them with an ACPI object.  That would lead to 
  complications
  in the user space interface, though.
 
  Right.  Even bigger issue is that I do not think __add_pages() and
  __remove_pages() can add / delete a memory chunk that is less than
  128MB.  128MB is the granularity of them.  So, we may just have to fail
  this case gracefully.
 
  Sigh.
 
  BTW, why do you think they are 64 M each (it's late and I'm obviously 
  tired)?
 
  Oops!  Sorry, I had confused the above messages with the one in
  init_memory_mapping(), which shows a memory range being added, i.e. the
  size of an ACPI memory device object.  But the above messages actually
  came from vmemmap_populate_hugepages(), which was called during boot-up.
  So, these messages have nothing to do with ACPI memory device objects.
  And even worse, I do not seem to be able to count a number of zeros...
  In the above messages, each memory range is 4MB (0x40), not 64MB
  (0x400)...  My bad. :-(
 
  So, while we may still need to do something for the less-than-128MB
  issue, Yasuaki may be hitting a different one.  Let's wait for Yasuaki
  to give us more info.
 
  

Re: Cannot hot remove a memory device (patch)

2013-08-05 Thread Toshi Kani
On Mon, 2013-08-05 at 15:14 +0200, Rafael J. Wysocki wrote:
  :
 Can you please test the appended patch?  I tested it somewhat, but since the
 greatest number of physical nodes per ACPI device object I can get on my test
 machines is 2 (and even that after hacking the kernel somewhat), that was kind
 of unconclusive.
 
 Thanks,
 Rafael
 
 
 ---
 From: Rafael J. Wysocki rafael.j.wyso...@intel.com
 Subject: ACPI: Drop physical_node_id_bitmap from struct acpi_device
 
 The physical_node_id_bitmap in struct acpi_device is only used for
 looking up the first currently unused phyiscal dependent node ID
 by acpi_bind_one().  It is not really necessary, however, because
 acpi_bind_one() walks the entire physical_node_list of the given
 device object for sanity checking anyway and if that list is always
 sorted by node_id, it is straightforward to find the first gap
 between the currently used node IDs and use that number as the ID
 of the new list node.
 
 This also removes the artificial limit of the maximum number of
 dependent physical devices per ACPI device object, which now depends
 only on the capacity of unsigend int.
 
 Signed-off-by: Rafael J. Wysocki rafael.j.wyso...@intel.com

I like the change. Much better :-)

Acked-by: Toshi Kani toshi.k...@hp.com

Thanks,
-Toshi


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Cannot hot remove a memory device (patch, updated)

2013-08-05 Thread Rafael J. Wysocki
On Monday, August 05, 2013 05:19:56 PM Toshi Kani wrote:
 On Mon, 2013-08-05 at 15:14 +0200, Rafael J. Wysocki wrote:
   :
  Can you please test the appended patch?  I tested it somewhat, but since the
  greatest number of physical nodes per ACPI device object I can get on my 
  test
  machines is 2 (and even that after hacking the kernel somewhat), that was 
  kind
  of unconclusive.
  
  Thanks,
  Rafael
  
  
  ---
  From: Rafael J. Wysocki rafael.j.wyso...@intel.com
  Subject: ACPI: Drop physical_node_id_bitmap from struct acpi_device
  
  The physical_node_id_bitmap in struct acpi_device is only used for
  looking up the first currently unused phyiscal dependent node ID
  by acpi_bind_one().  It is not really necessary, however, because
  acpi_bind_one() walks the entire physical_node_list of the given
  device object for sanity checking anyway and if that list is always
  sorted by node_id, it is straightforward to find the first gap
  between the currently used node IDs and use that number as the ID
  of the new list node.
  
  This also removes the artificial limit of the maximum number of
  dependent physical devices per ACPI device object, which now depends
  only on the capacity of unsigend int.
  
  Signed-off-by: Rafael J. Wysocki rafael.j.wyso...@intel.com
 
 I like the change. Much better :-)
 
 Acked-by: Toshi Kani toshi.k...@hp.com

However, it introduces a bug in acpi_unbind_one(), because the size of the name
array in there has to be increased too.  Updated patch follows.

Thanks,
Rafael


---
From: Rafael J. Wysocki rafael.j.wyso...@intel.com
Subject: ACPI: Drop physical_node_id_bitmap from struct acpi_device

The physical_node_id_bitmap in struct acpi_device is only used for
looking up the first currently unused dependent phyiscal node ID
by acpi_bind_one().  It is not really necessary, however, because
acpi_bind_one() walks the entire physical_node_list of the given
device object for sanity checking anyway and if that list is always
sorted by node_id, it is straightforward to find the first gap
between the currently used node IDs and use that number as the ID
of the new list node.

This also removes the artificial limit of the maximum number of
dependent physical devices per ACPI device object, which now depends
only on the capacity of unsigend int.

Signed-off-by: Rafael J. Wysocki rafael.j.wyso...@intel.com
---
 drivers/acpi/glue.c |   34 +++---
 include/acpi/acpi_bus.h |8 ++--
 2 files changed, 21 insertions(+), 21 deletions(-)

Index: linux-pm/drivers/acpi/glue.c
===
--- linux-pm.orig/drivers/acpi/glue.c
+++ linux-pm/drivers/acpi/glue.c
@@ -31,6 +31,7 @@ static LIST_HEAD(bus_type_list);
 static DECLARE_RWSEM(bus_type_sem);
 
 #define PHYSICAL_NODE_STRING physical_node
+#define PHYSICAL_NODE_NAME_SIZE (sizeof(PHYSICAL_NODE_STRING) + 10)
 
 int register_acpi_bus_type(struct acpi_bus_type *type)
 {
@@ -112,7 +113,9 @@ int acpi_bind_one(struct device *dev, ac
struct acpi_device *acpi_dev;
acpi_status status;
struct acpi_device_physical_node *physical_node, *pn;
-   char physical_node_name[sizeof(PHYSICAL_NODE_STRING) + 2];
+   char physical_node_name[PHYSICAL_NODE_NAME_SIZE];
+   struct list_head *physnode_list;
+   unsigned int node_id;
int retval = -EINVAL;
 
if (ACPI_HANDLE(dev)) {
@@ -139,8 +142,14 @@ int acpi_bind_one(struct device *dev, ac
 
mutex_lock(acpi_dev-physical_node_lock);
 
-   /* Sanity check. */
-   list_for_each_entry(pn, acpi_dev-physical_node_list, node)
+   /*
+* Keep the list sorted by node_id so that the IDs of removed nodes can
+* be recycled.
+*/
+   physnode_list = acpi_dev-physical_node_list;
+   node_id = 0;
+   list_for_each_entry(pn, acpi_dev-physical_node_list, node) {
+   /* Sanity check. */
if (pn-dev == dev) {
dev_warn(dev, Already associated with ACPI node\n);
if (ACPI_HANDLE(dev) == handle)
@@ -148,19 +157,15 @@ int acpi_bind_one(struct device *dev, ac
 
goto out_free;
}
-
-   /* allocate physical node id according to physical_node_id_bitmap */
-   physical_node-node_id =
-   find_first_zero_bit(acpi_dev-physical_node_id_bitmap,
-   ACPI_MAX_PHYSICAL_NODE);
-   if (physical_node-node_id = ACPI_MAX_PHYSICAL_NODE) {
-   retval = -ENOSPC;
-   goto out_free;
+   if (pn-node_id == node_id) {
+   physnode_list = pn-node;
+   node_id++;
+   }
}
 
-   set_bit(physical_node-node_id, acpi_dev-physical_node_id_bitmap);
+   physical_node-node_id = node_id;
physical_node-dev = dev;
-   list_add_tail(physical_node-node, acpi_dev-physical_node_list);
+   list_add(physical_node-node, 

Re: Cannot hot remove a memory device (patch, updated)

2013-08-05 Thread Yasuaki Ishimatsu

(2013/08/06 9:15), Rafael J. Wysocki wrote:

On Monday, August 05, 2013 05:19:56 PM Toshi Kani wrote:

On Mon, 2013-08-05 at 15:14 +0200, Rafael J. Wysocki wrote:
   :

Can you please test the appended patch?  I tested it somewhat, but since the
greatest number of physical nodes per ACPI device object I can get on my test
machines is 2 (and even that after hacking the kernel somewhat), that was kind
of unconclusive.

Thanks,
Rafael


---
From: Rafael J. Wysocki rafael.j.wyso...@intel.com
Subject: ACPI: Drop physical_node_id_bitmap from struct acpi_device

The physical_node_id_bitmap in struct acpi_device is only used for
looking up the first currently unused phyiscal dependent node ID
by acpi_bind_one().  It is not really necessary, however, because
acpi_bind_one() walks the entire physical_node_list of the given
device object for sanity checking anyway and if that list is always
sorted by node_id, it is straightforward to find the first gap
between the currently used node IDs and use that number as the ID
of the new list node.

This also removes the artificial limit of the maximum number of
dependent physical devices per ACPI device object, which now depends
only on the capacity of unsigend int.

Signed-off-by: Rafael J. Wysocki rafael.j.wyso...@intel.com


I like the change. Much better :-)

Acked-by: Toshi Kani toshi.k...@hp.com


However, it introduces a bug in acpi_unbind_one(), because the size of the name
array in there has to be increased too.  Updated patch follows.

Thanks,
Rafael


---
From: Rafael J. Wysocki rafael.j.wyso...@intel.com
Subject: ACPI: Drop physical_node_id_bitmap from struct acpi_device

The physical_node_id_bitmap in struct acpi_device is only used for
looking up the first currently unused dependent phyiscal node ID
by acpi_bind_one().  It is not really necessary, however, because
acpi_bind_one() walks the entire physical_node_list of the given
device object for sanity checking anyway and if that list is always
sorted by node_id, it is straightforward to find the first gap
between the currently used node IDs and use that number as the ID
of the new list node.

This also removes the artificial limit of the maximum number of
dependent physical devices per ACPI device object, which now depends
only on the capacity of unsigend int.

Signed-off-by: Rafael J. Wysocki rafael.j.wyso...@intel.com


Reviewed-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
Tested-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

I confirmed that I can add and remove a memory device correctly.

Thanks,
Yasuaki Ishimatsu


---
  drivers/acpi/glue.c |   34 +++---
  include/acpi/acpi_bus.h |8 ++--
  2 files changed, 21 insertions(+), 21 deletions(-)

Index: linux-pm/drivers/acpi/glue.c
===
--- linux-pm.orig/drivers/acpi/glue.c
+++ linux-pm/drivers/acpi/glue.c
@@ -31,6 +31,7 @@ static LIST_HEAD(bus_type_list);
  static DECLARE_RWSEM(bus_type_sem);

  #define PHYSICAL_NODE_STRING physical_node
+#define PHYSICAL_NODE_NAME_SIZE (sizeof(PHYSICAL_NODE_STRING) + 10)

  int register_acpi_bus_type(struct acpi_bus_type *type)
  {
@@ -112,7 +113,9 @@ int acpi_bind_one(struct device *dev, ac
struct acpi_device *acpi_dev;
acpi_status status;
struct acpi_device_physical_node *physical_node, *pn;
-   char physical_node_name[sizeof(PHYSICAL_NODE_STRING) + 2];
+   char physical_node_name[PHYSICAL_NODE_NAME_SIZE];
+   struct list_head *physnode_list;
+   unsigned int node_id;
int retval = -EINVAL;

if (ACPI_HANDLE(dev)) {
@@ -139,8 +142,14 @@ int acpi_bind_one(struct device *dev, ac

mutex_lock(acpi_dev-physical_node_lock);

-   /* Sanity check. */
-   list_for_each_entry(pn, acpi_dev-physical_node_list, node)
+   /*
+* Keep the list sorted by node_id so that the IDs of removed nodes can
+* be recycled.
+*/
+   physnode_list = acpi_dev-physical_node_list;
+   node_id = 0;
+   list_for_each_entry(pn, acpi_dev-physical_node_list, node) {
+   /* Sanity check. */
if (pn-dev == dev) {
dev_warn(dev, Already associated with ACPI node\n);
if (ACPI_HANDLE(dev) == handle)
@@ -148,19 +157,15 @@ int acpi_bind_one(struct device *dev, ac

goto out_free;
}
-
-   /* allocate physical node id according to physical_node_id_bitmap */
-   physical_node-node_id =
-   find_first_zero_bit(acpi_dev-physical_node_id_bitmap,
-   ACPI_MAX_PHYSICAL_NODE);
-   if (physical_node-node_id = ACPI_MAX_PHYSICAL_NODE) {
-   retval = -ENOSPC;
-   goto out_free;
+   if (pn-node_id == node_id) {
+   physnode_list = pn-node;
+   node_id++;
+   }
}

-   set_bit(physical_node-node_id, 

Re: Cannot hot remove a memory device

2013-08-04 Thread Yasuaki Ishimatsu

(2013/08/04 9:37), Toshi Kani wrote:

On Sat, 2013-08-03 at 03:01 +0200, Rafael J. Wysocki wrote:

On Friday, August 02, 2013 06:04:40 PM Toshi Kani wrote:

On Sat, 2013-08-03 at 01:43 +0200, Rafael J. Wysocki wrote:

On Friday, August 02, 2013 03:46:15 PM Toshi Kani wrote:

On Thu, 2013-08-01 at 23:43 +0200, Rafael J. Wysocki wrote:

Hi,

Thanks for your report.

On Thursday, August 01, 2013 05:37:21 PM Yasuaki Ishimatsu wrote:

By following commit, I cannot hot remove a memory device.

ACPI / memhotplug: Bind removable memory blocks to ACPI device nodes
commit e2ff39400d81233374e780b133496a2296643d7d

Details are follows:
When I add a memory device, acpi_memory_enable_device() always fails
as follows:

...
[ 1271.114116]  [ea121c40-ea121c7f] PMD -> 
[880813c0-880813ff] on node 3
[ 1271.128682]  [ea121c80-ea121cbf] PMD -> 
[88081380-880813bf] on node 3
[ 1271.143298]  [ea121cc0-ea121cff] PMD -> 
[88081300-8808133f] on node 3
[ 1271.157799]  [ea121d00-ea121d3f] PMD -> 
[880812c0-880812ff] on node 3
[ 1271.172341]  [ea121d40-ea121d7f] PMD -> 
[88081280-880812bf] on node 3
[ 1271.186872]  [ea121d80-ea121dbf] PMD -> 
[88081240-8808127f] on node 3
[ 1271.201481]  [ea121dc0-ea121dff] PMD -> 
[88081200-8808123f] on node 3
[ 1271.216041]  [ea121e00-ea121e3f] PMD -> 
[880811c0-880811ff] on node 3
[ 1271.230623]  [ea121e40-ea121e7f] PMD -> 
[88081180-880811bf] on node 3
[ 1271.245148]  [ea121e80-ea121ebf] PMD -> 
[88081140-8808117f] on node 3
[ 1271.259683]  [ea121ec0-ea121eff] PMD -> 
[88081100-8808113f] on node 3
[ 1271.274194]  [ea121f00-ea121f3f] PMD -> 
[880810c0-880810ff] on node 3
[ 1271.288764]  [ea121f40-ea121f7f] PMD -> 
[88081080-880810bf] on node 3


It appears that each memory object only has 64MB of memory.  This is
less than the memory block size, which is 128MB.  This means that a
single memory block associates with two ACPI memory device objects.


That'd be bad.

How did that work before if that indeed is the case?


Well, it looks to me that it has never worked before...


... 
[ 1271.325841] acpi PNP0C80:03: acpi_memory_enable_device() error


Well, the only new way acpi_memory_enable_device() can fail after that commit
is a failure in acpi_bind_memory_blocks().


Agreed.


This means that either handle is NULL, which I think we can exclude, because
acpi_memory_enable_device() wouldn't be called at all if that were the case, or
there's a more subtle error in acpi_bind_one().

One that comes to mind is that we may be calling acpi_bind_one() twice for the
same memory region, in which it will trigger -EINVAL from the sanity check in
there.


I think it fails with -EINVAL at the place with dev_warn(dev, "ACPI
handle is already set\n").  When two ACPI memory objects associate with
a same memory block, the bind procedure of the 2nd ACPI memory object
sees that ACPI_HANDLE(dev) is already set to the 1st ACPI memory object.


That sound's plausible, but I wonder how we can fix that?

There's no way for a single physical device to have two different ACPI
"companions".  It looks like the memory blocks should be 64 M each in that
case.  Or we need to create two child devices for each memory block and
associate each of them with an ACPI object.  That would lead to complications
in the user space interface, though.


Right.  Even bigger issue is that I do not think __add_pages() and
__remove_pages() can add / delete a memory chunk that is less than
128MB.  128MB is the granularity of them.  So, we may just have to fail
this case gracefully.


Sigh.

BTW, why do you think they are 64 M each (it's late and I'm obviously tired)?


Oops!  Sorry, I had confused the above messages with the one in
init_memory_mapping(), which shows a memory range being added, i.e. the
size of an ACPI memory device object.  But the above messages actually
came from vmemmap_populate_hugepages(), which was called during boot-up.
So, these messages have nothing to do with ACPI memory device objects.
And even worse, I do not seem to be able to count a number of zeros...
In the above messages, each memory range is 4MB (0x40), not 64MB
(0x400)...  My bad. :-(

So, while we may still need to do something for the less-than-128MB
issue, Yasuaki may be hitting a different one.  Let's wait for Yasuaki
to give us more info.


acpi_bind_memory_blocks() failed with -ENOSPC.

int acpi_bind_one(struct device *dev, acpi_handle handle)
{
...
/* allocate physical node id according to physical_node_id_bitmap */
physical_node->node_id =
find_first_zero_bit(acpi_dev->physical_node_id_bitmap,
  

Re: Cannot hot remove a memory device

2013-08-04 Thread Rafael J. Wysocki
On Saturday, August 03, 2013 06:37:26 PM Toshi Kani wrote:
> On Sat, 2013-08-03 at 03:01 +0200, Rafael J. Wysocki wrote:
> > On Friday, August 02, 2013 06:04:40 PM Toshi Kani wrote:
> > > On Sat, 2013-08-03 at 01:43 +0200, Rafael J. Wysocki wrote:
> > > > On Friday, August 02, 2013 03:46:15 PM Toshi Kani wrote:
> > > > > On Thu, 2013-08-01 at 23:43 +0200, Rafael J. Wysocki wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > Thanks for your report.
> > > > > > 
> > > > > > On Thursday, August 01, 2013 05:37:21 PM Yasuaki Ishimatsu wrote:
> > > > > > > By following commit, I cannot hot remove a memory device.
> > > > > > > 
> > > > > > > ACPI / memhotplug: Bind removable memory blocks to ACPI device 
> > > > > > > nodes
> > > > > > > commit e2ff39400d81233374e780b133496a2296643d7d
> > > > > > > 
> > > > > > > Details are follows:
> > > > > > > When I add a memory device, acpi_memory_enable_device() always 
> > > > > > > fails
> > > > > > > as follows:
> > > > > > > 
> > > > > > > ...
> > > > > > > [ 1271.114116]  [ea121c40-ea121c7f] PMD -> 
> > > > > > > [880813c0-880813ff] on node 3
> > > > > > > [ 1271.128682]  [ea121c80-ea121cbf] PMD -> 
> > > > > > > [88081380-880813bf] on node 3
> > > > > > > [ 1271.143298]  [ea121cc0-ea121cff] PMD -> 
> > > > > > > [88081300-8808133f] on node 3
> > > > > > > [ 1271.157799]  [ea121d00-ea121d3f] PMD -> 
> > > > > > > [880812c0-880812ff] on node 3
> > > > > > > [ 1271.172341]  [ea121d40-ea121d7f] PMD -> 
> > > > > > > [88081280-880812bf] on node 3
> > > > > > > [ 1271.186872]  [ea121d80-ea121dbf] PMD -> 
> > > > > > > [88081240-8808127f] on node 3
> > > > > > > [ 1271.201481]  [ea121dc0-ea121dff] PMD -> 
> > > > > > > [88081200-8808123f] on node 3
> > > > > > > [ 1271.216041]  [ea121e00-ea121e3f] PMD -> 
> > > > > > > [880811c0-880811ff] on node 3
> > > > > > > [ 1271.230623]  [ea121e40-ea121e7f] PMD -> 
> > > > > > > [88081180-880811bf] on node 3
> > > > > > > [ 1271.245148]  [ea121e80-ea121ebf] PMD -> 
> > > > > > > [88081140-8808117f] on node 3
> > > > > > > [ 1271.259683]  [ea121ec0-ea121eff] PMD -> 
> > > > > > > [88081100-8808113f] on node 3
> > > > > > > [ 1271.274194]  [ea121f00-ea121f3f] PMD -> 
> > > > > > > [880810c0-880810ff] on node 3
> > > > > > > [ 1271.288764]  [ea121f40-ea121f7f] PMD -> 
> > > > > > > [88081080-880810bf] on node 3
> > > > > 
> > > > > It appears that each memory object only has 64MB of memory.  This is
> > > > > less than the memory block size, which is 128MB.  This means that a
> > > > > single memory block associates with two ACPI memory device objects.
> > > > 
> > > > That'd be bad.
> > > > 
> > > > How did that work before if that indeed is the case?
> > > 
> > > Well, it looks to me that it has never worked before...
> > > 
> > > > > > > ...   
> > > > > > > [ 1271.325841] acpi PNP0C80:03: acpi_memory_enable_device() error
> > > > > > 
> > > > > > Well, the only new way acpi_memory_enable_device() can fail after 
> > > > > > that commit
> > > > > > is a failure in acpi_bind_memory_blocks().
> > > > > 
> > > > > Agreed.
> > > > > 
> > > > > > This means that either handle is NULL, which I think we can 
> > > > > > exclude, because
> > > > > > acpi_memory_enable_device() wouldn't be called at all if that were 
> > > > > > the case, or
> > > > > > there's a more subtle error in acpi_bind_one().
> > > > > > 
> > > > > > One that comes to mind is that we may be calling acpi_bind_one() 
> > > > > > twice for the
> > > > > > same memory region, in which it will trigger -EINVAL from the 
> > > > > > sanity check in
> > > > > > there.
> > > > > 
> > > > > I think it fails with -EINVAL at the place with dev_warn(dev, "ACPI
> > > > > handle is already set\n").  When two ACPI memory objects associate 
> > > > > with
> > > > > a same memory block, the bind procedure of the 2nd ACPI memory object
> > > > > sees that ACPI_HANDLE(dev) is already set to the 1st ACPI memory 
> > > > > object.
> > > > 
> > > > That sound's plausible, but I wonder how we can fix that?
> > > > 
> > > > There's no way for a single physical device to have two different ACPI
> > > > "companions".  It looks like the memory blocks should be 64 M each in 
> > > > that
> > > > case.  Or we need to create two child devices for each memory block and
> > > > associate each of them with an ACPI object.  That would lead to 
> > > > complications
> > > > in the user space interface, though.
> > > 
> > > Right.  Even bigger issue is that I do not think __add_pages() and
> > > __remove_pages() can add / delete a memory chunk that is less than
> > > 128MB.  128MB is the 

Re: Cannot hot remove a memory device

2013-08-04 Thread Rafael J. Wysocki
On Saturday, August 03, 2013 06:37:26 PM Toshi Kani wrote:
 On Sat, 2013-08-03 at 03:01 +0200, Rafael J. Wysocki wrote:
  On Friday, August 02, 2013 06:04:40 PM Toshi Kani wrote:
   On Sat, 2013-08-03 at 01:43 +0200, Rafael J. Wysocki wrote:
On Friday, August 02, 2013 03:46:15 PM Toshi Kani wrote:
 On Thu, 2013-08-01 at 23:43 +0200, Rafael J. Wysocki wrote:
  Hi,
  
  Thanks for your report.
  
  On Thursday, August 01, 2013 05:37:21 PM Yasuaki Ishimatsu wrote:
   By following commit, I cannot hot remove a memory device.
   
   ACPI / memhotplug: Bind removable memory blocks to ACPI device 
   nodes
   commit e2ff39400d81233374e780b133496a2296643d7d
   
   Details are follows:
   When I add a memory device, acpi_memory_enable_device() always 
   fails
   as follows:
   
   ...
   [ 1271.114116]  [ea121c40-ea121c7f] PMD - 
   [880813c0-880813ff] on node 3
   [ 1271.128682]  [ea121c80-ea121cbf] PMD - 
   [88081380-880813bf] on node 3
   [ 1271.143298]  [ea121cc0-ea121cff] PMD - 
   [88081300-8808133f] on node 3
   [ 1271.157799]  [ea121d00-ea121d3f] PMD - 
   [880812c0-880812ff] on node 3
   [ 1271.172341]  [ea121d40-ea121d7f] PMD - 
   [88081280-880812bf] on node 3
   [ 1271.186872]  [ea121d80-ea121dbf] PMD - 
   [88081240-8808127f] on node 3
   [ 1271.201481]  [ea121dc0-ea121dff] PMD - 
   [88081200-8808123f] on node 3
   [ 1271.216041]  [ea121e00-ea121e3f] PMD - 
   [880811c0-880811ff] on node 3
   [ 1271.230623]  [ea121e40-ea121e7f] PMD - 
   [88081180-880811bf] on node 3
   [ 1271.245148]  [ea121e80-ea121ebf] PMD - 
   [88081140-8808117f] on node 3
   [ 1271.259683]  [ea121ec0-ea121eff] PMD - 
   [88081100-8808113f] on node 3
   [ 1271.274194]  [ea121f00-ea121f3f] PMD - 
   [880810c0-880810ff] on node 3
   [ 1271.288764]  [ea121f40-ea121f7f] PMD - 
   [88081080-880810bf] on node 3
 
 It appears that each memory object only has 64MB of memory.  This is
 less than the memory block size, which is 128MB.  This means that a
 single memory block associates with two ACPI memory device objects.

That'd be bad.

How did that work before if that indeed is the case?
   
   Well, it looks to me that it has never worked before...
   
   ...   
   [ 1271.325841] acpi PNP0C80:03: acpi_memory_enable_device() error
  
  Well, the only new way acpi_memory_enable_device() can fail after 
  that commit
  is a failure in acpi_bind_memory_blocks().
 
 Agreed.
 
  This means that either handle is NULL, which I think we can 
  exclude, because
  acpi_memory_enable_device() wouldn't be called at all if that were 
  the case, or
  there's a more subtle error in acpi_bind_one().
  
  One that comes to mind is that we may be calling acpi_bind_one() 
  twice for the
  same memory region, in which it will trigger -EINVAL from the 
  sanity check in
  there.
 
 I think it fails with -EINVAL at the place with dev_warn(dev, ACPI
 handle is already set\n).  When two ACPI memory objects associate 
 with
 a same memory block, the bind procedure of the 2nd ACPI memory object
 sees that ACPI_HANDLE(dev) is already set to the 1st ACPI memory 
 object.

That sound's plausible, but I wonder how we can fix that?

There's no way for a single physical device to have two different ACPI
companions.  It looks like the memory blocks should be 64 M each in 
that
case.  Or we need to create two child devices for each memory block and
associate each of them with an ACPI object.  That would lead to 
complications
in the user space interface, though.
   
   Right.  Even bigger issue is that I do not think __add_pages() and
   __remove_pages() can add / delete a memory chunk that is less than
   128MB.  128MB is the granularity of them.  So, we may just have to fail
   this case gracefully.
  
  Sigh.
  
  BTW, why do you think they are 64 M each (it's late and I'm obviously 
  tired)?
 
 Oops!  Sorry, I had confused the above messages with the one in
 init_memory_mapping(), which shows a memory range being added, i.e. the
 size of an ACPI memory device object.  But the above messages actually
 came from vmemmap_populate_hugepages(), which was called during boot-up.
 So, these messages have nothing to do with ACPI memory device objects.
 And even worse, I do not seem to be able to count a 

Re: Cannot hot remove a memory device

2013-08-04 Thread Yasuaki Ishimatsu

(2013/08/04 9:37), Toshi Kani wrote:

On Sat, 2013-08-03 at 03:01 +0200, Rafael J. Wysocki wrote:

On Friday, August 02, 2013 06:04:40 PM Toshi Kani wrote:

On Sat, 2013-08-03 at 01:43 +0200, Rafael J. Wysocki wrote:

On Friday, August 02, 2013 03:46:15 PM Toshi Kani wrote:

On Thu, 2013-08-01 at 23:43 +0200, Rafael J. Wysocki wrote:

Hi,

Thanks for your report.

On Thursday, August 01, 2013 05:37:21 PM Yasuaki Ishimatsu wrote:

By following commit, I cannot hot remove a memory device.

ACPI / memhotplug: Bind removable memory blocks to ACPI device nodes
commit e2ff39400d81233374e780b133496a2296643d7d

Details are follows:
When I add a memory device, acpi_memory_enable_device() always fails
as follows:

...
[ 1271.114116]  [ea121c40-ea121c7f] PMD - 
[880813c0-880813ff] on node 3
[ 1271.128682]  [ea121c80-ea121cbf] PMD - 
[88081380-880813bf] on node 3
[ 1271.143298]  [ea121cc0-ea121cff] PMD - 
[88081300-8808133f] on node 3
[ 1271.157799]  [ea121d00-ea121d3f] PMD - 
[880812c0-880812ff] on node 3
[ 1271.172341]  [ea121d40-ea121d7f] PMD - 
[88081280-880812bf] on node 3
[ 1271.186872]  [ea121d80-ea121dbf] PMD - 
[88081240-8808127f] on node 3
[ 1271.201481]  [ea121dc0-ea121dff] PMD - 
[88081200-8808123f] on node 3
[ 1271.216041]  [ea121e00-ea121e3f] PMD - 
[880811c0-880811ff] on node 3
[ 1271.230623]  [ea121e40-ea121e7f] PMD - 
[88081180-880811bf] on node 3
[ 1271.245148]  [ea121e80-ea121ebf] PMD - 
[88081140-8808117f] on node 3
[ 1271.259683]  [ea121ec0-ea121eff] PMD - 
[88081100-8808113f] on node 3
[ 1271.274194]  [ea121f00-ea121f3f] PMD - 
[880810c0-880810ff] on node 3
[ 1271.288764]  [ea121f40-ea121f7f] PMD - 
[88081080-880810bf] on node 3


It appears that each memory object only has 64MB of memory.  This is
less than the memory block size, which is 128MB.  This means that a
single memory block associates with two ACPI memory device objects.


That'd be bad.

How did that work before if that indeed is the case?


Well, it looks to me that it has never worked before...


... 
[ 1271.325841] acpi PNP0C80:03: acpi_memory_enable_device() error


Well, the only new way acpi_memory_enable_device() can fail after that commit
is a failure in acpi_bind_memory_blocks().


Agreed.


This means that either handle is NULL, which I think we can exclude, because
acpi_memory_enable_device() wouldn't be called at all if that were the case, or
there's a more subtle error in acpi_bind_one().

One that comes to mind is that we may be calling acpi_bind_one() twice for the
same memory region, in which it will trigger -EINVAL from the sanity check in
there.


I think it fails with -EINVAL at the place with dev_warn(dev, ACPI
handle is already set\n).  When two ACPI memory objects associate with
a same memory block, the bind procedure of the 2nd ACPI memory object
sees that ACPI_HANDLE(dev) is already set to the 1st ACPI memory object.


That sound's plausible, but I wonder how we can fix that?

There's no way for a single physical device to have two different ACPI
companions.  It looks like the memory blocks should be 64 M each in that
case.  Or we need to create two child devices for each memory block and
associate each of them with an ACPI object.  That would lead to complications
in the user space interface, though.


Right.  Even bigger issue is that I do not think __add_pages() and
__remove_pages() can add / delete a memory chunk that is less than
128MB.  128MB is the granularity of them.  So, we may just have to fail
this case gracefully.


Sigh.

BTW, why do you think they are 64 M each (it's late and I'm obviously tired)?


Oops!  Sorry, I had confused the above messages with the one in
init_memory_mapping(), which shows a memory range being added, i.e. the
size of an ACPI memory device object.  But the above messages actually
came from vmemmap_populate_hugepages(), which was called during boot-up.
So, these messages have nothing to do with ACPI memory device objects.
And even worse, I do not seem to be able to count a number of zeros...
In the above messages, each memory range is 4MB (0x40), not 64MB
(0x400)...  My bad. :-(

So, while we may still need to do something for the less-than-128MB
issue, Yasuaki may be hitting a different one.  Let's wait for Yasuaki
to give us more info.


acpi_bind_memory_blocks() failed with -ENOSPC.

int acpi_bind_one(struct device *dev, acpi_handle handle)
{
...
/* allocate physical node id according to physical_node_id_bitmap */
physical_node-node_id =
find_first_zero_bit(acpi_dev-physical_node_id_bitmap,

Re: Cannot hot remove a memory device

2013-08-03 Thread Toshi Kani
On Sat, 2013-08-03 at 03:01 +0200, Rafael J. Wysocki wrote:
> On Friday, August 02, 2013 06:04:40 PM Toshi Kani wrote:
> > On Sat, 2013-08-03 at 01:43 +0200, Rafael J. Wysocki wrote:
> > > On Friday, August 02, 2013 03:46:15 PM Toshi Kani wrote:
> > > > On Thu, 2013-08-01 at 23:43 +0200, Rafael J. Wysocki wrote:
> > > > > Hi,
> > > > > 
> > > > > Thanks for your report.
> > > > > 
> > > > > On Thursday, August 01, 2013 05:37:21 PM Yasuaki Ishimatsu wrote:
> > > > > > By following commit, I cannot hot remove a memory device.
> > > > > > 
> > > > > > ACPI / memhotplug: Bind removable memory blocks to ACPI device nodes
> > > > > > commit e2ff39400d81233374e780b133496a2296643d7d
> > > > > > 
> > > > > > Details are follows:
> > > > > > When I add a memory device, acpi_memory_enable_device() always fails
> > > > > > as follows:
> > > > > > 
> > > > > > ...
> > > > > > [ 1271.114116]  [ea121c40-ea121c7f] PMD -> 
> > > > > > [880813c0-880813ff] on node 3
> > > > > > [ 1271.128682]  [ea121c80-ea121cbf] PMD -> 
> > > > > > [88081380-880813bf] on node 3
> > > > > > [ 1271.143298]  [ea121cc0-ea121cff] PMD -> 
> > > > > > [88081300-8808133f] on node 3
> > > > > > [ 1271.157799]  [ea121d00-ea121d3f] PMD -> 
> > > > > > [880812c0-880812ff] on node 3
> > > > > > [ 1271.172341]  [ea121d40-ea121d7f] PMD -> 
> > > > > > [88081280-880812bf] on node 3
> > > > > > [ 1271.186872]  [ea121d80-ea121dbf] PMD -> 
> > > > > > [88081240-8808127f] on node 3
> > > > > > [ 1271.201481]  [ea121dc0-ea121dff] PMD -> 
> > > > > > [88081200-8808123f] on node 3
> > > > > > [ 1271.216041]  [ea121e00-ea121e3f] PMD -> 
> > > > > > [880811c0-880811ff] on node 3
> > > > > > [ 1271.230623]  [ea121e40-ea121e7f] PMD -> 
> > > > > > [88081180-880811bf] on node 3
> > > > > > [ 1271.245148]  [ea121e80-ea121ebf] PMD -> 
> > > > > > [88081140-8808117f] on node 3
> > > > > > [ 1271.259683]  [ea121ec0-ea121eff] PMD -> 
> > > > > > [88081100-8808113f] on node 3
> > > > > > [ 1271.274194]  [ea121f00-ea121f3f] PMD -> 
> > > > > > [880810c0-880810ff] on node 3
> > > > > > [ 1271.288764]  [ea121f40-ea121f7f] PMD -> 
> > > > > > [88081080-880810bf] on node 3
> > > > 
> > > > It appears that each memory object only has 64MB of memory.  This is
> > > > less than the memory block size, which is 128MB.  This means that a
> > > > single memory block associates with two ACPI memory device objects.
> > > 
> > > That'd be bad.
> > > 
> > > How did that work before if that indeed is the case?
> > 
> > Well, it looks to me that it has never worked before...
> > 
> > > > > > ... 
> > > > > > [ 1271.325841] acpi PNP0C80:03: acpi_memory_enable_device() error
> > > > > 
> > > > > Well, the only new way acpi_memory_enable_device() can fail after 
> > > > > that commit
> > > > > is a failure in acpi_bind_memory_blocks().
> > > > 
> > > > Agreed.
> > > > 
> > > > > This means that either handle is NULL, which I think we can exclude, 
> > > > > because
> > > > > acpi_memory_enable_device() wouldn't be called at all if that were 
> > > > > the case, or
> > > > > there's a more subtle error in acpi_bind_one().
> > > > > 
> > > > > One that comes to mind is that we may be calling acpi_bind_one() 
> > > > > twice for the
> > > > > same memory region, in which it will trigger -EINVAL from the sanity 
> > > > > check in
> > > > > there.
> > > > 
> > > > I think it fails with -EINVAL at the place with dev_warn(dev, "ACPI
> > > > handle is already set\n").  When two ACPI memory objects associate with
> > > > a same memory block, the bind procedure of the 2nd ACPI memory object
> > > > sees that ACPI_HANDLE(dev) is already set to the 1st ACPI memory object.
> > > 
> > > That sound's plausible, but I wonder how we can fix that?
> > > 
> > > There's no way for a single physical device to have two different ACPI
> > > "companions".  It looks like the memory blocks should be 64 M each in that
> > > case.  Or we need to create two child devices for each memory block and
> > > associate each of them with an ACPI object.  That would lead to 
> > > complications
> > > in the user space interface, though.
> > 
> > Right.  Even bigger issue is that I do not think __add_pages() and
> > __remove_pages() can add / delete a memory chunk that is less than
> > 128MB.  128MB is the granularity of them.  So, we may just have to fail
> > this case gracefully.
> 
> Sigh.
> 
> BTW, why do you think they are 64 M each (it's late and I'm obviously tired)?

Oops!  Sorry, I had confused the above messages with the one in
init_memory_mapping(), which shows a memory range being added, i.e. the
size of an 

Re: Cannot hot remove a memory device

2013-08-03 Thread Toshi Kani
On Sat, 2013-08-03 at 03:01 +0200, Rafael J. Wysocki wrote:
 On Friday, August 02, 2013 06:04:40 PM Toshi Kani wrote:
  On Sat, 2013-08-03 at 01:43 +0200, Rafael J. Wysocki wrote:
   On Friday, August 02, 2013 03:46:15 PM Toshi Kani wrote:
On Thu, 2013-08-01 at 23:43 +0200, Rafael J. Wysocki wrote:
 Hi,
 
 Thanks for your report.
 
 On Thursday, August 01, 2013 05:37:21 PM Yasuaki Ishimatsu wrote:
  By following commit, I cannot hot remove a memory device.
  
  ACPI / memhotplug: Bind removable memory blocks to ACPI device nodes
  commit e2ff39400d81233374e780b133496a2296643d7d
  
  Details are follows:
  When I add a memory device, acpi_memory_enable_device() always fails
  as follows:
  
  ...
  [ 1271.114116]  [ea121c40-ea121c7f] PMD - 
  [880813c0-880813ff] on node 3
  [ 1271.128682]  [ea121c80-ea121cbf] PMD - 
  [88081380-880813bf] on node 3
  [ 1271.143298]  [ea121cc0-ea121cff] PMD - 
  [88081300-8808133f] on node 3
  [ 1271.157799]  [ea121d00-ea121d3f] PMD - 
  [880812c0-880812ff] on node 3
  [ 1271.172341]  [ea121d40-ea121d7f] PMD - 
  [88081280-880812bf] on node 3
  [ 1271.186872]  [ea121d80-ea121dbf] PMD - 
  [88081240-8808127f] on node 3
  [ 1271.201481]  [ea121dc0-ea121dff] PMD - 
  [88081200-8808123f] on node 3
  [ 1271.216041]  [ea121e00-ea121e3f] PMD - 
  [880811c0-880811ff] on node 3
  [ 1271.230623]  [ea121e40-ea121e7f] PMD - 
  [88081180-880811bf] on node 3
  [ 1271.245148]  [ea121e80-ea121ebf] PMD - 
  [88081140-8808117f] on node 3
  [ 1271.259683]  [ea121ec0-ea121eff] PMD - 
  [88081100-8808113f] on node 3
  [ 1271.274194]  [ea121f00-ea121f3f] PMD - 
  [880810c0-880810ff] on node 3
  [ 1271.288764]  [ea121f40-ea121f7f] PMD - 
  [88081080-880810bf] on node 3

It appears that each memory object only has 64MB of memory.  This is
less than the memory block size, which is 128MB.  This means that a
single memory block associates with two ACPI memory device objects.
   
   That'd be bad.
   
   How did that work before if that indeed is the case?
  
  Well, it looks to me that it has never worked before...
  
  ... 
  [ 1271.325841] acpi PNP0C80:03: acpi_memory_enable_device() error
 
 Well, the only new way acpi_memory_enable_device() can fail after 
 that commit
 is a failure in acpi_bind_memory_blocks().

Agreed.

 This means that either handle is NULL, which I think we can exclude, 
 because
 acpi_memory_enable_device() wouldn't be called at all if that were 
 the case, or
 there's a more subtle error in acpi_bind_one().
 
 One that comes to mind is that we may be calling acpi_bind_one() 
 twice for the
 same memory region, in which it will trigger -EINVAL from the sanity 
 check in
 there.

I think it fails with -EINVAL at the place with dev_warn(dev, ACPI
handle is already set\n).  When two ACPI memory objects associate with
a same memory block, the bind procedure of the 2nd ACPI memory object
sees that ACPI_HANDLE(dev) is already set to the 1st ACPI memory object.
   
   That sound's plausible, but I wonder how we can fix that?
   
   There's no way for a single physical device to have two different ACPI
   companions.  It looks like the memory blocks should be 64 M each in that
   case.  Or we need to create two child devices for each memory block and
   associate each of them with an ACPI object.  That would lead to 
   complications
   in the user space interface, though.
  
  Right.  Even bigger issue is that I do not think __add_pages() and
  __remove_pages() can add / delete a memory chunk that is less than
  128MB.  128MB is the granularity of them.  So, we may just have to fail
  this case gracefully.
 
 Sigh.
 
 BTW, why do you think they are 64 M each (it's late and I'm obviously tired)?

Oops!  Sorry, I had confused the above messages with the one in
init_memory_mapping(), which shows a memory range being added, i.e. the
size of an ACPI memory device object.  But the above messages actually
came from vmemmap_populate_hugepages(), which was called during boot-up.
So, these messages have nothing to do with ACPI memory device objects.
And even worse, I do not seem to be able to count a number of zeros...
In the above messages, each memory range is 4MB (0x40), not 64MB
(0x400)...  My bad. :-(

So, while we may still need to do something for the less-than-128MB
issue, Yasuaki may be 

Re: Cannot hot remove a memory device

2013-08-02 Thread Rafael J. Wysocki
On Friday, August 02, 2013 06:04:40 PM Toshi Kani wrote:
> On Sat, 2013-08-03 at 01:43 +0200, Rafael J. Wysocki wrote:
> > On Friday, August 02, 2013 03:46:15 PM Toshi Kani wrote:
> > > On Thu, 2013-08-01 at 23:43 +0200, Rafael J. Wysocki wrote:
> > > > Hi,
> > > > 
> > > > Thanks for your report.
> > > > 
> > > > On Thursday, August 01, 2013 05:37:21 PM Yasuaki Ishimatsu wrote:
> > > > > By following commit, I cannot hot remove a memory device.
> > > > > 
> > > > > ACPI / memhotplug: Bind removable memory blocks to ACPI device nodes
> > > > > commit e2ff39400d81233374e780b133496a2296643d7d
> > > > > 
> > > > > Details are follows:
> > > > > When I add a memory device, acpi_memory_enable_device() always fails
> > > > > as follows:
> > > > > 
> > > > > ...
> > > > > [ 1271.114116]  [ea121c40-ea121c7f] PMD -> 
> > > > > [880813c0-880813ff] on node 3
> > > > > [ 1271.128682]  [ea121c80-ea121cbf] PMD -> 
> > > > > [88081380-880813bf] on node 3
> > > > > [ 1271.143298]  [ea121cc0-ea121cff] PMD -> 
> > > > > [88081300-8808133f] on node 3
> > > > > [ 1271.157799]  [ea121d00-ea121d3f] PMD -> 
> > > > > [880812c0-880812ff] on node 3
> > > > > [ 1271.172341]  [ea121d40-ea121d7f] PMD -> 
> > > > > [88081280-880812bf] on node 3
> > > > > [ 1271.186872]  [ea121d80-ea121dbf] PMD -> 
> > > > > [88081240-8808127f] on node 3
> > > > > [ 1271.201481]  [ea121dc0-ea121dff] PMD -> 
> > > > > [88081200-8808123f] on node 3
> > > > > [ 1271.216041]  [ea121e00-ea121e3f] PMD -> 
> > > > > [880811c0-880811ff] on node 3
> > > > > [ 1271.230623]  [ea121e40-ea121e7f] PMD -> 
> > > > > [88081180-880811bf] on node 3
> > > > > [ 1271.245148]  [ea121e80-ea121ebf] PMD -> 
> > > > > [88081140-8808117f] on node 3
> > > > > [ 1271.259683]  [ea121ec0-ea121eff] PMD -> 
> > > > > [88081100-8808113f] on node 3
> > > > > [ 1271.274194]  [ea121f00-ea121f3f] PMD -> 
> > > > > [880810c0-880810ff] on node 3
> > > > > [ 1271.288764]  [ea121f40-ea121f7f] PMD -> 
> > > > > [88081080-880810bf] on node 3
> > > 
> > > It appears that each memory object only has 64MB of memory.  This is
> > > less than the memory block size, which is 128MB.  This means that a
> > > single memory block associates with two ACPI memory device objects.
> > 
> > That'd be bad.
> > 
> > How did that work before if that indeed is the case?
> 
> Well, it looks to me that it has never worked before...
> 
> > > > > ...   
> > > > > [ 1271.325841] acpi PNP0C80:03: acpi_memory_enable_device() error
> > > > 
> > > > Well, the only new way acpi_memory_enable_device() can fail after that 
> > > > commit
> > > > is a failure in acpi_bind_memory_blocks().
> > > 
> > > Agreed.
> > > 
> > > > This means that either handle is NULL, which I think we can exclude, 
> > > > because
> > > > acpi_memory_enable_device() wouldn't be called at all if that were the 
> > > > case, or
> > > > there's a more subtle error in acpi_bind_one().
> > > > 
> > > > One that comes to mind is that we may be calling acpi_bind_one() twice 
> > > > for the
> > > > same memory region, in which it will trigger -EINVAL from the sanity 
> > > > check in
> > > > there.
> > > 
> > > I think it fails with -EINVAL at the place with dev_warn(dev, "ACPI
> > > handle is already set\n").  When two ACPI memory objects associate with
> > > a same memory block, the bind procedure of the 2nd ACPI memory object
> > > sees that ACPI_HANDLE(dev) is already set to the 1st ACPI memory object.
> > 
> > That sound's plausible, but I wonder how we can fix that?
> > 
> > There's no way for a single physical device to have two different ACPI
> > "companions".  It looks like the memory blocks should be 64 M each in that
> > case.  Or we need to create two child devices for each memory block and
> > associate each of them with an ACPI object.  That would lead to 
> > complications
> > in the user space interface, though.
> 
> Right.  Even bigger issue is that I do not think __add_pages() and
> __remove_pages() can add / delete a memory chunk that is less than
> 128MB.  128MB is the granularity of them.  So, we may just have to fail
> this case gracefully.

Sigh.

BTW, why do you think they are 64 M each (it's late and I'm obviously tired)?

Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Cannot hot remove a memory device

2013-08-02 Thread Toshi Kani
On Sat, 2013-08-03 at 01:43 +0200, Rafael J. Wysocki wrote:
> On Friday, August 02, 2013 03:46:15 PM Toshi Kani wrote:
> > On Thu, 2013-08-01 at 23:43 +0200, Rafael J. Wysocki wrote:
> > > Hi,
> > > 
> > > Thanks for your report.
> > > 
> > > On Thursday, August 01, 2013 05:37:21 PM Yasuaki Ishimatsu wrote:
> > > > By following commit, I cannot hot remove a memory device.
> > > > 
> > > > ACPI / memhotplug: Bind removable memory blocks to ACPI device nodes
> > > > commit e2ff39400d81233374e780b133496a2296643d7d
> > > > 
> > > > Details are follows:
> > > > When I add a memory device, acpi_memory_enable_device() always fails
> > > > as follows:
> > > > 
> > > > ...
> > > > [ 1271.114116]  [ea121c40-ea121c7f] PMD -> 
> > > > [880813c0-880813ff] on node 3
> > > > [ 1271.128682]  [ea121c80-ea121cbf] PMD -> 
> > > > [88081380-880813bf] on node 3
> > > > [ 1271.143298]  [ea121cc0-ea121cff] PMD -> 
> > > > [88081300-8808133f] on node 3
> > > > [ 1271.157799]  [ea121d00-ea121d3f] PMD -> 
> > > > [880812c0-880812ff] on node 3
> > > > [ 1271.172341]  [ea121d40-ea121d7f] PMD -> 
> > > > [88081280-880812bf] on node 3
> > > > [ 1271.186872]  [ea121d80-ea121dbf] PMD -> 
> > > > [88081240-8808127f] on node 3
> > > > [ 1271.201481]  [ea121dc0-ea121dff] PMD -> 
> > > > [88081200-8808123f] on node 3
> > > > [ 1271.216041]  [ea121e00-ea121e3f] PMD -> 
> > > > [880811c0-880811ff] on node 3
> > > > [ 1271.230623]  [ea121e40-ea121e7f] PMD -> 
> > > > [88081180-880811bf] on node 3
> > > > [ 1271.245148]  [ea121e80-ea121ebf] PMD -> 
> > > > [88081140-8808117f] on node 3
> > > > [ 1271.259683]  [ea121ec0-ea121eff] PMD -> 
> > > > [88081100-8808113f] on node 3
> > > > [ 1271.274194]  [ea121f00-ea121f3f] PMD -> 
> > > > [880810c0-880810ff] on node 3
> > > > [ 1271.288764]  [ea121f40-ea121f7f] PMD -> 
> > > > [88081080-880810bf] on node 3
> > 
> > It appears that each memory object only has 64MB of memory.  This is
> > less than the memory block size, which is 128MB.  This means that a
> > single memory block associates with two ACPI memory device objects.
> 
> That'd be bad.
> 
> How did that work before if that indeed is the case?

Well, it looks to me that it has never worked before...

> > > > ... 
> > > > [ 1271.325841] acpi PNP0C80:03: acpi_memory_enable_device() error
> > > 
> > > Well, the only new way acpi_memory_enable_device() can fail after that 
> > > commit
> > > is a failure in acpi_bind_memory_blocks().
> > 
> > Agreed.
> > 
> > > This means that either handle is NULL, which I think we can exclude, 
> > > because
> > > acpi_memory_enable_device() wouldn't be called at all if that were the 
> > > case, or
> > > there's a more subtle error in acpi_bind_one().
> > > 
> > > One that comes to mind is that we may be calling acpi_bind_one() twice 
> > > for the
> > > same memory region, in which it will trigger -EINVAL from the sanity 
> > > check in
> > > there.
> > 
> > I think it fails with -EINVAL at the place with dev_warn(dev, "ACPI
> > handle is already set\n").  When two ACPI memory objects associate with
> > a same memory block, the bind procedure of the 2nd ACPI memory object
> > sees that ACPI_HANDLE(dev) is already set to the 1st ACPI memory object.
> 
> That sound's plausible, but I wonder how we can fix that?
> 
> There's no way for a single physical device to have two different ACPI
> "companions".  It looks like the memory blocks should be 64 M each in that
> case.  Or we need to create two child devices for each memory block and
> associate each of them with an ACPI object.  That would lead to complications
> in the user space interface, though.

Right.  Even bigger issue is that I do not think __add_pages() and
__remove_pages() can add / delete a memory chunk that is less than
128MB.  128MB is the granularity of them.  So, we may just have to fail
this case gracefully.

Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Cannot hot remove a memory device

2013-08-02 Thread Rafael J. Wysocki
On Friday, August 02, 2013 03:46:15 PM Toshi Kani wrote:
> On Thu, 2013-08-01 at 23:43 +0200, Rafael J. Wysocki wrote:
> > Hi,
> > 
> > Thanks for your report.
> > 
> > On Thursday, August 01, 2013 05:37:21 PM Yasuaki Ishimatsu wrote:
> > > By following commit, I cannot hot remove a memory device.
> > > 
> > > ACPI / memhotplug: Bind removable memory blocks to ACPI device nodes
> > > commit e2ff39400d81233374e780b133496a2296643d7d
> > > 
> > > Details are follows:
> > > When I add a memory device, acpi_memory_enable_device() always fails
> > > as follows:
> > > 
> > > ...
> > > [ 1271.114116]  [ea121c40-ea121c7f] PMD -> 
> > > [880813c0-880813ff] on node 3
> > > [ 1271.128682]  [ea121c80-ea121cbf] PMD -> 
> > > [88081380-880813bf] on node 3
> > > [ 1271.143298]  [ea121cc0-ea121cff] PMD -> 
> > > [88081300-8808133f] on node 3
> > > [ 1271.157799]  [ea121d00-ea121d3f] PMD -> 
> > > [880812c0-880812ff] on node 3
> > > [ 1271.172341]  [ea121d40-ea121d7f] PMD -> 
> > > [88081280-880812bf] on node 3
> > > [ 1271.186872]  [ea121d80-ea121dbf] PMD -> 
> > > [88081240-8808127f] on node 3
> > > [ 1271.201481]  [ea121dc0-ea121dff] PMD -> 
> > > [88081200-8808123f] on node 3
> > > [ 1271.216041]  [ea121e00-ea121e3f] PMD -> 
> > > [880811c0-880811ff] on node 3
> > > [ 1271.230623]  [ea121e40-ea121e7f] PMD -> 
> > > [88081180-880811bf] on node 3
> > > [ 1271.245148]  [ea121e80-ea121ebf] PMD -> 
> > > [88081140-8808117f] on node 3
> > > [ 1271.259683]  [ea121ec0-ea121eff] PMD -> 
> > > [88081100-8808113f] on node 3
> > > [ 1271.274194]  [ea121f00-ea121f3f] PMD -> 
> > > [880810c0-880810ff] on node 3
> > > [ 1271.288764]  [ea121f40-ea121f7f] PMD -> 
> > > [88081080-880810bf] on node 3
> 
> It appears that each memory object only has 64MB of memory.  This is
> less than the memory block size, which is 128MB.  This means that a
> single memory block associates with two ACPI memory device objects.

That'd be bad.

How did that work before if that indeed is the case?

> > > ...   
> > > [ 1271.325841] acpi PNP0C80:03: acpi_memory_enable_device() error
> > 
> > Well, the only new way acpi_memory_enable_device() can fail after that 
> > commit
> > is a failure in acpi_bind_memory_blocks().
> 
> Agreed.
> 
> > This means that either handle is NULL, which I think we can exclude, because
> > acpi_memory_enable_device() wouldn't be called at all if that were the 
> > case, or
> > there's a more subtle error in acpi_bind_one().
> > 
> > One that comes to mind is that we may be calling acpi_bind_one() twice for 
> > the
> > same memory region, in which it will trigger -EINVAL from the sanity check 
> > in
> > there.
> 
> I think it fails with -EINVAL at the place with dev_warn(dev, "ACPI
> handle is already set\n").  When two ACPI memory objects associate with
> a same memory block, the bind procedure of the 2nd ACPI memory object
> sees that ACPI_HANDLE(dev) is already set to the 1st ACPI memory object.

That sound's plausible, but I wonder how we can fix that?

There's no way for a single physical device to have two different ACPI
"companions".  It looks like the memory blocks should be 64 M each in that
case.  Or we need to create two child devices for each memory block and
associate each of them with an ACPI object.  That would lead to complications
in the user space interface, though.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Cannot hot remove a memory device

2013-08-02 Thread Toshi Kani
On Thu, 2013-08-01 at 23:43 +0200, Rafael J. Wysocki wrote:
> Hi,
> 
> Thanks for your report.
> 
> On Thursday, August 01, 2013 05:37:21 PM Yasuaki Ishimatsu wrote:
> > By following commit, I cannot hot remove a memory device.
> > 
> > ACPI / memhotplug: Bind removable memory blocks to ACPI device nodes
> > commit e2ff39400d81233374e780b133496a2296643d7d
> > 
> > Details are follows:
> > When I add a memory device, acpi_memory_enable_device() always fails
> > as follows:
> > 
> > ...
> > [ 1271.114116]  [ea121c40-ea121c7f] PMD -> 
> > [880813c0-880813ff] on node 3
> > [ 1271.128682]  [ea121c80-ea121cbf] PMD -> 
> > [88081380-880813bf] on node 3
> > [ 1271.143298]  [ea121cc0-ea121cff] PMD -> 
> > [88081300-8808133f] on node 3
> > [ 1271.157799]  [ea121d00-ea121d3f] PMD -> 
> > [880812c0-880812ff] on node 3
> > [ 1271.172341]  [ea121d40-ea121d7f] PMD -> 
> > [88081280-880812bf] on node 3
> > [ 1271.186872]  [ea121d80-ea121dbf] PMD -> 
> > [88081240-8808127f] on node 3
> > [ 1271.201481]  [ea121dc0-ea121dff] PMD -> 
> > [88081200-8808123f] on node 3
> > [ 1271.216041]  [ea121e00-ea121e3f] PMD -> 
> > [880811c0-880811ff] on node 3
> > [ 1271.230623]  [ea121e40-ea121e7f] PMD -> 
> > [88081180-880811bf] on node 3
> > [ 1271.245148]  [ea121e80-ea121ebf] PMD -> 
> > [88081140-8808117f] on node 3
> > [ 1271.259683]  [ea121ec0-ea121eff] PMD -> 
> > [88081100-8808113f] on node 3
> > [ 1271.274194]  [ea121f00-ea121f3f] PMD -> 
> > [880810c0-880810ff] on node 3
> > [ 1271.288764]  [ea121f40-ea121f7f] PMD -> 
> > [88081080-880810bf] on node 3

It appears that each memory object only has 64MB of memory.  This is
less than the memory block size, which is 128MB.  This means that a
single memory block associates with two ACPI memory device objects.

> > ... 
> > [ 1271.325841] acpi PNP0C80:03: acpi_memory_enable_device() error
> 
> Well, the only new way acpi_memory_enable_device() can fail after that commit
> is a failure in acpi_bind_memory_blocks().

Agreed.

> This means that either handle is NULL, which I think we can exclude, because
> acpi_memory_enable_device() wouldn't be called at all if that were the case, 
> or
> there's a more subtle error in acpi_bind_one().
> 
> One that comes to mind is that we may be calling acpi_bind_one() twice for the
> same memory region, in which it will trigger -EINVAL from the sanity check in
> there.

I think it fails with -EINVAL at the place with dev_warn(dev, "ACPI
handle is already set\n").  When two ACPI memory objects associate with
a same memory block, the bind procedure of the 2nd ACPI memory object
sees that ACPI_HANDLE(dev) is already set to the 1st ACPI memory object.

Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Cannot hot remove a memory device

2013-08-02 Thread Toshi Kani
On Thu, 2013-08-01 at 23:43 +0200, Rafael J. Wysocki wrote:
 Hi,
 
 Thanks for your report.
 
 On Thursday, August 01, 2013 05:37:21 PM Yasuaki Ishimatsu wrote:
  By following commit, I cannot hot remove a memory device.
  
  ACPI / memhotplug: Bind removable memory blocks to ACPI device nodes
  commit e2ff39400d81233374e780b133496a2296643d7d
  
  Details are follows:
  When I add a memory device, acpi_memory_enable_device() always fails
  as follows:
  
  ...
  [ 1271.114116]  [ea121c40-ea121c7f] PMD - 
  [880813c0-880813ff] on node 3
  [ 1271.128682]  [ea121c80-ea121cbf] PMD - 
  [88081380-880813bf] on node 3
  [ 1271.143298]  [ea121cc0-ea121cff] PMD - 
  [88081300-8808133f] on node 3
  [ 1271.157799]  [ea121d00-ea121d3f] PMD - 
  [880812c0-880812ff] on node 3
  [ 1271.172341]  [ea121d40-ea121d7f] PMD - 
  [88081280-880812bf] on node 3
  [ 1271.186872]  [ea121d80-ea121dbf] PMD - 
  [88081240-8808127f] on node 3
  [ 1271.201481]  [ea121dc0-ea121dff] PMD - 
  [88081200-8808123f] on node 3
  [ 1271.216041]  [ea121e00-ea121e3f] PMD - 
  [880811c0-880811ff] on node 3
  [ 1271.230623]  [ea121e40-ea121e7f] PMD - 
  [88081180-880811bf] on node 3
  [ 1271.245148]  [ea121e80-ea121ebf] PMD - 
  [88081140-8808117f] on node 3
  [ 1271.259683]  [ea121ec0-ea121eff] PMD - 
  [88081100-8808113f] on node 3
  [ 1271.274194]  [ea121f00-ea121f3f] PMD - 
  [880810c0-880810ff] on node 3
  [ 1271.288764]  [ea121f40-ea121f7f] PMD - 
  [88081080-880810bf] on node 3

It appears that each memory object only has 64MB of memory.  This is
less than the memory block size, which is 128MB.  This means that a
single memory block associates with two ACPI memory device objects.

  ... 
  [ 1271.325841] acpi PNP0C80:03: acpi_memory_enable_device() error
 
 Well, the only new way acpi_memory_enable_device() can fail after that commit
 is a failure in acpi_bind_memory_blocks().

Agreed.

 This means that either handle is NULL, which I think we can exclude, because
 acpi_memory_enable_device() wouldn't be called at all if that were the case, 
 or
 there's a more subtle error in acpi_bind_one().
 
 One that comes to mind is that we may be calling acpi_bind_one() twice for the
 same memory region, in which it will trigger -EINVAL from the sanity check in
 there.

I think it fails with -EINVAL at the place with dev_warn(dev, ACPI
handle is already set\n).  When two ACPI memory objects associate with
a same memory block, the bind procedure of the 2nd ACPI memory object
sees that ACPI_HANDLE(dev) is already set to the 1st ACPI memory object.

Thanks,
-Toshi

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Cannot hot remove a memory device

2013-08-02 Thread Rafael J. Wysocki
On Friday, August 02, 2013 03:46:15 PM Toshi Kani wrote:
 On Thu, 2013-08-01 at 23:43 +0200, Rafael J. Wysocki wrote:
  Hi,
  
  Thanks for your report.
  
  On Thursday, August 01, 2013 05:37:21 PM Yasuaki Ishimatsu wrote:
   By following commit, I cannot hot remove a memory device.
   
   ACPI / memhotplug: Bind removable memory blocks to ACPI device nodes
   commit e2ff39400d81233374e780b133496a2296643d7d
   
   Details are follows:
   When I add a memory device, acpi_memory_enable_device() always fails
   as follows:
   
   ...
   [ 1271.114116]  [ea121c40-ea121c7f] PMD - 
   [880813c0-880813ff] on node 3
   [ 1271.128682]  [ea121c80-ea121cbf] PMD - 
   [88081380-880813bf] on node 3
   [ 1271.143298]  [ea121cc0-ea121cff] PMD - 
   [88081300-8808133f] on node 3
   [ 1271.157799]  [ea121d00-ea121d3f] PMD - 
   [880812c0-880812ff] on node 3
   [ 1271.172341]  [ea121d40-ea121d7f] PMD - 
   [88081280-880812bf] on node 3
   [ 1271.186872]  [ea121d80-ea121dbf] PMD - 
   [88081240-8808127f] on node 3
   [ 1271.201481]  [ea121dc0-ea121dff] PMD - 
   [88081200-8808123f] on node 3
   [ 1271.216041]  [ea121e00-ea121e3f] PMD - 
   [880811c0-880811ff] on node 3
   [ 1271.230623]  [ea121e40-ea121e7f] PMD - 
   [88081180-880811bf] on node 3
   [ 1271.245148]  [ea121e80-ea121ebf] PMD - 
   [88081140-8808117f] on node 3
   [ 1271.259683]  [ea121ec0-ea121eff] PMD - 
   [88081100-8808113f] on node 3
   [ 1271.274194]  [ea121f00-ea121f3f] PMD - 
   [880810c0-880810ff] on node 3
   [ 1271.288764]  [ea121f40-ea121f7f] PMD - 
   [88081080-880810bf] on node 3
 
 It appears that each memory object only has 64MB of memory.  This is
 less than the memory block size, which is 128MB.  This means that a
 single memory block associates with two ACPI memory device objects.

That'd be bad.

How did that work before if that indeed is the case?

   ...   
   [ 1271.325841] acpi PNP0C80:03: acpi_memory_enable_device() error
  
  Well, the only new way acpi_memory_enable_device() can fail after that 
  commit
  is a failure in acpi_bind_memory_blocks().
 
 Agreed.
 
  This means that either handle is NULL, which I think we can exclude, because
  acpi_memory_enable_device() wouldn't be called at all if that were the 
  case, or
  there's a more subtle error in acpi_bind_one().
  
  One that comes to mind is that we may be calling acpi_bind_one() twice for 
  the
  same memory region, in which it will trigger -EINVAL from the sanity check 
  in
  there.
 
 I think it fails with -EINVAL at the place with dev_warn(dev, ACPI
 handle is already set\n).  When two ACPI memory objects associate with
 a same memory block, the bind procedure of the 2nd ACPI memory object
 sees that ACPI_HANDLE(dev) is already set to the 1st ACPI memory object.

That sound's plausible, but I wonder how we can fix that?

There's no way for a single physical device to have two different ACPI
companions.  It looks like the memory blocks should be 64 M each in that
case.  Or we need to create two child devices for each memory block and
associate each of them with an ACPI object.  That would lead to complications
in the user space interface, though.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Cannot hot remove a memory device

2013-08-02 Thread Toshi Kani
On Sat, 2013-08-03 at 01:43 +0200, Rafael J. Wysocki wrote:
 On Friday, August 02, 2013 03:46:15 PM Toshi Kani wrote:
  On Thu, 2013-08-01 at 23:43 +0200, Rafael J. Wysocki wrote:
   Hi,
   
   Thanks for your report.
   
   On Thursday, August 01, 2013 05:37:21 PM Yasuaki Ishimatsu wrote:
By following commit, I cannot hot remove a memory device.

ACPI / memhotplug: Bind removable memory blocks to ACPI device nodes
commit e2ff39400d81233374e780b133496a2296643d7d

Details are follows:
When I add a memory device, acpi_memory_enable_device() always fails
as follows:

...
[ 1271.114116]  [ea121c40-ea121c7f] PMD - 
[880813c0-880813ff] on node 3
[ 1271.128682]  [ea121c80-ea121cbf] PMD - 
[88081380-880813bf] on node 3
[ 1271.143298]  [ea121cc0-ea121cff] PMD - 
[88081300-8808133f] on node 3
[ 1271.157799]  [ea121d00-ea121d3f] PMD - 
[880812c0-880812ff] on node 3
[ 1271.172341]  [ea121d40-ea121d7f] PMD - 
[88081280-880812bf] on node 3
[ 1271.186872]  [ea121d80-ea121dbf] PMD - 
[88081240-8808127f] on node 3
[ 1271.201481]  [ea121dc0-ea121dff] PMD - 
[88081200-8808123f] on node 3
[ 1271.216041]  [ea121e00-ea121e3f] PMD - 
[880811c0-880811ff] on node 3
[ 1271.230623]  [ea121e40-ea121e7f] PMD - 
[88081180-880811bf] on node 3
[ 1271.245148]  [ea121e80-ea121ebf] PMD - 
[88081140-8808117f] on node 3
[ 1271.259683]  [ea121ec0-ea121eff] PMD - 
[88081100-8808113f] on node 3
[ 1271.274194]  [ea121f00-ea121f3f] PMD - 
[880810c0-880810ff] on node 3
[ 1271.288764]  [ea121f40-ea121f7f] PMD - 
[88081080-880810bf] on node 3
  
  It appears that each memory object only has 64MB of memory.  This is
  less than the memory block size, which is 128MB.  This means that a
  single memory block associates with two ACPI memory device objects.
 
 That'd be bad.
 
 How did that work before if that indeed is the case?

Well, it looks to me that it has never worked before...

... 
[ 1271.325841] acpi PNP0C80:03: acpi_memory_enable_device() error
   
   Well, the only new way acpi_memory_enable_device() can fail after that 
   commit
   is a failure in acpi_bind_memory_blocks().
  
  Agreed.
  
   This means that either handle is NULL, which I think we can exclude, 
   because
   acpi_memory_enable_device() wouldn't be called at all if that were the 
   case, or
   there's a more subtle error in acpi_bind_one().
   
   One that comes to mind is that we may be calling acpi_bind_one() twice 
   for the
   same memory region, in which it will trigger -EINVAL from the sanity 
   check in
   there.
  
  I think it fails with -EINVAL at the place with dev_warn(dev, ACPI
  handle is already set\n).  When two ACPI memory objects associate with
  a same memory block, the bind procedure of the 2nd ACPI memory object
  sees that ACPI_HANDLE(dev) is already set to the 1st ACPI memory object.
 
 That sound's plausible, but I wonder how we can fix that?
 
 There's no way for a single physical device to have two different ACPI
 companions.  It looks like the memory blocks should be 64 M each in that
 case.  Or we need to create two child devices for each memory block and
 associate each of them with an ACPI object.  That would lead to complications
 in the user space interface, though.

Right.  Even bigger issue is that I do not think __add_pages() and
__remove_pages() can add / delete a memory chunk that is less than
128MB.  128MB is the granularity of them.  So, we may just have to fail
this case gracefully.

Thanks,
-Toshi

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Cannot hot remove a memory device

2013-08-02 Thread Rafael J. Wysocki
On Friday, August 02, 2013 06:04:40 PM Toshi Kani wrote:
 On Sat, 2013-08-03 at 01:43 +0200, Rafael J. Wysocki wrote:
  On Friday, August 02, 2013 03:46:15 PM Toshi Kani wrote:
   On Thu, 2013-08-01 at 23:43 +0200, Rafael J. Wysocki wrote:
Hi,

Thanks for your report.

On Thursday, August 01, 2013 05:37:21 PM Yasuaki Ishimatsu wrote:
 By following commit, I cannot hot remove a memory device.
 
 ACPI / memhotplug: Bind removable memory blocks to ACPI device nodes
 commit e2ff39400d81233374e780b133496a2296643d7d
 
 Details are follows:
 When I add a memory device, acpi_memory_enable_device() always fails
 as follows:
 
 ...
 [ 1271.114116]  [ea121c40-ea121c7f] PMD - 
 [880813c0-880813ff] on node 3
 [ 1271.128682]  [ea121c80-ea121cbf] PMD - 
 [88081380-880813bf] on node 3
 [ 1271.143298]  [ea121cc0-ea121cff] PMD - 
 [88081300-8808133f] on node 3
 [ 1271.157799]  [ea121d00-ea121d3f] PMD - 
 [880812c0-880812ff] on node 3
 [ 1271.172341]  [ea121d40-ea121d7f] PMD - 
 [88081280-880812bf] on node 3
 [ 1271.186872]  [ea121d80-ea121dbf] PMD - 
 [88081240-8808127f] on node 3
 [ 1271.201481]  [ea121dc0-ea121dff] PMD - 
 [88081200-8808123f] on node 3
 [ 1271.216041]  [ea121e00-ea121e3f] PMD - 
 [880811c0-880811ff] on node 3
 [ 1271.230623]  [ea121e40-ea121e7f] PMD - 
 [88081180-880811bf] on node 3
 [ 1271.245148]  [ea121e80-ea121ebf] PMD - 
 [88081140-8808117f] on node 3
 [ 1271.259683]  [ea121ec0-ea121eff] PMD - 
 [88081100-8808113f] on node 3
 [ 1271.274194]  [ea121f00-ea121f3f] PMD - 
 [880810c0-880810ff] on node 3
 [ 1271.288764]  [ea121f40-ea121f7f] PMD - 
 [88081080-880810bf] on node 3
   
   It appears that each memory object only has 64MB of memory.  This is
   less than the memory block size, which is 128MB.  This means that a
   single memory block associates with two ACPI memory device objects.
  
  That'd be bad.
  
  How did that work before if that indeed is the case?
 
 Well, it looks to me that it has never worked before...
 
 ...   
 [ 1271.325841] acpi PNP0C80:03: acpi_memory_enable_device() error

Well, the only new way acpi_memory_enable_device() can fail after that 
commit
is a failure in acpi_bind_memory_blocks().
   
   Agreed.
   
This means that either handle is NULL, which I think we can exclude, 
because
acpi_memory_enable_device() wouldn't be called at all if that were the 
case, or
there's a more subtle error in acpi_bind_one().

One that comes to mind is that we may be calling acpi_bind_one() twice 
for the
same memory region, in which it will trigger -EINVAL from the sanity 
check in
there.
   
   I think it fails with -EINVAL at the place with dev_warn(dev, ACPI
   handle is already set\n).  When two ACPI memory objects associate with
   a same memory block, the bind procedure of the 2nd ACPI memory object
   sees that ACPI_HANDLE(dev) is already set to the 1st ACPI memory object.
  
  That sound's plausible, but I wonder how we can fix that?
  
  There's no way for a single physical device to have two different ACPI
  companions.  It looks like the memory blocks should be 64 M each in that
  case.  Or we need to create two child devices for each memory block and
  associate each of them with an ACPI object.  That would lead to 
  complications
  in the user space interface, though.
 
 Right.  Even bigger issue is that I do not think __add_pages() and
 __remove_pages() can add / delete a memory chunk that is less than
 128MB.  128MB is the granularity of them.  So, we may just have to fail
 this case gracefully.

Sigh.

BTW, why do you think they are 64 M each (it's late and I'm obviously tired)?

Rafael

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Cannot hot remove a memory device

2013-08-01 Thread Rafael J. Wysocki
Hi,

Thanks for your report.

On Thursday, August 01, 2013 05:37:21 PM Yasuaki Ishimatsu wrote:
> By following commit, I cannot hot remove a memory device.
> 
> ACPI / memhotplug: Bind removable memory blocks to ACPI device nodes
> commit e2ff39400d81233374e780b133496a2296643d7d
> 
> Details are follows:
> When I add a memory device, acpi_memory_enable_device() always fails
> as follows:
> 
> ...
> [ 1271.114116]  [ea121c40-ea121c7f] PMD -> 
> [880813c0-880813ff] on node 3
> [ 1271.128682]  [ea121c80-ea121cbf] PMD -> 
> [88081380-880813bf] on node 3
> [ 1271.143298]  [ea121cc0-ea121cff] PMD -> 
> [88081300-8808133f] on node 3
> [ 1271.157799]  [ea121d00-ea121d3f] PMD -> 
> [880812c0-880812ff] on node 3
> [ 1271.172341]  [ea121d40-ea121d7f] PMD -> 
> [88081280-880812bf] on node 3
> [ 1271.186872]  [ea121d80-ea121dbf] PMD -> 
> [88081240-8808127f] on node 3
> [ 1271.201481]  [ea121dc0-ea121dff] PMD -> 
> [88081200-8808123f] on node 3
> [ 1271.216041]  [ea121e00-ea121e3f] PMD -> 
> [880811c0-880811ff] on node 3
> [ 1271.230623]  [ea121e40-ea121e7f] PMD -> 
> [88081180-880811bf] on node 3
> [ 1271.245148]  [ea121e80-ea121ebf] PMD -> 
> [88081140-8808117f] on node 3
> [ 1271.259683]  [ea121ec0-ea121eff] PMD -> 
> [88081100-8808113f] on node 3
> [ 1271.274194]  [ea121f00-ea121f3f] PMD -> 
> [880810c0-880810ff] on node 3
> [ 1271.288764]  [ea121f40-ea121f7f] PMD -> 
> [88081080-880810bf] on node 3
> ...   
> [ 1271.325841] acpi PNP0C80:03: acpi_memory_enable_device() error

Well, the only new way acpi_memory_enable_device() can fail after that commit
is a failure in acpi_bind_memory_blocks().

This means that either handle is NULL, which I think we can exclude, because
acpi_memory_enable_device() wouldn't be called at all if that were the case, or
there's a more subtle error in acpi_bind_one().

One that comes to mind is that we may be calling acpi_bind_one() twice for the
same memory region, in which it will trigger -EINVAL from the sanity check in
there.

Also you may try to increase ACPI_MAX_PHYSICAL_NODE (in acpi_bus.h) and see if
that helps.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Cannot hot remove a memory device

2013-08-01 Thread Rafael J. Wysocki
Hi,

Thanks for your report.

On Thursday, August 01, 2013 05:37:21 PM Yasuaki Ishimatsu wrote:
 By following commit, I cannot hot remove a memory device.
 
 ACPI / memhotplug: Bind removable memory blocks to ACPI device nodes
 commit e2ff39400d81233374e780b133496a2296643d7d
 
 Details are follows:
 When I add a memory device, acpi_memory_enable_device() always fails
 as follows:
 
 ...
 [ 1271.114116]  [ea121c40-ea121c7f] PMD - 
 [880813c0-880813ff] on node 3
 [ 1271.128682]  [ea121c80-ea121cbf] PMD - 
 [88081380-880813bf] on node 3
 [ 1271.143298]  [ea121cc0-ea121cff] PMD - 
 [88081300-8808133f] on node 3
 [ 1271.157799]  [ea121d00-ea121d3f] PMD - 
 [880812c0-880812ff] on node 3
 [ 1271.172341]  [ea121d40-ea121d7f] PMD - 
 [88081280-880812bf] on node 3
 [ 1271.186872]  [ea121d80-ea121dbf] PMD - 
 [88081240-8808127f] on node 3
 [ 1271.201481]  [ea121dc0-ea121dff] PMD - 
 [88081200-8808123f] on node 3
 [ 1271.216041]  [ea121e00-ea121e3f] PMD - 
 [880811c0-880811ff] on node 3
 [ 1271.230623]  [ea121e40-ea121e7f] PMD - 
 [88081180-880811bf] on node 3
 [ 1271.245148]  [ea121e80-ea121ebf] PMD - 
 [88081140-8808117f] on node 3
 [ 1271.259683]  [ea121ec0-ea121eff] PMD - 
 [88081100-8808113f] on node 3
 [ 1271.274194]  [ea121f00-ea121f3f] PMD - 
 [880810c0-880810ff] on node 3
 [ 1271.288764]  [ea121f40-ea121f7f] PMD - 
 [88081080-880810bf] on node 3
 ...   
 [ 1271.325841] acpi PNP0C80:03: acpi_memory_enable_device() error

Well, the only new way acpi_memory_enable_device() can fail after that commit
is a failure in acpi_bind_memory_blocks().

This means that either handle is NULL, which I think we can exclude, because
acpi_memory_enable_device() wouldn't be called at all if that were the case, or
there's a more subtle error in acpi_bind_one().

One that comes to mind is that we may be calling acpi_bind_one() twice for the
same memory region, in which it will trigger -EINVAL from the sanity check in
there.

Also you may try to increase ACPI_MAX_PHYSICAL_NODE (in acpi_bus.h) and see if
that helps.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/