Re: [openstack-dev] [Ironic] [TripleO] How to gracefully quiesce a box?
This patch https://review.openstack.org/#/c/116093/3/ironic/nova/virt/ironic/driver.py seems to have the right parameters to enable Ironic to DTRT (with associated internal changes) - thats when Nova learnt to soft shutdown machines. -Rob On 23 August 2014 05:48, Clint Byrum wrote: > It has been brought to my attention that Ironic uses the biggest hammer > in the IPMI toolbox to control chassis power: > > https://git.openstack.org/cgit/openstack/ironic/tree/ironic/drivers/modules/ipminative.py#n142 > > Which is > > ret = ipmicmd.set_power('off', wait) > > This is the most abrupt form, where the system power should be flipped > off at a hardware level. The "short press" on the power button would be > 'shutdown' instead of 'off'. > > I also understand that this has been brought up before, and that the > answer given was "SSH in and shut it down yourself." I can respect that > position, but I have run into a bit of a pickle using it. Observe: > > - ssh box.ip "poweroff" > - poll ironic until power state is off. > - This is a race. Ironic is asserting the power. As soon as it sees > that the power is off, it will turn it back on. > > - ssh box.ip "halt" > - NO way to know that this has worked. Once SSH is off and the network > stack is gone, I cannot actually verify that the disks were > unmounted properly, which is the primary area of concern that I > have. > > This is particulary important if I'm issuing a rebuild + preserve > ephemeral, as it is likely I will have lots of I/O going on, and I want > to make sure that it is all quiesced before I reboot to replace the > software and reboot. > > Perhaps I missed something. If so, please do educate me on how I can > achieve this without hacking around it. Currently my workaround is to > manually unmount the state partition, which is something system shutdown > is supposed to do and may become problematic if system processes are > holding it open. > > It seems to me that Ironic should at least try to use the graceful > shutdown. There can be a timeout, but it would need to be something a user > can disable so if graceful never works we never just dump the power on the > box. Even a journaled filesystem will take quite a bit to do a full fsck. > > The inability to gracefully shutdown in a reasonable amount of time > is an error state really, and I need to go to the box and inspect it, > which is precisely the reason we have ERROR states. > > Thanks for your time. :) > > ___ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Robert Collins Distinguished Technologist HP Converged Cloud ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ironic] [TripleO] How to gracefully quiesce a box?
I also feel that "graceful power off" is one of the feature that we want in ironic. But until then, you can see if the below works for you: You can set the following property to false which will prevent ironic to sync the power state. It will instead update the node with the latest power status which you can poll. https://github.com/openstack/ironic/blob/master/etc/ironic/ironic.conf.sample#L527-L531 On Sat, Aug 23, 2014 at 12:04 AM, Clint Byrum wrote: > Excerpts from Jay Pipes's message of 2014-08-22 11:16:05 -0700: > > On 08/22/2014 01:48 PM, Clint Byrum wrote: > > > It has been brought to my attention that Ironic uses the biggest hammer > > > in the IPMI toolbox to control chassis power: > > > > > > > https://git.openstack.org/cgit/openstack/ironic/tree/ironic/drivers/modules/ipminative.py#n142 > > > > > > Which is > > > > > > ret = ipmicmd.set_power('off', wait) > > > > > > This is the most abrupt form, where the system power should be flipped > > > off at a hardware level. The "short press" on the power button would be > > > 'shutdown' instead of 'off'. > > > > > > I also understand that this has been brought up before, and that the > > > answer given was "SSH in and shut it down yourself." I can respect that > > > position, but I have run into a bit of a pickle using it. Observe: > > > > > > - ssh box.ip "poweroff" > > > - poll ironic until power state is off. > > >- This is a race. Ironic is asserting the power. As soon as it sees > > > that the power is off, it will turn it back on. > > > > > > - ssh box.ip "halt" > > >- NO way to know that this has worked. Once SSH is off and the > network > > > stack is gone, I cannot actually verify that the disks were > > > unmounted properly, which is the primary area of concern that I > > > have. > > > > > > This is particulary important if I'm issuing a rebuild + preserve > > > ephemeral, as it is likely I will have lots of I/O going on, and I want > > > to make sure that it is all quiesced before I reboot to replace the > > > software and reboot. > > > > > > Perhaps I missed something. If so, please do educate me on how I can > > > achieve this without hacking around it. Currently my workaround is to > > > manually unmount the state partition, which is something system > shutdown > > > is supposed to do and may become problematic if system processes are > > > holding it open. > > > > > > It seems to me that Ironic should at least try to use the graceful > > > shutdown. There can be a timeout, but it would need to be something a > user > > > can disable so if graceful never works we never just dump the power on > the > > > box. Even a journaled filesystem will take quite a bit to do a full > fsck. > > > > > > The inability to gracefully shutdown in a reasonable amount of time > > > is an error state really, and I need to go to the box and inspect it, > > > which is precisely the reason we have ERROR states. > > > > What about placing a runlevel script in /etc/init.d/ and symlinking it > > to run on shutdown -- i.e. /etc/rc0.d/? You could run fsync or unmount > > the state partition in that script which would ensure disk state was > > quiesced, no? > > That's already what OS's do in their rc0.d. > > My point is, I don't have any way to know that process happened, without > the box turning itself off after it succeeded. > > ___ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ironic] [TripleO] How to gracefully quiesce a box?
Excerpts from Jay Pipes's message of 2014-08-22 11:16:05 -0700: > On 08/22/2014 01:48 PM, Clint Byrum wrote: > > It has been brought to my attention that Ironic uses the biggest hammer > > in the IPMI toolbox to control chassis power: > > > > https://git.openstack.org/cgit/openstack/ironic/tree/ironic/drivers/modules/ipminative.py#n142 > > > > Which is > > > > ret = ipmicmd.set_power('off', wait) > > > > This is the most abrupt form, where the system power should be flipped > > off at a hardware level. The "short press" on the power button would be > > 'shutdown' instead of 'off'. > > > > I also understand that this has been brought up before, and that the > > answer given was "SSH in and shut it down yourself." I can respect that > > position, but I have run into a bit of a pickle using it. Observe: > > > > - ssh box.ip "poweroff" > > - poll ironic until power state is off. > >- This is a race. Ironic is asserting the power. As soon as it sees > > that the power is off, it will turn it back on. > > > > - ssh box.ip "halt" > >- NO way to know that this has worked. Once SSH is off and the network > > stack is gone, I cannot actually verify that the disks were > > unmounted properly, which is the primary area of concern that I > > have. > > > > This is particulary important if I'm issuing a rebuild + preserve > > ephemeral, as it is likely I will have lots of I/O going on, and I want > > to make sure that it is all quiesced before I reboot to replace the > > software and reboot. > > > > Perhaps I missed something. If so, please do educate me on how I can > > achieve this without hacking around it. Currently my workaround is to > > manually unmount the state partition, which is something system shutdown > > is supposed to do and may become problematic if system processes are > > holding it open. > > > > It seems to me that Ironic should at least try to use the graceful > > shutdown. There can be a timeout, but it would need to be something a user > > can disable so if graceful never works we never just dump the power on the > > box. Even a journaled filesystem will take quite a bit to do a full fsck. > > > > The inability to gracefully shutdown in a reasonable amount of time > > is an error state really, and I need to go to the box and inspect it, > > which is precisely the reason we have ERROR states. > > What about placing a runlevel script in /etc/init.d/ and symlinking it > to run on shutdown -- i.e. /etc/rc0.d/? You could run fsync or unmount > the state partition in that script which would ensure disk state was > quiesced, no? That's already what OS's do in their rc0.d. My point is, I don't have any way to know that process happened, without the box turning itself off after it succeeded. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ironic] [TripleO] How to gracefully quiesce a box?
On 08/22/2014 01:48 PM, Clint Byrum wrote: It has been brought to my attention that Ironic uses the biggest hammer in the IPMI toolbox to control chassis power: https://git.openstack.org/cgit/openstack/ironic/tree/ironic/drivers/modules/ipminative.py#n142 Which is ret = ipmicmd.set_power('off', wait) This is the most abrupt form, where the system power should be flipped off at a hardware level. The "short press" on the power button would be 'shutdown' instead of 'off'. I also understand that this has been brought up before, and that the answer given was "SSH in and shut it down yourself." I can respect that position, but I have run into a bit of a pickle using it. Observe: - ssh box.ip "poweroff" - poll ironic until power state is off. - This is a race. Ironic is asserting the power. As soon as it sees that the power is off, it will turn it back on. - ssh box.ip "halt" - NO way to know that this has worked. Once SSH is off and the network stack is gone, I cannot actually verify that the disks were unmounted properly, which is the primary area of concern that I have. This is particulary important if I'm issuing a rebuild + preserve ephemeral, as it is likely I will have lots of I/O going on, and I want to make sure that it is all quiesced before I reboot to replace the software and reboot. Perhaps I missed something. If so, please do educate me on how I can achieve this without hacking around it. Currently my workaround is to manually unmount the state partition, which is something system shutdown is supposed to do and may become problematic if system processes are holding it open. It seems to me that Ironic should at least try to use the graceful shutdown. There can be a timeout, but it would need to be something a user can disable so if graceful never works we never just dump the power on the box. Even a journaled filesystem will take quite a bit to do a full fsck. The inability to gracefully shutdown in a reasonable amount of time is an error state really, and I need to go to the box and inspect it, which is precisely the reason we have ERROR states. What about placing a runlevel script in /etc/init.d/ and symlinking it to run on shutdown -- i.e. /etc/rc0.d/? You could run fsync or unmount the state partition in that script which would ensure disk state was quiesced, no? Best, -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Ironic] [TripleO] How to gracefully quiesce a box?
It has been brought to my attention that Ironic uses the biggest hammer in the IPMI toolbox to control chassis power: https://git.openstack.org/cgit/openstack/ironic/tree/ironic/drivers/modules/ipminative.py#n142 Which is ret = ipmicmd.set_power('off', wait) This is the most abrupt form, where the system power should be flipped off at a hardware level. The "short press" on the power button would be 'shutdown' instead of 'off'. I also understand that this has been brought up before, and that the answer given was "SSH in and shut it down yourself." I can respect that position, but I have run into a bit of a pickle using it. Observe: - ssh box.ip "poweroff" - poll ironic until power state is off. - This is a race. Ironic is asserting the power. As soon as it sees that the power is off, it will turn it back on. - ssh box.ip "halt" - NO way to know that this has worked. Once SSH is off and the network stack is gone, I cannot actually verify that the disks were unmounted properly, which is the primary area of concern that I have. This is particulary important if I'm issuing a rebuild + preserve ephemeral, as it is likely I will have lots of I/O going on, and I want to make sure that it is all quiesced before I reboot to replace the software and reboot. Perhaps I missed something. If so, please do educate me on how I can achieve this without hacking around it. Currently my workaround is to manually unmount the state partition, which is something system shutdown is supposed to do and may become problematic if system processes are holding it open. It seems to me that Ironic should at least try to use the graceful shutdown. There can be a timeout, but it would need to be something a user can disable so if graceful never works we never just dump the power on the box. Even a journaled filesystem will take quite a bit to do a full fsck. The inability to gracefully shutdown in a reasonable amount of time is an error state really, and I need to go to the box and inspect it, which is precisely the reason we have ERROR states. Thanks for your time. :) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev