Re: [openstack-dev] [Ironic] [TripleO] How to gracefully quiesce a box?

2014-08-25 Thread Robert Collins
This patch 
https://review.openstack.org/#/c/116093/3/ironic/nova/virt/ironic/driver.py
seems to have the right parameters to enable Ironic to DTRT (with
associated internal changes) - thats when Nova learnt to soft shutdown
machines.

-Rob

On 23 August 2014 05:48, Clint Byrum  wrote:
> It has been brought to my attention that Ironic uses the biggest hammer
> in the IPMI toolbox to control chassis power:
>
> https://git.openstack.org/cgit/openstack/ironic/tree/ironic/drivers/modules/ipminative.py#n142
>
> Which is
>
> ret = ipmicmd.set_power('off', wait)
>
> This is the most abrupt form, where the system power should be flipped
> off at a hardware level. The "short press" on the power button would be
> 'shutdown' instead of 'off'.
>
> I also understand that this has been brought up before, and that the
> answer given was "SSH in and shut it down yourself." I can respect that
> position, but I have run into a bit of a pickle using it. Observe:
>
> - ssh box.ip "poweroff"
> - poll ironic until power state is off.
>   - This is a race. Ironic is asserting the power. As soon as it sees
> that the power is off, it will turn it back on.
>
> - ssh box.ip "halt"
>   - NO way to know that this has worked. Once SSH is off and the network
> stack is gone, I cannot actually verify that the disks were
> unmounted properly, which is the primary area of concern that I
> have.
>
> This is particulary important if I'm issuing a rebuild + preserve
> ephemeral, as it is likely I will have lots of I/O going on, and I want
> to make sure that it is all quiesced before I reboot to replace the
> software and reboot.
>
> Perhaps I missed something. If so, please do educate me on how I can
> achieve this without hacking around it. Currently my workaround is to
> manually unmount the state partition, which is something system shutdown
> is supposed to do and may become problematic if system processes are
> holding it open.
>
> It seems to me that Ironic should at least try to use the graceful
> shutdown. There can be a timeout, but it would need to be something a user
> can disable so if graceful never works we never just dump the power on the
> box. Even a journaled filesystem will take quite a bit to do a full fsck.
>
> The inability to gracefully shutdown in a reasonable amount of time
> is an error state really, and I need to go to the box and inspect it,
> which is precisely the reason we have ERROR states.
>
> Thanks for your time. :)
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
Robert Collins 
Distinguished Technologist
HP Converged Cloud

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic] [TripleO] How to gracefully quiesce a box?

2014-08-24 Thread Ramakrishnan G
I also feel that "graceful power off" is one of the feature that we want in
ironic.  But until then, you can see if the below works for you:

You can set the following property to false which will prevent ironic to
sync the power state.  It will instead update the node with the latest
power status which you can poll.
https://github.com/openstack/ironic/blob/master/etc/ironic/ironic.conf.sample#L527-L531




On Sat, Aug 23, 2014 at 12:04 AM, Clint Byrum  wrote:

> Excerpts from Jay Pipes's message of 2014-08-22 11:16:05 -0700:
> > On 08/22/2014 01:48 PM, Clint Byrum wrote:
> > > It has been brought to my attention that Ironic uses the biggest hammer
> > > in the IPMI toolbox to control chassis power:
> > >
> > >
> https://git.openstack.org/cgit/openstack/ironic/tree/ironic/drivers/modules/ipminative.py#n142
> > >
> > > Which is
> > >
> > >  ret = ipmicmd.set_power('off', wait)
> > >
> > > This is the most abrupt form, where the system power should be flipped
> > > off at a hardware level. The "short press" on the power button would be
> > > 'shutdown' instead of 'off'.
> > >
> > > I also understand that this has been brought up before, and that the
> > > answer given was "SSH in and shut it down yourself." I can respect that
> > > position, but I have run into a bit of a pickle using it. Observe:
> > >
> > > - ssh box.ip "poweroff"
> > > - poll ironic until power state is off.
> > >- This is a race. Ironic is asserting the power. As soon as it sees
> > >  that the power is off, it will turn it back on.
> > >
> > > - ssh box.ip "halt"
> > >- NO way to know that this has worked. Once SSH is off and the
> network
> > >  stack is gone, I cannot actually verify that the disks were
> > >  unmounted properly, which is the primary area of concern that I
> > >  have.
> > >
> > > This is particulary important if I'm issuing a rebuild + preserve
> > > ephemeral, as it is likely I will have lots of I/O going on, and I want
> > > to make sure that it is all quiesced before I reboot to replace the
> > > software and reboot.
> > >
> > > Perhaps I missed something. If so, please do educate me on how I can
> > > achieve this without hacking around it. Currently my workaround is to
> > > manually unmount the state partition, which is something system
> shutdown
> > > is supposed to do and may become problematic if system processes are
> > > holding it open.
> > >
> > > It seems to me that Ironic should at least try to use the graceful
> > > shutdown. There can be a timeout, but it would need to be something a
> user
> > > can disable so if graceful never works we never just dump the power on
> the
> > > box. Even a journaled filesystem will take quite a bit to do a full
> fsck.
> > >
> > > The inability to gracefully shutdown in a reasonable amount of time
> > > is an error state really, and I need to go to the box and inspect it,
> > > which is precisely the reason we have ERROR states.
> >
> > What about placing a runlevel script in /etc/init.d/ and symlinking it
> > to run on shutdown -- i.e. /etc/rc0.d/? You could run fsync or unmount
> > the state partition in that script which would ensure disk state was
> > quiesced, no?
>
> That's already what OS's do in their rc0.d.
>
> My point is, I don't have any way to know that process happened, without
> the box turning itself off after it succeeded.
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic] [TripleO] How to gracefully quiesce a box?

2014-08-22 Thread Clint Byrum
Excerpts from Jay Pipes's message of 2014-08-22 11:16:05 -0700:
> On 08/22/2014 01:48 PM, Clint Byrum wrote:
> > It has been brought to my attention that Ironic uses the biggest hammer
> > in the IPMI toolbox to control chassis power:
> >
> > https://git.openstack.org/cgit/openstack/ironic/tree/ironic/drivers/modules/ipminative.py#n142
> >
> > Which is
> >
> >  ret = ipmicmd.set_power('off', wait)
> >
> > This is the most abrupt form, where the system power should be flipped
> > off at a hardware level. The "short press" on the power button would be
> > 'shutdown' instead of 'off'.
> >
> > I also understand that this has been brought up before, and that the
> > answer given was "SSH in and shut it down yourself." I can respect that
> > position, but I have run into a bit of a pickle using it. Observe:
> >
> > - ssh box.ip "poweroff"
> > - poll ironic until power state is off.
> >- This is a race. Ironic is asserting the power. As soon as it sees
> >  that the power is off, it will turn it back on.
> >
> > - ssh box.ip "halt"
> >- NO way to know that this has worked. Once SSH is off and the network
> >  stack is gone, I cannot actually verify that the disks were
> >  unmounted properly, which is the primary area of concern that I
> >  have.
> >
> > This is particulary important if I'm issuing a rebuild + preserve
> > ephemeral, as it is likely I will have lots of I/O going on, and I want
> > to make sure that it is all quiesced before I reboot to replace the
> > software and reboot.
> >
> > Perhaps I missed something. If so, please do educate me on how I can
> > achieve this without hacking around it. Currently my workaround is to
> > manually unmount the state partition, which is something system shutdown
> > is supposed to do and may become problematic if system processes are
> > holding it open.
> >
> > It seems to me that Ironic should at least try to use the graceful
> > shutdown. There can be a timeout, but it would need to be something a user
> > can disable so if graceful never works we never just dump the power on the
> > box. Even a journaled filesystem will take quite a bit to do a full fsck.
> >
> > The inability to gracefully shutdown in a reasonable amount of time
> > is an error state really, and I need to go to the box and inspect it,
> > which is precisely the reason we have ERROR states.
> 
> What about placing a runlevel script in /etc/init.d/ and symlinking it 
> to run on shutdown -- i.e. /etc/rc0.d/? You could run fsync or unmount 
> the state partition in that script which would ensure disk state was 
> quiesced, no?

That's already what OS's do in their rc0.d.

My point is, I don't have any way to know that process happened, without
the box turning itself off after it succeeded.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic] [TripleO] How to gracefully quiesce a box?

2014-08-22 Thread Jay Pipes

On 08/22/2014 01:48 PM, Clint Byrum wrote:

It has been brought to my attention that Ironic uses the biggest hammer
in the IPMI toolbox to control chassis power:

https://git.openstack.org/cgit/openstack/ironic/tree/ironic/drivers/modules/ipminative.py#n142

Which is

 ret = ipmicmd.set_power('off', wait)

This is the most abrupt form, where the system power should be flipped
off at a hardware level. The "short press" on the power button would be
'shutdown' instead of 'off'.

I also understand that this has been brought up before, and that the
answer given was "SSH in and shut it down yourself." I can respect that
position, but I have run into a bit of a pickle using it. Observe:

- ssh box.ip "poweroff"
- poll ironic until power state is off.
   - This is a race. Ironic is asserting the power. As soon as it sees
 that the power is off, it will turn it back on.

- ssh box.ip "halt"
   - NO way to know that this has worked. Once SSH is off and the network
 stack is gone, I cannot actually verify that the disks were
 unmounted properly, which is the primary area of concern that I
 have.

This is particulary important if I'm issuing a rebuild + preserve
ephemeral, as it is likely I will have lots of I/O going on, and I want
to make sure that it is all quiesced before I reboot to replace the
software and reboot.

Perhaps I missed something. If so, please do educate me on how I can
achieve this without hacking around it. Currently my workaround is to
manually unmount the state partition, which is something system shutdown
is supposed to do and may become problematic if system processes are
holding it open.

It seems to me that Ironic should at least try to use the graceful
shutdown. There can be a timeout, but it would need to be something a user
can disable so if graceful never works we never just dump the power on the
box. Even a journaled filesystem will take quite a bit to do a full fsck.

The inability to gracefully shutdown in a reasonable amount of time
is an error state really, and I need to go to the box and inspect it,
which is precisely the reason we have ERROR states.


What about placing a runlevel script in /etc/init.d/ and symlinking it 
to run on shutdown -- i.e. /etc/rc0.d/? You could run fsync or unmount 
the state partition in that script which would ensure disk state was 
quiesced, no?


Best,
-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Ironic] [TripleO] How to gracefully quiesce a box?

2014-08-22 Thread Clint Byrum
It has been brought to my attention that Ironic uses the biggest hammer
in the IPMI toolbox to control chassis power:

https://git.openstack.org/cgit/openstack/ironic/tree/ironic/drivers/modules/ipminative.py#n142

Which is

ret = ipmicmd.set_power('off', wait)

This is the most abrupt form, where the system power should be flipped
off at a hardware level. The "short press" on the power button would be
'shutdown' instead of 'off'.

I also understand that this has been brought up before, and that the
answer given was "SSH in and shut it down yourself." I can respect that
position, but I have run into a bit of a pickle using it. Observe:

- ssh box.ip "poweroff"
- poll ironic until power state is off.
  - This is a race. Ironic is asserting the power. As soon as it sees
that the power is off, it will turn it back on.

- ssh box.ip "halt"
  - NO way to know that this has worked. Once SSH is off and the network
stack is gone, I cannot actually verify that the disks were
unmounted properly, which is the primary area of concern that I
have.

This is particulary important if I'm issuing a rebuild + preserve
ephemeral, as it is likely I will have lots of I/O going on, and I want
to make sure that it is all quiesced before I reboot to replace the
software and reboot.

Perhaps I missed something. If so, please do educate me on how I can
achieve this without hacking around it. Currently my workaround is to
manually unmount the state partition, which is something system shutdown
is supposed to do and may become problematic if system processes are
holding it open.

It seems to me that Ironic should at least try to use the graceful
shutdown. There can be a timeout, but it would need to be something a user
can disable so if graceful never works we never just dump the power on the
box. Even a journaled filesystem will take quite a bit to do a full fsck.

The inability to gracefully shutdown in a reasonable amount of time
is an error state really, and I need to go to the box and inspect it,
which is precisely the reason we have ERROR states.

Thanks for your time. :)

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev