Another use cases for maintenance node are:

*         HW component replacement, e.g. NIC, or disk

*         FW upgrade/downgrade - we should be able to use ironic FW management 
API/CLI for it.

*         HW configuration change. Like re-provision server, like changing RAID 
configuration. Again, we should be able to use ironic FW management API/CLI for 
it.

Thanks,
Arkady

-----Original Message-----
From: Jim Rollenhagen [mailto:j...@jimrollenhagen.com]
Sent: Tuesday, November 24, 2015 9:39 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [ironic]Ironic operations on nodes in maintenance 
mode

On Mon, Nov 23, 2015 at 03:35:58PM -0800, Shraddha Pandhe wrote:
> Hi,
>
> I would like to know how everyone is using maintenance mode and what
> is expected from admins about nodes in maintenance. The reason I am
> bringing up this topic is because, most of the ironic operations,
> including manual cleaning are not allowed for nodes in maintenance. Thats a 
> problem for us.
>
> The way we use it is as follows:
>
> We allow users to put nodes in maintenance mode (indirectly) if they
> find something wrong with the node. They also provide a maintenance
> reason along with it, which gets stored as "user_reason" under
> maintenance_reason. So basically we tag it as user specified reason.
>
> To debug what happened to the node our operators use manual cleaning
> to re-image the node. By doing this, they can find out all the issues
> related to re-imaging (dhcp, ipmi, image transfer, etc). This
> debugging process applies to all the nodes that were put in
> maintenance either by user, or by system (due to power cycle failure or due 
> to cleaning failure).

Interesting; do you let the node go through cleaning between the user nuking 
the instance and doing this manual cleaning stuff?

At Rackspace, we leverage the fact that maintenance mode will not allow the 
node to proceed through the state machine. If a user reports a hardware issue, 
we maintenance the node on their behalf, and when they delete it, it boots the 
agent for cleaning and begins heartbeating.
Heartbeats are ignored in maintenance mode, which gives us time to investigate 
the hardware, fix things, etc. When the issue is resolved, we remove 
maintenance mode, it goes through cleaning, then back in the pool.

We used to enroll nodes in maintenance mode, back when the API put them in the 
available state immediately, to avoid them being scheduled to until we knew 
they were good to go. The enroll state solved this for us.

Last, we use maintenance mode on available nodes if we want to temporarily pull 
them from the pool for a manual process or some testing. This can also be 
solved by the manageable state.

// jim

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to