Re: [openstack-dev] [ironic] using ironic as a replacement for existing datacenter baremetal provisioning

Jim Rollenhagen Tue, 07 Jun 2016 06:57:44 -0700

Thanks for getting to this before me, Deva. Saved me some typing. :)

A little more color inline.


On Mon, Jun 06, 2016 at 05:01:04PM -0700, Devananda van der Veen wrote:
> 
> On 06/06/2016 01:44 PM, Kris G. Lindgren wrote:
> > Hi ironic folks,
> > As I'm trying to explore how GoDaddy can use ironic I've created the 
> > following
> > in an attempt to document some of my concerns, and I'm wondering if you 
> > folks
> > could help myself identity ongoing work to solve these (or alternatives?)
> > List of concerns with ironic:
> 
> Hi Kris,
> 
> There is a lot of ongoing work in and around the Ironic project. Thanks for
> diving in and for sharing your concerns; you're not alone.
> 
> I'll respond to each group of concerns, as some of these appear quite similar 
> to
> each other and align with stuff we're already doing. Hopefully I can provide
> some helpful background to where the project is at today.
> 
> > 
> > 1.)Nova <-> ironic interactions are generally seem terrible?
> 
> These two projects are coming at the task of managing "compute" with
> significantly different situations and we've been working, for the last ~2
> years, to build a framework that can provide both virtual and physical 
> resources
> through one API. It's not a simple task, and we have a lot more to do.
> 
> 
> >   -How to accept raid config and partitioning(?) from end users? Seems to 
> > not a
> > yet agreed upon method between nova/ironic.
> 
> Nova expresses partitioning in a very limited way on the flavor. You get root,
> swap, and ephemeral partitions -- and that's it. Ironic honors those today, 
> but
> they're pinned on the flavor definition, eg. by the cloud admin (or whoever 
> can
> define the flavor.
> 
> If your users need more complex partitioning, they could create additional
> partitions after the instance is created. This limitation within Ironic 
> exists,
> in part, because the projects' goal is to provide hardware through the 
> OpenStack
> Compute API -- which doesn't express arbitrary partitionability. (If you're
> interested, there is a lengthier and more political discussion about whether 
> the
> cloud should support "pets" and whether arbitrary partitioning is needed for
> "cattle".)
> 
> 
> RAID configuration isn't something that Nova allows their users to choose 
> today
> - it doesn't fit in the Nova model of "compute", and there is, to my 
> knowledge,
> nothing in the Nova API to allow its input. We've discussed this a little bit,
> but so far settled on leaving it up to the cloud admin to set this in Ironic.
> 
> There has been discussion with the Cinder community over ways to express 
> volume
> spanning and mirroring, but apply it to a machines' local disks, but these
> discussions didn't result in any traction.
> 
> There's also been discussion of ways we could do ad-hoc changes in RAID level,
> based on flavor metadata, during the provisioning process (rather than ahead 
> of
> time) but no code has been done for this yet, AFAIK.
> 
> So, where does that leave us? With the "explosion of flavors" that you
> described. It may not be ideal, but that is the common ground we've reached.
> 
> >    -How to run multiple conductors/nova-computes?   Right now as far as I 
> > can
> > tell all of ironic front-end by a single nova-compute, which I will have to
> > manage via a cluster technology between two or mode nodes.  Because of this 
> > and
> > the way host-agregates work I am unable to expose fault domains for ironic
> > instances (all of ironic can only be under a single AZ (the az that is 
> > assigned
> > to the nova-compute node)). Unless I create multiple nova-compute servers 
> > and
> > manage multiple independent ironic setups.  This makes on-boarding/query of
> > hardware capacity painful.
> 
> Yep. It's not ideal, and the community is very well aware of, and actively
> working on, this limitation. It also may not be as bad as you may think. The
> nova-compute process doesn't do very much, and tests show it handling some
> thousands of ironic nodes fairly well in parallel. Standard active-passive
> management of that process should suffice.
> 
> A lot of design work has been done to come up with a joint solution by folks 
> on
> both the Ironic and Nova teams.
> http://specs.openstack.org/openstack/nova-specs/specs/newton/approved/ironic-multiple-compute-hosts.html

It's important to point out here that we're re-working how this works,
but it's still one of our highest priorities:
https://review.openstack.org/#/c/320016/

> 
> As a side note, it's possible (though not tested, recommended, or well
> documented) to run more than one nova-compute. See
> https://github.com/openstack/ironic/blob/master/ironic/nova/compute/manager.py
> 
> >   - Nova appears to be forcing a we are "compute" as long as "compute" is 
> > VMs,
> > means that we will have a baremetal flavor explosion (ie the mismatch 
> > between
> > baremetal and VM).
> >       - This is a feeling I got from the ironic-nova cross project meeting 
> > in
> > Austin.  General exmaple goes back to raid config above. I can configure a
> > single piece of hardware many different ways, but to fit into nova's world 
> > view
> > I need to have many different flavors exposed to end-user.  In this way many
> > flavors can map back to a single piece of hardware with just a lsightly
> > different configuration applied. So how am I suppose to do a single server 
> > with
> > 6 drives as either: Raid 1 + Raid 5, Raid 5, Raid 10, Raid 6, or JBOD.  
> > Seems
> > like I would need to pre-mark out servers that were going to be a specific 
> > raid
> > level.  Which means that I need to start managing additional sub-pools of
> > hardware to just deal with how the end users wants the raid configured, 
> > this is
> > pretty much a non-starter for us.  I have not really heard of whats being 
> > done
> > on this specific front.
> > 
> 
> You're correct. Again, Nova has no concept of RAID in their API, so yea, today
> you're left with a 'flavor explosion', as you put it.
> 
> There's been discussion of methods we could use to apply the RAID level during
> provisioning, but generally those discussions have landed on the side of "it's
> the operators responsibility to maintain pools of resources available that 
> match
> their customers' demand".
> 
> 
> > 2.) Inspector:
> >   - IPA service doesn't gather port/switching information
> 
> Folks are working on this, but it's been blocked for a while on the
> ironic-neutron integration:
> https://review.openstack.org/#/c/241242/
> 
> >   - Inspection service doesn't process port/switching information, which 
> > means
> > that it wont add it to ironic.  Which makes managing network swinging of the
> > host a non-starter.  As I would inspect the host – then modify the ironic 
> > record
> > to add the details about what port/switch the server is connected to from a
> > different source.  At that point why wouldn't I just onboard everything 
> > through
> > the API?
> 
> This is desired, but not done yet, AFAIK.
> 
> >   - Doesn't grab hardware disk configurations, If the server has multiple 
> > raids
> > (r1 + r5) only reports boot raid disk capacity.
> 
> This falls out from a limitation in Nova (discussed above) though I would
> encourage inspector to collect all the data (even if ironic/nova can't use it,
> today).
> 
> >   - Inspection is geared towards using a different network and dnsmasq
> > infrastructure than what is in use for ironic/neutron.  Which also means 
> > that in
> > order to not conflict with dhcp requests for servers in ironic I need to use
> > different networks.  Which also means I now need to handle swinging server 
> > ports
> > between different networks.
> 
> Inspector is designed to respond only to requests for nodes in the inspection
> phase, so that it *doesn't* conflict with provisioning of nodes by Ironic. 
> I've
> been using the same network for inspection and provisioning without issue -- 
> so
> I'm not sure what problem you're encountering here.
> 
> > 
> > 3.) IPA image:
> >   - Default build stuff is pinned to extremly old versions due to gate 
> > failure
> > issues. So I can not work without a fork for onboard of servers due to the 
> > fact
> > that IPMI modules aren't built for the kernel, so inspection can never 
> > match the
> > node against ironic.  Seems like currently functionality here is MVP for 
> > gate to
> > work and to deploy images.  But if you need to do firmware, bios-config, any
> > other hardware specific features you are pretty much going to need to roll 
> > your
> > own IPA image and IPA modules to do standard provisioning tasks.
> > 
> 
> That's correct. We assume that operators and downstream distributors will 
> build
> and customize the IPA image as needed for their environment. Ironic only
> provides the base image and the tools to modify it; if we were to attempt to
> build an image that could handle every piece of hardware out there, it would 
> be
> huge, unwieldy, and contain a lot of proprietary tools that we simply don't 
> have
> access / license to use.
> 
> > 4.) Conductor:
> >   - Serial-over-lan consoles require a unique port on the conductor server 
> > (I
> > have seen purposes to try and fix this?), this is painful to manage with 
> > large
> > numbers of servers.
> >   - SOL consoles aren't restarted when conductor is restarted (I think this
> > might be fixed in newer versions of ironic?), again if end users aren't 
> > suppose
> > to consume ironic api's directly - this is painful to handle.
> >   - As far as I can tell shell-in-a- box, SOL consoles aren't support via 
> > nova –
> > so how are end users suppose to consume the shell-in-box console?
> 
> You are, unfortunately, correct. Ironic once supported SOL console 
> connectivity
> through Nova, but it has not been working for a while now. We discussed this 
> at
> length in the Austin summit and plan to fix it soon:
> https://review.openstack.org/#/c/319505/
> 
> >   - Its very easy to get a node to fall off the staemachine rails (reboot a
> > server while an image is being deployed to it), the only way I have seen to 
> > be
> > able to fix this is to update the DB directly.
> 
> Yea, that's a well known pain point, and there is ongoing work to improve the
> recovery process for nodes that get "stuck" in various ways, with the premise
> that the operator should never have to munge the DB directly. One approach 
> we've
> discussed is adding a management CLI tool to make this cleaner.
> 
> >   - I have BMC that need specific configuration (some require SOL on com2,
> > others on com1) this makes it pretty much impossible without per box 
> > overrides
> > against the conductor hardcoded templates.
> 
> Ironic allows certain aspects of the Node's management to be overridden
> individually, but it sounds like you need some knobs that we haven't
> implemented. Could you file a bug for this? I think we'd be keen to add it.

Yeah, we've talked about this before but nobody has really pushed on it.
Essentially an optional `pxe_append_params` per node. Shouldn't be too
hard to implement.

> 
> >   - Additionally it would be nice to default to having a provisioning
> > kernel/image that was set as a single config option with per server 
> > overrides –
> > rather than on each server.  If we ever change the IPA image – that means at
> > scale we would need to update thousands of ironic nodes.
> 
> This request has surfaced in the past, however, it wouldn't make sense in a
> heterogeneous environment (eg, mix of ia64 and x86_64 hardware in one region)
> and so past discussions have landed on the side of not implementing it (either
> as a system-level default image or as a driver-level default image).
> 
> If there were a consensus that it helped enough deployments, without 
> increasing
> the complexity of complex multi-arch deployments, I think folks would be 
> willing
> to accept a feature like this.

This wouldn't be too complex, right? A config for the default deploy
image, overridden per-node in driver_info (like we do today). Multi-arch
deployments, in the worst case, would behave just like today, so I don't
see a problem here.

I suspect most single-arch deployments use the same ramdisk everywhere,
so this would be a huge help. The only downside I see right now is
needing to restart the conductor to roll a new ramdisk out, but that's a
pretty uneventful thing if you're running multiple conductors.

> 
> > 
> > What is ironic doing to monitor the hardware for failures?  I assume the 
> > answer
> > here is nothing and that we will need to make sure the images that we 
> > deploy are
> > correctly configuring the tools to monitor disk/health/psu's/ram errors, 
> > ect. ect.
> > 
> 
> Today, nothing, but this is something we want to do, and there is an 
> agreed-upon
> design, here:
> http://specs.openstack.org/openstack/ironic-specs/specs/not-implemented/notifications.html
> 
> This is only the initial design for the notification system. The goal of this
> would be to enable drivers to capture hardware alerts (or perform more 
> proactive
> gathering of hardware status) and propagate those alerts up to the cloud 
> operator.
> 
> 
> 
> In summary, you're not alone, nor are your ideas/thoughts/requests 
> unreasonable.
> We're all facing similar concerns -- and you're welcome to come hang out in
> #openstack-ironic and participate in shaping Ironic so that it meets your 
> needs,
> too :)

++

// jim

> 
> Regards,
> Devananda
> 




> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ironic] using ironic as a replacement for existing datacenter baremetal provisioning

Reply via email to