Thanks for getting to this before me, Deva. Saved me some typing. :) A little more color inline.
On Mon, Jun 06, 2016 at 05:01:04PM -0700, Devananda van der Veen wrote: > > On 06/06/2016 01:44 PM, Kris G. Lindgren wrote: > > Hi ironic folks, > > As I'm trying to explore how GoDaddy can use ironic I've created the > > following > > in an attempt to document some of my concerns, and I'm wondering if you > > folks > > could help myself identity ongoing work to solve these (or alternatives?) > > List of concerns with ironic: > > Hi Kris, > > There is a lot of ongoing work in and around the Ironic project. Thanks for > diving in and for sharing your concerns; you're not alone. > > I'll respond to each group of concerns, as some of these appear quite similar > to > each other and align with stuff we're already doing. Hopefully I can provide > some helpful background to where the project is at today. > > > > > 1.)Nova <-> ironic interactions are generally seem terrible? > > These two projects are coming at the task of managing "compute" with > significantly different situations and we've been working, for the last ~2 > years, to build a framework that can provide both virtual and physical > resources > through one API. It's not a simple task, and we have a lot more to do. > > > > -How to accept raid config and partitioning(?) from end users? Seems to > > not a > > yet agreed upon method between nova/ironic. > > Nova expresses partitioning in a very limited way on the flavor. You get root, > swap, and ephemeral partitions -- and that's it. Ironic honors those today, > but > they're pinned on the flavor definition, eg. by the cloud admin (or whoever > can > define the flavor. > > If your users need more complex partitioning, they could create additional > partitions after the instance is created. This limitation within Ironic > exists, > in part, because the projects' goal is to provide hardware through the > OpenStack > Compute API -- which doesn't express arbitrary partitionability. (If you're > interested, there is a lengthier and more political discussion about whether > the > cloud should support "pets" and whether arbitrary partitioning is needed for > "cattle".) > > > RAID configuration isn't something that Nova allows their users to choose > today > - it doesn't fit in the Nova model of "compute", and there is, to my > knowledge, > nothing in the Nova API to allow its input. We've discussed this a little bit, > but so far settled on leaving it up to the cloud admin to set this in Ironic. > > There has been discussion with the Cinder community over ways to express > volume > spanning and mirroring, but apply it to a machines' local disks, but these > discussions didn't result in any traction. > > There's also been discussion of ways we could do ad-hoc changes in RAID level, > based on flavor metadata, during the provisioning process (rather than ahead > of > time) but no code has been done for this yet, AFAIK. > > So, where does that leave us? With the "explosion of flavors" that you > described. It may not be ideal, but that is the common ground we've reached. > > > -How to run multiple conductors/nova-computes? Right now as far as I > > can > > tell all of ironic front-end by a single nova-compute, which I will have to > > manage via a cluster technology between two or mode nodes. Because of this > > and > > the way host-agregates work I am unable to expose fault domains for ironic > > instances (all of ironic can only be under a single AZ (the az that is > > assigned > > to the nova-compute node)). Unless I create multiple nova-compute servers > > and > > manage multiple independent ironic setups. This makes on-boarding/query of > > hardware capacity painful. > > Yep. It's not ideal, and the community is very well aware of, and actively > working on, this limitation. It also may not be as bad as you may think. The > nova-compute process doesn't do very much, and tests show it handling some > thousands of ironic nodes fairly well in parallel. Standard active-passive > management of that process should suffice. > > A lot of design work has been done to come up with a joint solution by folks > on > both the Ironic and Nova teams. > http://specs.openstack.org/openstack/nova-specs/specs/newton/approved/ironic-multiple-compute-hosts.html It's important to point out here that we're re-working how this works, but it's still one of our highest priorities: https://review.openstack.org/#/c/320016/ > > As a side note, it's possible (though not tested, recommended, or well > documented) to run more than one nova-compute. See > https://github.com/openstack/ironic/blob/master/ironic/nova/compute/manager.py > > > - Nova appears to be forcing a we are "compute" as long as "compute" is > > VMs, > > means that we will have a baremetal flavor explosion (ie the mismatch > > between > > baremetal and VM). > > - This is a feeling I got from the ironic-nova cross project meeting > > in > > Austin. General exmaple goes back to raid config above. I can configure a > > single piece of hardware many different ways, but to fit into nova's world > > view > > I need to have many different flavors exposed to end-user. In this way many > > flavors can map back to a single piece of hardware with just a lsightly > > different configuration applied. So how am I suppose to do a single server > > with > > 6 drives as either: Raid 1 + Raid 5, Raid 5, Raid 10, Raid 6, or JBOD. > > Seems > > like I would need to pre-mark out servers that were going to be a specific > > raid > > level. Which means that I need to start managing additional sub-pools of > > hardware to just deal with how the end users wants the raid configured, > > this is > > pretty much a non-starter for us. I have not really heard of whats being > > done > > on this specific front. > > > > You're correct. Again, Nova has no concept of RAID in their API, so yea, today > you're left with a 'flavor explosion', as you put it. > > There's been discussion of methods we could use to apply the RAID level during > provisioning, but generally those discussions have landed on the side of "it's > the operators responsibility to maintain pools of resources available that > match > their customers' demand". > > > > 2.) Inspector: > > - IPA service doesn't gather port/switching information > > Folks are working on this, but it's been blocked for a while on the > ironic-neutron integration: > https://review.openstack.org/#/c/241242/ > > > - Inspection service doesn't process port/switching information, which > > means > > that it wont add it to ironic. Which makes managing network swinging of the > > host a non-starter. As I would inspect the host – then modify the ironic > > record > > to add the details about what port/switch the server is connected to from a > > different source. At that point why wouldn't I just onboard everything > > through > > the API? > > This is desired, but not done yet, AFAIK. > > > - Doesn't grab hardware disk configurations, If the server has multiple > > raids > > (r1 + r5) only reports boot raid disk capacity. > > This falls out from a limitation in Nova (discussed above) though I would > encourage inspector to collect all the data (even if ironic/nova can't use it, > today). > > > - Inspection is geared towards using a different network and dnsmasq > > infrastructure than what is in use for ironic/neutron. Which also means > > that in > > order to not conflict with dhcp requests for servers in ironic I need to use > > different networks. Which also means I now need to handle swinging server > > ports > > between different networks. > > Inspector is designed to respond only to requests for nodes in the inspection > phase, so that it *doesn't* conflict with provisioning of nodes by Ironic. > I've > been using the same network for inspection and provisioning without issue -- > so > I'm not sure what problem you're encountering here. > > > > > 3.) IPA image: > > - Default build stuff is pinned to extremly old versions due to gate > > failure > > issues. So I can not work without a fork for onboard of servers due to the > > fact > > that IPMI modules aren't built for the kernel, so inspection can never > > match the > > node against ironic. Seems like currently functionality here is MVP for > > gate to > > work and to deploy images. But if you need to do firmware, bios-config, any > > other hardware specific features you are pretty much going to need to roll > > your > > own IPA image and IPA modules to do standard provisioning tasks. > > > > That's correct. We assume that operators and downstream distributors will > build > and customize the IPA image as needed for their environment. Ironic only > provides the base image and the tools to modify it; if we were to attempt to > build an image that could handle every piece of hardware out there, it would > be > huge, unwieldy, and contain a lot of proprietary tools that we simply don't > have > access / license to use. > > > 4.) Conductor: > > - Serial-over-lan consoles require a unique port on the conductor server > > (I > > have seen purposes to try and fix this?), this is painful to manage with > > large > > numbers of servers. > > - SOL consoles aren't restarted when conductor is restarted (I think this > > might be fixed in newer versions of ironic?), again if end users aren't > > suppose > > to consume ironic api's directly - this is painful to handle. > > - As far as I can tell shell-in-a- box, SOL consoles aren't support via > > nova – > > so how are end users suppose to consume the shell-in-box console? > > You are, unfortunately, correct. Ironic once supported SOL console > connectivity > through Nova, but it has not been working for a while now. We discussed this > at > length in the Austin summit and plan to fix it soon: > https://review.openstack.org/#/c/319505/ > > > - Its very easy to get a node to fall off the staemachine rails (reboot a > > server while an image is being deployed to it), the only way I have seen to > > be > > able to fix this is to update the DB directly. > > Yea, that's a well known pain point, and there is ongoing work to improve the > recovery process for nodes that get "stuck" in various ways, with the premise > that the operator should never have to munge the DB directly. One approach > we've > discussed is adding a management CLI tool to make this cleaner. > > > - I have BMC that need specific configuration (some require SOL on com2, > > others on com1) this makes it pretty much impossible without per box > > overrides > > against the conductor hardcoded templates. > > Ironic allows certain aspects of the Node's management to be overridden > individually, but it sounds like you need some knobs that we haven't > implemented. Could you file a bug for this? I think we'd be keen to add it. Yeah, we've talked about this before but nobody has really pushed on it. Essentially an optional `pxe_append_params` per node. Shouldn't be too hard to implement. > > > - Additionally it would be nice to default to having a provisioning > > kernel/image that was set as a single config option with per server > > overrides – > > rather than on each server. If we ever change the IPA image – that means at > > scale we would need to update thousands of ironic nodes. > > This request has surfaced in the past, however, it wouldn't make sense in a > heterogeneous environment (eg, mix of ia64 and x86_64 hardware in one region) > and so past discussions have landed on the side of not implementing it (either > as a system-level default image or as a driver-level default image). > > If there were a consensus that it helped enough deployments, without > increasing > the complexity of complex multi-arch deployments, I think folks would be > willing > to accept a feature like this. This wouldn't be too complex, right? A config for the default deploy image, overridden per-node in driver_info (like we do today). Multi-arch deployments, in the worst case, would behave just like today, so I don't see a problem here. I suspect most single-arch deployments use the same ramdisk everywhere, so this would be a huge help. The only downside I see right now is needing to restart the conductor to roll a new ramdisk out, but that's a pretty uneventful thing if you're running multiple conductors. > > > > > What is ironic doing to monitor the hardware for failures? I assume the > > answer > > here is nothing and that we will need to make sure the images that we > > deploy are > > correctly configuring the tools to monitor disk/health/psu's/ram errors, > > ect. ect. > > > > Today, nothing, but this is something we want to do, and there is an > agreed-upon > design, here: > http://specs.openstack.org/openstack/ironic-specs/specs/not-implemented/notifications.html > > This is only the initial design for the notification system. The goal of this > would be to enable drivers to capture hardware alerts (or perform more > proactive > gathering of hardware status) and propagate those alerts up to the cloud > operator. > > > > In summary, you're not alone, nor are your ideas/thoughts/requests > unreasonable. > We're all facing similar concerns -- and you're welcome to come hang out in > #openstack-ironic and participate in shaping Ironic so that it meets your > needs, > too :) ++ // jim > > Regards, > Devananda > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev