Greetings ironic folk! Like many other teams, we had very few ironic contributors make it to Sydney. As such, I wanted to go ahead and write up a summary that covers takeaways, questions, and obvious action items for the community that were raised by operators and users present during the sessions, so that we can use this as feedback to help guide our next steps and feature planning.
Much of this is from my memory combined with notes on the various etherpads. I would like to explicitly thank NobodyCam for reading through this in advance to see if I was missing anything at a high level since he was present in the vast majority of these sessions, and dtantsur for sanity checking the content and asking for some elaboration in some cases. -Julia Ironic Project Update ===================== Questions largely arose around use of boot from volume, including some scenarios we anticipated that would arise, as well as new scenarios that we had not considered. Boot nodes booting from the same volume --------------------------------------- From a technical standpoint, when BFV is used with iPXE chain loading, the chain loader reads the boot loader and related data from the cinder (or, realistically, any iSCSI volume). This means that a skilled operator is able to craft a specific volume that may just turn around and unpack a ramdisk and operate the machine solely from RAM, or that utilize an NFS root. This sort of technical configuration would not be something an average user would make use of, but there are actual use cases that some large scale deployment operators would make use of and that would provide them value. Additionally, this topic and the desire for this capability also come up during the “Building a bare metal cloud is hard” talk Q&A. Action Item: Check the data model to see if we prohibit, and consider removing the prohibition against using the same volume across nodes, if any. Cinder-less BFV support ----------------------- Some operators are curious about booting Ironic managed nodes without cinder in a BFV context. This is something we anticipated and built the API and CLI interfaces to support this. Realistically, we just need to offer the ability for the data to be read and utilized. Action Item: Review code and ensure that we have a some sort of no-op driver or method that allows cinder-less node booting. For existing drivers, it would be the shipment of the information to the BMC or the write-out of iPXE templates as necessary. Boot IPA from a cinder volume ----------------------------- With larger IPA images, specifically in cases where the image contains a substantial amount of utilized or tooling to perform cleaning, providing a mechanism to point the deployment Ramdisk to a cinder volume would allow more efficient IO access. Action Item: Discuss further - Specifically how we could support as we would need to better understand how some of the operators might use such functionality. Dedicated Storage Fabric support -------------------------------- A question of dedicated storage fabric/networking support arose. For users of FibreChannel, they generally have a dedicated storage fabric by the very nature of separate infrasturcture. However, with ethernet networking where iSCSI software initiators are used, or even possibly converged network adapters, things get a little more complex. Presently, with the iPXE boot from volume support, we boot using the same interface details for the neutron VIF that the node is attached with. Moving forward, with BFV, the concept was to support the use of explicitly defined interfaces as storage interfaces, which could be denoted as "volume connectors" in ironic by type defined as "mac". In theory, we begin to get functionality along these lines once https://review.openstack.org/#/c/468353/ lands, as the user could define two networks, and the storage network should then fall to the explicit volume connector interface(s). The operator would just need to ensure that the settings being used on that storage network are such that the node can boot and reach the iSCSI endpoint, and that a default route is not provided. The question then may be, does Ironic do this quietly for the user requesting the VM or not, and how do we document the use such that operators can conceptualize it. How do we make this work at a larger scale? How could this fit or not fit into multi-site deployments? In order to determine if there is more to do, we need to have more discussions with operators. Action items: * Determine overall needs for operators, since this is implementation architecture centric. * Plan forward path form there, if it makes sense. Note: This may require more information to be stored or leveraged in terms of structural or location based data. Migration questions from classic drivers to Hardware types ---------------------------------------------------------- One explicit question from the operator community was if we intended to perform a migration from the classic driver to hardware types. In a sense, there are two issues here. The first being a perception of the work and the second is there a good way to cleanly identify, and transform classic drivers during upgrade. Action item: * For whatever reason the ironic community felt it was un-necessary to facilitate a migration for users from drivers to resource classes, even though we have direct analogs. The ironic community should re-evaluate and consider implementing migration logic to ease user migration. * In order to proceed, Ironic does need to understand if operators would be okay if the upgrade process failed, that is if the pre-upgrade checks detected that the configuration was incompatible for a migration to be successful. This could allow an operator to correct their configuration file and re-execute the upgrade attempt. Ironic use Feedback Session =========================== https://etherpad.openstack.org/p/SYD-forum-ironic-feedback The feedback session felt particularly productive because developers were far out numbered by operators. Current Troubles/Not Working for Operators ------------------------------------------ * Current RAID deployment process where we apply raid configuration generally during the cleaning step, prior to deploy. ** One of the proposed solutions was the marriage of traits, deploy templates, and the application of deployment templates upon deployment. ** The concern is that this will lead to an explosion of flavors, and some operators environments are already extremely flavor-full. “I presently run `nova flavor-list`, and go get a coffee” ** The mitigating factor will be the ability to allow at-boot time definition by the user initiating the deployment, that additional traits could be proposed on the command line. This was mentioned by Sam Betts, and one of the nova cores present indicated that it was part of their plan. * UEFI iPXE boot - Specifically some operators are encountering issues, with some vendors hardware, that “should” be compatible, however is not actually working except in specific scenarios. ** This is not an ironic bug. ** In the specific case that an operator reported, they were forced into use of a vendor driver and specific settings, which seemed like something they would have preferred to avoid. ** The community members, as long with the users and operators present agreed that a good solution would be to propose documentation updates to our repository that detail when drivers _do not_ work, or when there are weird compatibility issues that are not quite visible. ** It may be worth considering some sort of matrix to raise visibility of drivers compatibility/interoperability moving forward. The Ironic team would not push back if an operator wishes to being updating our Admin documentation with such information. Action Items: * The community should encourage operators to submit documentation changes when they become aware of such issues. * The community should also encourage vendor driver maintainers to explicitly document their known-good/tested scenarios, as some hardware with-in the same family can vary. What Operators are indicating that they need -------------------------------------------- Firmware Updates ~~~~~~~~~~~~~~~~ Our present firmware update model is dependent upon a hardware manager driving the process during cleaning, which presently requires the hardware manager to be built inside the ramdisk image. This is problematic as it requires operators to craft and build hardware managers that fit their needs, and then ensure those are running on the specific hosts to upgrade their firmware. While this may seem easy and reasonable for a small deployment, there is an operations disconnect in many organizations between who blesses new firmware versions, and who controls the hardware. In some cases, a team may be in charge of certifying and testing new firmware, while another team entirely operates the cloud. These process and operational constraints also prevent hardware managers from being shared in the open, because they could potentially reveal security state of a deployment. Simply put, operators need something easier, especially when they may receive twenty different chassis in a single year. While we discussed this as a group, we did seem to begin to reach an understanding of what would be useful. Several operators made it clear that they feel that Ironic is in a position to help drive standardization across vendors. What operators are looking for: * A framework or scaffolding to facilitate centrally managed firmware updates where the current state information is published upward, and the system replies with the firmware to be applied. ** Depending on the deployment, an operator may choose to assert firmware upon every cleaning, but they need to be able to identify, the hardware, current firmware, and necessary versions by some sort of policy. ** Any version policy may vary across the infrastructure, based on either resource class, or hardware ownership concepts. ** This may, in itself just be a hardware manager that calls out to an external service, and executes based upon what the service tells it to do. * Ironic to work with vendors to encourage some sort of standardized packaging and/or installation process such that the firmware updating code can be as generic as possible. One other note worth mentioning, some operators spoke of stress testing their hardware during cleaning processes. It seems like a logical thing to do, however this would be best something for a few operators to explicitly propose what they wish to test, and how they do it presently so we as a community can gain a better understanding. Action Items: * Poll hardware vendors during the next weekly meeting and attempt to build an understanding and consensus. * With feedback, we will then have to take the next step is trying to determine how to fit such a service into Ironic along with what ironic's expectations are for drivers regarding firmware upgrades. TPM Key Assertion ~~~~~~~~~~~~~~~~~ Some operators utilize their TPMs for secure key storage, and need a mechanism to update/assert a new key to overwrite existing key data. The key data in the TPM is used by the running system, and we have no operational need to store or utilize the data. Presently some operators perform this manually, and replacing keys on systems running elsewhere in the world is presently a human intensive process. The consensus in the room was that this might be a good out of band management interface feature which could be in the management interfaces for the vendor drivers. We presently minimally use the management interface. From a security standpoint, this is also something we shouldn’t store locally, but only be a clean pass-through conduit for the data, which makes explicitly out-of-band usage even more appealing with vendor drivers. Action Item: Poll hardware vendors during the next weekly meeting or two in order to begin discussion of viability/capability to support. This could be passthru functionality in the driver, but if two drivers can eventually support such functionality, we should standardize this upfront. Reversing the communications flow ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ One of the often discussed items in security conscious environments is the need to be able to have the conductor initiate communication to the agent. While this model will not work for all deployments, we've had a consensus in the past that this is something we should consider doing as an optional setting. In other words, IPA could have a mode of operation where it no longer heartbeats to the API, where the conductor would lookup the address from Neutron, and proceed to poll it until the node came online. The conductor would then poll that address on a regular basis, much like heart-beating works today. We should keep in mind, that this polling operation will have an increased impact on conductor load. Several operators present in the session expressed interest, with others indicated this would be a breaking change for their environment's security model, and as such any movement in this direction must be optional. Action Item: Someone writes a specification and poll the larger operators that we know in the community for thoughts, in order to see if it meets their needs. Documentation on known issues and fixes or incompatibilities ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Operators would like to see more information on known driver issues or incompatibilities explicitly stated in the documentation. Additionally operators would like a single location for "how to fix this issue" that is not a bug tracking system. There seemed to be consensus that these were good things to document, and like the like this, and it seems like the community does not disagree. That being said, the operators are the ones who will have a tendency for us to be more aware of such issues. The best way for the operator community to help the developers, is to propose documentation patches to the ironic repository to raise awareness and visibility with-in our own documentation. We must keep in mind that we must curate this information as well, since some of these things are not necessarily “bugs”, much like the UEFI boot issues noted earlier. Automatic BIOS Configuration ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ tl;dr: We are working on it. Use of System UUID in addition to MAC addresses ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Some operators would like to see changes in order to allow booting based upon a recorded system UUID. In these cases, the operator may be using the "noop" network interface driver, or another custom driver that does not involve neutron. To the reasons to support UUID based booting are extensive: * iPXE can attempt the system UUID first, so support for utilizing the UUID could remove several possible transactions. * The first ethernet device to iPXE may not be the one known to ironic, and may still obtain an IP address based upon the environment configuration/operation. This will largely be the case in an environment where DHCP is not managed by neutron. Presently, operators have to wait for the unknown interfaces to fail completely, and eventually reach a known network interface. ** Possibly worse, the order may not be consistent depending on the hardware boot order / switch configuration / cabling. ** Operators indicated that swapped cabling with Link Aggregates is a semi-common problem they encounter. * MAC addresses of nodes may just not be known, and evolving to support hard coded UUIDs does provide us some greater flexibility in the terms of being able to boot a node where we do not know nor control the IP addressing assigned. In addition to UUIDs, an operator expressed interest in having the same boot behavior, however with the IP address allocated, as opposed to the UUID. This also deserves consideration as it may be useful for some operators. Action Items: * We should determine a method of storage of the UUID if discovered or already known. Some operators may already know the address. ** Suggestion from an operator: Maybe just allow setting of uuid when a node is created, like we do with ports, so that a operator or inspector could set the node uuid to be the same as the systems uuid, thus eliminating the need for another field. ** Ironic contributor: Alternatively, we should just add a boolean that writes it out and offers it as an initial step, and then falls back to the MAC address based attempt. * Update template generation to support writing a symlink with the UUID and / or MAC addresses. * Explore possibility of doing the same with IP addresses. Diskless boot ~~~~~~~~~~~~~ This is a repeat theme that has arisen before, and in many cases could be solved via the BFV iPXE functionality, however other operators have expressed need in the past for more generic boot options in order to boot their nodes. There has been some prior specifications on making generic PXE interfaces available for things such as network switches. As such, we should likely re-evaluate those specifications. Action Item: Ironic should re-evaluate current position, and review related specifications. Physical Location/Hardware Ownership ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This one didn't quite make the notes, but a couple attendees seem to remember it and it is worth mentioning. Presently, there is no ability to have a geographically diverse ironic installation where a single pane of glass is provided by the API. To add further complexity, ironic may be in a situation where it is managing hardware that the operator might not explicitly own, that needs to be in a delineated pool. We presently have no way to represent or control scheduling in these cases. This quickly leads to tenant awareness, as in an operator may have a tenant that owns hardware hardware in the datacenter. Naturally, this can get complex quite quickly, but it seems logical for many users as they have either trusted users that may wish to manually deploy hardware, and in many of those cases, it is desired that hardware be used by no other tenant. This concept may also be extended to a concept of "authorized users" who have temporary implied rights to interact with ironic based upon permissions and current ownership. To keep this short, the impacts of this are _massive_ as they are intertwined fundamental changes to how ironic represents data at the API level as well as to how ironic executes requests as the end goal would be to provide the ability to provide the granularity to say "These two conductors are for x environment" or the granularity of "your only allowed to see x nodes". As a result of all of this, it would be a huge API change. The current concept of which, is just build upon the existing 1.0 API as a 2.0 API. Action Item: TheJulia and Nobodycam volunteer as tributes... to start a spec. Ironic On-boarding Session ========================== The Sydney summit was the first attempt by the ironic team to execute an on-boarding session for the community. As such, the intent was to take a free form approach as an attempt to answer questions that anyone had for community members, which would provide feedback into what new contributors might be interested in moving forward. By in large, the questions that were asked boiled down to the following questions: Where do I find the code? This was largely a question of what repository contains what pieces. How do I setup a test environment? This was very much a question of getting started, which led into the next logical question. How do I test without real physical servers? The answer became Devstack or Bifrost, depending on your use case and desire to perform full-stack development or lightweight work with or along side Ironic. Can I test using remote VMs? Overall the answer was yes, but that you needed to handle networking yourself to bridge it through and have some mechanism to control power. Ironic-staging-drivers was brought up as a repository that might have useful drivers in these cases. Ironic should to look at improving some of our docs to highlight the possibilities? What alternatives to devstack are there? Bifrost was raised as an example. We failed to mention kolla as an option. :( How do we see community priorities? This was very easy for us, but for a new contributor coming into the community, it is not as clear. Ironic should consider improving documentation to make it very clear where to find this information. Action Items: * Some of Ironic's documentation for new contributors may need revision to provide some of these contextual details upfront, or we might need to consider a Q&A portion of the documentation. * The ironic community should ensure that the above questions are largely answered in whatever material is presented as part of the next on-boarding session at the Vancouver summit. Mogan/Nova/Ironic Session ========================= https://etherpad.openstack.org/p/SYD-forum-baremetal-ironic-mogan-nova The purpose of this session was to help compare, contrast, and provide background to the community as to the purpose behind the Mogan project, which was to create a baremetal centric user-facing compute API allowing non-administrator users to provision and directly control baremetal. The baremetal in mogan's context could be baremetal that already exists, or baremetal that is created from some sort of composible infrastructure. The Mogan PTL started with an overview to provide context to the community, and then the community shifted to asking questions to gain context, including polling operators for interest and concerns. Primarily, operator concern was over creating divergence and user confusion in the community. Once we had some operator input, we attempted to identify differences and shortcomings in Ironic and Nova that primarily drove the effort. What we were able to identify from a work-in-progress comparison document largely indicated that was additional insight into aggregates which was partly due to affinity/anti-affinity needs. Additional functionality exists in Mogan to list servers available for management and then directly act upon them, although the extent of what additional actions can be taken upon a baremetal node had not been identified. As the discussions went on, the Ironic team members that were present were able to express our concerns over communication. It largely seemed to be a surprise that some of our hardware teams were working in the direction of composible hardware, and that the use model mogan sought could fit into our scope and workflow for composible hardware. Largely, for composible hardware, we would need some way to represent a node that a user wishes to perform an action upon. In some cases now, that is performed with placeholder records representing possible capacity. Naturally for ironic, making it user facing would be a very large change to Ironic's API, however these are changes, based on other sessions, that Ironic may wish to explore given stated operator needs. The discussion for both Ironic and Nova was more of a “How do we best navigate” instead of “If we should navigate” question, which in it's self is positive. Some of these items included improving the view of available physical baremetal. Regional/Availability zoning, tenant utilization of the API, and possibly hardware ownership concepts. Many of these items, as touched in the feedback session, are intertwined. Overall, the session was good in that we were able to gain consensus that the core issues which spurred the creation of Mogan are addressable by the present Ironic and Nova contributors. Complete gap/feature comparison remains as an outstanding item, which may still influence the discussion going forward. Baremetal Scheduling session ============================ https://etherpad.openstack.org/p/SYD-forum-baremetal-scheduling We were originally hoping to cancel this session and redirect everyone into the nova placement status update, but we soon found out that there were some lingering questions as well as concerns of operators that needed to be discussed. We started out in discussion and came to the realization that there could very well be a trait explosion in the attempt to support affinity and anti-affinity efforts. While for baremetal it could be a side-effect, it does not line up with the nova model. Conceptually, we could quickly end up with trait lists that could look something like: CUSTOM_AC_GRID_C CUSTOM_ROOM1_POWER_GRID_C CUSTOM_CABINET_4 CUSTOM_Charlotte_DC3 CUSTOM_Charlotte_DC3_ROW2_CAB4 CUSTOM_CUSTOMER_TAG CUSTOM_OWNED_ENV NET1GB NET2GB NET10GB NET10GB_DUAL CUSTOM_STORAGE_FABRIC_A CUSTOM_FC_FABRIC_B CUSTOM_REDUNDANT_COOLING CUSTOM_IS_A_BIKE_SHED_ON_THE_MOON CUSTOM_IS_NOT_LORD_VADERS_BIKESHED At some point, someone remarked “It seems like there is just no solution that is going to work for everyone by default.” The remark was not just resource class determination, but trait identification, but also encompassed scheduling affinity and anti-affinity, which repeatedly came up in discussions over the week. This quickly raised an operator desire for the ironic community to solve for what would fit 80% of use cases, and then iterate moving forward. The example brought up in discussion was to give operators an explicit configuration parameter that they could use to assert resource_class, or possibly even static trait lists until they can populate the node with what should be there for their deployment, or for that individual hardware installation in their environment. While, the ironic community solution is "write introspection rules", it seems operators just want something simpler that is a standing default, like an explicit default in the configuration file. Some operators pointed out that with their processes, they would largely know or be able to reconcile the differences in their environment and make those in ironic as-needed. Eventually, the discussion shifted to affinity/anti-affinity which could partially make use of tags, although that as previously detailed, would quickly result in a tag explosion depending on how an operator implements and chooses to manage their environment. Action items: * Ironic needs to discuss as a group and what the impact of this discussion means. Many of themes beyond providing configurable defaults to meet the 80% of users, have repeatedly come up and really drive towards some of the things detailed as part of the feedback session. * For "resource_class" defaults, Dmitry was kind enough to create https://bugs.launchpad.net/ironic/+bug/1732190 as this does seem like a quick and easy thing for us to address. __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
