Hi heat and magnum. Apart from the scalability issues that have been observed, I'd like to add few more subjects to discuss during the summit.
1. One nested stack per node and linear scale of cluster creation time. 1.1 For large stacks, the creation of all nested stack scales linearly. We haven't run any tested using the convergence-engine. 1.2 For large stacks, 1000 nodes, the final call to heat to fetch the IPs for all nodes takes 3 to 4 minutes. In heat, the stack has status CREATE_COMPLETE but magnum's state is updated when this long final call is done. Can we do better? Maybe fetch only the master IPs or get he IPs in chunks. 1.3 After the stack create API call to heat, magnum's conductor busy-waits heat with a thread/cluster. (In case of a magnum conductor restart, we lose that thread and we can't update the status in magnum). Investigate better ways to sync the status between magnum and heat. 2. Next generation magnum clusters A need that comes up frequently in magnum is heterogeneous clusters. * We want to able to create cluster on different hardware, (e.g. spawn vms on nodes with SSDs and nodes without SSDs or other special hardware available only in some nodes of the cluster FPGA, GPU) * Spawn cluster across different AZs I'll describe briefly our plan here, for further information we have a detailed spec under review. [1] To address this issue we introduce the node-group concept in magnum. Each node-group will correspond to a different heat stack. The master nodes can be organized in one or more stacks, so as the worker nodes. We investigate how to implement this feature. We consider the following: At the moment, we have three template files, cluster, master and node, and all three template files create one stack. The new generation of clusters will have a cluster stack containing the resources in the cluster template, specifically, networks, lbaas floating-ips etc. Then, the output of this stack would be passed as input to create the master node stack(s) and the worker nodes stack(s). 3. Use of heat-agent A missing feature in magnum is the lifecycle operations in magnum. For restart of services and COE upgrades (upgrade docker, kubernetes and mesos) we consider using the heat-agent. Another option is to create a magnum agent or daemon like trove. 3.1 For restart, a few systemctl restart or service restart commands will be issued. [2] 3.2 For upgrades there are three scenarios: 1. Upgrade a service which runs in a container. In this case, a small script that runs in each node is sufficient. No vm reboot required. 2. For an ubuntu based image or similar that requires a package upgrade a similar small script is sufficient too. No vm reboot required. 3. For our fedora atomic images, we need to perform a rebase on the rpm-ostree files system which requires a reboot. 4. Finally, a thought under investigation is replacing the nodes one by one using a different image. e.g. Upgrade from fedora 24 to 25 with new versions of packages all in a new qcow2 image. How could we update the stack for this? Options 1. and 2. can be done by upgrading all worker nodes at once or one by one. Options 3. and 4. should be done one by one. I'm drafting a spec about upgrades, should be ready by Wednesday. Cheers, Spyros [1] https://review.openstack.org/#/c/352734/ [2] https://review.openstack.org/#/c/368981/
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev