On Wed, 2014-01-22 at 13:15 -0500, Dan Prince wrote: > > ----- Original Message ----- > > From: "Clint Byrum" <[email protected]> > > To: "openstack-dev" <[email protected]> > > Sent: Wednesday, January 22, 2014 12:45:45 PM > > Subject: Re: [openstack-dev] [TripleO] our update story: can people live > > with it? > > > > Excerpts from Dan Prince's message of 2014-01-22 09:17:24 -0800: > > > I've been thinking a bit more about how TripleO updates are developing > > > specifically with regards to compute nodes. What is commonly called the > > > "update story" I think. > > > > > > As I understand it we expect people to actually have to reboot a compute > > > node in the cluster in order to deploy an update. This really worries me > > > because it seems like way overkill for such a simple operation. Lets say > > > all I need to deploy is a simple change to Nova's libvirt driver. And > > > I need to deploy it to *all* my compute instances. Do we really expect > > > people to actually have to reboot every single compute node in their > > > cluster for such a thing. And then do this again and again for each > > > update they deploy? > > > > > > > Agreed, if we make everybody reboot to push out a patch to libvirt, we > > have failed. And thus far, we are failing to do that, but with good > > reason. > > > > Right at this very moment, we are leaning on 'rebuild' in Nova, which > > reboots the instance. But this is so that we handle the hardest thing > > well first (rebooting to have a new kernel). > > > > For small updates we need to decouple things a bit more. There is a > > notion of the image ID in Nova, versus the image ID that is actually > > running. Right now we update it with a nova rebuild command only. > > > > But ideally we would give operators a tool to optimize and avoid the > > reboot when it is appropriate. The heuristic should be as simple as > > comparing kernels. > > When we get to implementing such a thing I might prefer it not to be > auto-magic. I can see a case where I want the new image but maybe not the new > kernel. Perhaps this should be addressed when building the image (by using > the older kernel)... but still. I could see a case for explicitly not wanting > to reboot here as well.
++ > > Once we have determined that a new image does not > > need a reboot, we can just change the ID in Metadata, and an > > os-refresh-config script will do something like this: > > > > if [ "$(cat /etc/image_id)" != "$(os-apply-config --key image_id)" ] ; > > then; > > download_new_image > > mount_image /tmp/new_image > > mount / -o remount,rw # Assuming we've achieved ro root > > rsync --one-file-system -a /tmp/new_image/ / > > mount / -o remount,ro # ditto > > fi > > > > No reboot required. This would run early in configure.d, so that any > > pre-configure.d scripts will have run to quiesce services that can't > > handle having their binaries removed out from under them (read: > > non-Unix services). Then configure.d runs as usual, configures things, > > restarts services, and we are now running the new image. > > Cool. I like this a good bit better as it avoids the reboot. Still, this is a > rather large amount of data to copy around if I'm only changing a single file > in Nova. Right. > > > > > I understand the whole read only images thing plays into this too... but > > > I'm wondering if there is a middle ground where things might work > > > better. Perhaps we have a mechanism where we can tar up individual venvs > > > from /opt/stack/ or perhaps also this is an area where real OpenStack > > > packages could shine. It seems like we could certainly come up with some > > > simple mechanisms to deploy these sorts of changes with Heat such that > > > compute host reboot can be avoided for each new deploy. > > > > Given the scenario above, that would be a further optimization. I don't > > think it makes sense to specialize for venvs or openstack services > > though, so just "ensure the root filesystems match" seems like a > > workable, highly efficient system. Note that we've talked about having > > highly efficient ways to widely distribute the new images as well. > > Yes. Optimization! In the big scheme of things I could see 3 approaches being > useful: > > 1) Deploy a full image and reboot if you have a kernel update. (entire image > is copied) > > 2) Deploy a full image if you change a bunch of things and/or you prefer to > do that. (entire image is copied) > > 3) Deploy specific application level updates via packages or tarballs. (only > selected applications/packages get deployed) ++. FWIW, #3 happens a heck of a lot more often than #1 or #2 in CD environments, so this level of optimization will be frequently used. And, as I've said before, optimizing for frequently-used scenarios is worth spending the time on. Optimizing for infrequently-occurring things... not so much. :) Best, -jay _______________________________________________ OpenStack-dev mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
