On Fri, Jul 13, 2018 at 1:54 AM, Bogdan Dobrelya <bdobr...@redhat.com> wrote: > [Added tripleo] > > It would be nice to have this situation verified/improved for containerized > libvirt for compute nodes deployed with TripleO as well. > > On 7/12/18 11:02 PM, Clint Byrum wrote: >> >> Greetings! We've been deploying with Kolla on CentOS 7 now for a while, >> and >> we've recently noticed a rather troubling behavior when we shutdown >> hypervisors. >> >> Somewhere between systemd and libvirt's systemd-machined integration, >> we see that guests get killed aggressively by SIGTERM'ing all of the >> qemu-kvm processes. This seems to happen because they are scoped into >> machine.slice, but systemd-machined is killed which drops those scopes >> and thus results in killing off the machines. > > > So far we had observed the similar [0] happening, but to systemd vs > containers managed by docker-daemon (dockerd). > > [0] https://bugs.launchpad.net/tripleo/+bug/1778913 > > >> >> In the past, we've used the libvirt-guests service when our libvirt was >> running outside of containers. This worked splendidly, as we could >> have it wait 5 minutes for VMs to attempt a graceful shutdown, avoiding >> interrupting any running processes. But this service isn't available on >> the host OS, as it won't be able to talk to libvirt inside the container. >> >> The solution I've come up with for now is this: >> >> [Unit] >> Description=Manage libvirt guests in kolla safely >> After=docker.service systemd-machined.service >> Requires=docker.service >> >> [Install] >> WantedBy=sysinit.target >> >> [Service] >> Type=oneshot >> RemainAfterExit=yes >> TimeoutStopSec=400 >> ExecStart=/usr/bin/docker exec nova_libvirt /usr/libexec/libvirt-guests.sh >> start >> ExecStart=/usr/bin/docker start nova_compute >> ExecStop=/usr/bin/docker stop nova_compute >> ExecStop=/usr/bin/docker exec nova_libvirt /usr/libexec/libvirt-guests.sh >> shutdown >> >> This doesn't seem to work, though I'm still trying to work out >> the ordering and such. It should ensure that before we stop the >> systemd-machined and destroy all of its scopes (thus, killing all the >> vms), we run the libvirt-guests.sh script to try and shut them down. The >> TimeoutStopSec=400 is because the script itself waits 300 seconds for any >> VM that refuses to shutdown cleanly, so this gives it a chance to wait >> for at least one of those. This is an imperfect solution but it allows us >> to move forward after having made a reasonable attempt at clean shutdowns. >> >> Anyway, just wondering if anybody else using kolla-ansible or kolla >> containers in general have run into this problem, and whether or not >> there are better/known solutions. > > > As I noted above, I think the issue may be valid for TripleO as well. >
I think https://review.openstack.org/#/c/580351/ is trying to address this. Thanks, -Alex >> >> Thanks! >> >> __________________________________________________________________________ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > > > -- > Best regards, > Bogdan Dobrelya, > Irc #bogdando > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev