On Fri, Jul 13, 2018 at 1:54 AM, Bogdan Dobrelya <bdobr...@redhat.com> wrote:
> [Added tripleo]
>
> It would be nice to have this situation verified/improved for containerized
> libvirt for compute nodes deployed with TripleO as well.
>
> On 7/12/18 11:02 PM, Clint Byrum wrote:
>>
>> Greetings! We've been deploying with Kolla on CentOS 7 now for a while,
>> and
>> we've recently noticed a rather troubling behavior when we shutdown
>> hypervisors.
>>
>> Somewhere between systemd and libvirt's systemd-machined integration,
>> we see that guests get killed aggressively by SIGTERM'ing all of the
>> qemu-kvm processes. This seems to happen because they are scoped into
>> machine.slice, but systemd-machined is killed which drops those scopes
>> and thus results in killing off the machines.
>
>
> So far we had observed the similar [0] happening, but to systemd vs
> containers managed by docker-daemon (dockerd).
>
> [0] https://bugs.launchpad.net/tripleo/+bug/1778913
>
>
>>
>> In the past, we've used the libvirt-guests service when our libvirt was
>> running outside of containers. This worked splendidly, as we could
>> have it wait 5 minutes for VMs to attempt a graceful shutdown, avoiding
>> interrupting any running processes. But this service isn't available on
>> the host OS, as it won't be able to talk to libvirt inside the container.
>>
>> The solution I've come up with for now is this:
>>
>> [Unit]
>> Description=Manage libvirt guests in kolla safely
>> After=docker.service systemd-machined.service
>> Requires=docker.service
>>
>> [Install]
>> WantedBy=sysinit.target
>>
>> [Service]
>> Type=oneshot
>> RemainAfterExit=yes
>> TimeoutStopSec=400
>> ExecStart=/usr/bin/docker exec nova_libvirt /usr/libexec/libvirt-guests.sh
>> start
>> ExecStart=/usr/bin/docker start nova_compute
>> ExecStop=/usr/bin/docker stop nova_compute
>> ExecStop=/usr/bin/docker exec nova_libvirt /usr/libexec/libvirt-guests.sh
>> shutdown
>>
>> This doesn't seem to work, though I'm still trying to work out
>> the ordering and such. It should ensure that before we stop the
>> systemd-machined and destroy all of its scopes (thus, killing all the
>> vms), we run the libvirt-guests.sh script to try and shut them down. The
>> TimeoutStopSec=400 is because the script itself waits 300 seconds for any
>> VM that refuses to shutdown cleanly, so this gives it a chance to wait
>> for at least one of those. This is an imperfect solution but it allows us
>> to move forward after having made a reasonable attempt at clean shutdowns.
>>
>> Anyway, just wondering if anybody else using kolla-ansible or kolla
>> containers in general have run into this problem, and whether or not
>> there are better/known solutions.
>
>
> As I noted above, I think the issue may be valid for TripleO as well.
>

I think https://review.openstack.org/#/c/580351/ is trying to address this.

Thanks,
-Alex

>>
>> Thanks!
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
> --
> Best regards,
> Bogdan Dobrelya,
> Irc #bogdando
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to