On 03/28/2017 08:57 AM, Clark Boylan wrote:
1. Libvirt crashes: http://status.openstack.org/elastic-recheck/#1643911 and http://status.openstack.org/elastic-recheck/#1646779
Libvirt is randomly crashing during the job which causes things to fail (for obvious reasons). To address this will likely require someone with experience debugging libvirt since it's most likely a bug isolated to libvirt. We're looking for someone familiar with libvirt internals to drive the effort to fix this issue,
Ok, from the bug [1] we're seeing malloc() corruption. While I agree that a coredump is not that likely to help, I would also like to come to that conclusion after inspecting a coredump :) I've found things in the heap before that give clues as to what real problems are. To this end, I've proposed [2] to keep coredumps. It's a little hackish but I think gets the job done. [3] enables this and saves any dumps to the logs in d-g. As suggested, running under valgrind would be great but probably impractical until we narrow it down a little. Another thing I've had some success with is electric fence [4] which puts boundaries around allocations so out-of-bounds access hits at the time of access. I've proposed [5] to try this out, but it's not looking particularly promising unfortunately. I'm open to suggestions, for example maybe something like tcalloc might give us a different failure and could be another clue. If we get something vaguely reliable here, our best bet might be to run a parallel non-voting job on all changes to see what we can pick up. -i [1] https://bugs.launchpad.net/nova/+bug/1643911 [2] https://review.openstack.org/451128 [3] https://review.openstack.org/451219 [4] http://elinux.org/Electric_Fence [5] https://review.openstack.org/451136 __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
