On 2015-02-19 17:03:49 +0100 (+0100), Deepak Shetty wrote: [...] > For some reason we are seeing the centos7 glusterfs CI job getting > aborted/ killed either by Java exception or the build getting > aborted due to timeout. [...] > Hoping to root cause this soon and get the cinder-glusterfs CI job > back online soon.
I manually reran the same commands this job runs on an identical virtual machine and was able to reproduce some substantial weirdness. I temporarily lost remote access to the VM around 108 minutes into running the job (~17:50 in the logs) and the out of band console also became unresponsive to carriage returns. The machine's IP address still responded to ICMP ping, but attempts to open new TCP sockets to the SSH service never got a protocol version banner back. After about 10 minutes of that I went out to lunch but left everything untouched. To my excitement it was up and responding again when I returned. It appears from the logs that it runs well past the 120-minute mark where devstack-gate tries to kill the gate hook for its configured timeout. Somewhere around 165 minutes in (18:47) you can see the kernel out-of-memory killer starts to kick in and kill httpd and mysqld processes according to the syslog. Hopefully this is enough additional detail to get you a start at finding the root cause so that we can reenable your job. Let me know if there's anything else you need for this.  http://fungi.yuggoth.org/tmp/logs.tar -- Jeremy Stanley __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev