On Feb 21, 2015 12:26 AM, "Joe Gordon" <joe.gord...@gmail.com> wrote: > > > > On Fri, Feb 20, 2015 at 7:29 AM, Deepak Shetty <dpkshe...@gmail.com> wrote: >> >> Hi Jeremy, >> Couldn't find anything strong in the logs to back the reason for OOM. >> At the time OOM happens, mysqld and java processes have the most RAM hence OOM selects mysqld (4.7G) to be killed. >> >> From a glusterfs backend perspective, i haven't found anything suspicious, and we don't have the logs of glusterfs (which is typically in /var/log/glusterfs) so can't delve inside glusterfs too much :( >> >> BharatK (in CC) also tried to re-create the issue in local VM setup, but it hasn't yet! >> >> Having said that, we do know that we started seeing this issue after we enabled the nova-assisted-snapshot tests (by changing nova' s policy.json to enable non-admin to create hyp-assisted snaps). We think that enabling online snaps might have added to the number of tests and memory load & thats the only clue we have as of now! >> > > It looks like OOM killer hit while qemu was busy and during a ServerRescueTest. Maybe libvirt logs would be useful as well?
Thanks for the data point, will look at this test to understand more what's happening > > And I don't see any tempest tests calling assisted-volume-snapshots Maybe it still hasn't reached to it yet. Thanks Deepak > > Also this looks odd: Feb 19 18:47:16 devstack-centos7-rax-iad-916633.slave.openstack.org libvirtd[3753]: missing __com.redhat_reason in disk io error event > > >> >> So : >> >> 1) BharatK has merged the patch ( https://review.openstack.org/#/c/157707/ ) to revert the policy.json in the glusterfs job. So no more nova-assisted-snap tests. >> >> 2) We also are increasing the timeout of our job in patch ( https://review.openstack.org/#/c/157835/1 ) so that we can get a full run without timeouts to do a good analysis of the logs (logs are not posted if the job times out) >> >> Can you please re-enable our job, so that we can confirm that disabling online snap TCs is helping the issue, which if it does, can help us narrow down the issue. >> >> We also plan to monitor & debug over the weekend hence having the job enabled can help us a lot. >> >> thanx, >> deepak >> >> >> On Thu, Feb 19, 2015 at 10:37 PM, Jeremy Stanley <fu...@yuggoth.org> wrote: >>> >>> On 2015-02-19 17:03:49 +0100 (+0100), Deepak Shetty wrote: >>> [...] >>> > For some reason we are seeing the centos7 glusterfs CI job getting >>> > aborted/ killed either by Java exception or the build getting >>> > aborted due to timeout. >>> [...] >>> > Hoping to root cause this soon and get the cinder-glusterfs CI job >>> > back online soon. >>> >>> I manually reran the same commands this job runs on an identical >>> virtual machine and was able to reproduce some substantial >>> weirdness. >>> >>> I temporarily lost remote access to the VM around 108 minutes into >>> running the job (~17:50 in the logs) and the out of band console >>> also became unresponsive to carriage returns. The machine's IP >>> address still responded to ICMP ping, but attempts to open new TCP >>> sockets to the SSH service never got a protocol version banner back. >>> After about 10 minutes of that I went out to lunch but left >>> everything untouched. To my excitement it was up and responding >>> again when I returned. >>> >>> It appears from the logs that it runs well past the 120-minute mark >>> where devstack-gate tries to kill the gate hook for its configured >>> timeout. Somewhere around 165 minutes in (18:47) you can see the >>> kernel out-of-memory killer starts to kick in and kill httpd and >>> mysqld processes according to the syslog. Hopefully this is enough >>> additional detail to get you a start at finding the root cause so >>> that we can reenable your job. Let me know if there's anything else >>> you need for this. >>> >>> [1] http://fungi.yuggoth.org/tmp/logs.tar >>> -- >>> Jeremy Stanley >>> >>> __________________________________________________________________________ >>> OpenStack Development Mailing List (not for usage questions) >>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> >> >> >> __________________________________________________________________________ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev