On Fri, Jan 24, 2014 at 6:57 PM, Salvatore Orlando <sorla...@nicira.com>wrote:
> I've found out that several jobs are exhibiting failures like bug 1254890 > [1] and bug 1253896 [2] because openvswitch seem to be crashing the kernel. > The kernel trace reports as offending process usually either > neutron-ns-metadata-proxy or dnsmasq, but [3] seem to clearly point to > ovs-vsctl. > 254 events observed in the previous 6 days show a similar trace in the > logs [4]. > This means that while this alone won't explain all the failures observed, > it is however potentially one of the prominent root causes. > > From the logs I have little hints about the kernel running. It seems there > has been no update in the past 7 days, but I can't be sure. > Openvswitch builds are updated periodically. The last build I found not to > trigger failures was the one generated on 2014/01/16 at 01:58:18. > Unfortunately version-wise I always have only 1.4.0, no build number. > > I don't know if this will require getting in touch with ubuntu, or if we > can just prep a different image which an OVS build known to work without > problems. > > Salvatore > > [1] https://bugs.launchpad.net/neutron/+bug/1254890 > [2] https://bugs.launchpad.net/neutron/+bug/1253896 > [3] http://paste.openstack.org/show/61869/ > [4] "kernel BUG at /build/buildd/linux-3.2.0/fs/buffer.c:2917" and > filename:syslog.txt > > Do you want to track this as a separate bug and e-r fingerprint? It will overlap with the other two bugs but will give us good numbers on status.openstack.org/elastic-recheck/ > > On 24 January 2014 21:13, Clay Gerrard <clay.gerr...@gmail.com> wrote: > >> OH yeah that's much better. I had found those eventually but had to dig >> through all that other stuff :'( >> >> Moving forward I think we can keep an eye on that page, open bugs for >> those tests causing issue and dig in. >> >> Thanks again! >> >> -Clay >> >> >> On Fri, Jan 24, 2014 at 11:37 AM, Sean Dague <s...@dague.net> wrote: >> >>> On 01/24/2014 02:02 PM, Peter Portante wrote: >>> > Hi Sean, >>> > >>> > In the last 7 days I see only 6 python27 based test >>> > failures: >>> http://logstash.openstack.org/#eyJzZWFyY2giOiJwcm9qZWN0Olwib3BlbnN0YWNrL3N3aWZ0XCIgQU5EIGJ1aWxkX3F1ZXVlOmdhdGUgQU5EIGJ1aWxkX25hbWU6Z2F0ZS1zd2lmdC1weXRob24qIEFORCBtZXNzYWdlOlwiRVJST1I6ICAgcHkyNzogY29tbWFuZHMgZmFpbGVkXCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6IjYwNDgwMCIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJ1c2VyX2ludGVydmFsIjowfSwic3RhbXAiOjEzOTA1ODk2Mjk0MDR9 >>> > >>> > And 4 python26 based test >>> > failures: >>> http://logstash.openstack.org/#eyJzZWFyY2giOiJwcm9qZWN0Olwib3BlbnN0YWNrL3N3aWZ0XCIgQU5EIGJ1aWxkX3F1ZXVlOmdhdGUgQU5EIGJ1aWxkX25hbWU6Z2F0ZS1zd2lmdC1weXRob24qIEFORCBtZXNzYWdlOlwiRVJST1I6ICAgcHkyNjogY29tbWFuZHMgZmFpbGVkXCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6IjYwNDgwMCIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJ1c2VyX2ludGVydmFsIjowfSwic3RhbXAiOjEzOTA1ODk1MzAzNTd9 >>> > >>> > Maybe the query you posted captures failures where the job did not >>> even run? >>> > >>> > And only 15 hits (well, 18, but three are within the same job, and some >>> > of the tests are run twice, so it is a combined of 10 >>> > hits): >>> http://logstash.openstack.org/#eyJzZWFyY2giOiJwcm9qZWN0Olwib3BlbnN0YWNrL3N3aWZ0XCIgQU5EIGJ1aWxkX3F1ZXVlOmdhdGUgQU5EIGJ1aWxkX25hbWU6Z2F0ZS1zd2lmdC1weXRob24qIEFORCBtZXNzYWdlOlwiRkFJTDpcIiBhbmQgbWVzc2FnZTpcInRlc3RcIiIsImZpZWxkcyI6W10sIm9mZnNldCI6MCwidGltZWZyYW1lIjoiNjA0ODAwIiwiZ3JhcGhtb2RlIjoiY291bnQiLCJ0aW1lIjp7InVzZXJfaW50ZXJ2YWwiOjB9LCJzdGFtcCI6MTM5MDU4OTg1NTAzMX0= >>> > >>> > >>> > Thanks, >>> >>> So it is true, that the Interupted exceptions (which is when a job is >>> killed because of a reset) are some times being turned into Fail events >>> by the system, which is one of the reasons the graphite data for >>> failures is incorrect, and if you use just the graphite sourcing for >>> fails, your numbers will be overly pessimistic. >>> >>> The following is probably better lists >>> - >>> >>> http://status.openstack.org/elastic-recheck/data/uncategorized.html#gate-swift-python26 >>> (7 uncategorized fails) >>> - >>> >>> http://status.openstack.org/elastic-recheck/data/uncategorized.html#gate-swift-python27 >>> (5 uncategorized fails) >>> >>> -Sean >>> >>> -- >>> Sean Dague >>> Samsung Research America >>> s...@dague.net / sean.da...@samsung.com >>> http://dague.net >>> >>> >>> _______________________________________________ >>> OpenStack-dev mailing list >>> OpenStack-dev@lists.openstack.org >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>> >>> >> >> _______________________________________________ >> OpenStack-dev mailing list >> OpenStack-dev@lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> >> > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >
_______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev