Corey,

Thanks for investigating the gate issues and summarizing it. It looks there are 
multiple problems to solve, and tickets were created for each one.


1.       https://bugs.launchpad.net/magnum/+bug/1542384

2.       https://bugs.launchpad.net/magnum/+bug/1541964

3.       https://bugs.launchpad.net/magnum/+bug/1542386

4.       https://bugs.launchpad.net/magnum/+bug/1536739

I gave #3 the highest priority because, without this issue being resolved, the 
gate takes several hours to run a single job. It would be tedious for testing 
patches and trouble-shooting other issues in such environment. Any kind of help 
for this issue is greatly appreciated.

Egor, thanks for the advice. A ticket was created to track the logs missing 
issue you mentioned: https://bugs.launchpad.net/magnum/+bug/1542390

Best regards,
Hongbin

From: Guz Egor [mailto:guz_e...@yahoo.com]
Sent: February-05-16 2:44 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Magnum] gate issues

Corey,

I think we should do more investigation before applying any "hot" patches. E.g. 
I look at several failures today and honestly there is no way to find out 
reasons.
I believe we are not copying logs 
(https://github.com/openstack/magnum/blob/master/magnum/tests/functional/python_client_base.py#L163)
 during test failure,
we register handler at setUp 
(https://github.com/openstack/magnum/blob/master/magnum/tests/functional/python_client_base.py#L244),
 but Swarm tests, create
bay in setUpClass 
(https://github.com/openstack/magnum/blob/master/magnum/tests/functional/swarm/test_swarm_python_client.py#L48)
 which called before setUp.
So there is no way to see any logs from vm.

sorry, I cannot submit patch/debug by myself because I will get my laptop back 
only on Tue ):

---
 Egor

________________________________
From: Corey O'Brien <coreypobr...@gmail.com<mailto:coreypobr...@gmail.com>>
To: OpenStack Development Mailing List (not for usage questions) 
<openstack-dev@lists.openstack.org<mailto:openstack-dev@lists.openstack.org>>
Sent: Thursday, February 4, 2016 9:03 PM
Subject: [openstack-dev] [Magnum] gate issues

So as we're all aware, the gate is a mess right now. I wanted to sum up some of 
the issues so we can figure out solutions.

1. The functional-api job sometimes fails because bays timeout building after 1 
hour. The logs look something like this:
magnum.tests.functional.api.v1.test_bay.BayTest.test_create_list_and_delete_bays
 [3733.626171s] ... FAILED
I can reproduce this hang on my devstack with etcdctl 2.0.10 as described in 
this bug (https://bugs.launchpad.net/magnum/+bug/1541105), but apparently 
either my fix with using 2.2.5 (https://review.openstack.org/#/c/275994/) is 
incomplete or there is another intermittent problem because it happened again 
even with that fix: 
(http://logs.openstack.org/94/275994/1/check/gate-functional-dsvm-magnum-api/32aacb1/console.html)

2. The k8s job has some sort of intermittent hang as well that causes a similar 
symptom as with swarm. https://bugs.launchpad.net/magnum/+bug/1541964

3. When the functional-api job runs, it frequently destroys the VM causing the 
jenkins slave agent to die. Example: 
http://logs.openstack.org/03/275003/6/check/gate-functional-dsvm-magnum-api/a9a0eb9//console.html<http://logs.openstack.org/03/275003/6/check/gate-functional-dsvm-magnum-api/a9a0eb9/console.html>
When this happens, zuul re-queues a new build from the start on a new VM. This 
can happen many times in a row before the job completes.
I chatted with openstack-infra about this and after taking a look at one of the 
VMs, it looks like memory over consumption leading to thrashing was a possible 
culprit. The sshd daemon was also dead but the console showed things like 
"INFO: task kswapd0:77 blocked for more than 120 seconds". A cursory glance and 
following some of the jobs seems to indicate that this doesn't happen on RAX 
VMs which have swap devices unlike the OVH VMs as well.

4. In general, even when things work, the gate is really slow. The sequential 
master-then-node build process in combination with underpowered VMs makes bay 
builds take 25-30 minutes when they do succeed. Since we're already close to 
tipping over a VM, we run functional tests with concurrency=1, so 2 bay builds 
means almost the entire allotted devstack testing time (generally 75 minutes of 
actual test time available it seems).

Corey

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 
openstack-dev-requ...@lists.openstack.org<mailto:openstack-dev-requ...@lists.openstack.org>?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to