Re: [openstack-dev] upstream f21 devstack test

Attila Fazekas Mon, 19 Jan 2015 09:05:42 -0800

Per request moving this thread to the openstack-dev list.

I was not able to reproduce the issue so far either on the
vm you pointed me or in any of my VMs.


Several things I observed on `your` machine:
1. The installed kernel is newer then the actually used (No known related issue)
2. On the First tempest (run logs are collected [0]) lp#1353939 was triggered, 
but not related
3. After tried to reproduce the use many-many times I hit lp#1411525, the patch
   which introduced is already reverted.
4. Once I saw 'Returning 400 to user: No nw_info cache associated with 
instance' what I haven't
   seen with nova network for a long time.  (once in 100 run)
5. I see many annoying iscsi related logging, It also does not related to the 
connection issue,
   IMHO the tgtadm can be considered as DEPRECATED thing, and we should switch 
to lioadm.

So far, No Log entry found in connection to connection issue 
 which would worth to search on logstash.

The nova network log is not sufficient to figure out the actual netfilter state 
at any moment.
According the log it should have update the chains with something, but who 
knows..

With the ssh connection issues you can do very few things as post-mortem 
analyses.
Tempest normally deletes the related resources, so less evidences remaining.
If the issue is reproducible some cases enough to alter the test to do not 
destroy evidences,
but very frequently some kind of real debugger required.

Several suspected thing:
* The vm was able to acquire address via dhcp -> successful boot, has L2 
connectivity.
* No evidence found for a dead qemu, no special libvirt operation requested 
before failure.
* nnet claims it added the floating ip to the br100
* L3 issue / security group rules ?..

The basic network debug was removed form tempest[1]. I would like to recommend 
to revert that change
in order to have an idea at least the interfaces and netfilter was or wasn't in 
a good shape [1].

I also created a vm with enabled firewalld (normally it is not in my devstack 
setups), the 3
mentioned test case working fine even after running these tests for hours.
However the '/var/log/firewalld' contains COMMAD_FAILURES as in `your` vm. 

I will try run more full tempest+nnet@F21 in my env to have more sample for 
success rate.

So far I reproduced 0 ssh failure,
so I will scan the logs[0] again more carefully on `your` machine,
maybe I missed something, maybe those tests interfered with something less 
obvious.

I'll check the other gate f21 logs (~100 job/week),
 does anything happened when the issue started and/or is the issue still 
exists. 


So, I have nothing useful at the moment, but I did not given up.

[0] 
http://logs.openstack.org/87/139287/14/check/check-tempest-dsvm-f21/5f3d210/console.html.gz
[1] https://review.openstack.org/#/c/140531/


PS.:
F21's HaProxy is more sensitive to services which stops listening,
and it will not be evenly balanced. 
For a working F21 neutron job better listener is required: 
https://review.openstack.org/#/c/146039/ .
 


----- Original Message -----
> From: "Ian Wienand" <[email protected]>
> To: "Attila Fazekas" <[email protected]>
> Cc: "Alvaro Lopez Ortega" <[email protected]>, "Jeremy Stanley" 
> <[email protected]>, "Sean Dague" <[email protected]>,
> "dean Troyer" <[email protected]>
> Sent: Friday, January 16, 2015 5:24:38 AM
> Subject: upstream f21 devstack test
> 
> Hi Attila,
> 
> I don't know if you've seen, but upstream f21 testing is happening for
> devstack jobs.  As an experimental job I was getting good runs, but in
> the last day and a bit, all runs have started failing.
> 
> The failing tests are varied; a small sample I pulled:
> 
> [1]
> tempest.thirdparty.boto.test_ec2_instance_run.InstanceRunTest.test_compute_with_volumes
> [2]
> tempest.scenario.test_snapshot_pattern.TestSnapshotPattern.test_snapshot_pattern[compute,image,network]
> [3]
> tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance[compute,image,network]
> 
> The common thread is that they can't ssh to the cirros instance
> started up.
> 
> So far I can not replicate this locally.  I know there were some
> firewalld/neutron issues, but this is not a neutron job.
> 
> Unfortunately, I'm about to head out the door on PTO until 2015-01-27.
> I don't like the idea of this being broken while I don't have time to
> look at it, so I'm hoping you can help out.
> 
> There is a failing f21 machine on hold at
> 
>  [email protected]
Sanitized.
> 
> I've attached a private key that should let you log in.  This
> particular run failed in [4]:
> 
>  
> tempest.thirdparty.boto.test_ec2_instance_run.InstanceRunTest.test_compute_with_volumes
>  
> tempest.scenario.test_minimum_basic.TestMinimumBasicScenario.test_minimum_basic_scenario[compute,image,network,volume]
> 
> Sorry I haven't got very far in debugging this.  Nothing obviously
> jumped out at me in the logs, but I only had a brief look.  I'm hoping
> as the best tempest guy I know you can find some time to take a look
> at this in my absence :)
> 
> Thanks,
> 
> -i
> 
> [1]
> http://logs.openstack.org/03/147303/1/check/check-tempest-dsvm-f21/3d0c86d/console.html
> [2]
> http://logs.openstack.org/09/147209/2/check/check-tempest-dsvm-f21/83444c9/console.html
> [3]
> http://logs.openstack.org/71/141971/5/check/check-tempest-dsvm-f21/95b1574/console.html
> [4] https://jenkins06.openstack.org/job/check-tempest-dsvm-f21/8/console
> 

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] upstream f21 devstack test

Reply via email to