Re: [openstack-dev] [gate][neutron][infra] tempest jobs timing out due to general sluggishness of the node?

Antonio Ojea Fri, 10 Feb 2017 00:11:00 -0800

I guess it's an infra issue, specifically related to the storage, or the
network that provide the storage.


If you look at the syslog file [1] , there are a lot of this entries:

Feb 09 04:20:42
<http://logs.openstack.org/95/429095/2/check/gate-tempest-dsvm-neutron-dvr-ubuntu-xenial/35aa22f/logs/syslog.txt.gz#_Feb_09_04_20_42>
ubuntu-xenial-rax-ord-7193667 tgtd[8542]: tgtd:
iscsi_task_tx_start(2024) no more dataFeb 09 04:20:42
<http://logs.openstack.org/95/429095/2/check/gate-tempest-dsvm-neutron-dvr-ubuntu-xenial/35aa22f/logs/syslog.txt.gz#_Feb_09_04_20_42>
ubuntu-xenial-rax-ord-7193667 tgtd[8542]: tgtd:
iscsi_task_tx_start(1996) found a task 71 131072 0 0Feb 09 04:20:42
<http://logs.openstack.org/95/429095/2/check/gate-tempest-dsvm-neutron-dvr-ubuntu-xenial/35aa22f/logs/syslog.txt.gz#_Feb_09_04_20_42>
ubuntu-xenial-rax-ord-7193667 tgtd[8542]: tgtd:
iscsi_data_rsp_build(1136) 131072 131072 0 26214471Feb 09 04:20:42
<http://logs.openstack.org/95/429095/2/check/gate-tempest-dsvm-neutron-dvr-ubuntu-xenial/35aa22f/logs/syslog.txt.gz#_Feb_09_04_20_42>
ubuntu-xenial-rax-ord-7193667 tgtd[8542]: tgtd: __cmd_done(1281) (nil)
0x2563000 0 131072

grep tgtd syslog.txt.gz| wc
  139602 1710808 15699432

[1]
http://logs.openstack.org/95/429095/2/check/gate-tempest-dsvm-neutron-dvr-ubuntu-xenial/35aa22f/logs/syslog.txt.gz



On Fri, Feb 10, 2017 at 5:59 AM, Ihar Hrachyshka <ihrac...@redhat.com>
wrote:

> Hi all,
>
> I noticed lately a number of job failures in neutron gate that all
> result in job timeouts. I describe
> gate-tempest-dsvm-neutron-dvr-ubuntu-xenial job below, though I see
> timeouts happening in other jobs too.
>
> The failure mode is all operations, ./stack.sh and each tempest test
> take significantly more time (like 50% to 150% more, which results in
> job timeout triggered). An example of what I mean can be found in [1].
>
> A good run usually takes ~20 minutes to stack up devstack; then ~40
> minutes to pass full suite; a bad run usually takes ~30 minutes for
> ./stack.sh; and then 1:20h+ until it is killed due to timeout.
>
> It affects different clouds (we see rax, internap, infracloud-vanilla,
> ovh jobs affected; we haven't seen osic though). It can't be e.g. slow
> pypi or apt mirrors because then we would see slowdown in ./stack.sh
> phase only.
>
> We can't be sure that CPUs are the same, and devstack does not seem to
> dump /proc/cpuinfo anywhere (in the end, it's all virtual, so not sure
> if it would help anyway). Neither we have a way to learn whether
> slowliness could be a result of adherence to RFC1149. ;)
>
> We discussed the matter in neutron channel [2] though couldn't figure
> out the culprit, or where to go next. At this point we assume it's not
> neutron's fault, and we hope others (infra?) may have suggestions on
> where to look.
>
> [1] http://logs.openstack.org/95/429095/2/check/gate-tempest-
> dsvm-neutron-dvr-ubuntu-xenial/35aa22f/console.html#_
> 2017-02-09_04_47_12_874550
> [2] http://eavesdrop.openstack.org/irclogs/%23openstack-
> neutron/%23openstack-neutron.2017-02-10.log.html#t2017-02-10T04:06:01
>
> Thanks,
> Ihar
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [gate][neutron][infra] tempest jobs timing out due to general sluggishness of the node?

Reply via email to