Hi all, I noticed lately a number of job failures in neutron gate that all result in job timeouts. I describe gate-tempest-dsvm-neutron-dvr-ubuntu-xenial job below, though I see timeouts happening in other jobs too.
The failure mode is all operations, ./stack.sh and each tempest test take significantly more time (like 50% to 150% more, which results in job timeout triggered). An example of what I mean can be found in [1]. A good run usually takes ~20 minutes to stack up devstack; then ~40 minutes to pass full suite; a bad run usually takes ~30 minutes for ./stack.sh; and then 1:20h+ until it is killed due to timeout. It affects different clouds (we see rax, internap, infracloud-vanilla, ovh jobs affected; we haven't seen osic though). It can't be e.g. slow pypi or apt mirrors because then we would see slowdown in ./stack.sh phase only. We can't be sure that CPUs are the same, and devstack does not seem to dump /proc/cpuinfo anywhere (in the end, it's all virtual, so not sure if it would help anyway). Neither we have a way to learn whether slowliness could be a result of adherence to RFC1149. ;) We discussed the matter in neutron channel [2] though couldn't figure out the culprit, or where to go next. At this point we assume it's not neutron's fault, and we hope others (infra?) may have suggestions on where to look. [1] http://logs.openstack.org/95/429095/2/check/gate-tempest-dsvm-neutron-dvr-ubuntu-xenial/35aa22f/console.html#_2017-02-09_04_47_12_874550 [2] http://eavesdrop.openstack.org/irclogs/%23openstack-neutron/%23openstack-neutron.2017-02-10.log.html#t2017-02-10T04:06:01 Thanks, Ihar __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev