On Monday, 20 January 2014, 15:33, Jay Pipes <jaypi...@gmail.com> wrote:
>Sorry for top-posting -- using web mail client. no worries - it doesn't bother me. > >Is it possible to change the retry interval in Cirros (or cloud-init?) so that >the backoff is less than 60 seconds? I think the udhcpc command line parameters are baked into the image. It's part of BusyBox, and I'm not even sure if it's configurable from a script/text file. > >Best, > -jay > > > > >On Mon, Jan 20, 2014 at 10:23 AM, Darragh O'Reilly ><dara2002-openst...@yahoo.com> wrote: > > >>I did a test to see what the dhcp client on cirros does. I killed the dhcp >>agent and started an instance. The instance sent the first dhcp offer after >>about 35 sec. Then another 60 sec later, and a final one after another 60 sec. >> >> >>So a revised theory for what happened is this: >> >>t=0 tempest starts vm and starts polling for ACTIVE status >>t=20 instance-->ACTIVE and tempest starts polling the floating ip for 60 sec >>t=40 instance does a dhcp discover - no response - so sets a timer for 60 sec >>t=45 ovs-agent sets the port vlan >>t=80 tempest gives up and kills vm >>t=100 instance would have sent another dhcp discover now if it had been let >>live >> >>I think it would be worth trying to change that test to poll for 120 seconds >>instead of 60. >> >> >> >>On Monday, 20 January 2014, 11:23, Darragh O'Reilly >><dara2002-openst...@yahoo.com> wrote: >> >>Hi Salvatore, >>> >>> >>>I presume it's this one? >>>http://logs.openstack.org/38/65838/4/check/check-tempest-dsvm-neutron-isolated/d108e4a/logs/tempest.txt.gz?#_2014-01-19_20_50_14_604 >>> >>> >>>Is it true that the cirros image just fires off a few dhcp discovers and >>>then gives up? If so, then maybe it did so before the tagging happened. Do >>>we have the instance console log? It took about 45 seconds from when the >>>port was created to when it was tagged. >>> >>> >>>2014-01-19 20:48:57.412 8142 DEBUG neutron.agent.linux.ovsdb_monitor [-] >>>Output received from ovsdb monitor: {"data":[["3602a7b2-b559-4709-9bf0-53ae2af68d06","insert","tap496b808c-b5"]],"headings":["row","action","name"]} >>><snip> >>>2014-01-19 20:49:41.925 8142 DEBUG neutron.agent.linux.utils [-] >>>Command: ['sudo', '/usr/local/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ovs-vsctl', '--timeout=10', 'set', 'Port', 'tap496b808c-b5', 'tag=64'] >>>Exit code: 0 >>> >>> >>>Darragh. >>> >>> >>> >>>>I have been seeing in the past 2 days timeout failures on gate jobs which I >>>>am struggling to explain. An example is available in [1] >>>>These are the usual failure that we associate with bug 1253896, but this >>>>time I can verify that: >>>>- The floating IP is correctly wired (IP and NAT rules) >>>>- The DHCP port is correctly wired, as well as the VM port and the router >>>>port >>>>- The DHCP agent is correctly started for the network >>>> >>>>However, no DHCP DISCOVER request is sent. Only the DHCP RELEASE message is >>>>seen. >>>>Any help at interpreting the logs will be appreciated. >>>> >>>> >>>>Salvatore >>>> >>>>[1] http://logs.openstack.org/38/65838 >>> >>> >>> >>_______________________________________________ >>OpenStack-dev mailing list >>OpenStack-dev@lists.openstack.org >>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> >> > > > _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev