Re: [openstack-dev] [Neutron] Apparently weird timeout issue

Darragh O'Reilly Mon, 20 Jan 2014 07:58:03 -0800

On Monday, 20 January 2014, 15:33, Jay Pipes <[email protected]> wrote:


>Sorry for top-posting -- using web mail client.
no worries - it doesn't bother me.
>
>Is it possible to change the retry interval in Cirros (or cloud-init?) so that 
>the backoff is less than 60 seconds?
I think the udhcpc command line parameters are baked into the image. It's part 
of BusyBox, and I'm not even sure if it's configurable from a script/text file.
>
>Best,
>
-jay
>
>
>
>
>On Mon, Jan 20, 2014 at 10:23 AM, Darragh O'Reilly 
><[email protected]> wrote:
>
>
>>I did a test to see what the dhcp client on cirros does. I killed the dhcp 
>>agent and started an instance. The instance sent the first dhcp offer after 
>>about 35 sec. Then another 60 sec later, and a final one after another 60 sec.
>>
>>
>>So a revised theory for what happened is this:  
>>
>>t=0 tempest starts vm and starts polling for ACTIVE status
>>t=20 instance-->ACTIVE and tempest starts polling the floating ip for 60 sec
>>t=40 instance does a dhcp discover - no response - so sets a timer for 60 sec
>>t=45 ovs-agent sets the port vlan
>>t=80 tempest gives up and kills vm
>>t=100 instance would have sent another dhcp discover now if it had been let 
>>live
>>
>>I think it would be worth trying to change that test to poll for 120 seconds 
>>instead of 60.
>>
>>
>>
>>On Monday, 20 January 2014, 11:23, Darragh O'Reilly 
>><[email protected]> wrote:
>> 
>>Hi Salvatore,
>>>
>>>
>>>I presume it's this one? 
>>>http://logs.openstack.org/38/65838/4/check/check-tempest-dsvm-neutron-isolated/d108e4a/logs/tempest.txt.gz?#_2014-01-19_20_50_14_604
>>>
>>>
>>>Is it true that the cirros image just fires off a few dhcp discovers and 
>>>then gives up? If so, then maybe it did so before the tagging happened. Do 
>>>we have the instance console log? It took about 45 seconds from when the 
>>>port was created to when it was tagged.
>>>
>>>
>>>2014-01-19 20:48:57.412 8142 DEBUG neutron.agent.linux.ovsdb_monitor [-] 
>>>Output 
received from ovsdb monitor: 
{"data":[["3602a7b2-b559-4709-9bf0-53ae2af68d06","insert","tap496b808c-b5"]],"headings":["row","action","name"]}
>>><snip>
>>>2014-01-19 20:49:41.925 8142 DEBUG neutron.agent.linux.utils [-] 
>>>Command:
['sudo', '/usr/local/bin/neutron-rootwrap', 
'/etc/neutron/rootwrap.conf', 'ovs-vsctl', '--timeout=10', 'set', 
'Port', 'tap496b808c-b5', 'tag=64']
>>>Exit code: 0
>>>
>>>
>>>Darragh.
>>>
>>>
>>>
>>>>I have been seeing in the past 2 days timeout failures on gate jobs which I
>>>>am struggling to explain. An example is
available in [1]
>>>>These are the usual failure that we associate with bug 1253896, but this
>>>>time I can verify that:
>>>>- The floating IP is correctly wired (IP and NAT rules)
>>>>- The DHCP port is correctly wired, as well as the VM port and the router
>>>>port
>>>>- The DHCP agent is correctly started for the network
>>>>
>>>>However, no DHCP DISCOVER request is sent. Only the DHCP RELEASE message is
>>>>seen.
>>>>Any help at interpreting the logs will be appreciated.
>>>>
>>>>
>>>>Salvatore
>>>>
>>>>[1] http://logs.openstack.org/38/65838
>>>
>>>
>>>
>>_______________________________________________
>>OpenStack-dev mailing list
>>[email protected]
>>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
>
>

_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron] Apparently weird timeout issue

Reply via email to