Re: [openstack-dev] [puppet][qa][ubuntu][neutron] Xenial Neutron Timeouts

2017-11-14 Thread Mohammed Naser
Jens,

That's quite an interesting catch.  I'm reaching out to the author of
this change to get some more information.

Thanks,
Mohammed

On Tue, Nov 14, 2017 at 2:02 PM, Jens Harbott  wrote:
> 2017-11-14 16:29 GMT+00:00 Mohammed Naser :
>> Hi everyone,
>>
>> Thank you so much for the work on this, I'm sure we can progress with
>> this together.  I have noticed that this only occurs in master and
>> never in the stable branches.  Also, it only occurs under Ubuntu (so
>> maybe something related to mod_wsgi version?)
>>
>> Given that we don't have any "master" built packages for Ubuntu, we
>> test against the latest release which is the pike release.
>>
>> https://github.com/openstack/puppet-openstack-integration/blob/master/manifests/repos.pp#L6-L10
>>
>> I've noticed the issue is not as present in older branches but much
>> more visible in master.
>
> So does the issue not happen at all for stable/pike or less often?
> Anyway that would seem to indicate not an issue with the Ubuntu
> packages, but with the way they are deployed.
>
> If you look at [1] you can see that for pike you setup nova-api wsgi
> with workers=1 and threads=$::os_workers, which in master was swapped,
> see [2]. I'd suggest testing a revert of that change.
>
> [1] 
> https://github.com/openstack/puppet-nova/blob/stable/pike/manifests/wsgi/apache_api.pp
> [2] 
> https://github.com/openstack/puppet-nova/commit/df638e2526d2d957318519dfcfb9098cb7726095
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [puppet][qa][ubuntu][neutron] Xenial Neutron Timeouts

2017-11-14 Thread Jens Harbott
2017-11-14 16:29 GMT+00:00 Mohammed Naser :
> Hi everyone,
>
> Thank you so much for the work on this, I'm sure we can progress with
> this together.  I have noticed that this only occurs in master and
> never in the stable branches.  Also, it only occurs under Ubuntu (so
> maybe something related to mod_wsgi version?)
>
> Given that we don't have any "master" built packages for Ubuntu, we
> test against the latest release which is the pike release.
>
> https://github.com/openstack/puppet-openstack-integration/blob/master/manifests/repos.pp#L6-L10
>
> I've noticed the issue is not as present in older branches but much
> more visible in master.

So does the issue not happen at all for stable/pike or less often?
Anyway that would seem to indicate not an issue with the Ubuntu
packages, but with the way they are deployed.

If you look at [1] you can see that for pike you setup nova-api wsgi
with workers=1 and threads=$::os_workers, which in master was swapped,
see [2]. I'd suggest testing a revert of that change.

[1] 
https://github.com/openstack/puppet-nova/blob/stable/pike/manifests/wsgi/apache_api.pp
[2] 
https://github.com/openstack/puppet-nova/commit/df638e2526d2d957318519dfcfb9098cb7726095

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [puppet][qa][ubuntu][neutron] Xenial Neutron Timeouts

2017-11-14 Thread Mohammed Naser
Hi everyone,

Thank you so much for the work on this, I'm sure we can progress with
this together.  I have noticed that this only occurs in master and
never in the stable branches.  Also, it only occurs under Ubuntu (so
maybe something related to mod_wsgi version?)

Given that we don't have any "master" built packages for Ubuntu, we
test against the latest release which is the pike release.

https://github.com/openstack/puppet-openstack-integration/blob/master/manifests/repos.pp#L6-L10

I've noticed the issue is not as present in older branches but much
more visible in master.

Thanks,
Mohammed

On Tue, Nov 14, 2017 at 6:21 AM, Tobias Urdin  wrote:
> Yea, I've been scavenging the logs for any kind of indicator on what
> might have gone wrong but I can't see anything
> related to a deadlock even though I'm very certain that's the issue but
> don't know what's causing it.
>
> Perhaps we will need to manually recreate this issue and then
> troubleshoot it manually.
> The apache2 mod_wsgi config should be OK according to the docs [1].
>
> Best regards
>
> [1]
> http://modwsgi.readthedocs.io/en/develop/configuration-directives/WSGIDaemonProcess.html
>
> On 11/14/2017 11:12 AM, Jens Harbott wrote:
>> 2017-11-14 8:24 GMT+00:00 Tobias Urdin :
>>> Trying to trace this, tempest calls the POST /servers//action
>>> API endpoint for the nova compute api.
>>>
>>> https://github.com/openstack/tempest/blob/master/tempest/lib/services/compute/floating_ips_client.py#L82
>>>
>>> Nova then takes the requests and tries to do this floating ip association
>>> using the neutron server api.
>>>
>>> http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/nova/nova-api.txt.gz
>>>
>>> 2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips
>>> [req-7f810cc7-a498-4bf4-b27e-8fc80d652785 42526a28b1a14c629b83908b2d75c647
>>> 2493426e6a3c4253a60c0b7eb35cfe19 - default default] Unable to associate
>>> floating IP 172.24.5.17 to fixed IP 10.100.0.8 for instance
>>> d265626a-77c1-4d2f-8260-46abe548293e. Error: Request to
>>> https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c
>>> timed out: ConnectTimeout: Request to
>>> https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c
>>> timed out
>>>
>>> Checking that timestamp in the neutron-server logs:
>>> http://paste.openstack.org/show/626240/
>>>
>>> We can see that during this timestamp right before at 23:12:30.377 and then
>>> after 23:12:35.611 everything seems to be doing fine.
>>> So there is some connectivity issues to the neutron API from where the Nova
>>> API is running causing a timeout.
>>>
>>> Now some more questions would be:
>>>
>>> * Why is the return code 400? Are we being fooled or is it actually a
>>> connection timeout.
>>> * Is the Neutron API stuck causing the failed connection? All talk are done
>>> over loopback so chance of a problem there is very low.
>>> * Any firewall catching this? Not likely since the agent processes requests
>>> right before and after.
>>>
>>> I can't find anything interesting in the overall other system logs that
>>> could explain that.
>>> Back to the logs!
>> I'm pretty certain that this is a deadlock between nova and neutron,
>> though I cannot put my finger on the exact spot yet. But looking at
>> the neutron log that you extracted you can see that neutron indeed
>> tries to give a successful answer to the fip request just after nova
>> has given up waiting for it (seems the timeout is 30s here):
>>
>> 2017-10-29 23:12:35.932 18958 INFO neutron.wsgi
>> [req-e737b7dd-ed9c-46a7-911b-eb77efe11aa8
>> 42526a28b1a14c629b83908b2d75c647 2493426e6a3c4253a60c0b7eb35cfe19 -
>> default default] 127.0.0.1 "PUT
>> /v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c HTTP/1.1"
>> status: 200  len: 746 time: 30.4427412
>>
>> Also, looking at
>> http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/apache_config/10-nova_api_wsgi.conf.txt.gz
>> is seems that nova-api is started with two processes and one thread,
>> not sure if that means two processes with one thread each or only one
>> thread total, anyway nova-api might be getting stuck there.
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack 

Re: [openstack-dev] [puppet][qa][ubuntu][neutron] Xenial Neutron Timeouts

2017-11-14 Thread Tobias Urdin
Yea, I've been scavenging the logs for any kind of indicator on what
might have gone wrong but I can't see anything
related to a deadlock even though I'm very certain that's the issue but
don't know what's causing it.

Perhaps we will need to manually recreate this issue and then
troubleshoot it manually.
The apache2 mod_wsgi config should be OK according to the docs [1].

Best regards

[1]
http://modwsgi.readthedocs.io/en/develop/configuration-directives/WSGIDaemonProcess.html

On 11/14/2017 11:12 AM, Jens Harbott wrote:
> 2017-11-14 8:24 GMT+00:00 Tobias Urdin :
>> Trying to trace this, tempest calls the POST /servers//action
>> API endpoint for the nova compute api.
>>
>> https://github.com/openstack/tempest/blob/master/tempest/lib/services/compute/floating_ips_client.py#L82
>>
>> Nova then takes the requests and tries to do this floating ip association
>> using the neutron server api.
>>
>> http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/nova/nova-api.txt.gz
>>
>> 2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips
>> [req-7f810cc7-a498-4bf4-b27e-8fc80d652785 42526a28b1a14c629b83908b2d75c647
>> 2493426e6a3c4253a60c0b7eb35cfe19 - default default] Unable to associate
>> floating IP 172.24.5.17 to fixed IP 10.100.0.8 for instance
>> d265626a-77c1-4d2f-8260-46abe548293e. Error: Request to
>> https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c
>> timed out: ConnectTimeout: Request to
>> https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c
>> timed out
>>
>> Checking that timestamp in the neutron-server logs:
>> http://paste.openstack.org/show/626240/
>>
>> We can see that during this timestamp right before at 23:12:30.377 and then
>> after 23:12:35.611 everything seems to be doing fine.
>> So there is some connectivity issues to the neutron API from where the Nova
>> API is running causing a timeout.
>>
>> Now some more questions would be:
>>
>> * Why is the return code 400? Are we being fooled or is it actually a
>> connection timeout.
>> * Is the Neutron API stuck causing the failed connection? All talk are done
>> over loopback so chance of a problem there is very low.
>> * Any firewall catching this? Not likely since the agent processes requests
>> right before and after.
>>
>> I can't find anything interesting in the overall other system logs that
>> could explain that.
>> Back to the logs!
> I'm pretty certain that this is a deadlock between nova and neutron,
> though I cannot put my finger on the exact spot yet. But looking at
> the neutron log that you extracted you can see that neutron indeed
> tries to give a successful answer to the fip request just after nova
> has given up waiting for it (seems the timeout is 30s here):
>
> 2017-10-29 23:12:35.932 18958 INFO neutron.wsgi
> [req-e737b7dd-ed9c-46a7-911b-eb77efe11aa8
> 42526a28b1a14c629b83908b2d75c647 2493426e6a3c4253a60c0b7eb35cfe19 -
> default default] 127.0.0.1 "PUT
> /v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c HTTP/1.1"
> status: 200  len: 746 time: 30.4427412
>
> Also, looking at
> http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/apache_config/10-nova_api_wsgi.conf.txt.gz
> is seems that nova-api is started with two processes and one thread,
> not sure if that means two processes with one thread each or only one
> thread total, anyway nova-api might be getting stuck there.
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [puppet][qa][ubuntu][neutron] Xenial Neutron Timeouts

2017-11-14 Thread Jens Harbott
2017-11-14 8:24 GMT+00:00 Tobias Urdin :
> Trying to trace this, tempest calls the POST /servers//action
> API endpoint for the nova compute api.
>
> https://github.com/openstack/tempest/blob/master/tempest/lib/services/compute/floating_ips_client.py#L82
>
> Nova then takes the requests and tries to do this floating ip association
> using the neutron server api.
>
> http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/nova/nova-api.txt.gz
>
> 2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips
> [req-7f810cc7-a498-4bf4-b27e-8fc80d652785 42526a28b1a14c629b83908b2d75c647
> 2493426e6a3c4253a60c0b7eb35cfe19 - default default] Unable to associate
> floating IP 172.24.5.17 to fixed IP 10.100.0.8 for instance
> d265626a-77c1-4d2f-8260-46abe548293e. Error: Request to
> https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c
> timed out: ConnectTimeout: Request to
> https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c
> timed out
>
> Checking that timestamp in the neutron-server logs:
> http://paste.openstack.org/show/626240/
>
> We can see that during this timestamp right before at 23:12:30.377 and then
> after 23:12:35.611 everything seems to be doing fine.
> So there is some connectivity issues to the neutron API from where the Nova
> API is running causing a timeout.
>
> Now some more questions would be:
>
> * Why is the return code 400? Are we being fooled or is it actually a
> connection timeout.
> * Is the Neutron API stuck causing the failed connection? All talk are done
> over loopback so chance of a problem there is very low.
> * Any firewall catching this? Not likely since the agent processes requests
> right before and after.
>
> I can't find anything interesting in the overall other system logs that
> could explain that.
> Back to the logs!

I'm pretty certain that this is a deadlock between nova and neutron,
though I cannot put my finger on the exact spot yet. But looking at
the neutron log that you extracted you can see that neutron indeed
tries to give a successful answer to the fip request just after nova
has given up waiting for it (seems the timeout is 30s here):

2017-10-29 23:12:35.932 18958 INFO neutron.wsgi
[req-e737b7dd-ed9c-46a7-911b-eb77efe11aa8
42526a28b1a14c629b83908b2d75c647 2493426e6a3c4253a60c0b7eb35cfe19 -
default default] 127.0.0.1 "PUT
/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c HTTP/1.1"
status: 200  len: 746 time: 30.4427412

Also, looking at
http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/apache_config/10-nova_api_wsgi.conf.txt.gz
is seems that nova-api is started with two processes and one thread,
not sure if that means two processes with one thread each or only one
thread total, anyway nova-api might be getting stuck there.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [puppet][qa][ubuntu][neutron] Xenial Neutron Timeouts

2017-11-14 Thread Tobias Urdin
Am I actually hallucinating or is it the nova API that cannot communicate with 
Keystone?
Cannot substantiate this with any logs for keystone.

2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips 
[req-7f810cc7-a498-4bf4-b27e-8fc80d652785 42526a28b1a14c629b83908b2d75c647 
2493426e6a3c4253a60c0b7eb35cfe19 - default default] Unable to associate 
floating IP 172.24.5.17 to fixed IP 10.100.0.8 for instance 
d265626a-77c1-4d2f-8260-46abe548293e. Error: Request to 
https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c 
timed out: ConnectTimeout: Request to 
https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c 
timed out
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips 
Traceback (most recent call last):
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips   
File 
"/usr/lib/python2.7/dist-packages/nova/api/openstack/compute/floating_ips.py", 
line 267, in _add_floating_ip
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips 
fixed_address=fixed_address)
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips   
File "/usr/lib/python2.7/dist-packages/nova/network/base_api.py", line 83, in 
wrapper
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips 
res = f(self, context, *args, **kwargs)
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips   
File "/usr/lib/python2.7/dist-packages/nova/network/neutronv2/api.py", line 
1759, in associate_floating_ip
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips 
client.update_floatingip(fip['id'], {'floatingip': param})
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips   
File "/usr/lib/python2.7/dist-packages/nova/network/neutronv2/api.py", line 99, 
in wrapper
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips 
ret = obj(*args, **kwargs)
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips   
File "/usr/lib/python2.7/dist-packages/neutronclient/v2_0/client.py", line 935, 
in update_floatingip
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips 
return self.put(self.floatingip_path % (floatingip), body=body)
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips   
File "/usr/lib/python2.7/dist-packages/nova/network/neutronv2/api.py", line 99, 
in wrapper
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips 
ret = obj(*args, **kwargs)
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips   
File "/usr/lib/python2.7/dist-packages/neutronclient/v2_0/client.py", line 361, 
in put
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips 
headers=headers, params=params)
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips   
File "/usr/lib/python2.7/dist-packages/nova/network/neutronv2/api.py", line 99, 
in wrapper
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips 
ret = obj(*args, **kwargs)
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips   
File "/usr/lib/python2.7/dist-packages/neutronclient/v2_0/client.py", line 329, 
in retry_request
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips 
headers=headers, params=params)
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips   
File "/usr/lib/python2.7/dist-packages/nova/network/neutronv2/api.py", line 99, 
in wrapper
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips 
ret = obj(*args, **kwargs)
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips   
File "/usr/lib/python2.7/dist-packages/neutronclient/v2_0/client.py", line 280, 
in do_request
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips 
resp, replybody = self.httpclient.do_request(action, method, body=body)
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips   
File "/usr/lib/python2.7/dist-packages/neutronclient/client.py", line 342, in 
do_request
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips 
return self.request(url, method, **kwargs)
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips   
File "/usr/lib/python2.7/dist-packages/neutronclient/client.py", line 330, in 
request
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips 
resp = super(SessionClient, self).request(*args, **kwargs)
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips   
File "/usr/lib/python2.7/dist-packages/keystoneauth1/adapter.py", line 192, in 
request
2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips 
return self.session.request(url, method, **kwargs)
2017-10-29 23:12:35.521 17800 ERROR 

Re: [openstack-dev] [puppet][qa][ubuntu][neutron] Xenial Neutron Timeouts

2017-11-14 Thread Tobias Urdin
Trying to trace this, tempest calls the POST /servers//action API 
endpoint for the nova compute api.

https://github.com/openstack/tempest/blob/master/tempest/lib/services/compute/floating_ips_client.py#L82

Nova then takes the requests and tries to do this floating ip association using 
the neutron server api.

http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/nova/nova-api.txt.gz

2017-10-29 23:12:35.521 17800 ERROR nova.api.openstack.compute.floating_ips 
[req-7f810cc7-a498-4bf4-b27e-8fc80d652785 42526a28b1a14c629b83908b2d75c647 
2493426e6a3c4253a60c0b7eb35cfe19 - default default] Unable to associate 
floating IP 172.24.5.17 to fixed IP 10.100.0.8 for instance 
d265626a-77c1-4d2f-8260-46abe548293e. Error: Request to 
https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c 
timed out: ConnectTimeout: Request to 
https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c 
timed out

Checking that timestamp in the neutron-server logs:
http://paste.openstack.org/show/626240/

We can see that during this timestamp right before at 23:12:30.377 and then 
after 23:12:35.611 everything seems to be doing fine.
So there is some connectivity issues to the neutron API from where the Nova API 
is running causing a timeout.

Now some more questions would be:

* Why is the return code 400? Are we being fooled or is it actually a 
connection timeout.
* Is the Neutron API stuck causing the failed connection? All talk are done 
over loopback so chance of a problem there is very low.
* Any firewall catching this? Not likely since the agent processes requests 
right before and after.

I can't find anything interesting in the overall other system logs that could 
explain that.
Back to the logs!

Best regards
Tobias


On 11/14/2017 08:35 AM, Tobias Urdin wrote:

Hello,

Same here, I will continue looking at this aswell.

Would be great if we could get some input from a neutron dev with good insight 
into the project.


Can we backtrace the timed out message from where it's thrown/returned.

Error: Request to 
https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c 
timed out', u'code': 400}

I'm interested why we would get 400 code back, the floating ip operations 
should be async right so this would be the response from the API layer with 
information from the database that should

return more information.


Best regards

On 11/14/2017 07:41 AM, Arnaud MORIN wrote:
Hey,
Do you know if the bug appears on a specific Ubuntu / openstack version?
As far as I remember it was not related to the puppet branch?  I mean the bug 
is existing on master but also on newton puppet branches, right?

We are using Ubuntu in my company so we would love to see that continue ;)
I'll try to take a look again.

Cheers.

Le 14 nov. 2017 00:00, "Mohammed Naser" 
> a écrit :
Hi,

Hope that everyone had safe travels and enjoyed their time at Sydney
(and those who weren't there enjoyed a bit of quiet time!).  I'm just
sending this email if anyone had a chance to look more into this (or
perhaps we can get some help if there are any Canonical folks on the
list?)

I would be really sad if we had to drop Ubuntu/Debian support because
we cannot test it.  I think there are a lot of users out there using
it!  I'd be more than happy to provide any assistance/information in
troubleshooting this.

Thank you,
Mohammed

On Thu, Nov 2, 2017 at 1:10 PM, Mohammed Naser 
> wrote:
> On Thu, Nov 2, 2017 at 1:02 PM, Tobias Urdin 
> > wrote:
>> I've been staring at this for almost an hour now going through all the logs
>> and I can't really pin a point from
>>
>> where that error message is generated. Cannot find any references for the
>> timed out message that the API returns or the unable to associate part.
>>
>>
>> What I'm currently staring at is why would the instance fixed ip 172.24.5.17
>> be references as a network:router_gateway port in the OVS agent logs.
>>
>> 2017-10-29 23:19:27.591 11856 INFO
>> neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
>> [req-7274c6f7-18ef-420d-ad5a-9d0fe4eb35c6 - - - - -] Port
>> 053a625c-4227-41fb-9a26-45eda7bd2055 updated. Details: {'profile': {},
>> 'network_qos_policy_id': None, 'qos_policy_id': None,
>> 'allowed_address_pairs': [], 'admin_state_up': True, 'network_id':
>> 'f9647756-41ad-4ec5-af49-daefe410815e', 'segmentation_id': None,
>> 'fixed_ips': [{'subnet_id': 'a31c7115-1f3e-4220-8bdb-981b6df2e18c',
>> 'ip_address': '172.24.5.17'}], 'device_owner': u'network:router_gateway',
>> 'physical_network': u'external', 'mac_address': 'fa:16:3e:3b:ec:c3',
>> 'device': u'053a625c-4227-41fb-9a26-45eda7bd2055', 'port_security_enabled':
>> False, 'port_id': '053a625c-4227-41fb-9a26-45eda7bd2055', 'network_type':
>> u'flat', 

Re: [openstack-dev] [puppet][qa][ubuntu][neutron] Xenial Neutron Timeouts

2017-11-13 Thread Tobias Urdin
Hello,

Same here, I will continue looking at this aswell.

Would be great if we could get some input from a neutron dev with good insight 
into the project.


Can we backtrace the timed out message from where it's thrown/returned.

Error: Request to 
https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c 
timed out', u'code': 400}

I'm interested why we would get 400 code back, the floating ip operations 
should be async right so this would be the response from the API layer with 
information from the database that should

return more information.


Best regards

On 11/14/2017 07:41 AM, Arnaud MORIN wrote:
Hey,
Do you know if the bug appears on a specific Ubuntu / openstack version?
As far as I remember it was not related to the puppet branch?  I mean the bug 
is existing on master but also on newton puppet branches, right?

We are using Ubuntu in my company so we would love to see that continue ;)
I'll try to take a look again.

Cheers.

Le 14 nov. 2017 00:00, "Mohammed Naser" 
> a écrit :
Hi,

Hope that everyone had safe travels and enjoyed their time at Sydney
(and those who weren't there enjoyed a bit of quiet time!).  I'm just
sending this email if anyone had a chance to look more into this (or
perhaps we can get some help if there are any Canonical folks on the
list?)

I would be really sad if we had to drop Ubuntu/Debian support because
we cannot test it.  I think there are a lot of users out there using
it!  I'd be more than happy to provide any assistance/information in
troubleshooting this.

Thank you,
Mohammed

On Thu, Nov 2, 2017 at 1:10 PM, Mohammed Naser 
> wrote:
> On Thu, Nov 2, 2017 at 1:02 PM, Tobias Urdin 
> > wrote:
>> I've been staring at this for almost an hour now going through all the logs
>> and I can't really pin a point from
>>
>> where that error message is generated. Cannot find any references for the
>> timed out message that the API returns or the unable to associate part.
>>
>>
>> What I'm currently staring at is why would the instance fixed ip 172.24.5.17
>> be references as a network:router_gateway port in the OVS agent logs.
>>
>> 2017-10-29 23:19:27.591 11856 INFO
>> neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
>> [req-7274c6f7-18ef-420d-ad5a-9d0fe4eb35c6 - - - - -] Port
>> 053a625c-4227-41fb-9a26-45eda7bd2055 updated. Details: {'profile': {},
>> 'network_qos_policy_id': None, 'qos_policy_id': None,
>> 'allowed_address_pairs': [], 'admin_state_up': True, 'network_id':
>> 'f9647756-41ad-4ec5-af49-daefe410815e', 'segmentation_id': None,
>> 'fixed_ips': [{'subnet_id': 'a31c7115-1f3e-4220-8bdb-981b6df2e18c',
>> 'ip_address': '172.24.5.17'}], 'device_owner': u'network:router_gateway',
>> 'physical_network': u'external', 'mac_address': 'fa:16:3e:3b:ec:c3',
>> 'device': u'053a625c-4227-41fb-9a26-45eda7bd2055', 'port_security_enabled':
>> False, 'port_id': '053a625c-4227-41fb-9a26-45eda7bd2055', 'network_type':
>> u'flat', 'security_groups': []}
>>
>>
>> Anybody else seen anything interesting?
>
> Hi Tobias,
>
> Thanks for looking out.  I've spent a lot of time and I haven't
> successfully identified much, I can add the following
>
> - This issue is intermittent in CI
> - It does *not* happen on any specific providers, I've seen it fail on
> both OVH and Rackspace.
> - Not *all* floating iP assignments fail, if you look at the logs, you
> can see it attach a few successfully before failing
>
> But yeah, I'm still quite at a loss and not having this coverage isn't fun.
>
>>
>>
>> On 10/30/2017 11:08 PM, Brian Haley wrote:
>>
>> On 10/30/2017 05:46 PM, Matthew Treinish wrote:
>>
>>  From a quick glance at the logs my guess is that the issue is related
>> to this stack trace in the l3 agent logs:
>>
>> http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/neutron/neutron-l3-agent.txt.gz?level=TRACE#_2017-10-29_23_11_15_146
>>
>> I'm not sure what's causing it to complain there. But, I'm on a plane
>> right now (which is why this is a top post, sorry) so I can't really dig
>> much more than that. I'll try to take a deeper look at things later when
>> I'm on solid ground. (hopefully someone will beat me to it by then though)
>>
>> I don't think that l3-agent trace is it, as the failure is coming from
>> the API.  It's actually a trace that's happening due to the async nature
>> of how the agent runs arping, fix is
>> https://review.openstack.org/#/c/507914/ but it only removes the log noise.
>>
>> http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/neutron/neutron-server.txt.gz
>> has some tracebacks that look config related, possible missing DB table?
>>   But I haven't looked very closely.
>>
>> -Brian
>>
>>
>> On October 31, 2017 1:25:55 AM 

Re: [openstack-dev] [puppet][qa][ubuntu][neutron] Xenial Neutron Timeouts

2017-11-13 Thread Arnaud MORIN
Hey,
Do you know if the bug appears on a specific Ubuntu / openstack version?
As far as I remember it was not related to the puppet branch?  I mean the
bug is existing on master but also on newton puppet branches, right?

We are using Ubuntu in my company so we would love to see that continue ;)
I'll try to take a look again.

Cheers.

Le 14 nov. 2017 00:00, "Mohammed Naser"  a écrit :

> Hi,
>
> Hope that everyone had safe travels and enjoyed their time at Sydney
> (and those who weren't there enjoyed a bit of quiet time!).  I'm just
> sending this email if anyone had a chance to look more into this (or
> perhaps we can get some help if there are any Canonical folks on the
> list?)
>
> I would be really sad if we had to drop Ubuntu/Debian support because
> we cannot test it.  I think there are a lot of users out there using
> it!  I'd be more than happy to provide any assistance/information in
> troubleshooting this.
>
> Thank you,
> Mohammed
>
> On Thu, Nov 2, 2017 at 1:10 PM, Mohammed Naser 
> wrote:
> > On Thu, Nov 2, 2017 at 1:02 PM, Tobias Urdin 
> wrote:
> >> I've been staring at this for almost an hour now going through all the
> logs
> >> and I can't really pin a point from
> >>
> >> where that error message is generated. Cannot find any references for
> the
> >> timed out message that the API returns or the unable to associate part.
> >>
> >>
> >> What I'm currently staring at is why would the instance fixed ip
> 172.24.5.17
> >> be references as a network:router_gateway port in the OVS agent logs.
> >>
> >> 2017-10-29 23:19:27.591 11856 INFO
> >> neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
> >> [req-7274c6f7-18ef-420d-ad5a-9d0fe4eb35c6 - - - - -] Port
> >> 053a625c-4227-41fb-9a26-45eda7bd2055 updated. Details: {'profile': {},
> >> 'network_qos_policy_id': None, 'qos_policy_id': None,
> >> 'allowed_address_pairs': [], 'admin_state_up': True, 'network_id':
> >> 'f9647756-41ad-4ec5-af49-daefe410815e', 'segmentation_id': None,
> >> 'fixed_ips': [{'subnet_id': 'a31c7115-1f3e-4220-8bdb-981b6df2e18c',
> >> 'ip_address': '172.24.5.17'}], 'device_owner':
> u'network:router_gateway',
> >> 'physical_network': u'external', 'mac_address': 'fa:16:3e:3b:ec:c3',
> >> 'device': u'053a625c-4227-41fb-9a26-45eda7bd2055',
> 'port_security_enabled':
> >> False, 'port_id': '053a625c-4227-41fb-9a26-45eda7bd2055',
> 'network_type':
> >> u'flat', 'security_groups': []}
> >>
> >>
> >> Anybody else seen anything interesting?
> >
> > Hi Tobias,
> >
> > Thanks for looking out.  I've spent a lot of time and I haven't
> > successfully identified much, I can add the following
> >
> > - This issue is intermittent in CI
> > - It does *not* happen on any specific providers, I've seen it fail on
> > both OVH and Rackspace.
> > - Not *all* floating iP assignments fail, if you look at the logs, you
> > can see it attach a few successfully before failing
> >
> > But yeah, I'm still quite at a loss and not having this coverage isn't
> fun.
> >
> >>
> >>
> >> On 10/30/2017 11:08 PM, Brian Haley wrote:
> >>
> >> On 10/30/2017 05:46 PM, Matthew Treinish wrote:
> >>
> >>  From a quick glance at the logs my guess is that the issue is related
> >> to this stack trace in the l3 agent logs:
> >>
> >> http://logs.openstack.org/47/514347/1/check/puppet-
> openstack-integration-4-scenario001-tempest-ubuntu-
> xenial/ed5a657/logs/neutron/neutron-l3-agent.txt.gz?level=
> TRACE#_2017-10-29_23_11_15_146
> >>
> >> I'm not sure what's causing it to complain there. But, I'm on a plane
> >> right now (which is why this is a top post, sorry) so I can't really dig
> >> much more than that. I'll try to take a deeper look at things later when
> >> I'm on solid ground. (hopefully someone will beat me to it by then
> though)
> >>
> >> I don't think that l3-agent trace is it, as the failure is coming from
> >> the API.  It's actually a trace that's happening due to the async nature
> >> of how the agent runs arping, fix is
> >> https://review.openstack.org/#/c/507914/ but it only removes the log
> noise.
> >>
> >> http://logs.openstack.org/47/514347/1/check/puppet-
> openstack-integration-4-scenario001-tempest-ubuntu-
> xenial/ed5a657/logs/neutron/neutron-server.txt.gz
> >> has some tracebacks that look config related, possible missing DB table?
> >>   But I haven't looked very closely.
> >>
> >> -Brian
> >>
> >>
> >> On October 31, 2017 1:25:55 AM GMT+04:00, Mohammed Naser
> >>  wrote:
> >>
> >> Hi everyone,
> >>
> >> I'm looking for some help regarding an issue that we're having with
> >> the Puppet OpenStack modules, we've had very inconsistent failures
> in
> >> the Xenial with the following error:
> >>
> >>
> >> http://logs.openstack.org/47/514347/1/check/puppet-
> openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/
> >>
> >> http://logs.openstack.org/47/514347/1/check/puppet-
> 

Re: [openstack-dev] [puppet][qa][ubuntu][neutron] Xenial Neutron Timeouts

2017-11-13 Thread Mohammed Naser
Hi,

Hope that everyone had safe travels and enjoyed their time at Sydney
(and those who weren't there enjoyed a bit of quiet time!).  I'm just
sending this email if anyone had a chance to look more into this (or
perhaps we can get some help if there are any Canonical folks on the
list?)

I would be really sad if we had to drop Ubuntu/Debian support because
we cannot test it.  I think there are a lot of users out there using
it!  I'd be more than happy to provide any assistance/information in
troubleshooting this.

Thank you,
Mohammed

On Thu, Nov 2, 2017 at 1:10 PM, Mohammed Naser  wrote:
> On Thu, Nov 2, 2017 at 1:02 PM, Tobias Urdin  
> wrote:
>> I've been staring at this for almost an hour now going through all the logs
>> and I can't really pin a point from
>>
>> where that error message is generated. Cannot find any references for the
>> timed out message that the API returns or the unable to associate part.
>>
>>
>> What I'm currently staring at is why would the instance fixed ip 172.24.5.17
>> be references as a network:router_gateway port in the OVS agent logs.
>>
>> 2017-10-29 23:19:27.591 11856 INFO
>> neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
>> [req-7274c6f7-18ef-420d-ad5a-9d0fe4eb35c6 - - - - -] Port
>> 053a625c-4227-41fb-9a26-45eda7bd2055 updated. Details: {'profile': {},
>> 'network_qos_policy_id': None, 'qos_policy_id': None,
>> 'allowed_address_pairs': [], 'admin_state_up': True, 'network_id':
>> 'f9647756-41ad-4ec5-af49-daefe410815e', 'segmentation_id': None,
>> 'fixed_ips': [{'subnet_id': 'a31c7115-1f3e-4220-8bdb-981b6df2e18c',
>> 'ip_address': '172.24.5.17'}], 'device_owner': u'network:router_gateway',
>> 'physical_network': u'external', 'mac_address': 'fa:16:3e:3b:ec:c3',
>> 'device': u'053a625c-4227-41fb-9a26-45eda7bd2055', 'port_security_enabled':
>> False, 'port_id': '053a625c-4227-41fb-9a26-45eda7bd2055', 'network_type':
>> u'flat', 'security_groups': []}
>>
>>
>> Anybody else seen anything interesting?
>
> Hi Tobias,
>
> Thanks for looking out.  I've spent a lot of time and I haven't
> successfully identified much, I can add the following
>
> - This issue is intermittent in CI
> - It does *not* happen on any specific providers, I've seen it fail on
> both OVH and Rackspace.
> - Not *all* floating iP assignments fail, if you look at the logs, you
> can see it attach a few successfully before failing
>
> But yeah, I'm still quite at a loss and not having this coverage isn't fun.
>
>>
>>
>> On 10/30/2017 11:08 PM, Brian Haley wrote:
>>
>> On 10/30/2017 05:46 PM, Matthew Treinish wrote:
>>
>>  From a quick glance at the logs my guess is that the issue is related
>> to this stack trace in the l3 agent logs:
>>
>> http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/neutron/neutron-l3-agent.txt.gz?level=TRACE#_2017-10-29_23_11_15_146
>>
>> I'm not sure what's causing it to complain there. But, I'm on a plane
>> right now (which is why this is a top post, sorry) so I can't really dig
>> much more than that. I'll try to take a deeper look at things later when
>> I'm on solid ground. (hopefully someone will beat me to it by then though)
>>
>> I don't think that l3-agent trace is it, as the failure is coming from
>> the API.  It's actually a trace that's happening due to the async nature
>> of how the agent runs arping, fix is
>> https://review.openstack.org/#/c/507914/ but it only removes the log noise.
>>
>> http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/neutron/neutron-server.txt.gz
>> has some tracebacks that look config related, possible missing DB table?
>>   But I haven't looked very closely.
>>
>> -Brian
>>
>>
>> On October 31, 2017 1:25:55 AM GMT+04:00, Mohammed Naser
>>  wrote:
>>
>> Hi everyone,
>>
>> I'm looking for some help regarding an issue that we're having with
>> the Puppet OpenStack modules, we've had very inconsistent failures in
>> the Xenial with the following error:
>>
>>
>> http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/
>>
>> http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/testr_results.html.gz
>>  Details: {u'message': u'Unable to associate floating IP
>> 172.24.5.17   to fixed IP10.100.0.8
>>   for instance
>> d265626a-77c1-4d2f-8260-46abe548293e. Error: Request to
>>
>> https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c
>> timed out', u'code': 400}
>>
>> At this point, we're at a bit of a loss.  I've tried my best in order
>> to find the root cause however we have not been able to do this.  It
>> was persistent enough that we elected to go non-voting for our Xenial
>> 

Re: [openstack-dev] [puppet][qa][ubuntu][neutron] Xenial Neutron Timeouts

2017-11-02 Thread Mohammed Naser
On Thu, Nov 2, 2017 at 1:02 PM, Tobias Urdin  wrote:
> I've been staring at this for almost an hour now going through all the logs
> and I can't really pin a point from
>
> where that error message is generated. Cannot find any references for the
> timed out message that the API returns or the unable to associate part.
>
>
> What I'm currently staring at is why would the instance fixed ip 172.24.5.17
> be references as a network:router_gateway port in the OVS agent logs.
>
> 2017-10-29 23:19:27.591 11856 INFO
> neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
> [req-7274c6f7-18ef-420d-ad5a-9d0fe4eb35c6 - - - - -] Port
> 053a625c-4227-41fb-9a26-45eda7bd2055 updated. Details: {'profile': {},
> 'network_qos_policy_id': None, 'qos_policy_id': None,
> 'allowed_address_pairs': [], 'admin_state_up': True, 'network_id':
> 'f9647756-41ad-4ec5-af49-daefe410815e', 'segmentation_id': None,
> 'fixed_ips': [{'subnet_id': 'a31c7115-1f3e-4220-8bdb-981b6df2e18c',
> 'ip_address': '172.24.5.17'}], 'device_owner': u'network:router_gateway',
> 'physical_network': u'external', 'mac_address': 'fa:16:3e:3b:ec:c3',
> 'device': u'053a625c-4227-41fb-9a26-45eda7bd2055', 'port_security_enabled':
> False, 'port_id': '053a625c-4227-41fb-9a26-45eda7bd2055', 'network_type':
> u'flat', 'security_groups': []}
>
>
> Anybody else seen anything interesting?

Hi Tobias,

Thanks for looking out.  I've spent a lot of time and I haven't
successfully identified much, I can add the following

- This issue is intermittent in CI
- It does *not* happen on any specific providers, I've seen it fail on
both OVH and Rackspace.
- Not *all* floating iP assignments fail, if you look at the logs, you
can see it attach a few successfully before failing

But yeah, I'm still quite at a loss and not having this coverage isn't fun.

>
>
> On 10/30/2017 11:08 PM, Brian Haley wrote:
>
> On 10/30/2017 05:46 PM, Matthew Treinish wrote:
>
>  From a quick glance at the logs my guess is that the issue is related
> to this stack trace in the l3 agent logs:
>
> http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/neutron/neutron-l3-agent.txt.gz?level=TRACE#_2017-10-29_23_11_15_146
>
> I'm not sure what's causing it to complain there. But, I'm on a plane
> right now (which is why this is a top post, sorry) so I can't really dig
> much more than that. I'll try to take a deeper look at things later when
> I'm on solid ground. (hopefully someone will beat me to it by then though)
>
> I don't think that l3-agent trace is it, as the failure is coming from
> the API.  It's actually a trace that's happening due to the async nature
> of how the agent runs arping, fix is
> https://review.openstack.org/#/c/507914/ but it only removes the log noise.
>
> http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/neutron/neutron-server.txt.gz
> has some tracebacks that look config related, possible missing DB table?
>   But I haven't looked very closely.
>
> -Brian
>
>
> On October 31, 2017 1:25:55 AM GMT+04:00, Mohammed Naser
>  wrote:
>
> Hi everyone,
>
> I'm looking for some help regarding an issue that we're having with
> the Puppet OpenStack modules, we've had very inconsistent failures in
> the Xenial with the following error:
>
>
> http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/
>
> http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/testr_results.html.gz
>  Details: {u'message': u'Unable to associate floating IP
> 172.24.5.17   to fixed IP10.100.0.8
>   for instance
> d265626a-77c1-4d2f-8260-46abe548293e. Error: Request to
>
> https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c
> timed out', u'code': 400}
>
> At this point, we're at a bit of a loss.  I've tried my best in order
> to find the root cause however we have not been able to do this.  It
> was persistent enough that we elected to go non-voting for our Xenial
> gates, however, with no fix ahead of us, I feel like this is a waste
> of resources and we need to either fix this or drop CI for Ubuntu.  We
> don't deploy on Ubuntu and most of the developers working on the
> project don't either at this point, so we need a bit of resources.
>
> If you're a user of Puppet on Xenial, we need your help!  Without any
> resources going to fix this, we'd unfortunately have to drop support
> for Ubuntu because of the lack of resources to maintain it (or
> assistance).  We (Puppet OpenStack team) would be more than happy to
> work together to fix this so pop-in at #puppet-openstack or reply to
> this email and let's get this issue fixed.
>
> Thanks,
>   

Re: [openstack-dev] [puppet][qa][ubuntu][neutron] Xenial Neutron Timeouts

2017-11-02 Thread Tobias Urdin
I've been staring at this for almost an hour now going through all the logs and 
I can't really pin a point from

where that error message is generated. Cannot find any references for the timed 
out message that the API returns or the unable to associate part.


What I'm currently staring at is why would the instance fixed ip 172.24.5.17 be 
references as a network:router_gateway port in the OVS agent logs.

2017-10-29 
23:19:27.591
 11856 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent 
[req-7274c6f7-18ef-420d-ad5a-9d0fe4eb35c6 - - - - -] Port 
053a625c-4227-41fb-9a26-45eda7bd2055 updated. Details: {'profile': {}, 
'network_qos_policy_id': None, 'qos_policy_id': None, 'allowed_address_pairs': 
[], 'admin_state_up': True, 'network_id': 
'f9647756-41ad-4ec5-af49-daefe410815e', 'segmentation_id': None, 'fixed_ips': 
[{'subnet_id': 'a31c7115-1f3e-4220-8bdb-981b6df2e18c', 'ip_address': 
'172.24.5.17'}], 'device_owner': u'network:router_gateway', 'physical_network': 
u'external', 'mac_address': 'fa:16:3e:3b:ec:c3', 'device': 
u'053a625c-4227-41fb-9a26-45eda7bd2055', 'port_security_enabled': False, 
'port_id': '053a625c-4227-41fb-9a26-45eda7bd2055', 'network_type': u'flat', 
'security_groups': []}

Anybody else seen anything interesting?

On 10/30/2017 11:08 PM, Brian Haley wrote:

On 10/30/2017 05:46 PM, Matthew Treinish wrote:


 From a quick glance at the logs my guess is that the issue is related
to this stack trace in the l3 agent logs:

http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/neutron/neutron-l3-agent.txt.gz?level=TRACE#_2017-10-29_23_11_15_146

I'm not sure what's causing it to complain there. But, I'm on a plane
right now (which is why this is a top post, sorry) so I can't really dig
much more than that. I'll try to take a deeper look at things later when
I'm on solid ground. (hopefully someone will beat me to it by then though)



I don't think that l3-agent trace is it, as the failure is coming from
the API.  It's actually a trace that's happening due to the async nature
of how the agent runs arping, fix is
https://review.openstack.org/#/c/507914/ but it only removes the log noise.

http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/neutron/neutron-server.txt.gz
has some tracebacks that look config related, possible missing DB table?
  But I haven't looked very closely.

-Brian




On October 31, 2017 1:25:55 AM GMT+04:00, Mohammed Naser
 wrote:

Hi everyone,

I'm looking for some help regarding an issue that we're having with
the Puppet OpenStack modules, we've had very inconsistent failures in
the Xenial with the following error:

 
http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/
 
http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/testr_results.html.gz
 Details: {u'message': u'Unable to associate floating IP
172.24.5.17   to fixed IP10.100.0.8 
  for instance
d265626a-77c1-4d2f-8260-46abe548293e. Error: Request to
https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c
timed out', u'code': 400}

At this point, we're at a bit of a loss.  I've tried my best in order
to find the root cause however we have not been able to do this.  It
was persistent enough that we elected to go non-voting for our Xenial
gates, however, with no fix ahead of us, I feel like this is a waste
of resources and we need to either fix this or drop CI for Ubuntu.  We
don't deploy on Ubuntu and most of the developers working on the
project don't either at this point, so we need a bit of resources.

If you're a user of Puppet on Xenial, we need your help!  Without any
resources going to fix this, we'd unfortunately have to drop support
for Ubuntu because of the lack of resources to maintain it (or
assistance).  We (Puppet OpenStack team) would be more than happy to
work together to fix this so pop-in at #puppet-openstack or reply to
this email and let's get this issue fixed.

Thanks,
Mohammed



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




Re: [openstack-dev] [puppet][qa][ubuntu][neutron] Xenial Neutron Timeouts

2017-10-30 Thread Mohammed Naser
On Mon, Oct 30, 2017 at 6:07 PM, Brian Haley  wrote:
> On 10/30/2017 05:46 PM, Matthew Treinish wrote:
>>
>>  From a quick glance at the logs my guess is that the issue is related to
>> this stack trace in the l3 agent logs:
>>
>>
>> http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/neutron/neutron-l3-agent.txt.gz?level=TRACE#_2017-10-29_23_11_15_146
>>
>> I'm not sure what's causing it to complain there. But, I'm on a plane
>> right now (which is why this is a top post, sorry) so I can't really dig
>> much more than that. I'll try to take a deeper look at things later when I'm
>> on solid ground. (hopefully someone will beat me to it by then though)
>
>
> I don't think that l3-agent trace is it, as the failure is coming from the
> API.  It's actually a trace that's happening due to the async nature of how
> the agent runs arping, fix is https://review.openstack.org/#/c/507914/ but
> it only removes the log noise.

Indeed, I've reached out to Neutron team on IRC and Brian informed me
that this was just log noise.

> http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/neutron/neutron-server.txt.gz
> has some tracebacks that look config related, possible missing DB table?
> But I haven't looked very closely.

The tracebacks are because the Neutron server is started before the
MySQL database is sync'd (afaik, Ubuntu behaviour is to start services
on install, so we haven't had a chance to sync the db).  You can see
the service later restart with none of these database issues.  The
other reason to eliminate config issues is the fact that this happens
intermittently (though, often enough that we had to switch it to
non-voting).  If it was a config issue, it would constantly and always
fail.

Thank you Brian & Matthew for your help so far.

> -Brian
>
>
>> On October 31, 2017 1:25:55 AM GMT+04:00, Mohammed Naser
>>  wrote:
>>
>> Hi everyone,
>>
>> I'm looking for some help regarding an issue that we're having with
>> the Puppet OpenStack modules, we've had very inconsistent failures in
>> the Xenial with the following error:
>>
>>
>> http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/
>>
>> http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/testr_results.html.gz
>>  Details: {u'message': u'Unable to associate floating IP
>> 172.24.5.17   to fixed IP10.100.0.8
>>   for instance
>> d265626a-77c1-4d2f-8260-46abe548293e. Error: Request to
>>
>> https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c
>> timed out', u'code': 400}
>>
>> At this point, we're at a bit of a loss.  I've tried my best in order
>> to find the root cause however we have not been able to do this.  It
>> was persistent enough that we elected to go non-voting for our Xenial
>> gates, however, with no fix ahead of us, I feel like this is a waste
>> of resources and we need to either fix this or drop CI for Ubuntu.  We
>> don't deploy on Ubuntu and most of the developers working on the
>> project don't either at this point, so we need a bit of resources.
>>
>> If you're a user of Puppet on Xenial, we need your help!  Without any
>> resources going to fix this, we'd unfortunately have to drop support
>> for Ubuntu because of the lack of resources to maintain it (or
>> assistance).  We (Puppet OpenStack team) would be more than happy to
>> work together to fix this so pop-in at #puppet-openstack or reply to
>> this email and let's get this issue fixed.
>>
>> Thanks,
>> Mohammed
>>
>>
>> 
>>
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
> __
>
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

Re: [openstack-dev] [puppet][qa][ubuntu][neutron] Xenial Neutron Timeouts

2017-10-30 Thread Brian Haley

On 10/30/2017 05:46 PM, Matthew Treinish wrote:
 From a quick glance at the logs my guess is that the issue is related 
to this stack trace in the l3 agent logs:


http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/neutron/neutron-l3-agent.txt.gz?level=TRACE#_2017-10-29_23_11_15_146

I'm not sure what's causing it to complain there. But, I'm on a plane 
right now (which is why this is a top post, sorry) so I can't really dig 
much more than that. I'll try to take a deeper look at things later when 
I'm on solid ground. (hopefully someone will beat me to it by then though)


I don't think that l3-agent trace is it, as the failure is coming from 
the API.  It's actually a trace that's happening due to the async nature 
of how the agent runs arping, fix is 
https://review.openstack.org/#/c/507914/ but it only removes the log noise.


http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/neutron/neutron-server.txt.gz 
has some tracebacks that look config related, possible missing DB table? 
 But I haven't looked very closely.


-Brian


On October 31, 2017 1:25:55 AM GMT+04:00, Mohammed Naser 
 wrote:


Hi everyone,

I'm looking for some help regarding an issue that we're having with
the Puppet OpenStack modules, we've had very inconsistent failures in
the Xenial with the following error:

 
http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/
 
http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/testr_results.html.gz
 Details: {u'message': u'Unable to associate floating IP
172.24.5.17   to fixed IP10.100.0.8  
 for instance
d265626a-77c1-4d2f-8260-46abe548293e. Error: Request to
https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c
timed out', u'code': 400}

At this point, we're at a bit of a loss.  I've tried my best in order
to find the root cause however we have not been able to do this.  It
was persistent enough that we elected to go non-voting for our Xenial
gates, however, with no fix ahead of us, I feel like this is a waste
of resources and we need to either fix this or drop CI for Ubuntu.  We
don't deploy on Ubuntu and most of the developers working on the
project don't either at this point, so we need a bit of resources.

If you're a user of Puppet on Xenial, we need your help!  Without any
resources going to fix this, we'd unfortunately have to drop support
for Ubuntu because of the lack of resources to maintain it (or
assistance).  We (Puppet OpenStack team) would be more than happy to
work together to fix this so pop-in at #puppet-openstack or reply to
this email and let's get this issue fixed.

Thanks,
Mohammed



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [puppet][qa][ubuntu][neutron] Xenial Neutron Timeouts

2017-10-30 Thread Matthew Treinish
From a quick glance at the logs my guess is that the issue is related to this 
stack trace in the l3 agent logs:

http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/neutron/neutron-l3-agent.txt.gz?level=TRACE#_2017-10-29_23_11_15_146

I'm not sure what's causing it to complain there. But, I'm on a plane right now 
(which is why this is a top post, sorry) so I can't really dig much more than 
that. I'll try to take a deeper look at things later when I'm on solid ground. 
(hopefully someone will beat me to it by then though) 

-Matt Treinish

On October 31, 2017 1:25:55 AM GMT+04:00, Mohammed Naser  
wrote:
>Hi everyone,
>
>I'm looking for some help regarding an issue that we're having with
>the Puppet OpenStack modules, we've had very inconsistent failures in
>the Xenial with the following error:
>
>http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/
>http://logs.openstack.org/47/514347/1/check/puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/ed5a657/logs/testr_results.html.gz
>Details: {u'message': u'Unable to associate floating IP
>172.24.5.17 to fixed IP 10.100.0.8 for instance
>d265626a-77c1-4d2f-8260-46abe548293e. Error: Request to
>https://127.0.0.1:9696/v2.0/floatingips/2e3fa334-d6ac-443c-b5ba-eeb521d6324c
>timed out', u'code': 400}
>
>At this point, we're at a bit of a loss.  I've tried my best in order
>to find the root cause however we have not been able to do this.  It
>was persistent enough that we elected to go non-voting for our Xenial
>gates, however, with no fix ahead of us, I feel like this is a waste
>of resources and we need to either fix this or drop CI for Ubuntu.  We
>don't deploy on Ubuntu and most of the developers working on the
>project don't either at this point, so we need a bit of resources.
>
>If you're a user of Puppet on Xenial, we need your help!  Without any
>resources going to fix this, we'd unfortunately have to drop support
>for Ubuntu because of the lack of resources to maintain it (or
>assistance).  We (Puppet OpenStack team) would be more than happy to
>work together to fix this so pop-in at #puppet-openstack or reply to
>this email and let's get this issue fixed.
>
>Thanks,
>Mohammed
>
>__
>OpenStack Development Mailing List (not for usage questions)
>Unsubscribe:
>openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev