Public bug reported:

On our CI we see random failures of random jobs related to getting public keys 
from metadata.
As an example I would like to show this change [1]. In addition to current 
implementation of tests it adds three instances and test security groups.

Sometimes random jobs like:
neutron-tempest-plugin-scenario-linuxbridge
neutron-tempest-plugin-scenario-openvswitch-iptables_hybrid-stein
and others fail on checking SSH connectivity to just created instance. 

* It didn't work because the instance refused public key authentication, 
example:
------------------------------------------------------------------------------------------------
2019-12-13 14:43:48,694 31953 INFO     [tempest.lib.common.ssh] Creating ssh 
connection to '172.24.5.186:22' as 'cirros' with public key authentication
2019-12-13 14:43:48,704 31953 WARNING  [tempest.lib.common.ssh] Failed to 
establish authenticated ssh connection to cirros@172.24.5.186 ([Errno None] 
Unable to connect to port 22 on 172.24.5.186). Number attempts: 1. Retry after 
2 seconds.
------------------------------------------------------------------------------------------------

* While checking the instance console log we can find that the instance failed 
to get public keys list on boot (FIP: 172.24.5.186, Instance IP: 10.1.0.10):
-------------------------------------------------------------
cirros-ds 'net' up at 11.67
checking http://169.254.169.254/2009-04-04/instance-id
successful after 1/20 tries: up 12.13. iid=i-0000003c
failed to get http://169.254.169.254/2009-04-04/meta-data/public-keys
warning: no ec2 metadata for public-keys
-------------------------------------------------------------

* In addition to current Neutron logs I added more debugs to Neutron Metadata 
Agent in order to find out if the response from Nova Metadata is empty, then I 
verified Neutron Metadata logs related to this instance:
-----------------------------------------------------------------------------
Dec 13 14:43:49.572244 ubuntu-bionic-rax-ord-0013383633 
neutron-metadata-agent[17234]: DEBUG neutron.agent.metadata.agent [-] REQUEST: 
HEADERS {'X-Forwarded-For': '10.1.0.10', 'X-Instance-ID': 
'e77a44fc-249f-4c85-8f9c-40f299534c12', 'X-Tenant-ID': 
'8975f89b119046b48f5a674fa6a296c3', 'X-Instance-ID-Signature': 
'908153d94493c68c9cb8fae8aa78fab18244a260d7fe55b5b707ed9b369f45cd'} DATA: b'' 
URL: http://10.210.224.88:8775/2009-04-04/meta-data/public-keys {{(pid=17720) 
_proxy_request /opt/stack/neutron/neutron/agent/metadata/agent.py:214}}
Dec 13 14:43:49.572451 ubuntu-bionic-rax-ord-0013383633 
neutron-metadata-agent[17234]: DEBUG neutron.agent.metadata.agent [-] RESPONSE: 
HEADERS: {'Content-Length': '32', 'Content-Type': 'text/plain; charset=UTF-8', 
'Connection': 'close'} DATA: b'0=tempest-keypair-test-231375855' {{(pid=17720) 
_proxy_request /opt/stack/neutron/neutron/agent/metadata/agent.py:217}}
Dec 13 14:43:49.572977 ubuntu-bionic-rax-ord-0013383633 
neutron-metadata-agent[17234]: INFO eventlet.wsgi.server [-] 10.1.0.10,<local> 
"GET /2009-04-04/meta-data/public-keys HTTP/1.1" status: 200  len: 168 time: 
0.3123491
-----------------------------------------------------------------------------

The response was 200 with body: '0=tempest-keypair-test-231375855'. It
is the key used also for other instances, so that worked.


Conclusions:
1) Neutron metadata responds with 200
2) Nova metadata responds with 200 and valid data

Questions:
1) Is this cirros issue? Why there is no retry? 
2) Maybe its network issue that the data are not send back (connection dropped 
during delivery)?
3) Why we don't have more logs in cirros on this request failure?

[1] https://review.opendev.org/#/c/682369/
[2] https://review.opendev.org/#/c/698001/

** Affects: neutron
     Importance: Undecided
         Status: New

** Summary changed:

- Sometimes instance cant get public keys due to cirros
+ Sometimes instance can't get public keys due to cirros metadata request 
failure

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1856523

Title:
  Sometimes instance can't get public keys due to cirros metadata
  request failure

Status in neutron:
  New

Bug description:
  On our CI we see random failures of random jobs related to getting public 
keys from metadata.
  As an example I would like to show this change [1]. In addition to current 
implementation of tests it adds three instances and test security groups.

  Sometimes random jobs like:
  neutron-tempest-plugin-scenario-linuxbridge
  neutron-tempest-plugin-scenario-openvswitch-iptables_hybrid-stein
  and others fail on checking SSH connectivity to just created instance. 

  * It didn't work because the instance refused public key authentication, 
example:
  
------------------------------------------------------------------------------------------------
  2019-12-13 14:43:48,694 31953 INFO     [tempest.lib.common.ssh] Creating ssh 
connection to '172.24.5.186:22' as 'cirros' with public key authentication
  2019-12-13 14:43:48,704 31953 WARNING  [tempest.lib.common.ssh] Failed to 
establish authenticated ssh connection to cirros@172.24.5.186 ([Errno None] 
Unable to connect to port 22 on 172.24.5.186). Number attempts: 1. Retry after 
2 seconds.
  
------------------------------------------------------------------------------------------------

  * While checking the instance console log we can find that the instance 
failed to get public keys list on boot (FIP: 172.24.5.186, Instance IP: 
10.1.0.10):
  -------------------------------------------------------------
  cirros-ds 'net' up at 11.67
  checking http://169.254.169.254/2009-04-04/instance-id
  successful after 1/20 tries: up 12.13. iid=i-0000003c
  failed to get http://169.254.169.254/2009-04-04/meta-data/public-keys
  warning: no ec2 metadata for public-keys
  -------------------------------------------------------------

  * In addition to current Neutron logs I added more debugs to Neutron Metadata 
Agent in order to find out if the response from Nova Metadata is empty, then I 
verified Neutron Metadata logs related to this instance:
  -----------------------------------------------------------------------------
  Dec 13 14:43:49.572244 ubuntu-bionic-rax-ord-0013383633 
neutron-metadata-agent[17234]: DEBUG neutron.agent.metadata.agent [-] REQUEST: 
HEADERS {'X-Forwarded-For': '10.1.0.10', 'X-Instance-ID': 
'e77a44fc-249f-4c85-8f9c-40f299534c12', 'X-Tenant-ID': 
'8975f89b119046b48f5a674fa6a296c3', 'X-Instance-ID-Signature': 
'908153d94493c68c9cb8fae8aa78fab18244a260d7fe55b5b707ed9b369f45cd'} DATA: b'' 
URL: http://10.210.224.88:8775/2009-04-04/meta-data/public-keys {{(pid=17720) 
_proxy_request /opt/stack/neutron/neutron/agent/metadata/agent.py:214}}
  Dec 13 14:43:49.572451 ubuntu-bionic-rax-ord-0013383633 
neutron-metadata-agent[17234]: DEBUG neutron.agent.metadata.agent [-] RESPONSE: 
HEADERS: {'Content-Length': '32', 'Content-Type': 'text/plain; charset=UTF-8', 
'Connection': 'close'} DATA: b'0=tempest-keypair-test-231375855' {{(pid=17720) 
_proxy_request /opt/stack/neutron/neutron/agent/metadata/agent.py:217}}
  Dec 13 14:43:49.572977 ubuntu-bionic-rax-ord-0013383633 
neutron-metadata-agent[17234]: INFO eventlet.wsgi.server [-] 10.1.0.10,<local> 
"GET /2009-04-04/meta-data/public-keys HTTP/1.1" status: 200  len: 168 time: 
0.3123491
  -----------------------------------------------------------------------------

  The response was 200 with body: '0=tempest-keypair-test-231375855'. It
  is the key used also for other instances, so that worked.

  
  Conclusions:
  1) Neutron metadata responds with 200
  2) Nova metadata responds with 200 and valid data

  Questions:
  1) Is this cirros issue? Why there is no retry? 
  2) Maybe its network issue that the data are not send back (connection 
dropped during delivery)?
  3) Why we don't have more logs in cirros on this request failure?

  [1] https://review.opendev.org/#/c/682369/
  [2] https://review.opendev.org/#/c/698001/

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1856523/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to