Public bug reported: On our CI we see random failures of random jobs related to getting public keys from metadata. As an example I would like to show this change [1]. In addition to current implementation of tests it adds three instances and test security groups.
Sometimes random jobs like: neutron-tempest-plugin-scenario-linuxbridge neutron-tempest-plugin-scenario-openvswitch-iptables_hybrid-stein and others fail on checking SSH connectivity to just created instance. * It didn't work because the instance refused public key authentication, example: ------------------------------------------------------------------------------------------------ 2019-12-13 14:43:48,694 31953 INFO [tempest.lib.common.ssh] Creating ssh connection to '172.24.5.186:22' as 'cirros' with public key authentication 2019-12-13 14:43:48,704 31953 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.5.186 ([Errno None] Unable to connect to port 22 on 172.24.5.186). Number attempts: 1. Retry after 2 seconds. ------------------------------------------------------------------------------------------------ * While checking the instance console log we can find that the instance failed to get public keys list on boot (FIP: 172.24.5.186, Instance IP: 10.1.0.10): ------------------------------------------------------------- cirros-ds 'net' up at 11.67 checking http://169.254.169.254/2009-04-04/instance-id successful after 1/20 tries: up 12.13. iid=i-0000003c failed to get http://169.254.169.254/2009-04-04/meta-data/public-keys warning: no ec2 metadata for public-keys ------------------------------------------------------------- * In addition to current Neutron logs I added more debugs to Neutron Metadata Agent in order to find out if the response from Nova Metadata is empty, then I verified Neutron Metadata logs related to this instance: ----------------------------------------------------------------------------- Dec 13 14:43:49.572244 ubuntu-bionic-rax-ord-0013383633 neutron-metadata-agent[17234]: DEBUG neutron.agent.metadata.agent [-] REQUEST: HEADERS {'X-Forwarded-For': '10.1.0.10', 'X-Instance-ID': 'e77a44fc-249f-4c85-8f9c-40f299534c12', 'X-Tenant-ID': '8975f89b119046b48f5a674fa6a296c3', 'X-Instance-ID-Signature': '908153d94493c68c9cb8fae8aa78fab18244a260d7fe55b5b707ed9b369f45cd'} DATA: b'' URL: http://10.210.224.88:8775/2009-04-04/meta-data/public-keys {{(pid=17720) _proxy_request /opt/stack/neutron/neutron/agent/metadata/agent.py:214}} Dec 13 14:43:49.572451 ubuntu-bionic-rax-ord-0013383633 neutron-metadata-agent[17234]: DEBUG neutron.agent.metadata.agent [-] RESPONSE: HEADERS: {'Content-Length': '32', 'Content-Type': 'text/plain; charset=UTF-8', 'Connection': 'close'} DATA: b'0=tempest-keypair-test-231375855' {{(pid=17720) _proxy_request /opt/stack/neutron/neutron/agent/metadata/agent.py:217}} Dec 13 14:43:49.572977 ubuntu-bionic-rax-ord-0013383633 neutron-metadata-agent[17234]: INFO eventlet.wsgi.server [-] 10.1.0.10,<local> "GET /2009-04-04/meta-data/public-keys HTTP/1.1" status: 200 len: 168 time: 0.3123491 ----------------------------------------------------------------------------- The response was 200 with body: '0=tempest-keypair-test-231375855'. It is the key used also for other instances, so that worked. Conclusions: 1) Neutron metadata responds with 200 2) Nova metadata responds with 200 and valid data Questions: 1) Is this cirros issue? Why there is no retry? 2) Maybe its network issue that the data are not send back (connection dropped during delivery)? 3) Why we don't have more logs in cirros on this request failure? [1] https://review.opendev.org/#/c/682369/ [2] https://review.opendev.org/#/c/698001/ ** Affects: neutron Importance: Undecided Status: New ** Summary changed: - Sometimes instance cant get public keys due to cirros + Sometimes instance can't get public keys due to cirros metadata request failure -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1856523 Title: Sometimes instance can't get public keys due to cirros metadata request failure Status in neutron: New Bug description: On our CI we see random failures of random jobs related to getting public keys from metadata. As an example I would like to show this change [1]. In addition to current implementation of tests it adds three instances and test security groups. Sometimes random jobs like: neutron-tempest-plugin-scenario-linuxbridge neutron-tempest-plugin-scenario-openvswitch-iptables_hybrid-stein and others fail on checking SSH connectivity to just created instance. * It didn't work because the instance refused public key authentication, example: ------------------------------------------------------------------------------------------------ 2019-12-13 14:43:48,694 31953 INFO [tempest.lib.common.ssh] Creating ssh connection to '172.24.5.186:22' as 'cirros' with public key authentication 2019-12-13 14:43:48,704 31953 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.5.186 ([Errno None] Unable to connect to port 22 on 172.24.5.186). Number attempts: 1. Retry after 2 seconds. ------------------------------------------------------------------------------------------------ * While checking the instance console log we can find that the instance failed to get public keys list on boot (FIP: 172.24.5.186, Instance IP: 10.1.0.10): ------------------------------------------------------------- cirros-ds 'net' up at 11.67 checking http://169.254.169.254/2009-04-04/instance-id successful after 1/20 tries: up 12.13. iid=i-0000003c failed to get http://169.254.169.254/2009-04-04/meta-data/public-keys warning: no ec2 metadata for public-keys ------------------------------------------------------------- * In addition to current Neutron logs I added more debugs to Neutron Metadata Agent in order to find out if the response from Nova Metadata is empty, then I verified Neutron Metadata logs related to this instance: ----------------------------------------------------------------------------- Dec 13 14:43:49.572244 ubuntu-bionic-rax-ord-0013383633 neutron-metadata-agent[17234]: DEBUG neutron.agent.metadata.agent [-] REQUEST: HEADERS {'X-Forwarded-For': '10.1.0.10', 'X-Instance-ID': 'e77a44fc-249f-4c85-8f9c-40f299534c12', 'X-Tenant-ID': '8975f89b119046b48f5a674fa6a296c3', 'X-Instance-ID-Signature': '908153d94493c68c9cb8fae8aa78fab18244a260d7fe55b5b707ed9b369f45cd'} DATA: b'' URL: http://10.210.224.88:8775/2009-04-04/meta-data/public-keys {{(pid=17720) _proxy_request /opt/stack/neutron/neutron/agent/metadata/agent.py:214}} Dec 13 14:43:49.572451 ubuntu-bionic-rax-ord-0013383633 neutron-metadata-agent[17234]: DEBUG neutron.agent.metadata.agent [-] RESPONSE: HEADERS: {'Content-Length': '32', 'Content-Type': 'text/plain; charset=UTF-8', 'Connection': 'close'} DATA: b'0=tempest-keypair-test-231375855' {{(pid=17720) _proxy_request /opt/stack/neutron/neutron/agent/metadata/agent.py:217}} Dec 13 14:43:49.572977 ubuntu-bionic-rax-ord-0013383633 neutron-metadata-agent[17234]: INFO eventlet.wsgi.server [-] 10.1.0.10,<local> "GET /2009-04-04/meta-data/public-keys HTTP/1.1" status: 200 len: 168 time: 0.3123491 ----------------------------------------------------------------------------- The response was 200 with body: '0=tempest-keypair-test-231375855'. It is the key used also for other instances, so that worked. Conclusions: 1) Neutron metadata responds with 200 2) Nova metadata responds with 200 and valid data Questions: 1) Is this cirros issue? Why there is no retry? 2) Maybe its network issue that the data are not send back (connection dropped during delivery)? 3) Why we don't have more logs in cirros on this request failure? [1] https://review.opendev.org/#/c/682369/ [2] https://review.opendev.org/#/c/698001/ To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1856523/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp