Public bug reported: An occasional failure found in our production environment because of a race condition in NSX plugin, which results in neutron port creation failure. Here is the root cause.
When authentication cookie of NSX controller expires, it is expected that NSX plugin client will retry with new authentication cookie after re-login. The code logic is: when a connection is acquired, if cookie is None then it initiate a new login request for new cookie. Otherwise, it continue to use old cookie to issue request, and if request failed because of authentication failure, it then checks if cookie is None then abort retrying. However, there is a race condition that after connection is acquired but before request is issued, there can be another thread failed to authenticate and resetting the cookie (which is shared data between connections to same controller) to None (so that during retry it will know to re-login after acquiring the connection). So the first thread will then abort retrying because of the logic checking the cookie and aborting if the cookie is None. This has high probability because there is a blocking operation after connection is acquired: the sock.connect(). It is likely the cookie being changed to None during this blocking operation. The fix would be simply remove the abort logic, because even if the cookie is None, it would then re-login and get a new cookie before retry. Another fix could be moving the assignment of "cookie" before the socket.connect() operation. Related logs: (note that the log is from havana version, but the code logic is the same in upstream) --------------------------------------------------------------------- 2015-01-13 06:44:21,326 64499216 INFO [neutron.plugins.nicira.api_client.client] [5672] Connection https://xxx:443 idle for 1614.74 seconds; reconnecting. 2015-01-13 06:44:21,326 64499216 DEBUG [neutron.plugins.nicira.api_client.client] [5672] Acquired connection https://xxx:443. 14 connection(s) available. 2015-01-13 06:44:21,327 64499216 DEBUG [neutron.plugins.nicira.api_client.request] [5672] Issuing - request GET https://xxx:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552/lport/5589905e-e531-4676-b0a0-40fe5589e60f?relations=LogicalPortStatus 2015-01-13 06:44:21,389 64496976 INFO [neutron.plugins.nicira.api_client.client] [5673] Connection https://xxx:443 idle for 5833.87 seconds; reconnecting. 2015-01-13 06:44:21,389 64496976 DEBUG [neutron.plugins.nicira.api_client.client] [5673] Acquired connection https://xxx:443. 13 connection(s) available. 2015-01-13 06:44:21,390 64496976 DEBUG [neutron.plugins.nicira.api_client.request] [5673] Issuing - request GET https://xxx:443//ws.v1/lswitch/3ac1df8b-cdcb-472c-8638-75f89b87bfa4?relations=LogicalSwitchStatus 2015-01-13 06:44:21,394 64499216 DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '425260708' 2015-01-13 06:44:21,398 64499216 DEBUG [neutron.plugins.nicira.api_client.request] [5672] Completed request 'GET https://xxx:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552/lport/5589905e-e531-4676-b0a0-40fe5589e60f?relations=LogicalPortStatus': 401 (0.07 seconds) 2015-01-13 06:44:21,399 64499216 DEBUG [neutron.plugins.nicira.api_client.client] [5672] Released connection https://xxx:443. 14 connection(s) available. 2015-01-13 06:44:21,399 64499216 DEBUG [neutron.plugins.nicira.api_client.client] [5672] Acquired connection https://xxx:443. 13 connection(s) available. 2015-01-13 06:44:21,400 64496176 DEBUG [neutron.plugins.nicira.api_client.request] [5674] Issuing - request POST https://xxx:443//ws.v1/login 2015-01-13 06:44:21,400 64496176 DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '425260708' 2015-01-13 06:44:21,403 64496976 DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '425260708' 2015-01-13 06:44:21,439 64496976 DEBUG [neutron.plugins.nicira.api_client.request] [5673] Completed request 'GET https://xxx:443//ws.v1/lswitch/3ac1df8b-cdcb-472c-8638-75f89b87bfa4?relations=LogicalSwitchStatus': 401 (0.05 seconds) 2015-01-13 06:44:21,439 64496976 DEBUG [neutron.plugins.nicira.api_client.client] [5673] Released connection https://xxx:443. 14 connection(s) available. 2015-01-13 06:44:21,439 64496976 DEBUG [neutron.plugins.nicira.api_client.request_eventlet] [5673] Completed request 'GET /ws.v1/lswitch/3ac1df8b-cdcb-472c-8638-75f89b87bfa4?relations=LogicalSwitchStatus': 401 2015-01-13 06:44:21,439 57729424 ERROR [NeutronPlugin] An exception occured while selecting logical switch for the port ... NvpPluginException: An unexpected error occurred in the NVP Plugin:An exception occured while selecting logical switch for the port 2015-01-13 06:44:21,584 64496176 DEBUG [neutron.plugins.nicira.api_client.request] [5674] Completed request 'POST https://xxx:443//ws.v1/login': 200 (0.18 seconds) 2015-01-13 06:44:21,585 64496176 DEBUG [neutron.plugins.nicira.api_client.request_eventlet] [5674] Completed request 'POST /ws.v1/login': 200 2015-01-13 06:44:21,585 64499216 DEBUG [neutron.plugins.nicira.api_client.client_eventlet] Saving new authentication cookie 'nvp_sessionid=15d4602ff7fb3dc34c00901ef80a7cc4; Path=/; secure' 2015-01-13 06:44:21,585 64499216 DEBUG [neutron.plugins.nicira.api_client.request] [5672] Issuing - request GET https://xxx:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552/lport/5589905e-e531-4676-b0a0-40fe5589e60f?relations=LogicalPortStatus 2015-01-13 06:44:21,585 64499216 DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '425260708' 2015-01-13 06:44:21,694 64499216 DEBUG [neutron.plugins.nicira.api_client.request] [5672] Completed request 'GET https://xxx:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552/lport/5589905e-e531-4676-b0a0-40fe5589e60f?relations=LogicalPortStatus': 200 (0.11 seconds) 2015-01-13 06:44:21,694 64499216 DEBUG [neutron.plugins.nicira.api_client.client] [5672] Released connection https://xxx:443. 15 connection(s) available. 2015-01-13 06:44:21,694 64499216 DEBUG [neutron.plugins.nicira.api_client.request_eventlet] [5672] Completed request 'GET /ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552/lport/5589905e-e531-4676-b0a0-40fe5589e60f?relations=LogicalPortStatus': 200 ** Affects: neutron Importance: Undecided Status: New ** Tags: vmware ** Tags added: vmware -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1410805 Title: NSX Plugin: UnAuthorizedRequest without retry Status in OpenStack Neutron (virtual network service): New Bug description: An occasional failure found in our production environment because of a race condition in NSX plugin, which results in neutron port creation failure. Here is the root cause. When authentication cookie of NSX controller expires, it is expected that NSX plugin client will retry with new authentication cookie after re-login. The code logic is: when a connection is acquired, if cookie is None then it initiate a new login request for new cookie. Otherwise, it continue to use old cookie to issue request, and if request failed because of authentication failure, it then checks if cookie is None then abort retrying. However, there is a race condition that after connection is acquired but before request is issued, there can be another thread failed to authenticate and resetting the cookie (which is shared data between connections to same controller) to None (so that during retry it will know to re-login after acquiring the connection). So the first thread will then abort retrying because of the logic checking the cookie and aborting if the cookie is None. This has high probability because there is a blocking operation after connection is acquired: the sock.connect(). It is likely the cookie being changed to None during this blocking operation. The fix would be simply remove the abort logic, because even if the cookie is None, it would then re-login and get a new cookie before retry. Another fix could be moving the assignment of "cookie" before the socket.connect() operation. Related logs: (note that the log is from havana version, but the code logic is the same in upstream) --------------------------------------------------------------------- 2015-01-13 06:44:21,326 64499216 INFO [neutron.plugins.nicira.api_client.client] [5672] Connection https://xxx:443 idle for 1614.74 seconds; reconnecting. 2015-01-13 06:44:21,326 64499216 DEBUG [neutron.plugins.nicira.api_client.client] [5672] Acquired connection https://xxx:443. 14 connection(s) available. 2015-01-13 06:44:21,327 64499216 DEBUG [neutron.plugins.nicira.api_client.request] [5672] Issuing - request GET https://xxx:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552/lport/5589905e-e531-4676-b0a0-40fe5589e60f?relations=LogicalPortStatus 2015-01-13 06:44:21,389 64496976 INFO [neutron.plugins.nicira.api_client.client] [5673] Connection https://xxx:443 idle for 5833.87 seconds; reconnecting. 2015-01-13 06:44:21,389 64496976 DEBUG [neutron.plugins.nicira.api_client.client] [5673] Acquired connection https://xxx:443. 13 connection(s) available. 2015-01-13 06:44:21,390 64496976 DEBUG [neutron.plugins.nicira.api_client.request] [5673] Issuing - request GET https://xxx:443//ws.v1/lswitch/3ac1df8b-cdcb-472c-8638-75f89b87bfa4?relations=LogicalSwitchStatus 2015-01-13 06:44:21,394 64499216 DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '425260708' 2015-01-13 06:44:21,398 64499216 DEBUG [neutron.plugins.nicira.api_client.request] [5672] Completed request 'GET https://xxx:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552/lport/5589905e-e531-4676-b0a0-40fe5589e60f?relations=LogicalPortStatus': 401 (0.07 seconds) 2015-01-13 06:44:21,399 64499216 DEBUG [neutron.plugins.nicira.api_client.client] [5672] Released connection https://xxx:443. 14 connection(s) available. 2015-01-13 06:44:21,399 64499216 DEBUG [neutron.plugins.nicira.api_client.client] [5672] Acquired connection https://xxx:443. 13 connection(s) available. 2015-01-13 06:44:21,400 64496176 DEBUG [neutron.plugins.nicira.api_client.request] [5674] Issuing - request POST https://xxx:443//ws.v1/login 2015-01-13 06:44:21,400 64496176 DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '425260708' 2015-01-13 06:44:21,403 64496976 DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '425260708' 2015-01-13 06:44:21,439 64496976 DEBUG [neutron.plugins.nicira.api_client.request] [5673] Completed request 'GET https://xxx:443//ws.v1/lswitch/3ac1df8b-cdcb-472c-8638-75f89b87bfa4?relations=LogicalSwitchStatus': 401 (0.05 seconds) 2015-01-13 06:44:21,439 64496976 DEBUG [neutron.plugins.nicira.api_client.client] [5673] Released connection https://xxx:443. 14 connection(s) available. 2015-01-13 06:44:21,439 64496976 DEBUG [neutron.plugins.nicira.api_client.request_eventlet] [5673] Completed request 'GET /ws.v1/lswitch/3ac1df8b-cdcb-472c-8638-75f89b87bfa4?relations=LogicalSwitchStatus': 401 2015-01-13 06:44:21,439 57729424 ERROR [NeutronPlugin] An exception occured while selecting logical switch for the port ... NvpPluginException: An unexpected error occurred in the NVP Plugin:An exception occured while selecting logical switch for the port 2015-01-13 06:44:21,584 64496176 DEBUG [neutron.plugins.nicira.api_client.request] [5674] Completed request 'POST https://xxx:443//ws.v1/login': 200 (0.18 seconds) 2015-01-13 06:44:21,585 64496176 DEBUG [neutron.plugins.nicira.api_client.request_eventlet] [5674] Completed request 'POST /ws.v1/login': 200 2015-01-13 06:44:21,585 64499216 DEBUG [neutron.plugins.nicira.api_client.client_eventlet] Saving new authentication cookie 'nvp_sessionid=15d4602ff7fb3dc34c00901ef80a7cc4; Path=/; secure' 2015-01-13 06:44:21,585 64499216 DEBUG [neutron.plugins.nicira.api_client.request] [5672] Issuing - request GET https://xxx:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552/lport/5589905e-e531-4676-b0a0-40fe5589e60f?relations=LogicalPortStatus 2015-01-13 06:44:21,585 64499216 DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '425260708' 2015-01-13 06:44:21,694 64499216 DEBUG [neutron.plugins.nicira.api_client.request] [5672] Completed request 'GET https://xxx:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552/lport/5589905e-e531-4676-b0a0-40fe5589e60f?relations=LogicalPortStatus': 200 (0.11 seconds) 2015-01-13 06:44:21,694 64499216 DEBUG [neutron.plugins.nicira.api_client.client] [5672] Released connection https://xxx:443. 15 connection(s) available. 2015-01-13 06:44:21,694 64499216 DEBUG [neutron.plugins.nicira.api_client.request_eventlet] [5672] Completed request 'GET /ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552/lport/5589905e-e531-4676-b0a0-40fe5589e60f?relations=LogicalPortStatus': 200 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1410805/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

