Public bug reported: ENV: stable/victoria
We have enabled DVR and have some huge virtual routers with around 60 router interfaces scheduled on around 800 compute nodes. In a large scale cloud deployment, when restart neutron agent, especially l3 agents, neutron server side will trigger too many slow DB query. And this will cause the agent restart time to be too long to operate. Error log of l3-agent restart: ``` ymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') [SQL: 'SELECT ml2_port_binding_levels.port_id AS ml2_port_binding_levels_port_id, ml2_port_binding_levels.host AS ml2_port_binding_levels_host, ml2_port_binding_levels.level AS ml2_port_binding_levels_level, ml2_port_binding_levels.driver AS ml2_port_binding_levels_driver, ml2_port_binding_levels.segment_id AS ml2_port_binding_levels_segment_id, ports_1.id AS ports_1_id \nFROM (SELECT routers.id AS routers_id \nFROM routers LEFT OUTER JOIN (SELECT routerl3agentbindings.router_id AS router_id, count(routerl3agentbindings.router_id) AS count \nFROM routerl3agentbindings INNER JOIN router_extra_attributes ON routerl3agentbindings.router_id = router_extra_attributes.router_id INNER JOIN routers ON routers.id = router_extra_attributes.router_id GROUP BY routerl3agentbindings.router_id) AS anon_2 ON routers.id = anon_2.router_id) AS anon_1 INNER JOIN routerports AS routerports_1 ON anon_1.routers_id = routerports_1.router_id INNER JOIN ports AS ports_1 ON ports_1.id = routerports_1.port_id INNER JOIN ml2_port_binding_levels ON ports_1.id = ml2_port_binding_levels.port_id ORDER BY ports_1.id'] (Background on this error at: http://sqlalche.me/e/e3q8) ``` as well as ```console SELECT ml2_port_binding_levels.port_id AS ml2_port_binding_levels_port_id, ml2_port_binding_levels.host AS ml2_port_binding_levels_host, ml2_port_binding_levels.level AS ml2_port_binding_levels_level, ml2_port_binding_levels.driver AS ml2_port_binding_levels_driver, ml2_port_binding_levels.segment_id AS ml2_port_binding_levels_segment_id, ports_1.id AS ports_1_id FROM (SELECT routers.id AS routers_id FROM routers LEFT OUTER JOIN (SELECT routerl3agentbindings.router_id AS router_id, count(routerl3agentbindings.router_id) AS count FROM routerl3agentbindings INNER JOIN router_extra_attributes ON routerl3agentbindings.router_id = router_extra_attributes.router_id INNER JOIN routers ON routers.id = router_extra_attributes.router_id GROUP BY routerl3agentbindings.router_id) AS anon_2 ON routers.id = anon_2.router_id) AS anon_1 INNER JOIN routerports AS routerports_1 ON anon_1.routers_id = routerports_1.router_id INNER JOIN ports AS ports_1 ON ports_1.id = routerports_1.port_id INNER JOIN ml2_port_binding_levels ON ports_1.id = ml2_port_binding_levels.port_id; SELECT ml2_port_binding_levels.port_id AS ml2_port_binding_levels_port_id, ml2_port_binding_levels.host AS ml2_port_binding_levels_host, ml2_port_binding_levels.level AS ml2_port_binding_levels_level, ml2_port_binding_levels.driver AS ml2_port_binding_levels_driver, ml2_port_binding_levels.segment_id AS ml2_port_binding_levels_segment_id, ports_1.id AS ports_1_id FROM (SELECT DISTINCT routerports.port_id AS routerports_port_id FROM routerports WHERE routerports.router_id IN ('6e4ed0f5-e1b0-4cf1-931d-b30c93433719') AND routerports.port_type IN ('network:router_interface', 'network:ha_router_replicated_interface', 'network:router_interface_distributed')) AS anon_1 INNER JOIN ports AS ports_1 ON ports_1.id = anon_1.routerports_port_id INNER JOIN ml2_port_binding_levels ON ports_1.id = ml2_port_binding_levels.port_id ORDER BY ports_1.id; ``` from show processlist. and we saw excessive amounts of slow queries for ml2_port_binding_levels which is weird because it looks like not necessary. ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1976355 Title: remove eager subquery load for PortBindingLevel Status in neutron: New Bug description: ENV: stable/victoria We have enabled DVR and have some huge virtual routers with around 60 router interfaces scheduled on around 800 compute nodes. In a large scale cloud deployment, when restart neutron agent, especially l3 agents, neutron server side will trigger too many slow DB query. And this will cause the agent restart time to be too long to operate. Error log of l3-agent restart: ``` ymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') [SQL: 'SELECT ml2_port_binding_levels.port_id AS ml2_port_binding_levels_port_id, ml2_port_binding_levels.host AS ml2_port_binding_levels_host, ml2_port_binding_levels.level AS ml2_port_binding_levels_level, ml2_port_binding_levels.driver AS ml2_port_binding_levels_driver, ml2_port_binding_levels.segment_id AS ml2_port_binding_levels_segment_id, ports_1.id AS ports_1_id \nFROM (SELECT routers.id AS routers_id \nFROM routers LEFT OUTER JOIN (SELECT routerl3agentbindings.router_id AS router_id, count(routerl3agentbindings.router_id) AS count \nFROM routerl3agentbindings INNER JOIN router_extra_attributes ON routerl3agentbindings.router_id = router_extra_attributes.router_id INNER JOIN routers ON routers.id = router_extra_attributes.router_id GROUP BY routerl3agentbindings.router_id) AS anon_2 ON routers.id = anon_2.router_id) AS anon_1 INNER JOIN routerports AS routerports_1 ON anon_1.routers_id = routerports_1.router_id INNER JOIN ports AS ports_1 ON ports_1.id = routerports_1.port_id INNER JOIN ml2_port_binding_levels ON ports_1.id = ml2_port_binding_levels.port_id ORDER BY ports_1.id'] (Background on this error at: http://sqlalche.me/e/e3q8) ``` as well as ```console SELECT ml2_port_binding_levels.port_id AS ml2_port_binding_levels_port_id, ml2_port_binding_levels.host AS ml2_port_binding_levels_host, ml2_port_binding_levels.level AS ml2_port_binding_levels_level, ml2_port_binding_levels.driver AS ml2_port_binding_levels_driver, ml2_port_binding_levels.segment_id AS ml2_port_binding_levels_segment_id, ports_1.id AS ports_1_id FROM (SELECT routers.id AS routers_id FROM routers LEFT OUTER JOIN (SELECT routerl3agentbindings.router_id AS router_id, count(routerl3agentbindings.router_id) AS count FROM routerl3agentbindings INNER JOIN router_extra_attributes ON routerl3agentbindings.router_id = router_extra_attributes.router_id INNER JOIN routers ON routers.id = router_extra_attributes.router_id GROUP BY routerl3agentbindings.router_id) AS anon_2 ON routers.id = anon_2.router_id) AS anon_1 INNER JOIN routerports AS routerports_1 ON anon_1.routers_id = routerports_1.router_id INNER JOIN ports AS ports_1 ON ports_1.id = routerports_1.port_id INNER JOIN ml2_port_binding_levels ON ports_1.id = ml2_port_binding_levels.port_id; SELECT ml2_port_binding_levels.port_id AS ml2_port_binding_levels_port_id, ml2_port_binding_levels.host AS ml2_port_binding_levels_host, ml2_port_binding_levels.level AS ml2_port_binding_levels_level, ml2_port_binding_levels.driver AS ml2_port_binding_levels_driver, ml2_port_binding_levels.segment_id AS ml2_port_binding_levels_segment_id, ports_1.id AS ports_1_id FROM (SELECT DISTINCT routerports.port_id AS routerports_port_id FROM routerports WHERE routerports.router_id IN ('6e4ed0f5-e1b0-4cf1-931d-b30c93433719') AND routerports.port_type IN ('network:router_interface', 'network:ha_router_replicated_interface', 'network:router_interface_distributed')) AS anon_1 INNER JOIN ports AS ports_1 ON ports_1.id = anon_1.routerports_port_id INNER JOIN ml2_port_binding_levels ON ports_1.id = ml2_port_binding_levels.port_id ORDER BY ports_1.id; ``` from show processlist. and we saw excessive amounts of slow queries for ml2_port_binding_levels which is weird because it looks like not necessary. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1976355/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

