Public bug reported: In the L3 agent's _process_routers_loop method, it spawns a GreenPool with 8 eventlet threads. Those threads then take updates off the agent's queue and process router updates. Router updates are serialized by router_id so that two threads don't process the same router at any given time.
In an environment running on a powerful baremetal server, on agent restart it was trying to sync roughly 600 routers. Around half were HA routers, and half were legacy routers. With the default GreenPool size of 8, the result was that the server ground to a halt as CPU usage skyrocketed to over 600%. The main offenders were ip, bash, keepalived and Python. This was on an environment without rootwrap daemon based off stable/juno. It took around 60 seconds to configure a single router. Changing the GreenPool size from 8 to 1, caused the agent to: 1) Configure a router in 30 seconds, a 50% improvement. 2) Reduce CPU load from 600% to 70%, freeing the machine to do other things. I'm filing this bug so that: 1) Someone can confirm my personal experience in a more controlled way - For example, graph router configuration time and CPU load as a result of GreenPool size. 2) If my findings are confirmed on master with rootwrap daemon, start considering alternatives like multiprocessing instead of eventlet multithreading, or at the very least optimize the GreenPool size. ** Affects: neutron Importance: Undecided Status: New ** Tags: l3-ipam-dhcp loadimpact -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1526559 Title: L3 agent parallel configuration of routers might slow things down Status in neutron: New Bug description: In the L3 agent's _process_routers_loop method, it spawns a GreenPool with 8 eventlet threads. Those threads then take updates off the agent's queue and process router updates. Router updates are serialized by router_id so that two threads don't process the same router at any given time. In an environment running on a powerful baremetal server, on agent restart it was trying to sync roughly 600 routers. Around half were HA routers, and half were legacy routers. With the default GreenPool size of 8, the result was that the server ground to a halt as CPU usage skyrocketed to over 600%. The main offenders were ip, bash, keepalived and Python. This was on an environment without rootwrap daemon based off stable/juno. It took around 60 seconds to configure a single router. Changing the GreenPool size from 8 to 1, caused the agent to: 1) Configure a router in 30 seconds, a 50% improvement. 2) Reduce CPU load from 600% to 70%, freeing the machine to do other things. I'm filing this bug so that: 1) Someone can confirm my personal experience in a more controlled way - For example, graph router configuration time and CPU load as a result of GreenPool size. 2) If my findings are confirmed on master with rootwrap daemon, start considering alternatives like multiprocessing instead of eventlet multithreading, or at the very least optimize the GreenPool size. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1526559/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp