Public bug reported: (Loop1) https://github.com/openstack/nova/blob/af2d6c9576b1ac5f3b3768870bb15d9b5cf1610b/nova/scheduler/driver.py#L55 (Loop2) https://github.com/openstack/nova/blob/af2d6c9576b1ac5f3b3768870bb15d9b5cf1610b/nova/servicegroup/drivers/zk.py#L177
Iterating the hosts through the ComputeFilter also has this issue, ComputeFilter usage in a loop has other performance issues . The zk driver issue can be mitigated by doing the testing `filtering` in the is_up instead of the get_all , by reorganizing the code. However better solution would be to have the scheduler to use the get_all, or redesigning the servicegroup management. A better design would be to use the DB even with the zk,mc drvier, but do update ONLY when the service actually came up or dies, in this case the sg drivers MAY need dedicated service processes. NOTE: The servicegroup driver concept was introduced to avoid doing 10_000 DB update/sec @100_000 host (10/sec update freq), if your servers are bad and every server has 1:1000 chance to die on the given day, it would lead only to 0.001 UPDATE/sec (100/day) @100_000 host. NOTE: If the up/down is knowable just form the DB, the scheduler could eliminate the dead hosts at the first DB query, without using ComputeFilter as it is used now. (The plugins SHOULD be able to extend the base hosts query) ** Affects: nova Importance: Undecided Status: New ** Summary changed: - zookeper driver used with O(n^2) complexity by the scheduler + zookeeper driver used with O(n^2) complexity by the scheduler -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1437199 Title: zookeeper driver used with O(n^2) complexity by the scheduler Status in OpenStack Compute (Nova): New Bug description: (Loop1) https://github.com/openstack/nova/blob/af2d6c9576b1ac5f3b3768870bb15d9b5cf1610b/nova/scheduler/driver.py#L55 (Loop2) https://github.com/openstack/nova/blob/af2d6c9576b1ac5f3b3768870bb15d9b5cf1610b/nova/servicegroup/drivers/zk.py#L177 Iterating the hosts through the ComputeFilter also has this issue, ComputeFilter usage in a loop has other performance issues . The zk driver issue can be mitigated by doing the testing `filtering` in the is_up instead of the get_all , by reorganizing the code. However better solution would be to have the scheduler to use the get_all, or redesigning the servicegroup management. A better design would be to use the DB even with the zk,mc drvier, but do update ONLY when the service actually came up or dies, in this case the sg drivers MAY need dedicated service processes. NOTE: The servicegroup driver concept was introduced to avoid doing 10_000 DB update/sec @100_000 host (10/sec update freq), if your servers are bad and every server has 1:1000 chance to die on the given day, it would lead only to 0.001 UPDATE/sec (100/day) @100_000 host. NOTE: If the up/down is knowable just form the DB, the scheduler could eliminate the dead hosts at the first DB query, without using ComputeFilter as it is used now. (The plugins SHOULD be able to extend the base hosts query) To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1437199/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp