[Yahoo-eng-team] [Bug 1437199] [NEW] zookeeper driver used with O(n^2) complexity by the scheduler

Attila Fazekas Fri, 27 Mar 2015 01:31:11 -0700

Public bug reported:

(Loop1) 
https://github.com/openstack/nova/blob/af2d6c9576b1ac5f3b3768870bb15d9b5cf1610b/nova/scheduler/driver.py#L55
(Loop2) 
https://github.com/openstack/nova/blob/af2d6c9576b1ac5f3b3768870bb15d9b5cf1610b/nova/servicegroup/drivers/zk.py#L177


Iterating the hosts through  the  ComputeFilter also has this issue,  
ComputeFilter usage in a loop has other performance issues .

The zk driver issue can be mitigated by doing the testing `filtering` in
the is_up instead of the get_all , by reorganizing the code.


However better solution would be to have the scheduler to use the get_all,
or redesigning the servicegroup management.

A better design would be to use the DB even with the zk,mc drvier, but
do update ONLY when the service actually came up or dies, in this case
the sg drivers MAY need dedicated service processes.

NOTE: The servicegroup driver concept was introduced to avoid doing 10_000 DB 
update/sec @100_000 host (10/sec  update freq),
if your servers are bad and every server has 1:1000 chance to die on the given 
day,  it would lead only to 0.001 UPDATE/sec (100/day) @100_000 host.

NOTE: If the up/down is knowable just form the DB, the scheduler could
eliminate the dead hosts at the first DB query, without using
ComputeFilter as it is used now. (The plugins SHOULD be able to extend
the  base hosts query)

** Affects: nova
     Importance: Undecided
         Status: New

** Summary changed:

- zookeper driver used with O(n^2) complexity  by the scheduler
+ zookeeper driver used with O(n^2) complexity  by the scheduler

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1437199

Title:
  zookeeper driver used with O(n^2) complexity  by the scheduler

Status in OpenStack Compute (Nova):
  New

Bug description:
  (Loop1) 
https://github.com/openstack/nova/blob/af2d6c9576b1ac5f3b3768870bb15d9b5cf1610b/nova/scheduler/driver.py#L55
  (Loop2) 
https://github.com/openstack/nova/blob/af2d6c9576b1ac5f3b3768870bb15d9b5cf1610b/nova/servicegroup/drivers/zk.py#L177

  Iterating the hosts through  the  ComputeFilter also has this issue,  
  ComputeFilter usage in a loop has other performance issues .

  The zk driver issue can be mitigated by doing the testing `filtering`
  in the is_up instead of the get_all , by reorganizing the code.

  
  However better solution would be to have the scheduler to use the get_all,
  or redesigning the servicegroup management.

  A better design would be to use the DB even with the zk,mc drvier, but
  do update ONLY when the service actually came up or dies, in this case
  the sg drivers MAY need dedicated service processes.

  NOTE: The servicegroup driver concept was introduced to avoid doing 10_000 DB 
update/sec @100_000 host (10/sec  update freq),
  if your servers are bad and every server has 1:1000 chance to die on the 
given day,  it would lead only to 0.001 UPDATE/sec (100/day) @100_000 host.

  NOTE: If the up/down is knowable just form the DB, the scheduler could
  eliminate the dead hosts at the first DB query, without using
  ComputeFilter as it is used now. (The plugins SHOULD be able to extend
  the  base hosts query)

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1437199/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1437199] [NEW] zookeeper driver used with O(n^2) complexity by the scheduler

Reply via email to