Hi all,

We're experiencing huge nodepool slowness under load. Nodes are in the delete 
state for a long time (sometimes up to 20 minutes) before they actually get 
removed (we see very similar things for node creation too), and that exhausts 
our resources very quickly and our throughput slows to the speed of a snail 
with heavy shopping.

To try and figure out why, I wrote a little log analysis tool, and here are 
some graphs from the data.

Individual task time taken
https://s3.amazonaws.com/uploads.hipchat.com/8522/961402/4H008OHlrWf4NLm/task-time.png

This shows the time taken in seconds by each nodepool task (e.g. 
AddFloatingIPTask). Yes, it's slow, but consistent. During high load, the tasks 
only get more densely packed, they don't get slower.

Nodepool task queue size
https://s3.amazonaws.com/uploads.hipchat.com/8522/961402/1S0kAiKGMMQCrpb/queue-size.png

This shows the number of individual nodepool tasks (e.g. AddFloatingIPTask) 
waiting in the queue. Guess when a load of jobs hit us!

Total node deletion time
https://s3.amazonaws.com/uploads.hipchat.com/8522/961402/ixQxq4U4C5icl2K/deletion-time.png

That shows the amount of time the nodes spend in the delete state, from going 
from used to delete, to all the delete tasks having run and the node getting 
removed. Take a look at what happens when there's a lot of stuff in the queue. 
Ouchy.

Our 'rate' is the default of 1.0. Any ideas or help would be appreciated!

Thanks,
Mike


_______________________________________________
OpenStack-Infra mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Reply via email to