On Tue, Apr 18 2017, gordon chung wrote:

> the issue i see is not with how the sacks will be assigned to metricd 
> but how metrics (not daemon) are assigned to sacks. i don't think 
> storing value in storage object solves the issue because when would we 
> load/read it when the api and metricd processes startup? it seems this 
> would require: 1) all services to be shut down and 2) have a completely 
> clean incoming storage path. if any of the two steps aren't done, you 
> have a corrupt incoming storage. if this is a requirement and both of 
> these are done successfully, this means, any kind of 'live upgrade' is 
> impossible in gnocchi.

Live upgrade never has been supported in Gnocchi, so I don't see how
that's a problem. It'd be cool to support it for sure, but we're far
from having been able to implement it at any point in time in the best.
So it's not a new issue or anything like that. I really don't see
a problem with loading the number of sacks at startup.

> i had did test w/ 2 replicas (see: google sheet) and it's still 
> non-uniform but better than without replicas: ~4%-30% vs ~8%-45%. we 
> could also minimise the number lock calls by dividing sacks across 
> workers per agent.
>
> going to play devils advocate now, using hashring in our use case will 
> always hurt throughput (even with perfect distribution since the sack 
> contents themselves are not uniform). returning to original question, is 
> using hashring worth it? i don't think we're even leveraging the 
> re-balancing aspect of hashring.

I think it's worth it only if you use replicas – and I don't think 2 is
enough, I'd try 3 at least, and make it configurable. It'll reduce a lot
lock-contention (e.g. by 17x time in my previous example).
As far as I'm concerned, since the number of replicas is configurable,
you can add a knob that would set replicas=number_of_metricd_worker that
would implement the current behaviour you implemented – every worker
tries to grab every sack.

We're not leveraging the re-balancing aspect of hashring, that's true.
We could probably use any dumber system to spread sacks across workers,
We could stick to the good ol' "len(sacks) / len(workers in the group)".

But I think there's a couple of things down the road that may help us:
Using the hashring makes sure worker X does not jump from sacks [A, B,
C], to [W, X, Y, Z] but just to [A, B] or [A, B, C, X]. That should
minimize lock contention when bringing up/down new workers. I admit it's
a very marginal win, but… it comes free with it.
Also, I envision a push based approach in the future (to replace the
metricd_processing_delay) which will require worker to register to
sacks. Making sure the rebalancing does not shake everything but is
rather smooth will also reduce workload around that. Again, it comes
free.

-- 
Julien Danjou
# Free Software hacker
# https://julien.danjou.info

Attachment: signature.asc
Description: PGP signature

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to