On Tue, Apr 18 2017, gordon chung wrote: > the issue i see is not with how the sacks will be assigned to metricd > but how metrics (not daemon) are assigned to sacks. i don't think > storing value in storage object solves the issue because when would we > load/read it when the api and metricd processes startup? it seems this > would require: 1) all services to be shut down and 2) have a completely > clean incoming storage path. if any of the two steps aren't done, you > have a corrupt incoming storage. if this is a requirement and both of > these are done successfully, this means, any kind of 'live upgrade' is > impossible in gnocchi.
Live upgrade never has been supported in Gnocchi, so I don't see how that's a problem. It'd be cool to support it for sure, but we're far from having been able to implement it at any point in time in the best. So it's not a new issue or anything like that. I really don't see a problem with loading the number of sacks at startup. > i had did test w/ 2 replicas (see: google sheet) and it's still > non-uniform but better than without replicas: ~4%-30% vs ~8%-45%. we > could also minimise the number lock calls by dividing sacks across > workers per agent. > > going to play devils advocate now, using hashring in our use case will > always hurt throughput (even with perfect distribution since the sack > contents themselves are not uniform). returning to original question, is > using hashring worth it? i don't think we're even leveraging the > re-balancing aspect of hashring. I think it's worth it only if you use replicas – and I don't think 2 is enough, I'd try 3 at least, and make it configurable. It'll reduce a lot lock-contention (e.g. by 17x time in my previous example). As far as I'm concerned, since the number of replicas is configurable, you can add a knob that would set replicas=number_of_metricd_worker that would implement the current behaviour you implemented – every worker tries to grab every sack. We're not leveraging the re-balancing aspect of hashring, that's true. We could probably use any dumber system to spread sacks across workers, We could stick to the good ol' "len(sacks) / len(workers in the group)". But I think there's a couple of things down the road that may help us: Using the hashring makes sure worker X does not jump from sacks [A, B, C], to [W, X, Y, Z] but just to [A, B] or [A, B, C, X]. That should minimize lock contention when bringing up/down new workers. I admit it's a very marginal win, but… it comes free with it. Also, I envision a push based approach in the future (to replace the metricd_processing_delay) which will require worker to register to sacks. Making sure the rebalancing does not shake everything but is rather smooth will also reduce workload around that. Again, it comes free. -- Julien Danjou # Free Software hacker # https://julien.danjou.info
signature.asc
Description: PGP signature
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
