hi, i've started to implement multiple buckets and the initial tests look promising. here's some things i've done:
- dropped the scheduler process and allow processing workers to figure out tasks themselves - each sack is now handled fully (not counting anything added after processing worker) - number of sacks are static after the above, i've been testing it and it works pretty well, i'm able to process 40K metrics, 60 points each, in 8-10mins with 54 workers when it took significantly longer before. the issues i've run into: - dynamic sack size making number of sacks dynamic is a concern. previously, we said to have sack size in conf file. the concern is that changing that option incorrectly actually 'corrupts' the db to a state that it cannot recover from. it will have stray unprocessed measures constantly. if we change the db path incorrectly, we don't actually corrupt anything, we just lose data. we've said we don't want sack mappings in indexer so it seems to me, the only safe solution is to make it sack size static and only changeable by hacking? - sack distribution to distribute sacks across workers, i initially implemented consistent hashing. the issue i noticed is that because hashring is inherently has non-uniform distribution[1], i would have workers sitting idle because it was given less sacks, while other workers were still working. i tried also to implement jump hash[2], which improved distribution and is in theory, less memory intensive as it does not maintain a hash table. while better at distribution, it still is not completely uniform and similarly, the less sacks per worker, the worse the distribution. lastly, i tried just simple locking where each worker is completely unaware of any other worker and handles all sacks. it will lock the sack it is working on, so if another worker tries to work on it, it will just skip. this will effectively cause an additional requirement on locking system (in my case redis) as each worker will make x lock requests where x is number of sacks. so if we have 50 workers and 2048 sacks, it will be 102K requests per cycle. this is in addition to the n number of lock requests per metric (10K-1M metrics?). this does guarantee if a worker is free and there is work to be done, it will do it. i guess the question i have is: by using a non-uniform hash, it seems we gain possibly less load at the expense of efficiency/'speed'. the number of sacks/tasks we have is stable, it won't really change. the number of metricd workers may change but not constantly. lastly, the number of sacks per worker will always be relatively low (10:1, 100:1 assuming max number of sacks is 2048). given these conditions, do we need consistent/jump hashing? is it better to just modulo sacks and ensure 'uniform' distribution and allow for 'larger' set of buckets to be reshuffled when workers are added? [1] https://docs.google.com/spreadsheets/d/1flXw1lqao2tIc0p1baxVeJIXgzhy1Ksw3uFoiwyZkXk/edit?usp=sharing [2] https://arxiv.org/pdf/1406.2294.pdf [3] https://review.openstack.org/#/q/topic:buckets+project:openstack/gnocchi -- gord __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
