zugnush wrote:
You could do something like this so that every process will know if the file "belongs" to it without prior coordination, it means a lot of redundant hashing though.In [36]: import md5 In [37]: pool = 11 In [38]: process = 5 In [39]: [f for f in glob.glob('*') if int(md5.md5(f).hexdigest(),16) % pool == process ] Out[39]:
You're also relying on the hashing being perfectly distributed, otherwise some processes aren't going to be performing useful work even though there is useful work to perform.
In other words, why would you rely on a scheme that limits some processes to certain parts of the data? If we're already talking about trying to get away without some global lock for synchronisation this seems to go against the original intent of the problem...
n -- http://mail.python.org/mailman/listinfo/python-list
