Partial answers are super helpful! I'm happy to break this up if it's too much for 1 question @moderators
Sam On Sat, Feb 27, 2021 at 1:27 PM, Sam Shleifer < [email protected] > wrote: > > Hi! > > I am trying to use plasma store to reduce the memory usage of a pytorch > dataset/dataloader combination, and had 4 questions. I don’t think any of > them require pytorch knowledge. If you prefer to comment inline there is a > quip with identical content and prettier formatting here https:/ / quip. com/ > 3mwGAJ9KR2HT ( https://quip.com/3mwGAJ9KR2HT ) > > > > *1)* My script starts the plasma-store from python with 200 GB: > > > > nbytes = (1024 ** 3) * 200 > > _server = subprocess.Popen(["plasma_store", "-m", str(nbytes), "-s", > path]) > > where nbytes is chosen arbitrarily. From my experiments it seems that one > should start the store as large as possible within the limits of dev/shm . > I wanted to verify whether this is actually the best practice (it would be > hard for my app to know the storage needs up front) and also whether there > is an automated way to figure out how much storage to allocate. > > > > *2)* Does plasma store support simultaneous reads? My code, which has > multiple clients all asking for the 6 arrays from the plasma-store > thousands of times, was segfaulting with different errors, e.g. > > Check failed: RemoveFromClientObjectIds(object_id, entry, client) == 1 > > until I added a lock around my client.get > > > > if self.use_lock: # Fix segfault > > with FileLock("/tmp/plasma_lock"): > > ret = self.client.get(self.object_id) > > else: > > ret = self.client.get(self.object_id) > > > > which fixes. > > > > Here is a full traceback of the failure without the lock https:/ / gist. > github. > com/ sshleifer/ 75145ba828fcb4e998d5e34c46ce13fc ( > https://gist.github.com/sshleifer/75145ba828fcb4e998d5e34c46ce13fc ) > > Is this expected behavior? > > > > *3)* Is there a simple way to add many objects to the plasma store at > once? Right now, we are considering changing, > > > > oid = client.put(array) > > to > > oids = [client.put(x) for x in array] > > > > so that we can fetch one entry at a time. but the writes are much slower. > > > > * 3a) Is there a lower level interface for bulk writes? > > * 3b) Or is it recommended to chunk the array and have different python > processes write simultaneously to make this faster? > > > > *4)* Is there a way to save/load the contents of the plasma-store to disk > without loading everything into memory and then saving it to some other > format? > > > > Replication > > > > Setup instructions for fairseq+replicating the segfault: https:/ / gist. > github. > com/ sshleifer/ bd6982b3f632f1d4bcefc9feceb30b1a ( > https://gist.github.com/sshleifer/bd6982b3f632f1d4bcefc9feceb30b1a ) > > My code is here: https:/ / github. com/ pytorch/ fairseq/ pull/ 3287 ( > https://github.com/pytorch/fairseq/pull/3287 ) > > > > Thanks! > > Sam >
