I have been following, using and testing SD for a while now and saw a commit comment (https://github.com/collie/sheepdog/commit/a1cf3cbd28fb09ee0cf5dba30876b0502e8023dc) that mentioned stale object cleanup. My questions/observations follow.
I have been testing against the farm version of the code and it looks like not all of the objects are being cleaned up when a vdi is deleted. I have done several tests and it seems the VDI_ID00000000 files are left after the vdi's are deleted. I have done several rounds of testing with 44 nodes. If each node creates a vdi called "NODEX" where X in a number, I'm left with 44 vdi's. I have done it with 128MB vdi's and 10G vdi's. No matter the size, I'm left with 88 files @ 4MB after all vdi's are deleted. 88 because copies = 2. Of the 88 only 44 have unique id's due to copies = 2. Should these remaining object files be there since all vdi's have been deleted? I also would like to know if epoch cleanup is supposed to be fixed yet or not. When I drop a node and then put it back, the amount of data in the cluster, from the perspective of the os, grows even though no activity inside the vdi's took place. This appears to be limited to the node that dropped. The cluster was at epoch 1 when I dropped the node, I waited until the cluster reported being back to full size to bring the node back. Then after the cluster reported being back to full size, I checked and now it's at epoch 3 (understood because 2 was dropping node and 3 was it coming back. But the os reports double the space for that node. There is no object directory for epoch 2 on it because it was not a member of the cluster during epoch 2, but directories 1 and 3 are both the same size. So then to finish my test, I deleted all vdi's. After a few minutes, "collie node info -r" shows "Total 2000572 3695091 0% 0". It reports no data, but according to os utils, over 900GB still remains in "/sheep/obj/*". I then shutdown the cluster "collie cluster shutdown" wait a few and then restart it, now it shows "Total 2922982 9253980 31% 0". "collie vdi list -r" shows no vdi's. os's still report over 900GB. Why if there are no vdi's is there still over 900GB worth of data? I assume the 900GB includes ~ 21GB of duplicated data between epoch 1 and 3 on the node that dropped and came back, but I would think in the end old data should be purged in this situation as well. I have lots of data if desired showing what files/times/sizes were in /sheep/obj/* at various stages, as well as what the os reports at same time. I get strange results with various du commands, but believe that is due to usage of hard links between epochs. Regards, Shawn -- sheepdog mailing list [email protected] http://lists.wpkg.org/mailman/listinfo/sheepdog
