On 02/22/2012 10:55 PM, Shawn Moore wrote: > I have been following, using and testing SD for a while now and saw a > commit comment > (https://github.com/collie/sheepdog/commit/a1cf3cbd28fb09ee0cf5dba30876b0502e8023dc) > that mentioned stale object cleanup. My questions/observations > follow. > > I have been testing against the farm version of the code and it looks > like not all of the objects are being cleaned up when a vdi is > deleted. I have done several tests and it seems the VDI_ID00000000 > files are left after the vdi's are deleted. I have done several > rounds of testing with 44 nodes. If each node creates a vdi called > "NODEX" where X in a number, I'm left with 44 vdi's. I have done it > with 128MB vdi's and 10G vdi's. No matter the size, I'm left with 88 > files @ 4MB after all vdi's are deleted. 88 because copies = 2. Of > the 88 only 44 have unique id's due to copies = 2. Should these > remaining object files be there since all vdi's have been deleted? >
Thanks for your test. No, I don't think so, currently farm deals with the stale object in the context of recovery. Currently, both farm and simple store don't handle the stale object of deleted VDI yet. I am going to take a look at this issue later, cooking the patch to remove the stale object of deleted VDIs for farm. > I also would like to know if epoch cleanup is supposed to be fixed yet > or not. When I drop a node and then put it back, the amount of data > in the cluster, from the perspective of the os, grows even though no > activity inside the vdi's took place. This appears to be limited to > the node that dropped. The cluster was at epoch 1 when I dropped the > node, I waited until the cluster reported being back to full size to > bring the node back. Then after the cluster reported being back to > full size, I checked and now it's at epoch 3 (understood because 2 was > dropping node and 3 was it coming back. But the os reports double the > space for that node. There is no object directory for epoch 2 on it > because it was not a member of the cluster during epoch 2, but > directories 1 and 3 are both the same size. So then to finish my > test, I deleted all vdi's. After a few minutes, "collie node info -r" > shows "Total 2000572 3695091 0% 0". It reports no data, but according > to os utils, over 900GB still remains in "/sheep/obj/*". I then > shutdown the cluster "collie cluster shutdown" wait a few and then > restart it, now it shows "Total 2922982 9253980 31% 0". "collie vdi > list -r" shows no vdi's. os's still report over 900GB. Why if there > are no vdi's is there still over 900GB worth of data? I assume the > 900GB includes ~ 21GB of duplicated data between epoch 1 and 3 on the > node that dropped and came back, but I would think in the end old data > should be purged in this situation as well. > > I have lots of data if desired showing what files/times/sizes were in > /sheep/obj/* at various stages, as well as what the os reports at same > time. I get strange results with various du commands, but believe > that is due to usage of hard links between epochs. > I guess in this test, you use simple store? FOr simple store, stale objects both from recovery and deleted VDIs are not handled yet. Thanks, Yuan -- sheepdog mailing list [email protected] http://lists.wpkg.org/mailman/listinfo/sheepdog
