On Tue, Jan 13, 2015 at 10:37:40AM +0900, Hitoshi Mitake wrote: > Current sheepdog never recycles VIDs. But it will cause problems > e.g. VID space exhaustion, too much garbage inode objects. > > Keeping deleted inode objects is required because living inodes > (snapshots or clones) can point objects of the deleted inodes. So if > every member of VDI family is deleted, it is safe to remove deleted > inode objects. > > v2: > - update test scripts
All the nodes of our test cluster panic out for the following problem: Mar 12 00:05:03 DEBUG [main] zk_handle_notify(1216) NOTIFY Mar 12 00:05:03 DEBUG [main] sd_notify_handler(960) op NOTIFY_VDI_ADD, size: 96, from: IPv4 ip:192.168.39.177 port:7000 Mar 12 00:05:03 DEBUG [main] do_add_vdi_state(362) 7c2b2b, 3, 0, 22, 0 Mar 12 00:05:03 DEBUG [main] do_add_vdi_state(362) 7c2b2c, 3, 0, 22, 7c2b2b Mar 12 00:05:03 EMERG [main] update_vdi_family(127) PANIC: parent VID: 7c2b2b not found Mar 12 00:05:03 EMERG [main] crash_handler(286) sheep exits unexpectedly (Aborted), si pid 4786, uid 0, errno 0, code -6 Mar 12 00:05:03 EMERG [main] sd_backtrace(833) sheep.c:288: crash_handler Mar 12 00:05:03 EMERG [main] sd_backtrace(847) /lib64/libpthread.so.0() [0x338200f4ff] Mar 12 00:05:03 EMERG [main] sd_backtrace(847) /lib64/libc.so.6(gsignal+0x34) [0x3381c328a4] Mar 12 00:05:03 EMERG [main] sd_backtrace(847) /lib64/libc.so.6(abort+0x174) [0x3381c34084] Mar 12 00:05:03 EMERG [main] sd_backtrace(833) vdi.c:127: update_vdi_family Mar 12 00:05:03 EMERG [main] sd_backtrace(833) vdi.c:398: add_vdi_state Mar 12 00:05:03 EMERG [main] sd_backtrace(833) ops.c:711: cluster_notify_vdi_add Mar 12 00:05:03 EMERG [main] sd_backtrace(833) group.c:975: sd_notify_handler So I tracked back to this patch set. The problem of this patch set tried to solve is very clear and come along with sheepdog since its born. This reveals actually the defeciency of our vdi allocation algorithm, which we need rethink a completely new algorithm to replace it and is not fixable, unfortunately. One simple rule, we can't recyle any vid if it is once created because of its current hash collision handling. Our current implementation forbigs recycling. Instead of fixing the above panic bug, I'd suggest we revert this patch set. For the problem this patch set mentioned, I think we need a new algoirthm and implementation. But before that, we should stay with old one, it is stable and reliable and should work for small size cluster. How do you think, Hitoshi and Kazutaka? Yuan -- sheepdog mailing list [email protected] https://lists.wpkg.org/mailman/listinfo/sheepdog
