At Tue, 21 Oct 2014 12:18:55 +0200, Valerio Pachera wrote: > > 2014-10-20 9:07 GMT+02:00 Hitoshi Mitake <[email protected]>: > > This patchset removes a bug in recovery process. Current recovery > > process can lose data of erasure coded VDIs when a number of nodes is > > smaller than a number of data stripes. > > > > The same thing can be found here: > > https://github.com/sheepdog/sheepdog/tree/ec-recovery > > > > Valerio, could you test it? > > It seems to work fine. > > I used -c 2:1 and kill all nodes but one. > I rejoined the cluster with the second node and check the md5sum of > the vdi and it matches the one calculated before killing the nodes. > > dog vdi read test | md5sum > 8886bddd205a7698a8194594c76e61b5 - > > dog vdi read test | md5sum > 8886bddd205a7698a8194594c76e61b5 -
Thanks for your testing, Valerio. > > I notice that a lot of INFO and ERROR get printed in sheep.log. > In my testing environment I have only 1 vdi of 800M. > In a real cluster with terabytes of data the log would probably became huge. The below error messages seem to be introduced by trivial mistake. I'll fix it before applying. Thanks, Hitoshi > > ... > Oct 21 12:09:18 INFO [main] recover_object_main(908) object recovery > progress 47% > Oct 21 12:09:18 ERROR [rw 14158] sheep_exec_req(1170) failed Failed > to find requested tag, remote address: 192.168.10.5:7000, op name: > GET_EPOCH > Oct 21 12:09:18 ALERT [rw 14158] rollback_vnode_info(117) cannot get epoch 0 > Oct 21 12:09:18 ALERT [rw 14158] rollback_vnode_info(118) clients may > see old data > Oct 21 12:09:18 ERROR [rw 14158] read_erasure_object(230) can not > read 7c2b2500000085 idx 0 > Oct 21 12:09:18 INFO [main] recover_object_main(908) object recovery > progress 48% > Oct 21 12:09:18 ERROR [rw 13514] sheep_exec_req(1170) failed Failed > to find requested tag, remote address: 192.168.10.5:7000, op name: > GET_EPOCH > Oct 21 12:09:18 ALERT [rw 13514] rollback_vnode_info(117) cannot get epoch 0 > Oct 21 12:09:18 ALERT [rw 13514] rollback_vnode_info(118) clients may > see old data > Oct 21 12:09:18 ERROR [rw 13514] read_erasure_object(230) can not > read 7c2b2500000086 idx 0 > Oct 21 12:09:18 ERROR [rw 14158] sheep_exec_req(1170) failed Failed > to find requested tag, remote address: 192.168.10.5:7000, op name: > GET_EPOCH > Oct 21 12:09:18 ALERT [rw 14158] rollback_vnode_info(117) cannot get epoch 0 > Oct 21 12:09:18 ALERT [rw 14158] rollback_vnode_info(118) clients may > see old data > Oct 21 12:09:18 ERROR [rw 14158] read_erasure_object(230) can not > read 7c2b2500000088 idx 1 > Oct 21 12:09:18 INFO [main] recover_object_main(908) object recovery > progress 50% > ... > -- > sheepdog mailing list > [email protected] > http://lists.wpkg.org/mailman/listinfo/sheepdog -- sheepdog mailing list [email protected] http://lists.wpkg.org/mailman/listinfo/sheepdog
