On 03/19/2012 10:44 AM, HaiTing Yao wrote: > > On Fri, Mar 16, 2012 at 6:35 PM, Liu Yuan <[email protected] > <mailto:[email protected]>> wrote: > > On 03/16/2012 04:43 PM, [email protected] > <mailto:[email protected]> wrote: > > > From: HaiTing Yao <[email protected] <mailto:[email protected]>> > > > > cached_epoch is a __thread variable. If it greater than 1, format the > > cluster again will lead to permanent I/O error. > > > > Signed-off-by: HaiTing Yao <[email protected] > <mailto:[email protected]>> > > --- > > sheep/sdnet.c | 6 +++++- > > 1 files changed, 5 insertions(+), 1 deletions(-) > > > > diff --git a/sheep/sdnet.c b/sheep/sdnet.c > > index 5db9f29..d693858 100644 > > --- a/sheep/sdnet.c > > +++ b/sheep/sdnet.c > > @@ -832,7 +832,11 @@ int get_sheep_fd(uint8_t *addr, uint16_t > port, int node_idx, uint32_t epoch) > > if (before(epoch, cached_epoch)) { > > eprintf("requested epoch is smaller than the > previous one: %d < %d\n", > > epoch, cached_epoch); > > - return -1; > > + /* cluster format again */ > > + if (sys->epoch == 1) > > + cached_epoch = 0; > > + else > > + return -1; > > } > > if (after(epoch, cached_epoch)) { > > for (i = 0; i < SD_MAX_NODES; i++) { > > > Any script that can reproduce this issue? > > > Thanks, > Yuan > > > Please try this script, thanks > > The error log like this > > Mar 19 10:28:14 forward_write_obj_req(304) 70912800000000 > Mar 19 10:28:14 get_sheep_fd(834) requested epoch is smaller than the > previous one: 1 < 2 > Mar 19 10:28:14 forward_write_obj_req(337) failed to connect to > 127.0.0.1:7002 <http://127.0.0.1:7002> > Mar 19 10:28:14 do_io_request(785) failed: 1, 70912800000000 , 1, 129 > Mar 19 10:28:14 client_handler(557) closed connection 11 > test-cached.sh > > set -x > sudo killall sheep > sudo rm -rf ~/s1 ~/s2 ~/s3 ~/s4 > echo "test cached epoch" > ~/tmp-cached > sudo sheep -d ~/s1 -z 1 > sudo sheep -d ~/s2 -z 2 -p 7002 > sudo sheep -d ~/s3 -z 3 -p 7003 > sudo sheep -d ~/s4 -z 4 -p 7004 > sleep 60 > collie cluster format > collie vdi create v1 64M > sleep 30 > collie vdi write v1 0 1024 < ~/tmp-cached > ps -ef | grep "\-z 4" | awk '{print $2}' | xargs sudo kill > sleep 60 > collie vdi write v1 0 1024 < ~/tmp-cached > sleep 6 > collie cluster format > collie vdi create v1 64M > sleep 60 > collie vdi write v1 0 1024 < ~/tmp-cached > Best Regards
I can't reproduce the issue, using the following script: for i in 0 1 2 3 4 5 6; do sheep/sheep -d /home/tailai.ly/sheepdog/store/$i -z $i -p 700$i;sleep 1;done collie/collie cluster format -b farm qemu-img create -f raw sheepdog:test 10G ~/qemu-devel/qemu-io -c "write -P 0x1 0 10M" sheepdog:test for i in 2; do pkill -f "sheep/sheep -d /home/tailai.ly/sheepdog/store/$i -z $i -p 700$i";done; ~/qemu-devel/qemu-io -c "write -P 0x2 0 10M" sheepdog:test collie/collie cluster format -b farm qemu-img create -f raw sheepdog:test 10G Thanks, Yuan -- sheepdog mailing list [email protected] http://lists.wpkg.org/mailman/listinfo/sheepdog
