On Mon, Jul 9, 2012 at 11:30 AM, Liu Yuan <[email protected]> wrote: > On 07/09/2012 11:25 AM, Liu Yuan wrote: >> On 07/09/2012 09:58 AM, Liu Yuan wrote: >>> Got an weird segfault, >>> >>> (gdb) where >>> #0 0x0000000000411936 in do_process_work (work=0xd13c70) at ops.c:992 >>> #1 0x000000000040ed05 in worker_routine (arg=0xd12a20) at work.c:171 >>> #2 0x00007f43f992c971 in start_thread (arg=<value optimized out>) at >>> pthread_create.c:304 >>> #3 0x00007f43f8eeef3d in clone () at >>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 >>> #4 0x0000000000000000 in ?? () >>> >>> sheep.log: >>> ... >>> Jul 09 09:47:23 [main] client_handler(764) connection seems to be dead >>> Jul 09 09:47:23 [main] clear_client(703) refcnt:0, fd:14, ::1:43328 >>> Jul 09 09:47:23 [main] destroy_client(672) connection from: ::1:43328 >>> Jul 09 09:47:23 [main] cdrv_cpg_deliver(448) 5 >>> Jul 09 09:47:23 [main] sd_notify_handler(851) size: 96, from: IPv4 >>> ip:127.0.0.1 port:7000 >>> Jul 09 09:47:23 [main] client_tx_handler(663) connection from: 13, ::1:43330 >>> Jul 09 09:47:23 [main] client_handler(764) connection seems to be dead >>> Jul 09 09:47:23 [main] clear_client(703) refcnt:0, fd:13, ::1:43330 >>> Jul 09 09:47:23 [main] destroy_client(672) connection from: ::1:43330 >>> Jul 09 09:47:23 [main] listen_handler(819) accepted a new connection: 13 >>> Jul 09 09:47:23 [main] listen_handler(819) accepted a new connection: 14 >>> Jul 09 09:47:23 [block] do_process_work(990) 80, 0 , 32579 <--- XXX >>> Jul 09 09:47:23 [main] client_rx_handler(577) connection from: 14, ::1:43337 >>> Jul 09 09:47:23 [main] queue_request(323) 2 >>> Jul 09 09:47:23 [main] crash_handler(408) sheep pid 5326 exited >>> unexpectedly. >>> >>> Thanks, >>> Yuan >>> >> >> Yet another segfault. >> >> #0 __libc_free (mem=0x7f3301864000) at malloc.c:3709 >> 3709 malloc.c: No such file or directory. >> in malloc.c >> (gdb) where >> #0 __libc_free (mem=0x7f3301864000) at malloc.c:3709 >> #1 0x00000000004090a1 in free_request (req=0x7f32fc000a00) at sdnet.c:474 >> #2 0x00000000004098bd in client_tx_handler (ci=0x7f32fc0143c0) at >> sdnet.c:656 >> #3 0x0000000000409d32 in client_handler (fd=14, events=4, >> data=0x7f32fc0143c0) at sdnet.c:760 >> #4 0x000000000041e470 in event_loop (timeout=-1) at event.c:179 >> #5 0x0000000000404376 in main (argc=7, argv=0x7fff9f1566a8) at sheep.c:275 >> > > Again and again: > > Program terminated with signal 11, Segmentation fault. > #0 0x00000000004118b4 in has_process_main (op=0x0) at ops.c:981 > 981 return !!op->process_main;
I have fix this segment fault, see my newest patch. > (gdb) where > #0 0x00000000004118b4 in has_process_main (op=0x0) at ops.c:981 > #1 0x00000000004057e7 in prepare_cluster_msg (req=0xb03ca0, > sizep=0x7fff129c3640) at group.c:275 > #2 0x000000000040585c in cluster_op_done (work=0xb03d60) at group.c:290 > #3 0x000000000040ebaf in bs_thread_request_done (fd=12, events=1, > data=0x0) at work.c:135 > #4 0x000000000041e470 in event_loop (timeout=-1) at event.c:179 > #5 0x0000000000404376 in main (argc=7, argv=0x7fff129c4e98) at sheep.c:275 > > ========================== > > Program terminated with signal 11, Segmentation fault. > #0 0x000000000040e6d9 in __list_del (prev=0x21, next=0x0) at > ../include/list.h:79 > 79 next->prev = prev; > (gdb) where > #0 0x000000000040e6d9 in __list_del (prev=0x21, next=0x0) at > ../include/list.h:79 > #1 0x000000000040e710 in list_del (entry=0x1582420) at ../include/list.h:90 > #2 0x000000000040ece2 in worker_routine (arg=0x157aa20) at work.c:168 > #3 0x00007fd02a8c6971 in start_thread (arg=<value optimized out>) at > pthread_create.c:304 > #4 0x00007fd029e88f3d in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 > #5 0x0000000000000000 in ?? () > > > I can reproduce veracious kind of segfault by following script almost > every time, it seems that for-0.4.0 branch is broken. > > =================== > > #!/bin/bash > > pkill -9 sheep > pkill -9 collie > rm store/* -rf > for i in `seq 0 7`; do sheep/sheep -d /home/tailai.ly/sheepdog/store/$i > -z $i -p $((7000+$i));done > sleep 3 > collie/collie cluster format -c 3 > sleep 1 > > for i in `seq 0 4`;do > collie/collie vdi create test$i 100M > done > > for i in `seq 0 4`;do > dd if=/dev/urandom | collie/collie vdi write test$i -p 7000 & > done > > sleep 3 > for i in 1 2 3 4 5; do pkill -f "sheep/sheep -d > /home/tailai.ly/sheepdog/store/$i -z $i -p 700$i";sleep 3;done; > for i in `seq 1 5`; do sheep/sheep -d /home/tailai.ly/sheepdog/store/$i > -z $i -p $((7000+$i));done > > echo wait for object recovery to finish > for ((;;)); do > if [ "$(pgrep collie)" ]; then > sleep 1 > else > break > fi > done > > for i in `seq 0 7`; do > for j in `seq 0 4`; do > ./collie/collie vdi read test$j -p 700$i | md5sum > done > done > > -- > sheepdog mailing list > [email protected] > http://lists.wpkg.org/mailman/listinfo/sheepdog -- Yunkai Zhang Work at Taobao -- sheepdog mailing list [email protected] http://lists.wpkg.org/mailman/listinfo/sheepdog
