Re: [Gluster-devel] scrubber crash

2015-06-01 Thread Gaurav Garg


- Original Message -
From: Venky Shankar vshan...@redhat.com
To: gg...@redhat.com, anekk...@redhat.com
Cc: gluster-devel@gluster.org
Sent: Monday, June 1, 2015 3:28:21 PM
Subject: Re: [Gluster-devel] scrubber crash



On 06/01/2015 02:23 PM, Venky Shankar wrote:


 On 06/01/2015 01:09 PM, Anand Nekkunti wrote:
 Hi Venky
one of regression test in my patch, I found core dump from 
 scrubber . Please have a look.

 Link 
 :http://build.gluster.org/job/rackspace-regression-2GB-triggered/9925/consoleFull

 bt fir core ...

 (gdb) bt
 #0  0x7f89d6224731 in gf_tw_mod_timer_pending (base=0xf2fbc0, 
 timer=0x0, expires=233889) at 
 /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/contrib/timer-wheel/timer-wheel.c:239
 #1  0x7f89c82ce7e8 in br_fsscan_reschedule (this=0x7f89c4008980, 
 child=0x7f89c4011238, fsscan=0x7f89c4012290, fsscrub=0x7f89c4010010, 
 pendingcheck=_gf_true)

 The crash happens when scrubber is paused as reconfigure() blindly 
 accesses scrubber specific data which is not available _after_ pause.

 Thanks for reporting. I'll send a fix for this.
OK. This is not a straight forward crash. The crash is due to a race 
between CHILD_UP (marking the subvolume as up and initializing 
essential structures _later_) and reconfigure() which tries to access 
structures which are yet to be initialized.

For now we can induce delay before invoking reconfigure() {pause in 
the test case} and work on a proper fix for this.

in the test case how much delay we need we don't know. so one idea is to wait 
for few second in reconfigure function
and poll whether timer have initialized or not. if it is initialized then 
proceed further. otherwise skip.

Thoughts?

-Venky
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] scrubber crash

2015-06-01 Thread Venky Shankar



On 06/01/2015 02:23 PM, Venky Shankar wrote:



On 06/01/2015 01:09 PM, Anand Nekkunti wrote:

Hi Venky
   one of regression test in my patch, I found core dump from 
scrubber . Please have a look.


Link 
:http://build.gluster.org/job/rackspace-regression-2GB-triggered/9925/consoleFull


bt fir core ...

(gdb) bt
#0  0x7f89d6224731 in gf_tw_mod_timer_pending (base=0xf2fbc0, 
timer=0x0, expires=233889) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/contrib/timer-wheel/timer-wheel.c:239
#1  0x7f89c82ce7e8 in br_fsscan_reschedule (this=0x7f89c4008980, 
child=0x7f89c4011238, fsscan=0x7f89c4012290, fsscrub=0x7f89c4010010, 
pendingcheck=_gf_true)


The crash happens when scrubber is paused as reconfigure() blindly 
accesses scrubber specific data which is not available _after_ pause.


Thanks for reporting. I'll send a fix for this.
OK. This is not a straight forward crash. The crash is due to a race 
between CHILD_UP (marking the subvolume as up and initializing 
essential structures _later_) and reconfigure() which tries to access 
structures which are yet to be initialized.


For now we can induce delay before invoking reconfigure() {pause in 
the test case} and work on a proper fix for this.


Thoughts?

-Venky
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] scrubber crash

2015-06-01 Thread Venky Shankar



On 06/01/2015 01:09 PM, Anand Nekkunti wrote:

Hi Venky
   one of regression test in my patch, I found core dump from scrubber 
. Please have a look.


Link 
:http://build.gluster.org/job/rackspace-regression-2GB-triggered/9925/consoleFull


bt fir core ...

(gdb) bt
#0  0x7f89d6224731 in gf_tw_mod_timer_pending (base=0xf2fbc0, 
timer=0x0, expires=233889) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/contrib/timer-wheel/timer-wheel.c:239
#1  0x7f89c82ce7e8 in br_fsscan_reschedule (this=0x7f89c4008980, 
child=0x7f89c4011238, fsscan=0x7f89c4012290, fsscrub=0x7f89c4010010, 
pendingcheck=_gf_true)


The crash happens when scrubber is paused as reconfigure() blindly 
accesses scrubber specific data which is not available _after_ pause.


Thanks for reporting. I'll send a fix for this.

at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/bit-rot/src/bitd/bit-rot-scrub.c:703
#2  0x7f89c82cc9d4 in reconfigure (this=0x7f89c4008980, 
options=0x7f89d3bc9558) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/features/bit-rot/src/bitd/bit-rot.c:1673
#3  0x7f89d62044cd in xlator_reconfigure_rec 
(old_xl=0x7f89c4008980, new_xl=0x7f89c409b460) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/options.c:1084
#4  0x7f89d6204414 in xlator_reconfigure_rec 
(old_xl=0x7f89c400a6c0, new_xl=0x7f89c409c500) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/options.c:1070
#5  0x7f89d62045df in xlator_tree_reconfigure 
(old_xl=0x7f89c400a6c0, new_xl=0x7f89c409c500) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/options.c:1112
#6  0x7f89d61ec7bd in glusterfs_graph_reconfigure 
(oldgraph=0x7f89c4001d30, newgraph=0x7f89c4098130) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/graph.c:893
#7  0x7f89d61ec629 in glusterfs_volfile_reconfigure 
(oldvollen=932, newvolfile_fp=0x7f89c4097eb0, ctx=0xefe010,
oldvolfile=0x7f89c40608c0 volume patchy-client-0\ntype 
protocol/client\noption password 
57218e76-6f3a-4f60-8b23-a0bca58c135d\noption username 
3f24264e-5cbc-4be7-a2eb-326d804f8f90\noption transport-type 
tcp\nopti...) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/graph.c:844
#8  0x0040e27d in mgmt_getspec_cbk (req=0xf65c7c, 
iov=0xf65cbc, count=1, myframe=0x7f89d4005c58) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/glusterfsd/src/glusterfsd-mgmt.c:1532
#9  0x7f89d5f63e90 in rpc_clnt_handle_reply (clnt=0xf65990, 
pollin=0x7f89c4060740) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-lib/src/rpc-clnt.c:761
#10 0x7f89d5f642b0 in rpc_clnt_notify (trans=0xf674e0, 
mydata=0xf659c0, event=RPC_TRANSPORT_MSG_RECEIVED, 
data=0x7f89c4060740) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-lib/src/rpc-clnt.c:889
#11 0x7f89d5f607e0 in rpc_transport_notify (this=0xf674e0, 
event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f89c4060740) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-lib/src/rpc-transport.c:538
#12 0x7f89ca741311 in socket_event_poll_in (this=0xf674e0) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-transport/socket/src/socket.c:2285 

#13 0x7f89ca7417cc in socket_event_handler (fd=9, idx=1, 
data=0xf674e0, poll_in=1, poll_out=0, poll_err=0) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-transport/socket/src/socket.c:2398
#14 0x7f89d620f449 in event_dispatch_epoll_handler 
(event_pool=0xf1cc10, event=0x7f89c911ae70) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/event-epoll.c:567
#15 0x7f89d620f7a2 in event_dispatch_epoll_worker (data=0xf686a0) 
at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/event-epoll.c:669

#16 0x7f89d56ff9d1 in start_thread () from ./lib64/libpthread.so.0
#17 0x7f89d50698fd in clone () from ./lib64/libc.so.6

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel