[Gluster-devel] glusterfs coredump--mempool

2019-05-21 Thread Zhou, Cynthia (NSB - CN/Hangzhou)
Hi glusterfs expert,
I meet glusterfs process coredump again in my env, short after glusterfs 
process startup. The local become NULL, but seems this frame is not destroyed 
yet since the magic number(GF_MEM_HEADER_MAGIC) still untouched.
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterfs --acl --volfile-server=mn-0.local 
--volfile-server=mn-1.loc'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x7f867fcd2971 in client3_3_inodelk_cbk (req=, 
iov=, count=, myframe=0x7f8654008830)
at client-rpc-fops.c:1510
1510  CLIENT_STACK_UNWIND (inodelk, frame, rsp.op_ret,
[Current thread is 1 (Thread 0x7f867d6d4700 (LWP 3046))]
Missing separate debuginfos, use: dnf debuginfo-install 
glusterfs-fuse-3.12.15-1.wos2.wf29.x86_64
(gdb) bt
#0  0x7f867fcd2971 in client3_3_inodelk_cbk (req=, 
iov=, count=, myframe=0x7f8654008830)
at client-rpc-fops.c:1510
#1  0x7f8685ea5584 in rpc_clnt_handle_reply 
(clnt=clnt@entry=0x7f8678070030, pollin=pollin@entry=0x7f86702833e0) at 
rpc-clnt.c:782
#2  0x7f8685ea587b in rpc_clnt_notify (trans=, 
mydata=0x7f8678070060, event=, data=0x7f86702833e0) at 
rpc-clnt.c:975
#3  0x7f8685ea1b83 in rpc_transport_notify (this=this@entry=0x7f8678070270, 
event=event@entry=RPC_TRANSPORT_MSG_RECEIVED,
data=data@entry=0x7f86702833e0) at rpc-transport.c:538
#4  0x7f8680b99867 in socket_event_poll_in (notify_handled=_gf_true, 
this=0x7f8678070270) at socket.c:2260
#5  socket_event_handler (fd=, idx=3, gen=1, 
data=0x7f8678070270, poll_in=, poll_out=,
poll_err=) at socket.c:2645
#6  0x7f8686132911 in event_dispatch_epoll_handler (event=0x7f867d6d3e6c, 
event_pool=0x55e1b2792b00) at event-epoll.c:583
#7  event_dispatch_epoll_worker (data=0x7f867805ece0) at event-epoll.c:659
#8  0x7f8684ea65da in start_thread () from /lib64/libpthread.so.0
#9  0x7f868474eeaf in clone () from /lib64/libc.so.6
(gdb) print *(call_frame_t*)myframe
$3 = {root = 0x7f86540271a0, parent = 0x0, frames = {next = 0x7f8654027898, 
prev = 0x7f8654027898}, local = 0x0, this = 0x7f8678013080, ret = 0x0,
  ref_count = 0, lock = {spinlock = 0, mutex = {__data = {__lock = 0, __count = 
0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0,
__list = {__prev = 0x0, __next = 0x0}}, __size = '\000' , __align = 0}}, cookie = 0x0, complete = _gf_false, xid = 0,
  op = GF_FOP_NULL, begin = {tv_sec = 0, tv_usec = 0}, end = {tv_sec = 0, 
tv_usec = 0}, wind_from = 0x0, wind_to = 0x0, unwind_from = 0x0, unwind_to = 
0x0}
(gdb) x/4xw  0x7f8654008810
0x7f8654008810:   0xcafebabe 0x 0x 0x
(gdb) p *(pooled_obj_hdr_t *)0x7f8654008810
$2 = {magic = 3405691582, next = 0x0, pool_list = 0x7f8654000b80, power_of_two 
= 8}

I add "uint32_t xid" in data structure _call_frame, and set it according to the 
rcpreq->xid in __save_frame function. In normal situation this xid should only 
be 0 immediately after create_frame from memory pool. But in this case this xid 
is 0, so seems like that the frame has been given out for use before freed. 
Have you any idea how this happen?


cynthia
___

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/836554017

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/486278655

Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel



[Gluster-devel] glusterfs coredump

2019-01-22 Thread Lian, George (NSB - CN/Hangzhou)
Hi, GlusterFS expert,

We have encounter a coredump of client process “glusterfs” issue recently, and 
it could be reproduced more easy when the IO load and CPU/memory load is highly 
during stability testing.
Our glusterfs release is 3.12.2
I have copy the call trace of core dump as the below, and have some question, 
wish can get help from you.


1) Do you have encounter related issue? From the call trace, we could see 
the fd variable seems abnoram with the field “refcount” and “inode”,

For wb_inode->this, it become invalid value with 0xff00, did the 
value “0xff00”is some meaningful value? Because every coredump 
occurred, the value of inode->this is same with “0xff00”



2) When I check the source code, in function wb_enqueue_common, could find 
function __wb_request_unref used instead of  wb_request_unref, and we could see 
though function wb_request_unref is defined, but never used!  Firstly it seems 
some strange, secondly, in wb_request_unref, there are lock mechanism to avoid 
race condition, but __wb_request_unref without those mechanism, and we could 
see there are more occurrence called from __wb_request_unref, will it lead to 
race issue?

[Current thread is 1 (Thread 0x7f54e82a3700 (LWP 6078))]
(gdb) bt
#0  0x7f54e623197c in wb_fulfill (wb_inode=0x7f54d4066bd0, 
liabilities=0x7f54d0824440) at write-behind.c:1155
#1  0x7f54e6233662 in wb_process_queue (wb_inode=0x7f54d4066bd0) at 
write-behind.c:1728
#2  0x7f54e6234039 in wb_writev (frame=0x7f54d406d6c0, this=0x7f54e0014b10, 
fd=0x7f54d8019d70, vector=0x7f54d0018000, count=1, offset=33554431, 
flags=32770, iobref=0x7f54d021ec20, xdata=0x0)
at write-behind.c:1842
#3  0x7f54e6026fcb in du_writev_resume (ret=0, frame=0x7f54d0002260, 
opaque=0x7f54d0002260) at disk-usage.c:490
#4  0x7f54ece07160 in synctask_wrap () at syncop.c:377
#5  0x7f54eb3a2660 in ?? () from /lib64/libc.so.6
#6  0x in ?? ()
(gdb) p wb_inode
$6 = (wb_inode_t *) 0x7f54d4066bd0
(gdb) p wb_inode->this
$1 = (xlator_t *) 0xff00
(gdb) frame 1
#1  0x7f54e6233662 in wb_process_queue (wb_inode=0x7f54d4066bd0) at 
write-behind.c:1728
1728 in write-behind.c
(gdb) p wind_failure
$2 = 0
(gdb) p *wb_inode
$3 = {window_conf = 35840637416824320, window_current = 35840643167805440, 
transit = 35839681019027968, all = {next = 0xb000, prev = 0x7f54d4066bd000}, 
todo = {next = 0x7f54deadc0de00,
prev = 0x7f54e00489e000}, liability = {next = 0x7f5400a200, prev = 
0xb000}, temptation = {next = 0x7f54d4066bd000, prev = 0x7f54deadc0de00}, wip = 
{next = 0x7f54e00489e000, prev = 0x7f5400a200},
  gen = 45056, size = 35840591659782144, lock = {spinlock = 0, mutex = {__data 
= {__lock = 0, __count = 8344798, __owner = 0, __nusers = 8344799, __kind = 
41472, __spins = 21504, __elision = 127, __list = {
  __prev = 0xb000, __next = 0x7f54d4066bd000}},
  __size = 
"\000\000\000\000\336T\177\000\000\000\000\000\337T\177\000\000\242\000\000\000T\177\000\000\260\000\000\000\000\000\000\000\320k\006\324T\177",
 __align = 35840634501726208}},
  this = 0xff00, dontsync = -1}
(gdb) frame 2
#2  0x7f54e6234039 in wb_writev (frame=0x7f54d406d6c0, this=0x7f54e0014b10, 
fd=0x7f54d8019d70, vector=0x7f54d0018000, count=1, offset=33554431, 
flags=32770, iobref=0x7f54d021ec20, xdata=0x0)
at write-behind.c:1842
1842 in write-behind.c
(gdb) p fd
$4 = (fd_t *) 0x7f54d8019d70
(gdb) p *fd
$5 = {pid = 140002378149040, flags = -670836240, refcount = 32596, inode_list = 
{next = 0x7f54d8019d80, prev = 0x7f54d8019d80}, inode = 0x0, lock = {spinlock = 
-536740032, mutex = {__data = {
__lock = -536740032, __count = 32596, __owner = -453505333, __nusers = 
32596, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 
0x0}},
  __size = "@\377\001\340T\177\000\000\313\016\370\344T\177", '\000' 
, __align = 140002512207680}}, _ctx = 0x, xl_count = 
0, lk_ctx = 0x0, anonymous = (unknown: 3623984496)}
(gdb)


Thanks & Best Regards,
George

___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel