Re: [Gluster-users] glusterfs client crashes
On 2/23/2016 10:27 AM, Raghavendra Gowdappa wrote: Came across a glibc bug which could've caused some corruptions. On googling about possible problems, we found that there is an issue (https://bugzilla.redhat.com/show_bug.cgi?id=1305406) fixed in glibc-2.17-121.el7. We have the latest version available for Centos 7.2 installed, which is glibc-2.17-106. It reports "Your libc is likely buggy." I'm happy to test again once the 2.17-121 version is available for Centos 7.2. Thank you, -Dj ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfs client crashes
2.17-106.el7 is the latest glibc on CentOS 7. Tried the one-liner on older versions as well which also results in "likely buggy" according to the test. Found this CentOS issue - https://bugs.centos.org/view.php?id=10426 # rpm -qa | grep glibc *glibc*-2.17-106.el7_2.4.x86_64 *glibc*-common-2.17-106.el7_2.4.x86_64 # objdump -r -d /lib64/libc.so.6 | grep -C 20 _int_free | grep -C 10 cmpxchg | head -21 | grep -A 3 cmpxchg | tail -1 | (grep '%r' && echo "Your libc is likely buggy." || echo "Your libc looks OK.") 7ca3e: 48 85 c9 test *%r*cx,*%r*cx Your libc is likely buggy. Kind regards, Fredrik Widlund On Tue, Feb 23, 2016 at 4:27 PM, Raghavendra Gowdappa <rgowd...@redhat.com> wrote: > Came across a glibc bug which could've caused some corruptions. On > googling about possible problems, we found that there is an issue ( > https://bugzilla.redhat.com/show_bug.cgi?id=1305406) fixed in > glibc-2.17-121.el7. From the bug we found the following test-script to > determine if the glibc is buggy. And on running it, we ran it on the local > setup using the following method given in the bug: > > > # objdump -r -d /lib64/libc.so.6 | grep -C 20 _int_free | grep -C 10 > cmpxchg | head -21 | grep -A 3 cmpxchg | tail -1 | (grep '%r' && echo "Your > libc is likely buggy." || echo "Your libc looks OK.") > >7cc36:48 85 c9 test %rcx,%rcx > Your libc is likely buggy. > > > Could you check if the above command on your setup gives the same output > which says "Your libc is likely buggy." > > Thanks to Nithya, Krutika and Pranith for working on this. > > - Original Message - > > From: "Fredrik Widlund" <fredrik.widl...@gmail.com> > > To: glus...@deej.net > > Cc: gluster-users@gluster.org > > Sent: Tuesday, February 23, 2016 5:51:37 PM > > Subject: Re: [Gluster-users] glusterfs client crashes > > > > Hi, > > > > I have experienced what looks like a very similar crash. Gluster 3.7.6 on > > CentOS 7. No errors on the bricks or on other at the time mounted > clients. > > Relatively high load at the time. > > > > Remounting the filesystem brought it back online. > > > > > > pending frames: > > frame : type(1) op(READ) > > frame : type(1) op(READ) > > frame : type(1) op(READ) > > frame : type(1) op(READ) > > frame : type(1) op(READ) > > frame : type(1) op(READ) > > frame : type(1) op(READ) > > frame : type(1) op(READ) > > frame : type(1) op(READ) > > frame : type(1) op(READ) > > frame : type(1) op(READ) > > frame : type(1) op(READ) > > frame : type(1) op(READ) > > frame : type(1) op(READ) > > frame : type(1) op(READ) > > frame : type(1) op(READ) > > frame : type(1) op(READ) > > frame : type(1) op(READ) > > frame : type(1) op(READ) > > frame : type(1) op(READ) > > frame : type(1) op(READ) > > frame : type(1) op(STAT) > > frame : type(1) op(STAT) > > frame : type(1) op(READ) > > frame : type(1) op(READ) > > frame : type(1) op(READ) > > frame : type(1) op(READ) > > frame : type(1) op(READ) > > frame : type(1) op(READ) > > frame : type(1) op(READ) > > frame : type(0) op(0) > > patchset: git:// git.gluster.com/glusterfs.git > > signal received: 6 > > time of crash: > > 2016-02-22 10:28:45 > > configuration details: > > argp 1 > > backtrace 1 > > dlfcn 1 > > libpthread 1 > > llistxattr 1 > > setfsid 1 > > spinlock 1 > > epoll.h 1 > > xattr.h 1 > > st_atim.tv_nsec 1 > > package-string: glusterfs 3.7.6 > > /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7f83387f7012] > > /lib64/libglusterfs.so.0(gf_print_trace+0x31d)[0x7f83388134dd] > > /lib64/libc.so.6(+0x35670)[0x7f8336ee5670] > > /lib64/libc.so.6(gsignal+0x37)[0x7f8336ee55f7] > > /lib64/libc.so.6(abort+0x148)[0x7f8336ee6ce8] > > /lib64/libc.so.6(+0x75317)[0x7f8336f25317] > > /lib64/libc.so.6(+0x7cfe1)[0x7f8336f2cfe1] > > /lib64/libglusterfs.so.0(loc_wipe+0x27)[0x7f83387f4d47] > > > /usr/lib64/glusterfs/3.7.6/xlator/performance/md-cache.so(mdc_local_wipe+0x11)[0x7f8329c8e5f1] > > > /usr/lib64/glusterfs/3.7.6/xlator/performance/md-cache.so(mdc_stat_cbk+0x10c)[0x7f8329c8f4fc] > > /lib64/libglusterfs.so.0(default_stat_cbk+0xac)[0x7f83387fcc5c] > > > /usr/lib64/glusterfs/3.7.6/xlator/cluster/distribute.so(dht_file_attr_cbk+0x149)[0x7f832ab2a409] > > > /usr/lib64/glusterfs/3.7.6/xlator/protocol/client.so(client3_3_stat_cbk+0x3c6)[0x7f832ad6d266] > > /lib64/libgf
Re: [Gluster-users] glusterfs client crashes
Came across a glibc bug which could've caused some corruptions. On googling about possible problems, we found that there is an issue (https://bugzilla.redhat.com/show_bug.cgi?id=1305406) fixed in glibc-2.17-121.el7. From the bug we found the following test-script to determine if the glibc is buggy. And on running it, we ran it on the local setup using the following method given in the bug: # objdump -r -d /lib64/libc.so.6 | grep -C 20 _int_free | grep -C 10 cmpxchg | head -21 | grep -A 3 cmpxchg | tail -1 | (grep '%r' && echo "Your libc is likely buggy." || echo "Your libc looks OK.") 7cc36:48 85 c9 test %rcx,%rcx Your libc is likely buggy. Could you check if the above command on your setup gives the same output which says "Your libc is likely buggy." Thanks to Nithya, Krutika and Pranith for working on this. - Original Message - > From: "Fredrik Widlund" <fredrik.widl...@gmail.com> > To: glus...@deej.net > Cc: gluster-users@gluster.org > Sent: Tuesday, February 23, 2016 5:51:37 PM > Subject: Re: [Gluster-users] glusterfs client crashes > > Hi, > > I have experienced what looks like a very similar crash. Gluster 3.7.6 on > CentOS 7. No errors on the bricks or on other at the time mounted clients. > Relatively high load at the time. > > Remounting the filesystem brought it back online. > > > pending frames: > frame : type(1) op(READ) > frame : type(1) op(READ) > frame : type(1) op(READ) > frame : type(1) op(READ) > frame : type(1) op(READ) > frame : type(1) op(READ) > frame : type(1) op(READ) > frame : type(1) op(READ) > frame : type(1) op(READ) > frame : type(1) op(READ) > frame : type(1) op(READ) > frame : type(1) op(READ) > frame : type(1) op(READ) > frame : type(1) op(READ) > frame : type(1) op(READ) > frame : type(1) op(READ) > frame : type(1) op(READ) > frame : type(1) op(READ) > frame : type(1) op(READ) > frame : type(1) op(READ) > frame : type(1) op(READ) > frame : type(1) op(STAT) > frame : type(1) op(STAT) > frame : type(1) op(READ) > frame : type(1) op(READ) > frame : type(1) op(READ) > frame : type(1) op(READ) > frame : type(1) op(READ) > frame : type(1) op(READ) > frame : type(1) op(READ) > frame : type(0) op(0) > patchset: git:// git.gluster.com/glusterfs.git > signal received: 6 > time of crash: > 2016-02-22 10:28:45 > configuration details: > argp 1 > backtrace 1 > dlfcn 1 > libpthread 1 > llistxattr 1 > setfsid 1 > spinlock 1 > epoll.h 1 > xattr.h 1 > st_atim.tv_nsec 1 > package-string: glusterfs 3.7.6 > /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7f83387f7012] > /lib64/libglusterfs.so.0(gf_print_trace+0x31d)[0x7f83388134dd] > /lib64/libc.so.6(+0x35670)[0x7f8336ee5670] > /lib64/libc.so.6(gsignal+0x37)[0x7f8336ee55f7] > /lib64/libc.so.6(abort+0x148)[0x7f8336ee6ce8] > /lib64/libc.so.6(+0x75317)[0x7f8336f25317] > /lib64/libc.so.6(+0x7cfe1)[0x7f8336f2cfe1] > /lib64/libglusterfs.so.0(loc_wipe+0x27)[0x7f83387f4d47] > /usr/lib64/glusterfs/3.7.6/xlator/performance/md-cache.so(mdc_local_wipe+0x11)[0x7f8329c8e5f1] > /usr/lib64/glusterfs/3.7.6/xlator/performance/md-cache.so(mdc_stat_cbk+0x10c)[0x7f8329c8f4fc] > /lib64/libglusterfs.so.0(default_stat_cbk+0xac)[0x7f83387fcc5c] > /usr/lib64/glusterfs/3.7.6/xlator/cluster/distribute.so(dht_file_attr_cbk+0x149)[0x7f832ab2a409] > /usr/lib64/glusterfs/3.7.6/xlator/protocol/client.so(client3_3_stat_cbk+0x3c6)[0x7f832ad6d266] > /lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x7f83385c5b80] > /lib64/libgfrpc.so.0(rpc_clnt_notify+0x1bf)[0x7f83385c5e3f] > /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f83385c1983] > /usr/lib64/glusterfs/3.7.6/rpc-transport/socket.so(+0x9506)[0x7f832d261506] > /usr/lib64/glusterfs/3.7.6/rpc-transport/socket.so(+0xc3f4)[0x7f832d2643f4] > /lib64/libglusterfs.so.0(+0x878ea)[0x7f83388588ea] > /lib64/libpthread.so.0(+0x7dc5)[0x7f833765fdc5] > /lib64/libc.so.6(clone+0x6d)[0x7f8336fa621d] > > > > Kind regards, > Fredrik Widlund > > On Tue, Feb 23, 2016 at 1:00 PM, < gluster-users-requ...@gluster.org > wrote: > > > Date: Mon, 22 Feb 2016 15:08:47 -0500 > From: Dj Merrill < glus...@deej.net > > To: Gaurav Garg < gg...@redhat.com > > Cc: gluster-users@gluster.org > Subject: Re: [Gluster-users] glusterfs client crashes > Message-ID: < 56cb6acf.5080...@deej.net > > Content-Type: text/plain; charset=utf-8; format=flowed > > On 2/21/2016 2:23 PM, Dj Merrill wrote: > > Very interesting. They were reporting both bricks offline, but the > > processes on both servers were still running. Restarting glusterfsd on > > one of the servers brough
Re: [Gluster-users] glusterfs client crashes
ccing md-chache team member for this issue. Thanks, ~Gaurav - Original Message - From: "Fredrik Widlund" <fredrik.widl...@gmail.com> To: glus...@deej.net Cc: gluster-users@gluster.org Sent: Tuesday, February 23, 2016 5:51:37 PM Subject: Re: [Gluster-users] glusterfs client crashes Hi, I have experienced what looks like a very similar crash. Gluster 3.7.6 on CentOS 7. No errors on the bricks or on other at the time mounted clients. Relatively high load at the time. Remounting the filesystem brought it back online. pending frames: frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(0) op(0) patchset: git:// git.gluster.com/glusterfs.git signal received: 6 time of crash: 2016-02-22 10:28:45 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.7.6 /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7f83387f7012] /lib64/libglusterfs.so.0(gf_print_trace+0x31d)[0x7f83388134dd] /lib64/libc.so.6(+0x35670)[0x7f8336ee5670] /lib64/libc.so.6(gsignal+0x37)[0x7f8336ee55f7] /lib64/libc.so.6(abort+0x148)[0x7f8336ee6ce8] /lib64/libc.so.6(+0x75317)[0x7f8336f25317] /lib64/libc.so.6(+0x7cfe1)[0x7f8336f2cfe1] /lib64/libglusterfs.so.0(loc_wipe+0x27)[0x7f83387f4d47] /usr/lib64/glusterfs/3.7.6/xlator/performance/md-cache.so(mdc_local_wipe+0x11)[0x7f8329c8e5f1] /usr/lib64/glusterfs/3.7.6/xlator/performance/md-cache.so(mdc_stat_cbk+0x10c)[0x7f8329c8f4fc] /lib64/libglusterfs.so.0(default_stat_cbk+0xac)[0x7f83387fcc5c] /usr/lib64/glusterfs/3.7.6/xlator/cluster/distribute.so(dht_file_attr_cbk+0x149)[0x7f832ab2a409] /usr/lib64/glusterfs/3.7.6/xlator/protocol/client.so(client3_3_stat_cbk+0x3c6)[0x7f832ad6d266] /lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x7f83385c5b80] /lib64/libgfrpc.so.0(rpc_clnt_notify+0x1bf)[0x7f83385c5e3f] /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f83385c1983] /usr/lib64/glusterfs/3.7.6/rpc-transport/socket.so(+0x9506)[0x7f832d261506] /usr/lib64/glusterfs/3.7.6/rpc-transport/socket.so(+0xc3f4)[0x7f832d2643f4] /lib64/libglusterfs.so.0(+0x878ea)[0x7f83388588ea] /lib64/libpthread.so.0(+0x7dc5)[0x7f833765fdc5] /lib64/libc.so.6(clone+0x6d)[0x7f8336fa621d] Kind regards, Fredrik Widlund On Tue, Feb 23, 2016 at 1:00 PM, < gluster-users-requ...@gluster.org > wrote: Date: Mon, 22 Feb 2016 15:08:47 -0500 From: Dj Merrill < glus...@deej.net > To: Gaurav Garg < gg...@redhat.com > Cc: gluster-users@gluster.org Subject: Re: [Gluster-users] glusterfs client crashes Message-ID: < 56cb6acf.5080...@deej.net > Content-Type: text/plain; charset=utf-8; format=flowed On 2/21/2016 2:23 PM, Dj Merrill wrote: > Very interesting. They were reporting both bricks offline, but the > processes on both servers were still running. Restarting glusterfsd on > one of the servers brought them both back online. I realize I wasn't clear in my comments yesterday and would like to elaborate on this a bit further. The "very interesting" comment was sparked because when we were running 3.7.6, the bricks were not reporting as offline when a client was having an issue, so this is new behaviour now that we are running 3.7.8 (or a different issue entirely). The other point that I was not clear on is that we may have one client reporting the "Transport endpoint is not connected" error, but the other 40+ clients all continue to work properly. This is the case with both 3.7.6 and 3.7.8. Curious, how can the other clients continue to work fine if both Gluster 3.7.8 servers are reporting the bricks as offline? What does "offline" mean in this context? Re: the server logs, here is what I've found so far listed on both gluster servers (glusterfs1 and glusterfs2): [2016-02-21 08:06:02.785788] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing [2016-02-21 18:48:20.677010] W [socket.c:588:__socket_rwv] 0-gv0-client-1: readv on (sanitized IP of glusterfs2):49152 failed (No data available) [2016-02-21 18:48:20.677096] I [MSGID: 114018] [client.c:2030:client_rpc_notify] 0-gv0-client-1: disconne
Re: [Gluster-users] glusterfs client crashes
Hi, I have experienced what looks like a very similar crash. Gluster 3.7.6 on CentOS 7. No errors on the bricks or on other at the time mounted clients. Relatively high load at the time. Remounting the filesystem brought it back online. pending frames: frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(STAT) frame : type(1) op(STAT) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 6 time of crash: 2016-02-22 10:28:45 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.7.6 /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7f83387f7012] /lib64/libglusterfs.so.0(gf_print_trace+0x31d)[0x7f83388134dd] /lib64/libc.so.6(+0x35670)[0x7f8336ee5670] /lib64/libc.so.6(gsignal+0x37)[0x7f8336ee55f7] /lib64/libc.so.6(abort+0x148)[0x7f8336ee6ce8] /lib64/libc.so.6(+0x75317)[0x7f8336f25317] /lib64/libc.so.6(+0x7cfe1)[0x7f8336f2cfe1] /lib64/libglusterfs.so.0(loc_wipe+0x27)[0x7f83387f4d47] /usr/lib64/glusterfs/3.7.6/xlator/performance/md-cache.so(mdc_local_wipe+0x11)[0x7f8329c8e5f1] /usr/lib64/glusterfs/3.7.6/xlator/performance/md-cache.so(mdc_stat_cbk+0x10c)[0x7f8329c8f4fc] /lib64/libglusterfs.so.0(default_stat_cbk+0xac)[0x7f83387fcc5c] /usr/lib64/glusterfs/3.7.6/xlator/cluster/distribute.so(dht_file_attr_cbk+0x149)[0x7f832ab2a409] /usr/lib64/glusterfs/3.7.6/xlator/protocol/client.so(client3_3_stat_cbk+0x3c6)[0x7f832ad6d266] /lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x7f83385c5b80] /lib64/libgfrpc.so.0(rpc_clnt_notify+0x1bf)[0x7f83385c5e3f] /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f83385c1983] /usr/lib64/glusterfs/3.7.6/rpc-transport/socket.so(+0x9506)[0x7f832d261506] /usr/lib64/glusterfs/3.7.6/rpc-transport/socket.so(+0xc3f4)[0x7f832d2643f4] /lib64/libglusterfs.so.0(+0x878ea)[0x7f83388588ea] /lib64/libpthread.so.0(+0x7dc5)[0x7f833765fdc5] /lib64/libc.so.6(clone+0x6d)[0x7f8336fa621d] Kind regards, Fredrik Widlund On Tue, Feb 23, 2016 at 1:00 PM, <gluster-users-requ...@gluster.org> wrote: > Date: Mon, 22 Feb 2016 15:08:47 -0500 > From: Dj Merrill <glus...@deej.net> > To: Gaurav Garg <gg...@redhat.com> > Cc: gluster-users@gluster.org > Subject: Re: [Gluster-users] glusterfs client crashes > Message-ID: <56cb6acf.5080...@deej.net> > Content-Type: text/plain; charset=utf-8; format=flowed > > On 2/21/2016 2:23 PM, Dj Merrill wrote: > > Very interesting. They were reporting both bricks offline, but the > > processes on both servers were still running. Restarting glusterfsd on > > one of the servers brought them both back online. > > I realize I wasn't clear in my comments yesterday and would like to > elaborate on this a bit further. The "very interesting" comment was > sparked because when we were running 3.7.6, the bricks were not > reporting as offline when a client was having an issue, so this is new > behaviour now that we are running 3.7.8 (or a different issue entirely). > > The other point that I was not clear on is that we may have one client > reporting the "Transport endpoint is not connected" error, but the other > 40+ clients all continue to work properly. This is the case with both > 3.7.6 and 3.7.8. > > Curious, how can the other clients continue to work fine if both Gluster > 3.7.8 servers are reporting the bricks as offline? > > What does "offline" mean in this context? > > > Re: the server logs, here is what I've found so far listed on both > gluster servers (glusterfs1 and glusterfs2): > > [2016-02-21 08:06:02.785788] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk] > 0-glusterfs: No change in volfile, continuing > [2016-02-21 18:48:20.677010] W [socket.c:588:__socket_rwv] > 0-gv0-client-1: readv on (sanitized IP of glusterfs2):49152 failed (No > data available) > [2016-02-21 18:48:20.677096] I [MSGID: 114018] > [client.c:2030:client_rpc_notify] 0-gv0-client-1: disconnected from > gv0-client-1. Client process will keep trying to connect to glusterd > until brick's port is available > [2016-02-21 18:48:31.148564] E [MSGID: 114058] > [client-handshake.c:1524:client_query_portmap_cbk] 0-gv
Re: [Gluster-users] glusterfs client crashes
On 2/21/2016 2:23 PM, Dj Merrill wrote: > Very interesting. They were reporting both bricks offline, but the > processes on both servers were still running. Restarting glusterfsd on > one of the servers brought them both back online. I realize I wasn't clear in my comments yesterday and would like to elaborate on this a bit further. The "very interesting" comment was sparked because when we were running 3.7.6, the bricks were not reporting as offline when a client was having an issue, so this is new behaviour now that we are running 3.7.8 (or a different issue entirely). The other point that I was not clear on is that we may have one client reporting the "Transport endpoint is not connected" error, but the other 40+ clients all continue to work properly. This is the case with both 3.7.6 and 3.7.8. Curious, how can the other clients continue to work fine if both Gluster 3.7.8 servers are reporting the bricks as offline? What does "offline" mean in this context? Re: the server logs, here is what I've found so far listed on both gluster servers (glusterfs1 and glusterfs2): [2016-02-21 08:06:02.785788] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing [2016-02-21 18:48:20.677010] W [socket.c:588:__socket_rwv] 0-gv0-client-1: readv on (sanitized IP of glusterfs2):49152 failed (No data available) [2016-02-21 18:48:20.677096] I [MSGID: 114018] [client.c:2030:client_rpc_notify] 0-gv0-client-1: disconnected from gv0-client-1. Client process will keep trying to connect to glusterd until brick's port is available [2016-02-21 18:48:31.148564] E [MSGID: 114058] [client-handshake.c:1524:client_query_portmap_cbk] 0-gv0-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2016-02-21 18:48:40.941715] W [socket.c:588:__socket_rwv] 0-glusterfs: readv on (sanitized IP of glusterfs2):24007 failed (No data available) [2016-02-21 18:48:51.184424] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing [2016-02-21 18:48:51.972068] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec] 0-mgmt: Volume file changed [2016-02-21 18:48:51.980210] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec] 0-mgmt: Volume file changed [2016-02-21 18:48:51.985211] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec] 0-mgmt: Volume file changed [2016-02-21 18:48:51.995002] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec] 0-mgmt: Volume file changed [2016-02-21 18:48:53.006079] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing [2016-02-21 18:48:53.018104] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing [2016-02-21 18:48:53.024060] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing [2016-02-21 18:48:53.035170] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing [2016-02-21 18:48:53.045637] I [rpc-clnt.c:1847:rpc_clnt_reconfig] 0-gv0-client-1: changing port to 49152 (from 0) [2016-02-21 18:48:53.051991] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-gv0-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2016-02-21 18:48:53.052439] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-gv0-client-1: Connected to gv0-client-1, attached to remote volume '/export/brick1/sdb1'. [2016-02-21 18:48:53.052486] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-gv0-client-1: Server and Client lk-version numbers are not same, reopening the fds [2016-02-21 18:48:53.052668] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-gv0-client-1: Server lk version = 1 [2016-02-21 18:48:31.148706] I [MSGID: 114018] [client.c:2030:client_rpc_notify] 0-gv0-client-1: disconnected from gv0-client-1. Client process will keep trying to connect to glusterd until brick's port is available [2016-02-21 18:49:12.271865] W [socket.c:588:__socket_rwv] 0-glusterfs: readv on (sanitized IP of glusterfs2):24007 failed (No data available) [2016-02-21 18:49:15.637745] W [socket.c:588:__socket_rwv] 0-gv0-client-1: readv on (sanitized IP of glusterfs2):49152 failed (No data available) [2016-02-21 18:49:15.637824] I [MSGID: 114018] [client.c:2030:client_rpc_notify] 0-gv0-client-1: disconnected from gv0-client-1. Client process will keep trying to connect to glusterd until brick's port is available [2016-02-21 18:49:24.198431] E [socket.c:2278:socket_connect_finish] 0-glusterfs: connection to (sanitized IP of glusterfs2):24007 failed (Connection refused) [2016-02-21 18:49:26.204811] E [socket.c:2278:socket_connect_finish] 0-gv0-client-1: connection to (sanitized IP of glusterfs2):24007 failed (Connection refused) [2016-02-21 18:49:38.366559] I [MSGID: 108031] [afr-common.c:1883:afr_local_discovery_cbk] 0-gv0-replicate-0: selecting local read_child gv0-client-0
Re: [Gluster-users] glusterfs client crashes
On 2/21/2016 1:27 PM, Gaurav Garg wrote: Its seems that your brick process are offline or all brick process have crashed. Could you paste output of #gluster volume status and #gluster volume info command and attach core file. Very interesting. They were reporting both bricks offline, but the processes on both servers were still running. Restarting glusterfsd on one of the servers brought them both back online. I am going to have to take a closer look at the logs on the servers. Even after bringing them back up, the client is still reporting "Transport endpoint is not connected". Is there anything other than a reboot that will change this state on the client? # gluster volume status Status of volume: gv0 Gluster process TCP Port RDMA Port Online Pid -- Brick glusterfs1:/export/brick1/sdb149152 0 Y 15073 Brick glusterfs2:/export/brick1/sdb149152 0 Y 14068 Self-heal Daemon on localhost N/A N/AY 14063 Self-heal Daemon on glusterfs1 N/A N/AY 7732 Task Status of Volume gv0 -- There are no active volume tasks Status of volume: gv1 Gluster process TCP Port RDMA Port Online Pid -- Brick glusterfs1:/export/brick2/sdb249154 0 Y 15089 Brick glusterfs2:/export/brick2/sdb249157 0 Y 14073 Self-heal Daemon on localhost N/A N/AY 14063 Self-heal Daemon on glusterfs1 N/A N/AY 7732 Task Status of Volume gv1 -- There are no active volume tasks # gluster volume info Volume Name: gv0 Type: Replicate Volume ID: 1d31ea3c-a240-49fe-a68d-4218ac051b6d Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: glusterfs1:/export/brick1/sdb1 Brick2: glusterfs2:/export/brick1/sdb1 Options Reconfigured: performance.cache-max-file-size: 750MB diagnostics.count-fop-hits: on diagnostics.latency-measurement: on features.quota-timeout: 30 features.quota: off performance.io-thread-count: 16 performance.write-behind-window-size: 1GB performance.cache-size: 1GB nfs.volume-access: read-only nfs.disable: on cluster.self-heal-daemon: enable Volume Name: gv1 Type: Replicate Volume ID: 7127b90b-e208-4aea-a920-4db195295d7a Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: glusterfs1:/export/brick2/sdb2 Brick2: glusterfs2:/export/brick2/sdb2 Options Reconfigured: performance.cache-size: 1GB performance.write-behind-window-size: 1GB nfs.disable: on nfs.volume-access: read-only performance.cache-max-file-size: 750MB cluster.self-heal-daemon: enable -Dj ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfs client crashes
Hi Dj, Its seems that your brick process are offline or all brick process have crashed. Could you paste output of #gluster volume status and #gluster volume info command and attach core file. ccing dht-team member. Thanks, ~Gaurav - Original Message - From: "Dj Merrill" <glus...@deej.net> To: gluster-users@gluster.org Sent: Sunday, February 21, 2016 10:37:02 PM Subject: [Gluster-users] glusterfs client crashes Several weeks ago we started seeing some weird behaviour on our Gluster client systems. Things would be working fine for several days, then the client could no longer access the Gluster filesystems, giving an error: ls: cannot access /mnt/hpc: Transport endpoint is not connected We were running version 3.7.6 and this version had been working fine for a few months until the above started happening. Thinking that it may be an OS or kernel update causing the issue, when 3.7.8 came out, we upgraded in hopes that the issue might be addressed, but we are still getting having the issue. All client machines are running Centos 7.2 with the latest updates, and the problem is happening on several machines. Not every Gluster client machine has had the problem, but enough different machines to make us think that this is more of a generic issue versus one that only affects specific types of machines (Both Intel and AMD CPUs, different system manufacturers, etc). The log file included below from /var/log/glusterfs seems to be showing a crash of the glusterfs process if I am interpreting it correctly. At the top you can see an entry made on the 17th, then no further entries until the crash today on the 21st. We would greatly appreciate any help in tracking down the cause and possible fix for this. The only way to temporarily "fix" the machines seems to be a reboot, which allows the machines to work properly for a few days before the issue happens again (random amount of days, no pattern). [2016-02-17 23:56:39.685754] I [MSGID: 109036] [dht-common.c:8043:dht_log_new_layout_for_dir_selfheal] 0-gv0-dht: Setting layout of /tmp/ktreraya/gms-scr/tmp/123277 with [Subvol_name: gv0-replicate-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], pending frames: frame : type(1) op(GETXATTR) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(1) op(GETXATTR) frame : type(1) op(GETXATTR) frame : type(0) op(0) frame : type(0) op(0) frame : type(1) op(GETXATTR) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(1) op(GETXATTR) frame : type(0) op(0) frame : type(1) op(GETXATTR) frame : type(0) op(0) frame : type(1) op(GETXATTR) frame : type(0) op(0) frame : type(1) op(GETXATTR) frame : type(0) op(0) frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 6 time of crash: 2016-02-21 08:10:40 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.7.8 /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7ff56ddcd042] /lib64/libglusterfs.so.0(gf_print_trace+0x31d)[0x7ff56dde950d] /lib64/libc.so.6(+0x35670)[0x7ff56c4bb670] /lib64/libc.so.6(gsignal+0x37)[0x7ff56c4bb5f7] /lib64/libc.so.6(abort+0x148)[0x7ff56c4bcce8] /lib64/libc.so.6(+0x75317)[0x7ff56c4fb317] /lib64/libc.so.6(+0x7d023)[0x7ff56c503023] /usr/lib64/glusterfs/3.7.8/xlator/protocol/client.so(client_local_wipe+0x39)[0x7ff5600a46b9] /usr/lib64/glusterfs/3.7.8/xlator/protocol/client.so(client3_3_getxattr_cbk+0x182)[0x7ff5600a7f62] /lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x7ff56db9ba20] /lib64/libgfrpc.so.0(rpc_clnt_notify+0x1bf)[0x7ff56db9bcdf] /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7ff56db97823] /usr/lib64/glusterfs/3.7.8/rpc-transport/socket.so(+0x6636)[0x7ff5627a8636] /usr/lib64/glusterfs/3.7.8/rpc-transport/socket.so(+0x9294)[0x7ff5627ab294] /lib64/libglusterfs.so.0(+0x878ea)[0x7ff56de2e8ea] /lib64/libpthread.so.0(+0x7dc5)[0x7ff56cc35dc5] /lib64/libc.so.6(clone+0x6d)[0x7ff56c57c28d] Thank you, -Dj ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfs client crashes
Hi Dj, Its seems that your brick process are offline or all brick process have crashed. Could you paste output of #gluster volume status and #gluster volume info command and attach core file. ccing dht-team member. Thanks, ~Gaurav - Original Message - From: "Dj Merrill" <glus...@deej.net> To: gluster-users@gluster.org Sent: Sunday, February 21, 2016 10:37:02 PM Subject: [Gluster-users] glusterfs client crashes Several weeks ago we started seeing some weird behaviour on our Gluster client systems. Things would be working fine for several days, then the client could no longer access the Gluster filesystems, giving an error: ls: cannot access /mnt/hpc: Transport endpoint is not connected We were running version 3.7.6 and this version had been working fine for a few months until the above started happening. Thinking that it may be an OS or kernel update causing the issue, when 3.7.8 came out, we upgraded in hopes that the issue might be addressed, but we are still getting having the issue. All client machines are running Centos 7.2 with the latest updates, and the problem is happening on several machines. Not every Gluster client machine has had the problem, but enough different machines to make us think that this is more of a generic issue versus one that only affects specific types of machines (Both Intel and AMD CPUs, different system manufacturers, etc). The log file included below from /var/log/glusterfs seems to be showing a crash of the glusterfs process if I am interpreting it correctly. At the top you can see an entry made on the 17th, then no further entries until the crash today on the 21st. We would greatly appreciate any help in tracking down the cause and possible fix for this. The only way to temporarily "fix" the machines seems to be a reboot, which allows the machines to work properly for a few days before the issue happens again (random amount of days, no pattern). [2016-02-17 23:56:39.685754] I [MSGID: 109036] [dht-common.c:8043:dht_log_new_layout_for_dir_selfheal] 0-gv0-dht: Setting layout of /tmp/ktreraya/gms-scr/tmp/123277 with [Subvol_name: gv0-replicate-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], pending frames: frame : type(1) op(GETXATTR) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(1) op(GETXATTR) frame : type(1) op(GETXATTR) frame : type(0) op(0) frame : type(0) op(0) frame : type(1) op(GETXATTR) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(1) op(GETXATTR) frame : type(0) op(0) frame : type(1) op(GETXATTR) frame : type(0) op(0) frame : type(1) op(GETXATTR) frame : type(0) op(0) frame : type(1) op(GETXATTR) frame : type(0) op(0) frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 6 time of crash: 2016-02-21 08:10:40 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.7.8 /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7ff56ddcd042] /lib64/libglusterfs.so.0(gf_print_trace+0x31d)[0x7ff56dde950d] /lib64/libc.so.6(+0x35670)[0x7ff56c4bb670] /lib64/libc.so.6(gsignal+0x37)[0x7ff56c4bb5f7] /lib64/libc.so.6(abort+0x148)[0x7ff56c4bcce8] /lib64/libc.so.6(+0x75317)[0x7ff56c4fb317] /lib64/libc.so.6(+0x7d023)[0x7ff56c503023] /usr/lib64/glusterfs/3.7.8/xlator/protocol/client.so(client_local_wipe+0x39)[0x7ff5600a46b9] /usr/lib64/glusterfs/3.7.8/xlator/protocol/client.so(client3_3_getxattr_cbk+0x182)[0x7ff5600a7f62] /lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x7ff56db9ba20] /lib64/libgfrpc.so.0(rpc_clnt_notify+0x1bf)[0x7ff56db9bcdf] /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7ff56db97823] /usr/lib64/glusterfs/3.7.8/rpc-transport/socket.so(+0x6636)[0x7ff5627a8636] /usr/lib64/glusterfs/3.7.8/rpc-transport/socket.so(+0x9294)[0x7ff5627ab294] /lib64/libglusterfs.so.0(+0x878ea)[0x7ff56de2e8ea] /lib64/libpthread.so.0(+0x7dc5)[0x7ff56cc35dc5] /lib64/libc.so.6(clone+0x6d)[0x7ff56c57c28d] Thank you, -Dj ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] glusterfs client crashes
Several weeks ago we started seeing some weird behaviour on our Gluster client systems. Things would be working fine for several days, then the client could no longer access the Gluster filesystems, giving an error: ls: cannot access /mnt/hpc: Transport endpoint is not connected We were running version 3.7.6 and this version had been working fine for a few months until the above started happening. Thinking that it may be an OS or kernel update causing the issue, when 3.7.8 came out, we upgraded in hopes that the issue might be addressed, but we are still getting having the issue. All client machines are running Centos 7.2 with the latest updates, and the problem is happening on several machines. Not every Gluster client machine has had the problem, but enough different machines to make us think that this is more of a generic issue versus one that only affects specific types of machines (Both Intel and AMD CPUs, different system manufacturers, etc). The log file included below from /var/log/glusterfs seems to be showing a crash of the glusterfs process if I am interpreting it correctly. At the top you can see an entry made on the 17th, then no further entries until the crash today on the 21st. We would greatly appreciate any help in tracking down the cause and possible fix for this. The only way to temporarily "fix" the machines seems to be a reboot, which allows the machines to work properly for a few days before the issue happens again (random amount of days, no pattern). [2016-02-17 23:56:39.685754] I [MSGID: 109036] [dht-common.c:8043:dht_log_new_layout_for_dir_selfheal] 0-gv0-dht: Setting layout of /tmp/ktreraya/gms-scr/tmp/123277 with [Subvol_name: gv0-replicate-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], pending frames: frame : type(1) op(GETXATTR) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(1) op(GETXATTR) frame : type(1) op(GETXATTR) frame : type(0) op(0) frame : type(0) op(0) frame : type(1) op(GETXATTR) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(1) op(GETXATTR) frame : type(0) op(0) frame : type(1) op(GETXATTR) frame : type(0) op(0) frame : type(1) op(GETXATTR) frame : type(0) op(0) frame : type(1) op(GETXATTR) frame : type(0) op(0) frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 6 time of crash: 2016-02-21 08:10:40 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.7.8 /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7ff56ddcd042] /lib64/libglusterfs.so.0(gf_print_trace+0x31d)[0x7ff56dde950d] /lib64/libc.so.6(+0x35670)[0x7ff56c4bb670] /lib64/libc.so.6(gsignal+0x37)[0x7ff56c4bb5f7] /lib64/libc.so.6(abort+0x148)[0x7ff56c4bcce8] /lib64/libc.so.6(+0x75317)[0x7ff56c4fb317] /lib64/libc.so.6(+0x7d023)[0x7ff56c503023] /usr/lib64/glusterfs/3.7.8/xlator/protocol/client.so(client_local_wipe+0x39)[0x7ff5600a46b9] /usr/lib64/glusterfs/3.7.8/xlator/protocol/client.so(client3_3_getxattr_cbk+0x182)[0x7ff5600a7f62] /lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x7ff56db9ba20] /lib64/libgfrpc.so.0(rpc_clnt_notify+0x1bf)[0x7ff56db9bcdf] /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7ff56db97823] /usr/lib64/glusterfs/3.7.8/rpc-transport/socket.so(+0x6636)[0x7ff5627a8636] /usr/lib64/glusterfs/3.7.8/rpc-transport/socket.so(+0x9294)[0x7ff5627ab294] /lib64/libglusterfs.so.0(+0x878ea)[0x7ff56de2e8ea] /lib64/libpthread.so.0(+0x7dc5)[0x7ff56cc35dc5] /lib64/libc.so.6(clone+0x6d)[0x7ff56c57c28d] Thank you, -Dj ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users