[Gluster-devel] glusterd stuck for glusterfs with version 3.12.15

2019-04-07 Thread Zhou, Cynthia (NSB - CN/Hangzhou)
Hi glusterfs experts,
Good day!
In my test env, sometimes glusterd stuck issue happened, and it is not 
responding to any gluster commands, when I checked this issue I find that 
glusterd thread 9 and thread 8 is dealing with the same socket, I thought 
following patch should be able to solve this issue, however after I merged this 
patch this issue still exist. When I looked into this code, it seems 
socket_event_poll_in called event_handled before rpc_transport_pollin_destroy, 
I think this gives the chance for another poll for the exactly the same socket. 
And caused this glusterd stuck issue, also, I find there is no   
LOCK_DESTROY(>lock)
In iobref_destroy, I think it is better to add destroy lock.
Following is the gdb info when this issue happened, I would like to know your 
opinion on this issue, thanks!

SHA-1: f747d55a7fd364e2b9a74fe40360ab3cb7b11537

* socket: fix issue on concurrent handle of a socket



GDB INFO:
Thread 8 is blocked on pthread_cond_wait, and thread 9 is blocked in 
iobref_unref, I think
Thread 9 (Thread 0x7f9edf7fe700 (LWP 1933)):
#0  0x7f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x7f9ee9fda657 in __lll_lock_elision () from /lib64/libpthread.so.0
#2  0x7f9eeb24cae6 in iobref_unref (iobref=0x7f9ed00063b0) at iobuf.c:944
#3  0x7f9eeafd2f29 in rpc_transport_pollin_destroy (pollin=0x7f9ed00452d0) 
at rpc-transport.c:123
#4  0x7f9ee4fbf319 in socket_event_poll_in (this=0x7f9ed0049cc0, 
notify_handled=_gf_true) at socket.c:2322
#5  0x7f9ee4fbf932 in socket_event_handler (fd=36, idx=27, gen=4, 
data=0x7f9ed0049cc0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2471
#6  0x7f9eeb2825d4 in event_dispatch_epoll_handler (event_pool=0x17feb00, 
event=0x7f9edf7fde84) at event-epoll.c:583
#7  0x7f9eeb2828ab in event_dispatch_epoll_worker (data=0x180d0c0) at 
event-epoll.c:659
#8  0x7f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
#9  0x7f9ee98a4eaf in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7f9ed700 (LWP 1932)):
#0  0x7f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x7f9ee9fd2b42 in __pthread_mutex_cond_lock () from 
/lib64/libpthread.so.0
#2  0x7f9ee9fd44a8 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#3  0x7f9ee4fbadab in socket_event_poll_err (this=0x7f9ed0049cc0, gen=4, 
idx=27) at socket.c:1201
#4  0x7f9ee4fbf99c in socket_event_handler (fd=36, idx=27, gen=4, 
data=0x7f9ed0049cc0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2480
#5  0x7f9eeb2825d4 in event_dispatch_epoll_handler (event_pool=0x17feb00, 
event=0x7f9edfffee84) at event-epoll.c:583
#6  0x7f9eeb2828ab in event_dispatch_epoll_worker (data=0x180cf20) at 
event-epoll.c:659
#7  0x7f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
#8  0x7f9ee98a4eaf in clone () from /lib64/libc.so.6

(gdb) thread 9
[Switching to thread 9 (Thread 0x7f9edf7fe700 (LWP 1933))]
#0  0x7f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x7f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x7f9ee9fda657 in __lll_lock_elision () from /lib64/libpthread.so.0
#2  0x7f9eeb24cae6 in iobref_unref (iobref=0x7f9ed00063b0) at iobuf.c:944
#3  0x7f9eeafd2f29 in rpc_transport_pollin_destroy (pollin=0x7f9ed00452d0) 
at rpc-transport.c:123
#4  0x7f9ee4fbf319 in socket_event_poll_in (this=0x7f9ed0049cc0, 
notify_handled=_gf_true) at socket.c:2322
#5  0x7f9ee4fbf932 in socket_event_handler (fd=36, idx=27, gen=4, 
data=0x7f9ed0049cc0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2471
#6  0x7f9eeb2825d4 in event_dispatch_epoll_handler (event_pool=0x17feb00, 
event=0x7f9edf7fde84) at event-epoll.c:583
#7  0x7f9eeb2828ab in event_dispatch_epoll_worker (data=0x180d0c0) at 
event-epoll.c:659
#8  0x7f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
#9  0x7f9ee98a4eaf in clone () from /lib64/libc.so.6
(gdb) frame 2
#2  0x7f9eeb24cae6 in iobref_unref (iobref=0x7f9ed00063b0) at iobuf.c:944
944 iobuf.c: No such file or directory.
(gdb) print *iobref
$1 = {lock = {spinlock = 2, mutex = {__data = {__lock = 2, __count = 222, 
__owner = -2120437760, __nusers = 1, __kind = 8960, __spins = 512,
__elision = 0, __list = {__prev = 0x4000, __next = 0x7f9ed00063b000}},
  __size = 
"\002\000\000\000\336\000\000\000\000\260\234\201\001\000\000\000\000#\000\000\000\002\000\000\000@\000\000\000\000\000\000\000\260c\000О\177",
 __align = 953482739714}}, ref = -256, iobrefs = 0x, alloced = 
-1, used = -1}
(gdb) quit
A debugging session is active.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Weekly Untriaged Bugs

2019-04-07 Thread jenkins
[...truncated 6 lines...]
https://bugzilla.redhat.com/1688226 / core: Brick Still Died After Restart 
Glusterd & Glusterfsd Services
https://bugzilla.redhat.com/1695416 / core: client log flooding with 
intentional socket shutdown message when a brick is down
https://bugzilla.redhat.com/1691833 / core: Client sends 128KByte network 
packet for 0 length file copy
https://bugzilla.redhat.com/1695480 / core: Global Thread Pool
https://bugzilla.redhat.com/1694943 / core: parallel-readdir slows down 
directory listing
https://bugzilla.redhat.com/1696721 / geo-replication: geo-replication failing 
after upgrade from 5.5 to 6.0
https://bugzilla.redhat.com/1694637 / geo-replication: Geo-rep: Rename to an 
existing file name destroys its content on slave
https://bugzilla.redhat.com/1689981 / geo-replication: OSError: [Errno 1] 
Operation not permitted - failing with socket files?
https://bugzilla.redhat.com/1694139 / glusterd: Error waiting for job 
'heketi-storage-copy-job' to complete on one-node k3s deployment.
https://bugzilla.redhat.com/1695099 / glusterd: The number of glusterfs 
processes keeps increasing, using all available resources
https://bugzilla.redhat.com/1690454 / posix-acl: mount-shared-storage.sh does 
not implement mount options
https://bugzilla.redhat.com/1696518 / project-infrastructure: builder203 does 
not have a valid hostname set
https://bugzilla.redhat.com/1691617 / project-infrastructure: clang-scan tests 
are failing nightly.
https://bugzilla.redhat.com/1691357 / project-infrastructure: core archive link 
from regression jobs throw not found error
https://bugzilla.redhat.com/1692349 / project-infrastructure: 
gluster-csi-containers job is failing
https://bugzilla.redhat.com/1693385 / project-infrastructure: request to change 
the version of fedora in fedora-smoke-job
https://bugzilla.redhat.com/1693295 / project-infrastructure: rpc.statd not 
started on builder204.aws.gluster.org
https://bugzilla.redhat.com/1691789 / project-infrastructure: rpc-statd service 
stops on AWS builders
https://bugzilla.redhat.com/1695484 / project-infrastructure: smoke fails with 
"Build root is locked by another process"
https://bugzilla.redhat.com/1693184 / replicate: A brick process(glusterfsd) 
died with 'memory violation'
https://bugzilla.redhat.com/1696075 / replicate: Client lookup is unable to 
heal missing directory GFID entry
https://bugzilla.redhat.com/1696633 / tests: GlusterFs v4.1.5 Tests from 
/tests/bugs/ module failing on Intel
https://bugzilla.redhat.com/1694976 / unclassified: On Fedora 29 GlusterFS 4.1 
repo has bad/missing rpm signs
[...truncated 2 lines...]

build.log
Description: Binary data
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel