One suggestion would be to make *write *non blocking. Any other suggestions?


On Sat, Jul 25, 2020 at 4:43 AM Amarjeet Singh <[email protected]> wrote:

> Hi Team,
>
>
>  More analysis on this :-
>
> There are threads which are in deadlock:
>
> THREAD *25377 *is waiting for the mutex lock whereas THREAD *25376  *is
> stuck in a *write *system call Because of which there are connections
> which are in CLOSE_WAIT.
> guacd is not able to free the resources as well.
>
> (gdb) info threads
>   Id   Target Id         Frame
>   7    Thread 0x7fb3431ce700 (LWP 25374) "guacd" 0x00007fb7ad8fcf57 in
> pthread_join () from /lib64/libpthread.so.0
> * 6    Thread 0x7fb441bcb700 (LWP 25376) "guacd" 0x00007fb7ad9026ad in write
> () from /lib64/libpthread.so.0
>   5    Thread 0x7fb4423cc700 (LWP 25377) "guacd" 0x00007fb7ad90242d in
> __lll_lock_wait () from /lib64/libpthread.so.0
>   4    Thread 0x7fb3439cf700 (LWP 25395) "guacd" 0x00007fb7ac1ed7a3 in
> select () from /lib64/libc.so.6
>   3    Thread 0x7fb3441d0700 (LWP 25396) "guacd" 0x00007fb7ac1ed7a3 in
> select () from /lib64/libc.so.6
>   2    Thread 0x7fb3449d1700 (LWP 25397) "guacd" 0x00007fb7ac1ed7a3 in
> select () from /lib64/libc.so.6
>   1    Thread 0x7fb3429cd700 (LWP 23724) "guacd" 0x00007fb7ad902b5d in
> recvmsg () from /lib64/libpthread.so.0
> (gdb) thr 5
> [Switching to thread 5 (Thread 0x7fb4423cc700 (LWP 25377))]
> #0  0x00007fb7ad90242d in __lll_lock_wait () from /lib64/libpthread.so.0
> (gdb) bt
> #0  0x00007fb7ad90242d in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x00007fb7ad8fddcb in _L_lock_812 () from /lib64/libpthread.so.0
> #2  0x00007fb7ad8fdc98 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x00007fb7ae4c5345 in guac_socket_fd_write_handler () from
> /lib64/libguac.so.17
> #4  0x00007fb7ae4c4733 in __guac_socket_write () from /lib64/libguac.so.17
> #5  0x00007fb7ae4c4770 in guac_socket_write () from /lib64/libguac.so.17
> #6  0x00007fb7ae4c4a9a in guac_socket_write_string () from
> /lib64/libguac.so.17
> #7  0x00007fb7ae4c2365 in guac_protocol_send_error () from
> /lib64/libguac.so.17
> #8  0x00007fb7ae4c63cf in vguac_user_abort () from /lib64/libguac.so.17
> #9  0x00007fb7ae4c6495 in guac_user_abort () from /lib64/libguac.so.17
> #10 0x00007fb7ae4c7aa8 in guac_user_input_thread () from
> /lib64/libguac.so.17
> #11 0x00007fb7ad8fbe25 in start_thread () from /lib64/libpthread.so.0
> #12 0x00007fb7ac1f634d in clone () from /lib64/libc.so.6
>
> *MUTEX IS OWNED BY 25376*
>
>> 2  0x00007fb7ad8fdc98 in pthread_mutex_lock () from /lib64/libpthread.so.0
>> (gdb) info reg
>> rax            0xfffffffffffffe00       -512
>> rbx            0x0      0
>> rcx            0xffffffffffffffff       -1
>> rdx            0x0      0
>> rsi            0x0      0
>> rdi            0x7fb7a001dc30   140426640219184
>> rbp            0x7fb4423cba00   0x7fb4423cba00
>> rsp            0x7fb4423cb9c8   0x7fb4423cb9c8
>> r8             0x7fb7a001dc30   140426640219184
>> r9             0x141d54 1318228
>> r10            0x2      2
>> r11            0x202    514
>> r12            0x0      0
>> r13            0x7fb4423cc9c0   140412182120896
>> r14            0x7fb4423cc700   140412182120192
>> r15            0x2a     42
>> rip            0x7fb7ad8fdc98   0x7fb7ad8fdc98 <pthread_mutex_lock+104>
>> eflags         0x202    [ IF ]
>> cs             0x33     51
>> ss             0x2b     43
>> ds             0x0      0
>> es             0x0      0
>> fs             0x0      0
>> gs             0x0      0
>> (gdb) print *((int*)(0x7fb7a001dc30)+2)
>> $6 = *25376*
>
>
> *STRACE of the THREAD is as follows : -*
>
>  strace -p 25376
>> Process 25376 attached
>> write(4, "4.sync,10.1318124283;", 21
>
>
>
> Can I file a bug in JIRA ?
>
> Any suggestions how to fix the above ?
>
> *NOTE *: This happens intermittently.
>
> Thanks and Regards,
> Amarjeet Singh
>
>
>
> On Fri, Jul 17, 2020 at 8:43 AM Amarjeet Singh <[email protected]>
> wrote:
>
>> Hi Team,
>>
>> *GUACD *is consuming 100% of RAM. On analysis I have found that there
>> are many process which are not in any state [ CLOSE_WAIT, ESTABLISHED etc ]
>> but they are in
>> recvmsg  waiting for the fd.  This process is there for more than 2 days.
>> Below is the backtrace of the process.
>>
>> Reading symbols from /usr/lib64/freerdp/disp.so...Reading symbols from
>>> /usr/lib64/freerdp/disp.so...(no debugging symbols found)...done.
>>> (no debugging symbols found)...done.
>>> Loaded symbols for /usr/lib64/freerdp/disp.so
>>> 0x00007fa764807b5d in recvmsg () from /lib64/libpthread.so.0
>>> Missing separate debuginfos, use: debuginfo-install
>>> accops-server-8.0.0-2.x86_64
>>> (gdb) bt
>>> #0  0x00007fa764807b5d in recvmsg () from /lib64/libpthread.so.0
>>> #1  0x0000000000404a64 in guacd_recv_fd ()
>>> #2  0x0000000000404ed9 in guacd_exec_proc ()
>>> #3  0x0000000000405297 in guacd_create_proc ()
>>> #4  0x000000000040399f in guacd_route_connection ()
>>> #5  0x0000000000403ba7 in guacd_connection_thread ()
>>> #6  0x00007fa764800e25 in start_thread () from /lib64/libpthread.so.0
>>> #7  0x00007fa7630fb34d in clone () from /lib64/libc.so.6
>>
>>
>> Please help me to understand what is going wrong here ? This is not
>> happening for every connections. Is there any way  we can fix this ?
>> There are many connections which are in CLOSE_WAIT ( parent process id )
>> also. They are there for many days.
>>
>> Amarjeet Singh
>>
>

Reply via email to