Hi Nick, This is getting fixed after disabling disable-gfx to TRUE. We are using freerdp 2.11.0. What are the disadvantage if we disable GFX in RDP ?
-Dilip On Sun, Aug 17, 2025 at 3:03 PM Dilip Modi <dm...@zscaler.com> wrote: > Thanks Nick. > I could not able to fix this issue at the moment and need your help to fix > this. How we can avoid this crash during load condition and while cleanup > is happening? > > On Sat, Aug 16, 2025 at 1:47 AM Nick Couchman <vn...@apache.org> wrote: > >> On Mon, Aug 11, 2025 at 1:00 PM Dilip Modi <dm...@zscaler.com.invalid> >> wrote: >> >>> >>> Hello Guacamole Dev Team, >>> >>> I am writing to report a persistent crash issue we are experiencing >>> with guacd under load. We have been working to debug this for a while and >>> have applied several fixes that have improved stability, but we are still >>> seeing one final, intermittent crash. >>> >>> *Summary of the Issue* >>> >>> guacd crashes with a SIGABRT signal, originating >>> from __pthread_kill_implementation(), when handling a high volume of >>> concurrent RDP sessions (around 300). The crash occurs in a generic FreeRDP >>> worker thread, which strongly suggests heap corruption caused by a race >>> condition or memory bug elsewhere in the application. >>> >>> We are using 16 Core, 128 GB system. >>> >>> *Environment* >>> >>> - *Guacamole Server Version:* 1.6.0 >>> - *FreeRDP Version:* 2.11.0 >>> - *Operating System:* RHEL 9 on x86_64 >>> - *Build:* Custom build using GCC 12. >>> >>> *Latest Crash Backtrace* >>> >>> Here is the backtrace from the most recent crash. The crash location has >>> moved from the RDP disconnect logic to a generic worker thread after our >>> previous fixes. >>> >>> >>> >>> Program terminated with signal SIGABRT, Aborted. >>> >>> #0 0x00007f67e988bedc in __pthread_kill_implementation () from >>> /usr/lib64/libc.so.6 >>> >>> [Current thread is 1 (Thread 0x7f646c598640 (LWP 1496945))] >>> >>> >>> === bt === >>> >>> >>> #0 0x00007f67e988bedc in __pthread_kill_implementation () from >>> /usr/lib64/libc.so.6 >>> >>> #1 0x00007f67e983eb46 in raise () from /usr/lib64/libc.so.6 >>> >>> #2 0x00007f67e9828833 in abort () from /usr/lib64/libc.so.6 >>> >>> #3 0x00007f67e9829172 in __libc_message.cold () from >>> /usr/lib64/libc.so.6 >>> >>> #4 0x00007f67e9895f87 in malloc_printerr () from /usr/lib64/libc.so.6 >>> >>> #5 0x00007f67e9897c70 in _int_free () from /usr/lib64/libc.so.6 >>> >>> #6 0x00007f67e989a2c5 in free () from /usr/lib64/libc.so.6 >>> >>> #7 0x00007f67e0465507 in BufferPool_Clear () from >>> /opt/zscaler/lib64/libwinpr2.so.2 >>> >>> #8 0x00007f67e04656f6 in BufferPool_Free () from >>> /opt/zscaler/lib64/libwinpr2.so.2 >>> >>> #9 0x00007f67e06bf71f in rfx_context_free () from >>> /opt/zscaler/lib64/libfreerdp2.so.2 >>> >>> #10 0x00007f67e0640003 in codecs_free () from >>> /opt/zscaler/lib64/libfreerdp2.so.2 >>> >>> #11 0x00007f67e0648c3d in rdp_client_disconnect () from >>> /opt/zscaler/lib64/libfreerdp2.so.2 >>> >>> #12 0x00007f67e0639207 in freerdp_disconnect () from >>> /opt/zscaler/lib64/libfreerdp2.so.2 >>> >>> #13 0x00007f67e07be54e in guac_rdp_handle_connection >>> (client=0x7f67d4005870) at rdp.c:676 >>> >>> #14 guac_rdp_client_thread (data=0x7f67d4005870) at rdp.c:944 >>> >>> #15 0x00007f67e988a19a in start_thread () from /usr/lib64/libc.so.6 >>> >>> #16 0x00007f67e990f210 in clone3 () from /usr/lib64/libc.so.6 >>> >>> >>> >>> *Analysis and Troubleshooting Steps Taken* >>> >>> Our investigation points towards a memory corruption issue, likely a >>> race condition exposed by the high rate of connection setup and teardown. >>> The logs around the time of the crash show many "Handshake failed, >>> 'connect' instruction was not received" errors, indicating this high churn. >>> >>> We have progressively identified and fixed several bugs: >>> >>> 1. *Incorrect Cleanup Order:* Initially, we found >>> that freerdp_disconnect() was called before gdi_free(), which we >>> corrected. >>> >>> >>> *Request for Help* >>> >>> We would greatly appreciate it if the community could review our >>> analysis and the suspected root cause. >>> >>> - Does the analysis of the race condition in print-job.c seem >>> correct? >>> - Are there any other known issues or areas of the code we should >>> investigate that could cause this type of heap corruption under heavy >>> load? >>> >>> We are happy to provide more detailed logs, code snippets, or run >>> further tests as needed. >>> >>> Thank you for your time and assistance. >>> >>> >>> >> It'd be great if you could submit a Jira ticket for this (you'll need to >> request a Jira account, first, which you can do at the main Jira page), and >> then create a pull request to fix this. >> >> https://issues.apache.org/jira/browse/GUACAMOLE >> https://guacamole.apache.org/open-source/ >> >> -Nick >> > -- This communication (including any attachments) is intended for the sole use of the intended recipient and may contain confidential, non-public, and/or privileged material. Use, distribution, or reproduction of this communication by unintended recipients is not authorized. If you received this communication in error, please immediately notify the sender and then delete all copies of this communication from your system.