Hello  Community,

We are currently load-testing a high-concurrency environment using
Guacamole and are seeking some advice on performance tuning. Our goal is to
support around 300 simultaneous RDP sessions on a single server.

*Environment:*

   - *Server:* 16-core CPU, 128 GB RAM
   - *Protocol:* RDP
   - *Target Load:* 300 concurrent sessions

*Problem Statement:*

During our load tests, we've observed that as the number of RDP sessions
increases, the CPU load on the server becomes a significant bottleneck.

   - At *300 concurrent sessions*, the total CPU utilization reaches *85%*.
   - At the same time, RAM utilization is only at *20%*.

This strongly suggests that we are compute-bound, not memory-bound. The
primary consumers of CPU appear to be the individual guacd processes.

*Our Investigation & Analysis:*

Our investigation led us to the threading model within guacd, specifically
in guacamole-server/src/libguac/display.c. It appears that for each
connection, guacd spawns a pool of worker threads for encoding graphical
updates, with the number of threads being equal to the number of CPU cores
on the host (guac_display_nproc()).

On our 16-core server, this leads to an explosion of threads: 300
connections * 16 threads/connection = 4800 threads

We believe this is causing severe thread contention and context-switching
overhead, leading to the high CPU usage we're observing.

*Optimization planning:*

To address this, planning to modify *guacamole-server/src/libguac/display.c* to
limit the number of worker threads per connection to a small, fixed number,
like so:
/*

     * For high-density servers, creating cpu_count threads per connection

     * process can lead to excessive context switching. We'll limit the

     * number of worker threads to a more conservative number. A value of

     * 1 or 2 is generally sufficient.

     */

    *display->worker_thread_count = 2;*

This change seems to be the most logical step to reduce the thread
thrashing, do you agree?

*Our Questions for the Community:*

   1. Is our analysis of the CPU bottleneck due to the default threading
   model correct for a high-concurrency environment?
   2. Is the code modification shown above the recommended approach for
   scaling guacd to hundreds of sessions?
   3. Are there other known best practices, configuration changes (either
   in guacd or on the RDP server side, like color depth), or architectural
   optimizations we should consider to achieve our target of 300+ sessions?

We appreciate any insights or guidance you can provide. Thank you for your
time and for developing such a great tool.

Best regards,

-Dilip

-- 


This communication (including any attachments) is intended for the sole 
use of the intended recipient and may contain confidential, non-public, 
and/or privileged material. Use, distribution, or reproduction of this 
communication by unintended recipients is not authorized. If you received 
this communication in error, please immediately notify the sender and then 
delete all copies of this communication from your system.

Reply via email to