On Tue, Jul 29, 2025 at 4:57 AM Dilip Modi <[email protected]>
wrote:

> Hello  Community,
>
> We are currently load-testing a high-concurrency environment using
> Guacamole and are seeking some advice on performance tuning. Our goal is to
> support around 300 simultaneous RDP sessions on a single server.
>
> *Environment:*
>
>    - *Server:* 16-core CPU, 128 GB RAM
>    - *Protocol:* RDP
>    - *Target Load:* 300 concurrent sessions
>
> *Problem Statement:*
>
> During our load tests, we've observed that as the number of RDP sessions
> increases, the CPU load on the server becomes a significant bottleneck.
>
>    - At *300 concurrent sessions*, the total CPU utilization reaches *85%*
>    .
>    - At the same time, RAM utilization is only at *20%*.
>
> Hello, Dilip,

I seem to recall some guidance that Mike had produced once upon a time that
stated that you would need about 1 core and 2 GB of RAM per 25 concurrent
connections. Based on that guidance, your observations are not terribly far
off:
* 300 concurrent connections = 12 vCPUs, with is 75% of the processor.
Guacamole does contain some optimization based on availability of resources
to try to improve performance, so either this or some of the additional
rendering done by 1.6.0 and the GFX Pipeline support could account for the
additional 10% on top of that.
* 24 GB of RAM would be 19% of the available RAM, so, again, right in the
ballpark of the 20% of RAM you're seeing utilized.

>
>
> This strongly suggests that we are compute-bound, not memory-bound. The
> primary consumers of CPU appear to be the individual guacd processes.
>

Yes, I would expect this. The guacd code does very little with data - it's
job is to process display data being sent by remote servers via the various
supported protocols and translate that into the Guacamole protocol. This is
much more processor-intensive than it is RAM-intensive, so this aligns with
what I would expect.


> *Our Investigation & Analysis:*
>
> Our investigation led us to the threading model within guacd, specifically
> in guacamole-server/src/libguac/display.c. It appears that for each
> connection, guacd spawns a pool of worker threads for encoding graphical
> updates, with the number of threads being equal to the number of CPU cores
> on the host (guac_display_nproc()).
>
> On our 16-core server, this leads to an explosion of threads: 300
> connections * 16 threads/connection = 4800 threads
>
> We believe this is causing severe thread contention and context-switching
> overhead, leading to the high CPU usage we're observing.
>

This is good to know - we have had a couple of other reports on the mailing
list about users running into excessive CPU utilization with the 1.6.0 code.


> *Optimization planning:*
>
> To address this, planning to modify
> *guacamole-server/src/libguac/display.c* to limit the number of worker
> threads per connection to a small, fixed number, like so:
> /*
>
>      * For high-density servers, creating cpu_count threads per connection
>
>      * process can lead to excessive context switching. We'll limit the
>
>      * number of worker threads to a more conservative number. A value of
>
>      * 1 or 2 is generally sufficient.
>
>      */
>
>     *display->worker_thread_count = 2;*
>
> This change seems to be the most logical step to reduce the thread
> thrashing, do you agree?
>

Sure, at the cost of reduced overall performance of the remote connections.
But, if it resolves the issues you're seeing trying to support that large
of a connection base, and doesn't cause a significant issue for your user
base, then it should be okay.


> *Our Questions for the Community:*
>
>    1. Is our analysis of the CPU bottleneck due to the default threading
>    model correct for a high-concurrency environment?
>    2. Is the code modification shown above the recommended approach for
>    scaling guacd to hundreds of sessions?
>    3. Are there other known best practices, configuration changes (either
>    in guacd or on the RDP server side, like color depth), or architectural
>    optimizations we should consider to achieve our target of 300+ sessions?
>
>
I'm probably not the best person to answer this - Mike and a handful of
other folks have much more detailed knowledge about the processing of the
display updates and threading than I do, but if your testing shows that it
works, and doesn't have any significant downsides, then you might be okay
doing that.

The one area that you will probably see degradation in performance limiting
worker threads like that is in things like video playback over the remote
(RDP) sessions - with a limited worker thread count, Guacamole may struggle
to keep up with the display updates when trying to watch videos or do other
graphics-intensive things.

Let us know how it goes.

-Nick

Reply via email to