On Tue, Jul 29, 2025 at 4:57 AM Dilip Modi <[email protected]> wrote:
> Hello Community, > > We are currently load-testing a high-concurrency environment using > Guacamole and are seeking some advice on performance tuning. Our goal is to > support around 300 simultaneous RDP sessions on a single server. > > *Environment:* > > - *Server:* 16-core CPU, 128 GB RAM > - *Protocol:* RDP > - *Target Load:* 300 concurrent sessions > > *Problem Statement:* > > During our load tests, we've observed that as the number of RDP sessions > increases, the CPU load on the server becomes a significant bottleneck. > > - At *300 concurrent sessions*, the total CPU utilization reaches *85%* > . > - At the same time, RAM utilization is only at *20%*. > > Hello, Dilip, I seem to recall some guidance that Mike had produced once upon a time that stated that you would need about 1 core and 2 GB of RAM per 25 concurrent connections. Based on that guidance, your observations are not terribly far off: * 300 concurrent connections = 12 vCPUs, with is 75% of the processor. Guacamole does contain some optimization based on availability of resources to try to improve performance, so either this or some of the additional rendering done by 1.6.0 and the GFX Pipeline support could account for the additional 10% on top of that. * 24 GB of RAM would be 19% of the available RAM, so, again, right in the ballpark of the 20% of RAM you're seeing utilized. > > > This strongly suggests that we are compute-bound, not memory-bound. The > primary consumers of CPU appear to be the individual guacd processes. > Yes, I would expect this. The guacd code does very little with data - it's job is to process display data being sent by remote servers via the various supported protocols and translate that into the Guacamole protocol. This is much more processor-intensive than it is RAM-intensive, so this aligns with what I would expect. > *Our Investigation & Analysis:* > > Our investigation led us to the threading model within guacd, specifically > in guacamole-server/src/libguac/display.c. It appears that for each > connection, guacd spawns a pool of worker threads for encoding graphical > updates, with the number of threads being equal to the number of CPU cores > on the host (guac_display_nproc()). > > On our 16-core server, this leads to an explosion of threads: 300 > connections * 16 threads/connection = 4800 threads > > We believe this is causing severe thread contention and context-switching > overhead, leading to the high CPU usage we're observing. > This is good to know - we have had a couple of other reports on the mailing list about users running into excessive CPU utilization with the 1.6.0 code. > *Optimization planning:* > > To address this, planning to modify > *guacamole-server/src/libguac/display.c* to limit the number of worker > threads per connection to a small, fixed number, like so: > /* > > * For high-density servers, creating cpu_count threads per connection > > * process can lead to excessive context switching. We'll limit the > > * number of worker threads to a more conservative number. A value of > > * 1 or 2 is generally sufficient. > > */ > > *display->worker_thread_count = 2;* > > This change seems to be the most logical step to reduce the thread > thrashing, do you agree? > Sure, at the cost of reduced overall performance of the remote connections. But, if it resolves the issues you're seeing trying to support that large of a connection base, and doesn't cause a significant issue for your user base, then it should be okay. > *Our Questions for the Community:* > > 1. Is our analysis of the CPU bottleneck due to the default threading > model correct for a high-concurrency environment? > 2. Is the code modification shown above the recommended approach for > scaling guacd to hundreds of sessions? > 3. Are there other known best practices, configuration changes (either > in guacd or on the RDP server side, like color depth), or architectural > optimizations we should consider to achieve our target of 300+ sessions? > > I'm probably not the best person to answer this - Mike and a handful of other folks have much more detailed knowledge about the processing of the display updates and threading than I do, but if your testing shows that it works, and doesn't have any significant downsides, then you might be okay doing that. The one area that you will probably see degradation in performance limiting worker threads like that is in things like video playback over the remote (RDP) sessions - with a limited worker thread count, Guacamole may struggle to keep up with the display updates when trying to watch videos or do other graphics-intensive things. Let us know how it goes. -Nick
