On Thu, Feb 28, 2008 at 9:52 PM, Mark Smallcombe <[EMAIL PROTECTED]> wrote:
> What tuning recommendations do you have for varnish to help it handle high > load? Funny you should ask, I've been spending a lot of time with Varnish in the lab. Here are a few observations I've made: (N.B. We're using 4-CPU Xeon hardware running RHEL 4.5, which runs the 2.6.9 Linux kernel. All machines have at least 4GB RAM and run the 64-bit Varnish build, but our results are equally applicable to 32-bit builds) - When the cache hit ratio is very high (i.e. 100%), we discovered that Varnish's default configuration of thread_pool_max is too high. When there are too many worker threads, Varnish spends an inordinate amount of time in system call space. We're not sure whether this is due to some flaw in Varnish, our ancient Linux kernel (we were unable to test with a modern 2.6.22 or later kernel that apparently has a better scheduler), or is just a fundamental problem when a threaded daemon like Varnish tries to service thousands of concurrent connections. After much tweaking we determined that, on our hardware, the optimal ratio of threads per CPU is about 16, or around 48-50 threads on a 4-CPU box. To eliminate dropping work requests, it is also advisable to raise overflow_max to a significantly higher ratio than the default (e.g. 10000%). This will cause Varnish to consume somewhat more RAM, but will provide outstanding performance. With these tweaks, we were able to get Varnish to serve 10,000 concurrent connections, flooding a Gigabit Ethernet channel with 5 KB cached objects. - Conversely, when the cache hit ratio is 0, the default of 100 threads is too low. (To create this scenario, we used 2 Varnish boxes: the front-end proxy was configured to "pass" all requests to an optimized backend Varnish instance that served all requests from cache.) On the same 4-CPU hardware, we found that the optimal thread_pool_max value in this situation is about 750. Again, we were able to serve 10,0000 concurrent connections after optimizing the settings. I find this interesting, because one would think that Varnish would be making the system spend much more time in the scheduler in the second scenario because it is doing significantly less work (no lookups, just handing off connections to the appropriate backend). I suspect that there may be some thread-scalability issues with the cache lookup process. If someone with a suitably powerful lab setup (i.e. Gigabit Ethernet, big hardware) can test with a more modern Linux kernel, I'd be very interested in the results. Feel free to contact me if you need assistance with setup/analysis. Finally: Varnish performance is absolutely atrocious on a 8-CPU RHEL 4.5 system -- so bad that I have to turn down thread_pool_max to 4 or restrict it to run only on 4 CPUs via taskset(1). I've heard that MySQL has similar problems, so I suspect that this is a Linux kernel issue. Best regards, --Michael _______________________________________________ varnish-misc mailing list [email protected] http://projects.linpro.no/mailman/listinfo/varnish-misc
