On Jan 24, 2010, at 7:23 AM, Angelo Höngens wrote:
>> What is thread_pool_max set to?  Have you tried lowering it?   We have
>> found that on systems with very high cache-hit ratios, 16 threads per
>> CPU is the sweet spot to avoid context-switch saturation.
> 
> [ang...@nmt-nlb-03 ~]$ varnishadm -T localhost:81 param.show| grep
> thread_pool
> 
> thread_pool_add_delay      20 [milliseconds]
> thread_pool_add_threshold  2 [requests]
> thread_pool_fail_delay     200 [milliseconds]
> thread_pool_max            500 [threads]
> thread_pool_min            5 [threads]
> thread_pool_purge_delay    1000 [milliseconds]
> thread_pool_timeout        300 [seconds]
> thread_pools               2 [pools]
> 
> Thread_pool_max is set to 500 threads.. But I just increased it to 4000
> (as per http://varnish.projects.linpro.no/wiki/Performance), as 'top'
> shows me it's using around 480~490 threads now..
> 
> You suggest lowering it, what would be the effect of that? I would think
> it would run out of threads or something? Well, we'll see what happens
> with the increased threads..


Increasing concurrency is unlikely to solve the problem, although setting the 
number of thread pools to the number of CPUs is probably a good idea.

Assuming a high hit ratio and high CPU utilization (you haven't posted either), 
lowering concurrency (i.e. reducing thread_pool_max) can help reduce CPU 
contention incurred by context switching.  

If maximum concurrency is reached, incoming connections will be deferred to the 
TCP listen(2) backlog (the overflowed_requests counter in varnishstat increases 
when this happens).   When the request reaches the head of the queue, it will 
then be picked up by a processing thread.  The net effect is some additional 
latency, but probably not as much as you're experiencing if your CPU is swamped 
with context switches.

There are a few cases where increasing thread_pool_max can help, in particular, 
where you have a high cache-miss ratio and you have slow origin servers.  But 
if CPU is already high, it will only make the problem worse.

BTW, on FreeBSD you can view the current length of the listen(2) backlog via 
"netstat -aL"  By default, varnishd's listen(2) backlog is 512; as long as you 
don't see the length hit that value you should be ok.

--Michael

_______________________________________________
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc

Reply via email to