Re: Varnish restarts when all memory is allocated

2009-05-29 Thread Marco Walraven
On Tue, May 26, 2009 at 11:29:08PM +0200, Marco Walraven wrote:
 Hi,
 
 We are testing a Varnish Cache in our production environment with a 500Gb 
 storage file and
 32Gb of RAM. Varnish performance is excellent when all of the 32Gb is not 
 allocated yet.
 The rates I am seeing here are around 40-60Mbit/s, with roughly 2.2M objects 
 in cache and
 hitting a ratio of ~0.65, even then Varnish can handle it easily. However it 
 is still
 warming up since we have a lot of objects that need to be cached.
 
 The problem I am facing is that as soon as RAM is exhausted Varnish restarts 
 itself.

In the meantime I have been doing some tests tweaking the VM system under 
Linux, especially
vm.min_free_kbytes, leaving some memory for pdflush and kswapd. The results are 
slightly better, 
but still varnishd starts to hog the CPU's and restarts. Alternatively we 
disabled swap, running 
without it. But also tested with a swap file of 16Gb on a different disk. Again 
slightly better 
results but still the same effect in the end.

We also ran Varnish without the file storage type having just 8Gb assigned with 
malloc, this ran 
longer than the other tests we did. Varnishd did not crash but got extremely 
high CPU usages 700% 
and recoverd from that after a minute or 2.

The linux system run with the following sysctl config applied:

Linux varnish001 2.6.18-6-amd64 #1 SMP Tue May 5 08:01:28 UTC 2009 x86_64 
GNU/Linux

/etc/systctl.conf

net.ipv4.ip_local_port_range = 1024 65536
net.core.rmem_max=16777216
net.core.wmem_max=16777216
net.ipv4.tcp_rmem=4096 87380 16777216
net.ipv4.tcp_wmem=4096 65536 16777216
net.ipv4.tcp_fin_timeout = 3
net.ipv4.tcp_tw_recycle = 1
net.core.netdev_max_backlog = 3
net.ipv4.tcp_no_metrics_save=1
net.core.somaxconn = 262144
net.ipv4.tcp_syncookies = 0
net.ipv4.tcp_max_orphans = 262144
net.ipv4.tcp_max_syn_backlog = 262144
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_syn_retries = 2
vm.swappiness = 0
vm.min_free_kbytes = 4194304
vm.dirty_background_ratio = 25
vm.dirty_expire_centisecs = 1000
vm.dirty_writeback_centisecs = 100


So, yesterday I installed FreeBSD 7.2 STABLE with the lastest CVSup on the 
second Varnish box and
ran Varnish with the exact same config as on the Linux box. Exact same setup, 
16 Gb swap file, same
arguments for varnishd, same vcl, same amount of traffic, connections etc etc. 
I did apply 
perfmance tuning as described on the wiki. Both systems ran Ok until the moment 
there was little 
RAM left. Linux showed the exact same behaviour as before, high CPU load, 
varnishd with high amounts 
of CPU usages and in the end it varnishd restarted with an ampty cache. FreeBSD 
kept on going as I 
expected it to work; however with a higher load but still serving images at 
60Mbit/s. I did see that
it sometimes needed to recover. Meaning accepting no connections for a few 
moments and then starting
to go on again, but enough to notice. 

Both systems run varnishd as followed, I changed the amount of buckets to 
450011 as opposed to the previous tests I ran. Same for the lru_interval which 
was 60 and maybe too low.

/usr/sbin/varnishd -P /var/run/varnishd.pid -a :80 -f /etc/varnish/default.vcl 
-T 127.0.0.1:6082 -t 3600 -w 400,4000,60 -s file,500G -p obj_workspace 8192 -p 
sess_workspace 262144 -p lru_interval 600 -h classic,450011 -p sess_timeout 2 
-p listen_depth 8192 -p log_hashstring off -p shm_workspace 32768 -p 
ping_interval 10 -p srcaddr_ttl 0 -p esi_syntax 1

Below some output of Linux when it started to hog the CPU and output of the 
FreeBSD system 15 minutes
later when it was still going. 

So is this kind of setup actually possible ? And if so how to get it running 
smoothly ? So far FreeBSD
comes pretty close but not yet there. 

Thanks for the help,

Marco



Linux:

Hitrate ratio:   10  100 1000
Hitrate avg: 0.6914   0.6843   0.6656

5739 0.00 0.60 Client connections accepted
 5197130 0.00   542.72 Client requests received
 2929389 0.00   305.91 Cache hits
   0 0.00 0.00 Cache hits for pass
 2267089 0.00   236.75 Cache misses
 2267116 0.00   236.75 Backend connections success
   0 0.00 0.00 Backend connections failures
 2246281 0.00   234.57 Backend connections reuses
 2246300 1.00   234.58 Backend connections recycles
 120  ..   N struct sess_mem
  57  ..   N struct sess
 2224133  ..   N struct object
 930  ..   N struct objecthead
 4448208  ..   N struct smf
   0  ..   N small free smf
   0  ..   N large free smf
  33  ..   N struct vbe_conn
 114  ..   N struct bereq
 400  ..   N worker threads
 400  

Re: Varnish restarts when all memory is allocated

2009-05-29 Thread Michael S. Fischer
I think the lesson of these cases is pretty clear:  make your  
cacheable working set fits into the proxy server's available memory --  
or, if you want to exceed your available memory, make sure your hit  
ratio is sufficiently high that the cache server rarely resorts to  
paging in the data.  Otherwise, your cache server will suffer I/O  
starvation due to excessive paging.

This is a general working principle of cache configuration, and not  
specific to Varnish.  After all, the whole point of a cache is to  
serve cacheable objects quickly.  Disk is not fast, and Varnish  
doesn't make slow disks faster.

This goal will become much easier if you can put a layer 7 switch in  
front of a pool of Varnish servers and route HTTP requests to them  
based on some attribute of the request (e.g., the URI and/or Cookie:  
header, thereby ensuring efficient use of your cache pool's memory.

Best regards,

--Michael

On May 29, 2009, at 3:42 AM, Marco Walraven wrote:

 On Tue, May 26, 2009 at 11:29:08PM +0200, Marco Walraven wrote:
 Hi,

 We are testing a Varnish Cache in our production environment with a  
 500Gb storage file and
 32Gb of RAM. Varnish performance is excellent when all of the 32Gb  
 is not allocated yet.
 The rates I am seeing here are around 40-60Mbit/s, with roughly  
 2.2M objects in cache and
 hitting a ratio of ~0.65, even then Varnish can handle it easily.  
 However it is still
 warming up since we have a lot of objects that need to be cached.

 The problem I am facing is that as soon as RAM is exhausted Varnish  
 restarts itself.

 In the meantime I have been doing some tests tweaking the VM system  
 under Linux, especially
 vm.min_free_kbytes, leaving some memory for pdflush and kswapd. The  
 results are slightly better,
 but still varnishd starts to hog the CPU's and restarts.  
 Alternatively we disabled swap, running
 without it. But also tested with a swap file of 16Gb on a different  
 disk. Again slightly better
 results but still the same effect in the end.

 We also ran Varnish without the file storage type having just 8Gb  
 assigned with malloc, this ran
 longer than the other tests we did. Varnishd did not crash but got  
 extremely high CPU usages 700%
 and recoverd from that after a minute or 2.

 The linux system run with the following sysctl config applied:

 Linux varnish001 2.6.18-6-amd64 #1 SMP Tue May 5 08:01:28 UTC 2009  
 x86_64 GNU/Linux

 /etc/systctl.conf

 net.ipv4.ip_local_port_range = 1024 65536
 net.core.rmem_max=16777216
 net.core.wmem_max=16777216
 net.ipv4.tcp_rmem=4096 87380 16777216
 net.ipv4.tcp_wmem=4096 65536 16777216
 net.ipv4.tcp_fin_timeout = 3
 net.ipv4.tcp_tw_recycle = 1
 net.core.netdev_max_backlog = 3
 net.ipv4.tcp_no_metrics_save=1
 net.core.somaxconn = 262144
 net.ipv4.tcp_syncookies = 0
 net.ipv4.tcp_max_orphans = 262144
 net.ipv4.tcp_max_syn_backlog = 262144
 net.ipv4.tcp_synack_retries = 2
 net.ipv4.tcp_syn_retries = 2
 vm.swappiness = 0
 vm.min_free_kbytes = 4194304
 vm.dirty_background_ratio = 25
 vm.dirty_expire_centisecs = 1000
 vm.dirty_writeback_centisecs = 100


 So, yesterday I installed FreeBSD 7.2 STABLE with the lastest CVSup  
 on the second Varnish box and
 ran Varnish with the exact same config as on the Linux box. Exact  
 same setup, 16 Gb swap file, same
 arguments for varnishd, same vcl, same amount of traffic,  
 connections etc etc. I did apply
 perfmance tuning as described on the wiki. Both systems ran Ok until  
 the moment there was little
 RAM left. Linux showed the exact same behaviour as before, high CPU  
 load, varnishd with high amounts
 of CPU usages and in the end it varnishd restarted with an ampty  
 cache. FreeBSD kept on going as I
 expected it to work; however with a higher load but still serving  
 images at 60Mbit/s. I did see that
 it sometimes needed to recover. Meaning accepting no connections for  
 a few moments and then starting
 to go on again, but enough to notice.

 Both systems run varnishd as followed, I changed the amount of  
 buckets to 450011 as opposed to the previous tests I ran. Same for  
 the lru_interval which was 60 and maybe too low.

 /usr/sbin/varnishd -P /var/run/varnishd.pid -a :80 -f /etc/varnish/ 
 default.vcl -T 127.0.0.1:6082 -t 3600 -w 400,4000,60 -s file,500G -p  
 obj_workspace 8192 -p sess_workspace 262144 -p lru_interval 600 -h  
 classic,450011 -p sess_timeout 2 -p listen_depth 8192 -p  
 log_hashstring off -p shm_workspace 32768 -p ping_interval 10 -p  
 srcaddr_ttl 0 -p esi_syntax 1

 Below some output of Linux when it started to hog the CPU and output  
 of the FreeBSD system 15 minutes
 later when it was still going.

 So is this kind of setup actually possible ? And if so how to get it  
 running smoothly ? So far FreeBSD
 comes pretty close but not yet there.

 Thanks for the help,

 Marco



 Linux:

 Hitrate ratio:   10  100 1000
 Hitrate avg: 0.6914   0.6843   0.6656

5739 0.00 0.60 Client connections 

Re: Varnish restarts when all memory is allocated

2009-05-27 Thread Marco Walraven
On Wed, May 27, 2009 at 10:31:30AM +0200, Kristian Lyngstol wrote:
 Can you post the arguments you use to start varnish?

Sure; Varnish runs currently as follows:

 5117 ?Ss 0:00 /usr/sbin/varnishd -P /var/run/varnishd.pid -a :80 
-f /etc/varnish/default.vcl -T 127.0.0.1:6082 -t 3600 -w 400,4000,60 -s 
file,/data/varnish/mp-varnish001/varnish_storage.bin,500G -p obj_workspace 4096 
-p sess_workspace 262144 -p lru_interval 60 -h classic,4550111 -p listen_depth 
8192 -p log_hashstring off -p sess_timeout 10 -p shm_workspace 32768 -p 
ping_interval 1 -p thread_pools 4 -p thread_pool_min 100 -p thread_pool_max 
4000 -p srcaddr_ttl 0 -p esi_syntax 1
 5118 ?Sl 0:01 /usr/sbin/varnishd -P /var/run/varnishd.pid -a :80 
-f /etc/varnish/default.vcl -T 127.0.0.1:6082 -t 3600 -w 400,4000,60 -s 
file,/data/varnish/mp-varnish001/varnish_storage.bin,500G -p obj_workspace 4096 
-p sess_workspace 262144 -p lru_interval 60 -h classic,4550111 -p listen_depth 
8192 -p log_hashstring off -p sess_timeout 10 -p shm_workspace 32768 -p 
ping_interval 1 -p thread_pools 4 -p thread_pool_min 100 -p thread_pool_max 
4000 -p srcaddr_ttl 0 -p esi_syntax 1

I have to note that I have been running with a lru_interval of 3600 which had 
the same effect e.g. restarting when it hits the memory limit. 

Marco


-- 
 Terantula - Industrial Strength Open Source
 phone:+31 64 3232 400 / www: http://www.terantula.com / pgpkey: E7EE7A46
 pgp fingerprint: F2EE 122D 964C DE68 7380 6F95 3710 7719 E7EE 7A46 
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Varnish restarts when all memory is allocated

2009-05-26 Thread Marco Walraven
Hi,

We are testing a Varnish Cache in our production environment with a 500Gb 
storage file and
32Gb of RAM. Varnish performance is excellent when all of the 32Gb is not 
allocated yet.
The rates I am seeing here are around 40-60Mbit/s, with roughly 2.2M objects in 
cache and
hitting a ratio of ~0.65, even then Varnish can handle it easily. However it is 
still
warming up since we have a lot of objects that need to be cached.

The problem I am facing is that as soon as RAM is exhausted Varnish restarts 
itself.
Since this looked like an IO problem, we dropped ext2 in favour of xfs with much
better results on writing to disk. However varnishd still stops working after 
it get
to the 32G RAM limit. Note that I don't see any IO until just before it hits 
the 97% of
RAM usage.  

So we thought to combine the file storage type with malloc and limit the amount 
of
memory Varnish is allowed to allocate, first to 5G and see how that would work 
out. 
It turned out that it did not get limited and it seems from reading some posts  
   
this is not needed..

I have seen some posts on running large caches with the same kind but not a real
approach to a solution. What is the best way to get around this issue ?

Below are the init script and output of both varnisstat and top.

Hitrate ratio:333
Hitrate avg: 0.6008   0.6008   0.6008

   10871 1.00 1.17 Client connections accepted
 5278218   273.99   566.76 Client requests received
 2864011   172.99   307.53 Cache hits
 2413896   101.00   259.20 Cache misses
 2413920   101.00   259.20 Backend connections success
 239174999.00   256.82 Backend connections reuses
 239179599.00   256.82 Backend connections recycles
 148  ..   N struct sess_mem
  29  ..   N struct sess
 2366595  ..   N struct object
 2364206  ..   N struct objecthead
 4733079  ..   N struct smf
   0  ..   N small free smf
   1  ..   N large free smf
  10  ..   N struct vbe_conn
  96  ..   N struct bereq
 400  ..   N worker threads
 400 0.00 0.04 N worker threads created
   2  ..   N backends
   47353  ..   N expired objects
 2090535  ..   N LRU moved objects
 5086915   265.99   546.22 Objects sent with write
   10867 1.00 1.17 Total Sessions
 5278227   273.99   566.76 Total Requests
  12 0.00 0.00 Total pipe
  13 0.00 0.00 Total pass
 2413900   101.00   259.20 Total fetch
  1865669893 97172.71200329.64 Total header bytes
 22763257823   1297006.09   2444245.44 Total body bytes
3335 0.00 0.36 Session Closed
 5275957   273.99   566.52 Session herd
   292178030 14367.51 31373.14 SHM records
 7758036   382.99   833.03 SHM writes
6264 2.00 0.67 SHM flushes due to overflow
 239 0.00 0.03 SHM MTX contention
 125 0.00 0.01 SHM cycles through buffer
 4828098   201.99   518.43 allocator requests
 4733078  ..   outstanding allocations
 30790995968  ..   bytes allocated
506079916032  ..   bytes free
 303 0.00 0.03 SMS allocator requests
  130986  ..   SMS bytes allocated
  130986  ..   SMS bytes freed
 2413909   101.00   259.20 Backend requests made
   1 0.00 0.00 N vcl total
   1 0.00 0.00 N vcl available
   1  ..   N total active purges
   1 0.00 0.00 N new purges added

top - 15:13:40 up 7 days, 33 min,  2 users,  load average: 0.14, 0.71, 0.75
Tasks: 116 total,   1 running, 115 sleeping,   0 stopped,   0 zombie
Cpu0  :  3.0%us,  1.0%sy,  0.0%ni, 93.0%id,  1.7%wa,  0.0%hi,  1.3%si,  0.0%st
Cpu1  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu4  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu5  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu6  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu7  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  32942712k total, 32777060k used,   165652k free, 2164k buffers
Swap:   506008k total,25664k