Re: [squid-users] What is decent/good squid performance and architecture

Robert Borkowski Wed, 06 Jul 2005 06:39:28 -0700

Chris Robertson wrote:

-----Original Message-----
From: jos houtman [mailto:[EMAIL PROTECTED]
Sent: Saturday, July 02, 2005 3:08 PM
To: [email protected]
Subject: [squid-users] What is decent/good squid performance and
architecture
hello list,
Iam running a website and have setup 3 squidservers as reverse proxy'sto handle the images on the website.And before I try to tweak even more i am wondering what is consideredgood performance in requests/min.
some basic stats to get an idea:
- only images files are servers
- avarage size 40KB
- possible number of files somewhere between 10 and 15 million (andgrowing).
- the variaty of files thats accessed? ...
I got these stats from a squid servers thats running for 2/3 days now.
Internal Data Structures:
       2024476 StoreEntries
       146737 StoreEntries with MemObjects
       146721 Hot Object Cache Items
       2000067 on-disk objects
Is it safe to assume that the number of images actually accessed isabout 2million?
That is a fairly safe assumption (give or take a few thousand).  I love this 
list.  Some of the service requirements just make me gawk.  10-15 million 
images...
on our dual xeon with 4GB ram sata disk servers i can get about 250hits/seconds
on our dual xeon 8 GB scsi server i can get about 550 hits/seconds
are these decent numbers?


550 hits/second * 40KB average object size * 3 squids = 515 Mbps

Make sure you have enough upstream bandwidth before worrying aboutfurther performance. Even at 250 hits/second you'd be close tosaturating 100BaseT on each squid box (If that's what you're using).

i'am running aufs on the 8GB server, and diskd on the other servers.
does that contribute to the big difference or is it mainly the memoryand disk speed.



Given just the information above (and assuming that the OS and number of cache 
disks are the same between servers), I would guess that it's just a function of 
memory and disk speed (more objects cached in RAM, faster access to those not 
cached).

In any case, 
http://www.squid-cache.org/mail-archive/squid-users/200505/0974.html is an 
example of 700 hits per second.  No hardware specifics in the email.  There is 
a patch for squid to use epoll on linux that at least one person had a good 
experience with 
http://www.squid-cache.org/mail-archive/squid-users/200504/0422.html.

Here's an email from Kinkie (one of the Squid Devs if I'm not mistaken) describing 500 
hits/sec on a Pentium IV 3.2GHz w/2GB RAM as "not really too bad."  He also has 
a HowTo set up describing running multiple instances of Squid on a single box: 
http://squidwiki.kinkie.it/squidwiki/MultipleInstances.  If you are running out of CPU on 
one processor (Squid doesn't take full advantage of Multi-CPU installations), this might 
be something to look into.

I think that the variaty of files accessed by the clients is getting tobig (especially during peak hours) for the squid servers to cacheefficiently. And i am hoping that its possible to distribute the variatyover the squid servers. So that during normal operations eachs squidservers would only have to serve a third of the 2 million files.
Do you have some good idea's about how to achieve this?
Is there a way to have some kind of distribution based on the url?
Iam hoping this is possible without rewriting the webapplication
and so that a failure of 1 servers would go unnoticed for the public.



One method would be to set the cache servers up as cache-peers using the 
proxy-only option.  The message at 
http://www.squid-cache.org/mail-archive/squid-users/200506/0175.html is all 
about clustering squids for internet caching, but it does imply that ICP 
peering should work just fine up to 8 servers.  If you want to limit what each 
squid caches based on hierarchy, a combination of urlpath_regex acls and the 
no_cache directive are capable.  No promises on what that will do to 
performance.

For more explicit suggestions it would help to know how your caches are set up 
currently (separate IPs w/RR DNS?  Using a HW load balancer?  Software 
cluster?).

Another method would be CARP. I haven't used it myself, but it's used tosplit the load between peers based on URL. Basically a hash based loadbalancing algorithm.

If you have a load balancer with packet inspection capabilities you canalso direct traffic that way. On F5 BigIPs the facility is callediRules. I'm pretty sure NetScaler can do that too.


--
Robert Borkowski

Re: [squid-users] What is decent/good squid performance and architecture

Reply via email to