Re: Strategies for splitting load across varnish instances? And avoiding single-point-of-failure?
In message ff646d15-26b5-4843-877f-fb8d469d2...@slide.com, Ken Brownfield wri tes: It is important to be absolutely clear about what your objective is here, availability, cache-hit-ratio or raw performance, the best solution will depend on what you are after. For a lot of purposes, you will get a lot of mileage out of a number of parallel Varnish machines with DNS round-robin, for all practical purposes, a zero-cost solution. In the other end, you have a load-balancer in front of your varnishes, which gives you all sorts of neat features at a pretty steep cost. The spectrum between is filled with things like pound, haproxy and other open-source solution, which may, or may not, run on their own hardware. There is no perfect fit for all solutions in this space, you will need to make your own choice. Squid has a peering feature; [...] Squids peering feature was created for hit-rate only, the working scenario is two squids each behind a very slow line to the internet, asking each other before they pull down a file. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 p...@freebsd.org | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: Strategies for splitting load across varnish instances? And avoiding single-point-of-failure?
In message 4c3149fb1001151733g73f7a5dfjc84342b9df7f0...@mail.gmail.com, pub c rawler writes: Varnish performs very well. Extending this to have a cluster functionality within Varnish I think just makes sense. You can do some clever stuff with the hash director to distribute the content over a cluster of varnishes: varnish1 has: backend webserver {...} backend varnish2 {...} backend varnish3 {...} acl partner { varnish1 ip varnish2 ip varnish3 ip } director h1 hash { { .backend webserver; .weight 1; } { .backend varnish2; .weight 1; } { .backend varnish3; .weight 1; } } sub vcl_fetch { if (beresp.http.x-partner == yes) { set beresp.ttl = 0s; unset beresp.http.x-partner; } } sub vcl_deliver { if (client.ip ~ partner) { set resp.http.x-partner = yes; } } On varnish2 you change the h1 director to read: director h1 hash { { .backend varnish1; .weight 1; } { .backend webserver; .weight 1; } { .backend varnish3; .weight 1; } } On varnish3: director h1 hash { { .backend varnish1; .weight 1; } { .backend varnish2; .weight 1; } { .backend webserver; .weight 1; } } -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 p...@freebsd.org | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: Strategies for splitting load across varnish instances? And avoiding single-point-of-failure?
On Sat, Jan 16, 2010 at 1:54 AM, Bendik Heltne bhel...@gmail.com wrote: I must say that i am a bit confused. I don't understand the need of routing requests to different varnish servers based on hash algorithm. So I am wondering what kind of sites are we talking about? We're talking about sites that have a hot working set much larger than the amount of RAM you can fit in a single Varnish instance (i.e., 32-64GB). Our Varnish servers have ~ 120.000 - 150.000 objects cached in ~ 4GB memory and the backends have a much easier life than before Varnish. We are about to upgrade RAM on the Varnish boxes, and eventually we can switch to disk cache if needed. If you receive more than 100 requests/sec per Varnish instance and you use a disk cache, you will die. --Michael ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: Strategies for splitting load across varnish instances? And avoiding single-point-of-failure?
On Sat, Jan 16, 2010 at 1:59 AM, Poul-Henning Kamp p...@phk.freebsd.dkwrote: director h1 hash { { .backend webserver; .weight 1; } { .backend varnish2; .weight 1; } { .backend varnish3; .weight 1; } What happens when varnish2 or varnish3 dies? --Michael ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: Strategies for splitting load across varnish instances? And avoiding single-point-of-failure?
In message 4c3149fb1001160738l2233481dn82c34c2ba1fcc...@mail.gmail.com, pub c rawler writes: Poul, is anyone running the hash director distribution method like you provided (in production)? No idea... -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 p...@freebsd.org | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: Strategies for splitting load across varnish instances? And avoiding single-point-of-failure?
In message d002c4031001160741q63dd5a50i6342116daba15...@mail.gmail.com, Micha el Fischer writes: On Sat, Jan 16, 2010 at 1:59 AM, Poul-Henning Kamp p...@phk.freebsd.dkwrote: director h1 hash { { .backend webserver; .weight 1; } { .backend varnish2; .weight 1; } { .backend varnish3; .weight 1; } What happens when varnish2 or varnish3 dies? If a particular backend in the director is unhealthy, the requests for it will be redistributed by rehashing over the healthy subset of directors. Once it becomes healthy, normality will be restored. So everything should work out fine, for some value around 99.9% of fine. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 p...@freebsd.org | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: Strategies for splitting load across varnish instances? And avoiding single-point-of-failure?
In message d002c4031001160929p1f688fc9mcc927dda2c684...@mail.gmail.com, Micha el Fischer writes: For instance sizes larger than 2, I think a consistent hash is needed. Otherwise, the overall hit ratio will fall dramatically upon failure of an instance as the requests are rerouted. If you have perfect 1/3 splitting between 3 varnishes, having one die will do bad things to your hitrate until the remaining two distribute the load between them. That's a matter of math, and has nothing to do with the hash algorithm. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 p...@freebsd.org | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: Strategies for splitting load across varnish instances? And avoiding single-point-of-failure?
On Sat, Jan 16, 2010 at 10:44 AM, Poul-Henning Kamp p...@phk.freebsd.dkwrote: In message d002c4031001160929p1f688fc9mcc927dda2c684...@mail.gmail.com, Micha el Fischer writes: For instance sizes larger than 2, I think a consistent hash is needed. Otherwise, the overall hit ratio will fall dramatically upon failure of an instance as the requests are rerouted. If you have perfect 1/3 splitting between 3 varnishes, having one die will do bad things to your hitrate until the remaining two distribute the load between them. That's a matter of math, and has nothing to do with the hash algorithm. Let me put it this way and leave the math up to you: it will be way worse if you don't use a consistent hash. --Michael ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: Strategies for splitting load across varnish instances? And avoiding single-point-of-failure?
On Sat, Jan 16, 2010 at 10:44 AM, Poul-Henning Kamp p...@phk.freebsd.dk wrote: In message d002c4031001160929p1f688fc9mcc927dda2c684...@mail.gmail.com, Micha el Fischer writes: For instance sizes larger than 2, I think a consistent hash is needed. Otherwise, the overall hit ratio will fall dramatically upon failure of an instance as the requests are rerouted. If you have perfect 1/3 splitting between 3 varnishes, having one die will do bad things to your hitrate until the remaining two distribute the load between them. That's a matter of math, and has nothing to do with the hash algorithm. Right, but those 2 remaining are at least still being asked for the same url's they were prior to the 1 dying. They're just now responsible for the dead varnish's urls in addition to their own working set. This is much better than the entire url space being hashed against 2 buckets. ...or is my understanding of consistent hashing flawed? -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 p...@freebsd.org | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: Strategies for splitting load across varnish instances? And avoiding single-point-of-failure?
In message dcccdf791001161258s3e960aa8t3cd379e42d760...@mail.gmail.com, David Birdsong writes: Right, but those 2 remaining are at least still being asked for the same url's they were prior to the 1 dying. Correct, the hashing is canonical in the sense that if the configured backend is up, all traffic for its objects will be sent to it. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 p...@freebsd.org | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: Strategies for splitting load across varnish instances? And avoiding single-point-of-failure?
On Sat, Jan 16, 2010 at 1:19 PM, Poul-Henning Kamp p...@phk.freebsd.dkwrote: In message dcccdf791001161258s3e960aa8t3cd379e42d760...@mail.gmail.com, David Birdsong writes: Right, but those 2 remaining are at least still being asked for the same url's they were prior to the 1 dying. Correct, the hashing is canonical in the sense that if the configured backend is up, all traffic for its objects will be sent to it. Are you saying that the default hash is not a mod-n-type algorithm? If not, what happens when the failed backend is restored to service? --Michael ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: Strategies for splitting load across varnish instances? And avoiding single-point-of-failure?
On Sat, Jan 16, 2010 at 1:37 PM, Poul-Henning Kamp p...@phk.freebsd.dkwrote: Are you saying that the default hash is not a mod-n-type algorithm? Well, it is mod-n, with the footnote that n has nothing to do with the number of backends, because these have a configurable weight. If not, what happens when the failed backend is restored to service? It's probably simplest to paraphrase the code: Calculate hash over full complement of backends. Is the selected backend sick Calculate hash over subset of healthy backends Ah, ok. That should behave reasonably in the event of a backend failure if you're implementing Varnish tiers. Thanks for the clarification. --Michael ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: Strategies for splitting load across varnish instances? And avoiding single-point-of-failure?
Thanks again Poul for all you do. How does Varnish handle the hashing and locating of data where a backend returns to the pool? Wouldn't the hashing be wrong for prior loaded items since a machine has returned and the pool widens? Just trying to figure out the implications of this because in our environment we regularly find ourselves pulling servers offline. Wondering if the return of a Varnish would operate like a cold-cache miss or what magic in Varnish deals with the change in hashing per se. It's probably simplest to paraphrase the code: Calculate hash over full complement of backends. Is the selected backend sick Calculate hash over subset of healthy backends ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: Strategies for splitting load across varnish instances? And avoiding single-point-of-failure?
In message 4c3149fb1001161400n38a1ef1al18985bc3ad1ad...@mail.gmail.com, pub c rawler writes: Just trying to figure out the implications of this because in our environment we regularly find ourselves pulling servers offline. Wondering if the return of a Varnish would operate like a cold-cache miss or what magic in Varnish deals with the change in hashing per se. There is no built-in magic for that[1]. One of the really powerful things Varnish can do, is chance VCL code on-the-fly, instantly. So it is possible to start your Varnish with one VCL program, and have a small script change to another one some minutes later. You can use that, to start with a VCL where it only uses its neighbors as backends, and then some minutes later when the cache has the most common objects loaded, switch to another VCL that goes directly to the backend. If you want to get fancy, you can use VCL restarts, to ask the neighbors and if they don't have it, go directly to the backend on restart. Poul-Henning [1] In general Varnish has no built in magic, all the magic is your responsibility to write in the VCL code :-) -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 p...@freebsd.org | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
push button lru nuking
I'm trying to hack my way around a push-button like lru nuking like feature. The short description of how I'm doing it follows, I'll explain why farther down. I have a job that watches sm_bfree / (sm_bfree + sm_balloc). Once storage file utilization is past some percentage(yet to be determined) I connect to upstream load balancers and slowly drain traffic away from varnish. Once traffic is off and I can beat the hell out of that box, it's time to free up some space. In the past this has been done with restarts. Upon restarts, the cache hit ratio is destroyed, but the box can keep up and rebuild the cache in a stable way. What I'd like to do is dump everything in the storage files that have a very low obj.hits. Lru nuking on the surface seems like the best thing to initiate, but it usually only kicks in pretty late and puts the machine into a state that is unstable while serving. While not serving, I don't know how to kick it off, furthermore I want it to run hard and free up lots more space than it usually does. ie. cache file is ~200GB, I'd like it to run until sm_free is like 50GB. My idea is to load balance as I've described above. Pull 50GB of trash files through the cache + enough to kick off lru, purge the trash files, monitor sm_bfree and once it's high enough instruct the upstream load balancers to start sending traffic gently for a warm up period. Rinse and repeat into infinity replacing the ssd storage drives as they fail. Is this crazy? Am I uninformed on a better way? Also, I've had to keep making my trash files smaller and smaller. I started with a 10 and 1G files which crashed varnish immediately, then reduced to 500MB files and successfully pulled 200 through - then crashed both my python interpreter (libcurl) and varnish: varnishd[2664]: Child (14772) Panic message: Assert error in STV_alloc(), stevedore.c line 183:#012 Condition((st) != NULL) not true.#012thread = (cache-worker)#012Backtrace:#012 0x421f95: pan_ic+85#012 0x4369e5: STV_alloc+125#012 0x41a1b6: FetchBody+496#012 0x4114dd: cnt_fetch+63d#012 0x412a3d: CNT_Session+35d#012 0x424273: wrk_do_cnt_sess+93#012 0x42362e: wrk_thread_real+26e#012 0x7f2cf51b83da: _end+7f2cf4b47c1a#012 0x7f2cf4a862bd: _end+7f2cf4415afd#012sp = 0x7f2ced387008 {#012 fd = 58, id = 58, xid = 1454039386,#012 client = 127.0.0.1:7057,#012 step = STP_FETCH,#012 handling = deliver,#012 err_code = 200, err_reason = (null),#012 restarts = 0, esis = 0#012 ws = 0x7f2ced387078 { #012 id = sess,#012{s,f,r,e} = {0x7f2ced387800,+144,(nil),+4096},#012 },#012 http[req] = {#012 ws = 0x7f2ced387078[sess]#012 GET,#012 /lru.10.cache.buster.80.12994,#012 HTTP/1.1,#012 User-Agent: PycURL/7.18.2,#012 Host: localhost:6081,#012 Accept: */*,#012 },#012 worker = 0x7ef439f06390 {#012ws = 0x7ef439f068f0 { #012 id = wrk,#012 {s,f,r,e} = {0x7ef439f03350,+2143,(nil),+4096},#012},#012http[bereq] = {#012 ws = 0x7ef439f068f0[wrk]#012GET,#012 /lru.10.cache.buster.80.12994,#012HTTP/1.1,#012 User-Agent: PycURL/7.18.2,#012Host: localhost:6081,#012 Accept: */*,#012X-Varnish: 1454039386,#012 X-Forwarded-For: 127.0.0.1,#012},#012http[beresp] = {#012 ws = 0x7ef439f068f0[wrk]#012HTTP/1.1,#012 200,#012OK,#012Server: nginx/0.7.64,#012 Date: Sat, 16 Jan 2010 21:11:09 GMT,#012Content-Type: application/octet-stream,#012Content-Length: 524288000,#012 Last-Modified: Sat, 16 Jan 2010 21:08:11 GMT,#012 Connection: keep-alive,#012Accept-Ranges: bytes,#012 X-Varnish-IP: 127.0.0.1,#012X-Varnish-Port: 6081,#012 },#012},#012 Are big files bad? I expect that I'll have to close a pretty big gap normally given that my 4 storage files are 75GB each (SSD). I'd like to start this process before lru nuking happens on it's own while varnish is not unloaded by upstream load balancers. My guess based on loose recollection is that varnish will start lru nuking at 90% capacity. It may just prove not feasible given that I'll have to pull roughly 60GB through to achieve the goalperhaps freeing up a smaller percentage would be acceptable too though. I'm still playing with this, but wanted to share my uber-hacky idea and let you guys tear it apart if it's a dumb idea. Why: Identifying the working set has been difficult. It's large, the long tail is very long. I've tried adaptive ttls to expire objects constantly that shouldn't be in cache: in vcl_fetch: set every new object to a 2hr ttl. in vcl_hit: if obt.hits == N ; then obj.ttl = 36 hours, where N is some number that is high enough to cache another permutation, update the vcl every 30 mins such that obj.ttl was set to expire exactly at the trough of traffic (2300 - 2350 PST) in vcl_hit: if obt.hits = N ; then obj.ttl = 12h or 10h, or 3h (depending on time of day) This just ended up affecting cache hit ratio such that it was never favorable and the box was just
Re: push button lru nuking
On Sat, Jan 16, 2010 at 4:27 PM, Michael Fischer mich...@dynamine.net wrote: On Sat, Jan 16, 2010 at 4:16 PM, Michael Fischer mich...@dynamine.net wrote: On Sat, Jan 16, 2010 at 4:10 PM, David Birdsong david.birds...@gmail.com wrote: On Sat, Jan 16, 2010 at 3:55 PM, Michael Fischer mich...@dynamine.net wrote: This scheme seems very baroque. Why not just reduce the size of your caches so you don't page-thrash and let Varnish's builtin LRU algorithm handle the eviction? Then I wont be able to cache nearly as much. I want to originate as much content as possible on the varnish servers ie. reduce backend fetches. There is no way I could fit any useful amount of my working set into a storage that could handle the evictions without spending an unreasonable amount of money (basically fit it in RAM.) -I'd love to be proven wrong though. As far as random reads go, the SSD's are really good; it's just the writes that kill me. Right now a mostly filled cache server with ~80-160GB allocated can maintain between 90-92% cache hit ratio at 400-500Mb/sec. When it fills up completely eviction cause the machine to keel over, parent can't ping the child, health checks fail -general badness. I'd like to let the eviction run under supervision (automated supervision) and augment the eviction such that it buys back a few hours not minutes. What OS are you running? This might be one of those rare cases where a little more swappiness (i.e., aggressiveness of the pageout algorithm) might buy you something. This page may be useful if you're running on Linux: http://www.westnet.com/~gsmith/content/linux-pdflush.htm --Michael yes, this page was very helpful back when I had hopes of tuning my way around this load problem. ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc