Re: Strategies for splitting load across varnish instances? And avoiding single-point-of-failure?

2010-01-16 Thread Poul-Henning Kamp
In message ff646d15-26b5-4843-877f-fb8d469d2...@slide.com, Ken Brownfield wri
tes:

It is important to be absolutely clear about what your objective is here,
availability, cache-hit-ratio or raw performance, the best solution will
depend on what you are after.

For a lot of purposes, you will get a lot of mileage out of a number of
parallel Varnish machines with DNS round-robin, for all practical
purposes, a zero-cost solution.

In the other end, you have a load-balancer in front of your varnishes,
which gives you all sorts of neat features at a pretty steep cost.

The spectrum between is filled with things like pound, haproxy and other
open-source solution, which may, or may not, run on their own hardware.

There is no perfect fit for all solutions in this space, you will
need to make your own choice.

Squid has a peering feature; [...]

Squids peering feature was created for hit-rate only, the working scenario
is two squids each behind a very slow line to the internet, asking each
other before they pull down a file.

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
p...@freebsd.org | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Strategies for splitting load across varnish instances? And avoiding single-point-of-failure?

2010-01-16 Thread Poul-Henning Kamp
In message 4c3149fb1001151733g73f7a5dfjc84342b9df7f0...@mail.gmail.com, pub c
rawler writes:

Varnish performs very well.  Extending this to have a cluster
functionality within Varnish I think just makes sense.  

You can do some clever stuff with the hash director to distribute the
content over a cluster of varnishes:

varnish1 has:

backend webserver {...}
backend varnish2 {...}
backend varnish3 {...}

acl partner {
varnish1 ip
varnish2 ip
varnish3 ip
}

director h1 hash {
{ .backend webserver; .weight 1; }
{ .backend varnish2; .weight 1; }
{ .backend varnish3; .weight 1; }
}

sub vcl_fetch {
if (beresp.http.x-partner == yes) {
set beresp.ttl = 0s;
unset beresp.http.x-partner;
}
}

sub vcl_deliver {
if (client.ip ~ partner) {
set resp.http.x-partner = yes;
}
}

On varnish2 you change the h1 director to read:

director h1 hash {
{ .backend varnish1; .weight 1; }
{ .backend webserver; .weight 1; }
{ .backend varnish3; .weight 1; }
}

On varnish3:

director h1 hash {
{ .backend varnish1; .weight 1; }
{ .backend varnish2; .weight 1; }
{ .backend webserver; .weight 1; }
}

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
p...@freebsd.org | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Strategies for splitting load across varnish instances? And avoiding single-point-of-failure?

2010-01-16 Thread Michael Fischer
On Sat, Jan 16, 2010 at 1:54 AM, Bendik Heltne bhel...@gmail.com wrote:

 I must say that i am a bit confused.
 I don't understand the need of routing requests to different varnish
 servers based on hash algorithm. So I am wondering what kind of sites
 are we talking about?


We're talking about sites that have a hot working set much larger than the
amount of RAM you can fit in a single Varnish instance (i.e., 32-64GB).

Our Varnish servers have ~ 120.000 - 150.000 objects cached in ~ 4GB
 memory and the backends have a much easier life than before Varnish.
 We are about to upgrade RAM on the Varnish boxes, and eventually we
 can switch to disk cache if needed.


If you receive more than 100 requests/sec per Varnish instance and you use a
disk cache, you will die.

--Michael
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Strategies for splitting load across varnish instances? And avoiding single-point-of-failure?

2010-01-16 Thread Michael Fischer
On Sat, Jan 16, 2010 at 1:59 AM, Poul-Henning Kamp p...@phk.freebsd.dkwrote:

director h1 hash {
{ .backend webserver; .weight 1; }
{ .backend varnish2; .weight 1; }
{ .backend varnish3; .weight 1; }


What happens when varnish2 or varnish3 dies?

--Michael
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Strategies for splitting load across varnish instances? And avoiding single-point-of-failure?

2010-01-16 Thread Poul-Henning Kamp
In message 4c3149fb1001160738l2233481dn82c34c2ba1fcc...@mail.gmail.com, pub c
rawler writes:
Poul,  is anyone running the hash director distribution method like
you provided (in production)?

No idea...

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
p...@freebsd.org | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Strategies for splitting load across varnish instances? And avoiding single-point-of-failure?

2010-01-16 Thread Poul-Henning Kamp
In message d002c4031001160741q63dd5a50i6342116daba15...@mail.gmail.com, Micha
el Fischer writes:

On Sat, Jan 16, 2010 at 1:59 AM, Poul-Henning Kamp p...@phk.freebsd.dkwrote:

director h1 hash {
{ .backend webserver; .weight 1; }
{ .backend varnish2; .weight 1; }
{ .backend varnish3; .weight 1; }


What happens when varnish2 or varnish3 dies?

If a particular backend in the director is unhealthy, the requests
for it will be redistributed by rehashing over the healthy subset
of directors.  Once it becomes healthy, normality will be restored.

So everything should work out fine, for some value around 99.9% of fine.

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
p...@freebsd.org | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Strategies for splitting load across varnish instances? And avoiding single-point-of-failure?

2010-01-16 Thread Poul-Henning Kamp
In message d002c4031001160929p1f688fc9mcc927dda2c684...@mail.gmail.com, Micha
el Fischer writes:

For instance sizes larger than 2, I think a consistent hash is needed.
 Otherwise, the overall hit ratio will fall dramatically upon failure of an
instance as the requests are rerouted.

If you have perfect 1/3 splitting between 3 varnishes, having one die
will do bad things to your hitrate until the remaining two distribute
the load between them.

That's a matter of math, and has nothing to do with the hash algorithm.

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
p...@freebsd.org | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Strategies for splitting load across varnish instances? And avoiding single-point-of-failure?

2010-01-16 Thread Michael Fischer
On Sat, Jan 16, 2010 at 10:44 AM, Poul-Henning Kamp p...@phk.freebsd.dkwrote:

 In message d002c4031001160929p1f688fc9mcc927dda2c684...@mail.gmail.com,
 Micha
 el Fischer writes:

 For instance sizes larger than 2, I think a consistent hash is needed.
  Otherwise, the overall hit ratio will fall dramatically upon failure of
 an
 instance as the requests are rerouted.

 If you have perfect 1/3 splitting between 3 varnishes, having one die
 will do bad things to your hitrate until the remaining two distribute
 the load between them.

 That's a matter of math, and has nothing to do with the hash algorithm.


Let me put it this way and leave the math up to you:  it will be way worse
if you don't use a consistent hash.

--Michael
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Strategies for splitting load across varnish instances? And avoiding single-point-of-failure?

2010-01-16 Thread David Birdsong
On Sat, Jan 16, 2010 at 10:44 AM, Poul-Henning Kamp p...@phk.freebsd.dk wrote:
 In message d002c4031001160929p1f688fc9mcc927dda2c684...@mail.gmail.com, 
 Micha
 el Fischer writes:

For instance sizes larger than 2, I think a consistent hash is needed.
 Otherwise, the overall hit ratio will fall dramatically upon failure of an
instance as the requests are rerouted.

 If you have perfect 1/3 splitting between 3 varnishes, having one die
 will do bad things to your hitrate until the remaining two distribute
 the load between them.

 That's a matter of math, and has nothing to do with the hash algorithm.
Right, but those 2 remaining are at least still being asked for the
same url's they were prior to the 1 dying.  They're just now
responsible for the dead varnish's urls in addition to their own
working set.  This is much better than the entire url space being
hashed against 2 buckets.
  ...or is my understanding of consistent hashing flawed?


 --
 Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
 p...@freebsd.org         | TCP/IP since RFC 956
 FreeBSD committer       | BSD since 4.3-tahoe
 Never attribute to malice what can adequately be explained by incompetence.
 ___
 varnish-misc mailing list
 varnish-misc@projects.linpro.no
 http://projects.linpro.no/mailman/listinfo/varnish-misc

___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Strategies for splitting load across varnish instances? And avoiding single-point-of-failure?

2010-01-16 Thread Poul-Henning Kamp
In message dcccdf791001161258s3e960aa8t3cd379e42d760...@mail.gmail.com, David
 Birdsong writes:

Right, but those 2 remaining are at least still being asked for the
same url's they were prior to the 1 dying.

Correct, the hashing is canonical in the sense that if the
configured backend is up, all traffic for its objects will be
sent to it.

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
p...@freebsd.org | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Strategies for splitting load across varnish instances? And avoiding single-point-of-failure?

2010-01-16 Thread Michael Fischer
On Sat, Jan 16, 2010 at 1:19 PM, Poul-Henning Kamp p...@phk.freebsd.dkwrote:

 In message dcccdf791001161258s3e960aa8t3cd379e42d760...@mail.gmail.com,
 David
  Birdsong writes:

 Right, but those 2 remaining are at least still being asked for the
 same url's they were prior to the 1 dying.

 Correct, the hashing is canonical in the sense that if the
 configured backend is up, all traffic for its objects will be
 sent to it.


Are you saying that the default hash is not a mod-n-type algorithm?

If not, what happens when the failed backend is restored to service?

--Michael
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Strategies for splitting load across varnish instances? And avoiding single-point-of-failure?

2010-01-16 Thread Michael Fischer
On Sat, Jan 16, 2010 at 1:37 PM, Poul-Henning Kamp p...@phk.freebsd.dkwrote:

Are you saying that the default hash is not a mod-n-type algorithm?

 Well, it is mod-n, with the footnote that n has nothing to do with
 the number of backends, because these have a configurable weight.

 If not, what happens when the failed backend is restored to service?

 It's probably simplest to paraphrase the code:

Calculate hash over full complement of backends.
Is the selected backend sick
Calculate hash over subset of healthy backends


Ah, ok.  That should behave reasonably in the event of a backend failure if
you're implementing Varnish tiers.  Thanks for the clarification.

--Michael
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Strategies for splitting load across varnish instances? And avoiding single-point-of-failure?

2010-01-16 Thread pub crawler
Thanks again Poul for all you do.

How does Varnish handle the hashing and locating of data where a
backend returns to the pool?  Wouldn't the hashing be wrong for prior
loaded items since a machine has returned and the pool widens?

Just trying to figure out the implications of this because in our
environment we regularly find ourselves pulling servers offline.
Wondering if the return of a Varnish would operate like a cold-cache
miss or what magic in Varnish deals with the change in hashing per se.

 It's probably simplest to paraphrase the code:

        Calculate hash over full complement of backends.
        Is the selected backend sick
                Calculate hash over subset of healthy backends

___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Strategies for splitting load across varnish instances? And avoiding single-point-of-failure?

2010-01-16 Thread Poul-Henning Kamp
In message 4c3149fb1001161400n38a1ef1al18985bc3ad1ad...@mail.gmail.com, pub c
rawler writes:

Just trying to figure out the implications of this because in our
environment we regularly find ourselves pulling servers offline.
Wondering if the return of a Varnish would operate like a cold-cache
miss or what magic in Varnish deals with the change in hashing per se.

There is no built-in magic for that[1].

One of the really powerful things Varnish can do, is chance VCL code
on-the-fly, instantly.

So it is possible to start your Varnish with one VCL program, and have
a small script change to another one some minutes later.

You can use that, to start with a VCL where it only uses its neighbors
as backends, and then some minutes later when the cache has the most
common objects loaded, switch to another VCL that goes directly to
the backend.

If you want to get fancy, you can use VCL restarts, to ask the
neighbors and if they don't have it, go directly to the backend on
restart.

Poul-Henning

[1] In general Varnish has no built in magic, all the magic is your
responsibility to write in the VCL code :-)

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
p...@freebsd.org | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


push button lru nuking

2010-01-16 Thread David Birdsong
I'm trying to hack my way around a push-button like lru nuking like
feature.  The short description of how I'm doing it follows, I'll
explain why farther down.

I have a job that watches sm_bfree / (sm_bfree + sm_balloc).  Once
storage file utilization is past some percentage(yet to be determined)
I connect to upstream load balancers and slowly drain traffic away
from varnish.

Once traffic is off and I can beat the hell out of that box, it's time
to free up some space.  In the past this has been done with restarts.
Upon restarts, the cache hit ratio is destroyed, but the box can keep
up and rebuild the cache in a stable way.  What I'd like to do is dump
everything in the storage files that have a very low obj.hits.  Lru
nuking on the surface seems like the best thing to initiate, but it
usually only kicks in pretty late and puts the machine into a state
that is unstable while serving.  While not serving, I don't know how
to kick it off, furthermore I want it to run hard and free up lots
more space than it usually does.

ie.  cache file is ~200GB, I'd like it to run until sm_free is like 50GB.

My idea is to load balance as I've described above.   Pull 50GB of
trash files through the cache + enough to kick off lru, purge the
trash files, monitor sm_bfree and once it's high enough instruct the
upstream load balancers to start sending traffic gently for a warm up
period.  Rinse and repeat into infinity replacing the ssd storage
drives as they fail.  Is this crazy?  Am I uninformed on a better way?

Also, I've had to keep making my trash files smaller and smaller.  I
started with a 10 and 1G files which crashed varnish immediately, then
reduced to 500MB files and successfully pulled 200 through - then
crashed both my python interpreter (libcurl) and varnish:
varnishd[2664]: Child (14772) Panic message: Assert error in
STV_alloc(), stevedore.c line 183:#012  Condition((st) != NULL) not
true.#012thread = (cache-worker)#012Backtrace:#012  0x421f95:
pan_ic+85#012  0x4369e5: STV_alloc+125#012  0x41a1b6:
FetchBody+496#012  0x4114dd: cnt_fetch+63d#012  0x412a3d:
CNT_Session+35d#012  0x424273: wrk_do_cnt_sess+93#012  0x42362e:
wrk_thread_real+26e#012  0x7f2cf51b83da: _end+7f2cf4b47c1a#012
0x7f2cf4a862bd: _end+7f2cf4415afd#012sp = 0x7f2ced387008 {#012  fd =
58, id = 58, xid = 1454039386,#012  client = 127.0.0.1:7057,#012  step
= STP_FETCH,#012  handling = deliver,#012  err_code = 200, err_reason
= (null),#012  restarts = 0, esis = 0#012  ws = 0x7f2ced387078 { #012
  id = sess,#012{s,f,r,e} =
{0x7f2ced387800,+144,(nil),+4096},#012  },#012  http[req] = {#012
ws = 0x7f2ced387078[sess]#012  GET,#012
/lru.10.cache.buster.80.12994,#012  HTTP/1.1,#012
User-Agent: PycURL/7.18.2,#012  Host: localhost:6081,#012
Accept: */*,#012  },#012  worker = 0x7ef439f06390 {#012ws =
0x7ef439f068f0 { #012  id = wrk,#012  {s,f,r,e} =
{0x7ef439f03350,+2143,(nil),+4096},#012},#012http[bereq] =
{#012  ws = 0x7ef439f068f0[wrk]#012GET,#012
/lru.10.cache.buster.80.12994,#012HTTP/1.1,#012
User-Agent: PycURL/7.18.2,#012Host: localhost:6081,#012
Accept: */*,#012X-Varnish: 1454039386,#012
X-Forwarded-For: 127.0.0.1,#012},#012http[beresp] = {#012
  ws = 0x7ef439f068f0[wrk]#012HTTP/1.1,#012
200,#012OK,#012Server: nginx/0.7.64,#012
Date: Sat, 16 Jan 2010 21:11:09 GMT,#012Content-Type:
application/octet-stream,#012Content-Length: 524288000,#012
   Last-Modified: Sat, 16 Jan 2010 21:08:11 GMT,#012
Connection: keep-alive,#012Accept-Ranges: bytes,#012
 X-Varnish-IP: 127.0.0.1,#012X-Varnish-Port: 6081,#012
},#012},#012

Are big files bad?  I expect that I'll have to close a pretty big gap
normally given that my 4 storage files are 75GB each (SSD). I'd like
to start this process before lru nuking happens on it's own while
varnish is not unloaded by upstream load balancers.  My guess based on
loose recollection is that varnish will start lru nuking at 90%
capacity.  It may just prove not feasible given that I'll have to pull
roughly 60GB through to achieve the goalperhaps freeing up a
smaller percentage would be acceptable too though.  I'm still playing
with this, but wanted to share my uber-hacky idea and let you guys
tear it apart if it's a dumb idea.

Why:
Identifying the working set has been difficult.  It's large, the long
tail is very long.  I've tried adaptive ttls to expire objects
constantly that shouldn't be in cache:

  in vcl_fetch: set every new object to a 2hr ttl.
  in vcl_hit: if obt.hits == N ; then obj.ttl = 36 hours, where N is
some number that is high enough to cache
another permutation, update the vcl every 30 mins such that obj.ttl
was set to expire exactly at the trough of traffic (2300 - 2350 PST)
  in vcl_hit: if obt.hits = N ; then obj.ttl = 12h or 10h, or 3h
(depending on time of day)

This just ended up affecting cache hit ratio such that it was never
favorable and the box was just 

Re: push button lru nuking

2010-01-16 Thread David Birdsong
On Sat, Jan 16, 2010 at 4:27 PM, Michael Fischer mich...@dynamine.net wrote:
 On Sat, Jan 16, 2010 at 4:16 PM, Michael Fischer mich...@dynamine.net
 wrote:

 On Sat, Jan 16, 2010 at 4:10 PM, David Birdsong david.birds...@gmail.com
 wrote:

 On Sat, Jan 16, 2010 at 3:55 PM, Michael Fischer mich...@dynamine.net
 wrote:
  This scheme seems very baroque.  Why not just reduce the size of your
  caches
  so you don't page-thrash and let Varnish's builtin LRU algorithm handle
  the
  eviction?

 Then I wont be able to cache nearly as much.  I want to originate as
 much content as possible on the varnish servers ie. reduce backend
 fetches.  There is no way I could fit any useful amount of my working
 set into a storage that could handle the evictions without spending an
 unreasonable amount of money (basically fit it in RAM.)  -I'd love to
 be proven wrong though.  As far as random reads go, the SSD's are
 really good; it's just the writes that kill me.

 Right now a mostly filled cache server with ~80-160GB allocated can
 maintain between 90-92% cache hit ratio at 400-500Mb/sec.  When it
 fills up completely eviction cause the machine to keel over, parent
 can't ping the child, health checks fail -general badness.  I'd like
 to let the eviction run under supervision (automated supervision) and
 augment the eviction such that it buys back a few hours not minutes.

 What OS are you running?  This might be one of those rare cases where a
 little more swappiness (i.e., aggressiveness of the pageout algorithm)
 might buy you something.

 This page may be useful if you're running on Linux:
 http://www.westnet.com/~gsmith/content/linux-pdflush.htm
 --Michael

yes, this page was very helpful back when I had hopes of tuning my way
around this load problem.
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc