Re: Varnish use for purely binary files

2010-01-20 Thread Laurence Rowe
2010/1/19 pub crawler :
> Wanted in inject another discussion heady item into this thread and
> see if the idea is confirmed in other folks current architecture.
> Sorry in advance for being verbose.
>
> Often web servers (my experience) are smaller servers, less RAM and
> fewer CPUs than the app servers and databases.  A typical webserver
> might be a 2GB or 4GB machine with a dual CPU.  But, the disk storage
> on any given webserver will far exceed the RAM in the machine. This
> means disk IO even when attempting to cache as much as possible in a
> webserver due to the limited RAM.
>
> In this "normal" web server size model, simply plugging a bigger RAM
> Varnish in upstream means less disk IO, faster web servers, less
> memory consumption managing threads, etc.  This is well proven basic
> Varnish adopter model.
>
> Here's a concept that is not specific to the type of data being stored
> in Varnish:
>
> With some additional hashing in the mix, you could limit your large
> Varnish cache server to the very high repetitively accessed items and
> use the hash to go to the backend webservers where ideally you hit a
> smaller Varnish instance with the item cached on the 2-4GB webserver
> downriver and have it talk to the webserver directly on localhost if
> it didn't have the data.

Given that you've already taken care of the common requests upstream,
you are unlikely to see much benefit from any form of caching -
performance will be determined by disk seek time. I suspect you would
see much more of a benefit in moving to SSDs for storage. Even cheap
MLC SSDs like Intel's X25-M will give great read performance.

Laurence
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Varnish use for purely binary files

2010-01-19 Thread Michael S. Fischer
On Jan 19, 2010, at 12:46 AM, Poul-Henning Kamp wrote:

> In message , "Michael S. 
> Fis
> cher" writes:
> 
>> Does Varnish already try to utilize CPU caches efficiently by employing =
>> some sort of LIFO thread reuse policy or by pinning thread pools to =
>> specific CPUs?  If not, there might be some opportunity for optimization =
>> there.
> 
> You should really read the varnish_perf.pdf slides I linked to yesteday...

They appear to only briefly mention the LIFO issue (in one bullet point toward 
the end), and do not discuss the CPU affinity issue.

--Michael
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Varnish use for purely binary files

2010-01-19 Thread Poul-Henning Kamp
In message , "Michael S. Fis
cher" writes:

>Does Varnish already try to utilize CPU caches efficiently by employing =
>some sort of LIFO thread reuse policy or by pinning thread pools to =
>specific CPUs?  If not, there might be some opportunity for optimization =
>there.

You should really read the varnish_perf.pdf slides I linked to yesteday...

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
p...@freebsd.org | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Varnish use for purely binary files

2010-01-18 Thread pub crawler
Wanted in inject another discussion heady item into this thread and
see if the idea is confirmed in other folks current architecture.
Sorry in advance for being verbose.

Often web servers (my experience) are smaller servers, less RAM and
fewer CPUs than the app servers and databases.  A typical webserver
might be a 2GB or 4GB machine with a dual CPU.  But, the disk storage
on any given webserver will far exceed the RAM in the machine. This
means disk IO even when attempting to cache as much as possible in a
webserver due to the limited RAM.

In this "normal" web server size model, simply plugging a bigger RAM
Varnish in upstream means less disk IO, faster web servers, less
memory consumption managing threads, etc.  This is well proven basic
Varnish adopter model.

Here's a concept that is not specific to the type of data being stored
in Varnish:

With some additional hashing in the mix, you could limit your large
Varnish cache server to the very high repetitively accessed items and
use the hash to go to the backend webservers where ideally you hit a
smaller Varnish instance with the item cached on the 2-4GB webserver
downriver and have it talk to the webserver directly on localhost if
it didn't have the data.

Anyone doing anything remotely like this?  Lots of big RAM
installations for Varnish.  I like the Google or mini Google model of
many smaller machines distributing the load.  Seem feasible?  2-4GB
machines are very affordable compared to the 16GB and above machines.
Certainly more collective horsepower with the individual smaller
servers - perhaps a better performance-per-watt also (another one of
my interests).

Thanks again everyone.  I enjoy hearing about all the creative ways
folks are using Varnish in their very different environments.  The
more scenarios for Varnish, the more adoption and ideally the more
resources and expertise that become available for future development.

There is some sort of pruning of the cache that is beyond me at the
moment to keep Varnish from being overpopulated with non used items
and similarly from wasting RAM on the webservers for such.

Simple concept and probably very typical.  Oh yeah, plus it scales
horizontally on lower cost dummy server nodes.
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Varnish use for purely binary files

2010-01-18 Thread Michael S. Fischer
On Jan 18, 2010, at 4:35 PM, Poul-Henning Kamp wrote:

> In message <97f066dd-4044-46a7-b3e1-34ce928e8...@slide.com>, Ken Brownfield 
> wri
> tes:
> 
>> Ironically and IMHO, one of the barriers to Varnish scalability
>> is its thread model, though this problem strikes in the thousands
>> of connections.
> 
> It's only a matter of work to pool slow clients in Varnish into
> eventdriven writer clusters, but so far I have not seen a
> credible argument for doing it.
> 
> A thread is pretty cheap to have around if it doesn't do anything,
> and the varnish threads typically do not do anything during the
> delivery-phase:  They are stuck in the kernel in a writev(2) 
> or sendfile(2) system call.

Does Varnish already try to utilize CPU caches efficiently by employing some 
sort of LIFO thread reuse policy or by pinning thread pools to specific CPUs?  
If not, there might be some opportunity for optimization there.

--Michael
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Varnish use for purely binary files

2010-01-18 Thread Michael S. Fischer
On Jan 18, 2010, at 4:15 PM, Ken Brownfield wrote:

> Ironically and IMHO, one of the barriers to Varnish scalability is its thread 
> model, though this problem strikes in the thousands of connections.

Agreed.  In an early thread on varnish-misc in February 2008 I concluded that 
reducing thread_pool_max to well below the default value (to 16 threads/CPU) 
was instrumental in attaining maximum performance on high-hit-ratio workloads.  
 (This was with Varnish 1.1; things may have changed since then but the theory 
remains.)

Funny how there's always a tradeoff:

Overcommit ->  page-thrashing death
Undercommit -> context-switch death

:)

--Michael
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Varnish use for purely binary files

2010-01-18 Thread Poul-Henning Kamp
In message <97f066dd-4044-46a7-b3e1-34ce928e8...@slide.com>, Ken Brownfield wri
tes:

>Ironically and IMHO, one of the barriers to Varnish scalability
>is its thread model, though this problem strikes in the thousands
>of connections.

It's only a matter of work to pool slow clients in Varnish into
eventdriven writer clusters, but so far I have not seen a
credible argument for doing it.

A thread is pretty cheap to have around if it doesn't do anything,
and the varnish threads typically do not do anything during the
delivery-phase:  They are stuck in the kernel in a writev(2) 
or sendfile(2) system call.

In terms of machine resources, there is no cheaper way to do it.

An important but not often spotted advantage is that the object
overhead does not depend on the size of the object:  a 1
megabyte object takes exactly as few resources as a 1 byte object.

If you change to an eventdriven model, you will have many more
system-calls, scaling O(n) with object sizes, and you will
get a lot more locking in the kernel, resulting in contention
on fd's and pcbs.

At the higher level, you will have threads getting overwhelmed
if/when we misestimate the amount of bandwidth they have to deal
with, and you will need complicated code to mitigate this.

For 32bit machines, having thousands of threads is an issue, because
you run ou of address-space, but on a 64bit system, having 1000
threads or even 10k threads is not really an issue.

Again: Don't let the fact that people have doen this simple datamoving
job wrong in the past, mislead you think it cannot be done right.

The trick to getting high performance, is not doing work you don't
need to do, no architecture or performance trick can eve beat that.

Poul-Henning

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
p...@freebsd.org | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Varnish use for purely binary files

2010-01-18 Thread Poul-Henning Kamp
In message <87f6439f-76fe-416c-b750-5a53a9712...@dynamine.net>, "Michael S. Fis
cher" writes:

>I'm merely contending that the small amount of added =
>latency for a cache hit, where neither server is operating at full =
>capacity, is not enough to significantly affect the user experience.

Which translated to plain english becomes:

If you don't need varnish, you don't need varnish.

I'm not sure how much useful information that statement contains :-)


-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
p...@freebsd.org | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Varnish use for purely binary files

2010-01-18 Thread Michael S. Fischer
On Jan 18, 2010, at 4:06 PM, Poul-Henning Kamp wrote:

> In message <02d0ec1a-d0b0-40ee-b278-b57714e54...@dynamine.net>, "Michael S. 
> Fis
> cher" writes:
> 
>> But we are not discussing serving dynamic content in this thread
>> anyway.  We are talking about binary files, aren't we?  Yes?  Blobs
>> on disk?  Unless everyone is living on a different plane then me,
>> then I think that's what we're talking about.
>> 
>> For those you should be using a general purpose webserver.  There's
>> no reason you can't run both side by side.  And I stand by my
>> original statement about their performance relative to Varnish.
> 
> Why would you use a general purpose webserver, if Varnish can
> deliver 80 or 90% of your content much faster and much cheaper ?

There's no question that Varnish is faster and that it can handle more peak 
requests per second than a general-purpose webserver at a near-100% cache hit 
rate.  I'm merely contending that the small amount of added latency for a cache 
hit, where neither server is operating at full capacity, is not enough to 
significantly affect the user experience.

There are many competing factors that need to go into the planning process 
other than pure peak capacity, among them the cache hit ratio, the cost of a 
cache miss, and where your money is better spent: installing RAM in cache 
servers or in origin servers.

--Michael

___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Varnish use for purely binary files

2010-01-18 Thread Ken Brownfield
On Jan 18, 2010, at 4:03 PM, Michael S. Fischer wrote:
>> Does [Apache] perform "well" for static files in the absence of any other 
>> function?  Yes.  Would I choose it for anything other than an application 
>> server?  No.  There are much better solutions out there, and the proof is in 
>> the numbers.
> 
> 
> Not sure what you mean here... at my company it's used for everything but 
> proxying (because Apache's process model is contraindicated at high 
> concurrencies if you want to support Keep-Alive connections).  And we serve a 
> lot of traffic at very low latencies.   

The concurrency issue is really Apache's achilles tendon.  Real world example: 
being limited to 70 concurrent application workers on a 16GB machine is a bad 
joke.  Like you said, simultaneous (especially slow) connections will kill 
Apache dead very quickly.  mpm_event could be a huge boon (if Apache insists on 
continuing with a pure process/thread model) but I'm not sure it's every going 
to "arrive".

Ironically and IMHO, one of the barriers to Varnish scalability is its thread 
model, though this problem strikes in the thousands of connections.

Apache is fast at pure static, but it isn't the fastest.  nginx/lighttpd/thttpd 
can be simpler and faster, and they don't rely on the process/thread model.

But whether or not you need that 100th percentile of speed depends on a huge 
number of variables.  Apache's ubiquity is a strong argument, and it would take 
heavy loads to differentiate Apache from the others above.
-- 
Ken

> --Michael

___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Varnish use for purely binary files

2010-01-18 Thread Poul-Henning Kamp
In message <364f5e3e-0d1e-4c95-b101-b7a00c276...@slide.com>, Ken Brownfield wri
tes:

>A cache hit under Varnish will be comparable in latency to a
>dedicated static server hit, regardless of the backend.

Only provided the "dedicated static server" is written to work in
a modern SMP/VM system, which few, if any, of them are.

Poul-Henning

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
p...@freebsd.org | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Varnish use for purely binary files

2010-01-18 Thread Poul-Henning Kamp
In message <02d0ec1a-d0b0-40ee-b278-b57714e54...@dynamine.net>, "Michael S. Fis
cher" writes:

>But we are not discussing serving dynamic content in this thread
>anyway.  We are talking about binary files, aren't we?  Yes?  Blobs
>on disk?  Unless everyone is living on a different plane then me,
>then I think that's what we're talking about.
>
>For those you should be using a general purpose webserver.  There's
>no reason you can't run both side by side.  And I stand by my
>original statement about their performance relative to Varnish.

Why would you use a general purpose webserver, if Varnish can
deliver 80 or 90% of your content much faster and much cheaper ?

It sounds to me like you have not done your homework with respect
to Varnish.

For your information, here is the approximate sequence of systemcalls
Varnish performs for a cache hit:

read(get the HTTP request)
timestamp
timestamp
timestamp
timestamp
writev  (write the response)

With some frequency, depending on your system and OS, you will also
see a few mutex operations.

The difference between the first and the last timestamp is typically
on the order of 10-20 microseconds.  The middle to timestamps
are mostly for my pleasure and could be optimized out, if they
made any difference.

This is why people who run synthetic benchmarks do insane amounts
of req/s on varnish boxes, for values of insane >> 100.000.

I suggest you look at how many systems calls and how long time your
"general purpose webserver" spends doing the same job.

Once you have done that, I can recommend you read the various
architects notes I've written, and maybe browse through

http://phk.freebsd.dk/pubs/varnish_perf.pdf

Where you decide to deposit your "conventional wisdom" afterwards
is for you to decide, but it is unlikely to be applicable.

Poul-Henning

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
p...@freebsd.org | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Varnish use for purely binary files

2010-01-18 Thread Michael S. Fischer
On Jan 18, 2010, at 3:54 PM, Ken Brownfield wrote:

> Adding unnecessary software overhead will add latency to requests to the 
> filesystem, and obviously should be avoided.  However, a cache in front of a 
> general web server will 1) cause an object miss to have additional latency 
> (though small) and 2) guarantee object hits will be as low as possible.  A 
> cache in front of a dedicated static file server is unnecessary, but 
> worst-case would introduce additional latency only for cache misses.

Agreed.  This is what I was trying to communicate all along.  It was my 
understanding that this was what the thread was about.

>  Does [Apache] perform "well" for static files in the absence of any other 
> function?  Yes.  Would I choose it for anything other than an application 
> server?  No.  There are much better solutions out there, and the proof is in 
> the numbers.


Not sure what you mean here... at my company it's used for everything but 
proxying (because Apache's process model is contraindicated at high 
concurrencies if you want to support Keep-Alive connections).  And we serve a 
lot of traffic at very low latencies.   

--Michael
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Varnish use for purely binary files

2010-01-18 Thread Ken Brownfield
> Let me clear, in case I have not been clear enough already:
> 
> I am not talking about the edge cases of those low-concurrency, high-latency, 
> scripted-language webservers that are becoming tied to web application 
> frameworks like Rails and Django and that are the best fit for front-end 
> caching because they are slow at serving dynamic content.  
> 
> But we are not discussing serving dynamic content in this thread anyway.  We 
> are talking about binary files, aren't we?  Yes?  Blobs on disk?  Unless 
> everyone is living on a different plane then me, then I think that's what 
> we're talking about.
> 
> For those you should be using a general purpose webserver.  There's no reason 
> you can't run both side by side.  And I stand by my original statement about 
> their performance relative to Varnish.

Definitely wasn't clear until now.

But now I'm not sure what we're discussing, since comparing the performance of 
a reverse-proxy cache to an origin server is rather pointless.

A cache hit under Varnish will be comparable in latency to a dedicated static 
server hit, regardless of the backend.  The rate of misses will determine 
whether a dedicated static server would be required, and this is a growth path 
that many companies follow.
-- 
Ken

> --Michael
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Varnish use for purely binary files

2010-01-18 Thread Ken Brownfield
On Jan 18, 2010, at 3:16 PM, Michael S. Fischer wrote:
> On Jan 18, 2010, at 3:08 PM, Ken Brownfield wrote:
> 
>> In the real world, sites run their applications through web servers, and 
>> this fact does (and should) guide the decision on the base web server to 
>> use, not static file serving.
> 
> I meant webservers that more than 50%+ of the world uses, which do not 
> include those.

Depends on whether you mean 50% of companies, or 50% of web property traffic.  
The latter?  Definitely.

>  I was assuming, perhaps incorrectly, that the implementor would have at 
> least the wisdom/laziness to use a popular general-purpose webserver such as 
> Apache for the purpose of serving static objects from the filesystem.   And 
> that's not even really a stretch as it's the default for most servers.

This is true, though default Apache configurations vary the gamut from clean to 
bloated (>1ms variation I would say)

> 
>> (Though nginx may have an on-disk cache?  And don't get me started on Apache 
>> caching. :-)
> 
> Doctor, heal thyself before you call me inexperienced.  Using 
> application-level caching for serving objects from the filesystem rarely 
> works, which is the main point of Varnish.  Just because *you* can't get good 
> performance out of Apache doesn't mean it's not worth using.

I'm not sure what your definition of application-level is, here.  Much of 
Apache's functionality could be considered an application.  But if you mean an 
embedded app running "inside" Apache, then that distinction has almost no 
bearing on whether file serving "works" or not -- an app can serve files just 
as fast as Apache, assuming C/C++.

Adding unnecessary software overhead will add latency to requests to the 
filesystem, and obviously should be avoided.  However, a cache in front of a 
general web server will 1) cause an object miss to have additional latency 
(though small) and 2) guarantee object hits will be as low as possible.  A 
cache in front of a dedicated static file server is unnecessary, but worst-case 
would introduce additional latency only for cache misses.

I'm not sure what your comment on Apache is about, since I never said Apache 
isn't worth using.  I've been using it in production for 11+ years now.  Does 
it perform "well" for static files in the absence of any other function?  Yes.  
Would I choose it for anything other than an application server?  No.  There 
are much better solutions out there, and the proof is in the numbers.
-- 
Ken

> --Michael


___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Varnish use for purely binary files

2010-01-18 Thread pub crawler
> The average workload of a cache hit, last I looked, was 7 system
> calls, with typical service times, from request received from kernel
> until response ready to be written to kernel, of 10-20 microseconds.

Well that explains some of the performance difference in Varnish (in
our experience) versus web servers.

7 calls isn't much and you said MICROSECONDS.   :)

> I don't know if that is THE best performance, but I know of a lot
> of software doing a lot worse.

I haven't done the reproduceable testing to share with everyone yet.
But using 3rd party remotely hosted analysis services we know for
certain our page elements are starting faster and the average object
load time has gone down significantly.   We are using one or more of
the fast webservers and still are - just behind Varnish now :)
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Varnish use for purely binary files

2010-01-18 Thread Michael S. Fischer
On Jan 18, 2010, at 3:47 PM, Poul-Henning Kamp wrote:

> In message , "Michael S. 
> Fis
> cher" writes:
> 
>> That's why you don't use those webservers as origin servers for
>> that purpose.  But you don't use Varnish for it either.  It's not
>> an origin server anyway.
> 
> Actually, for protocol purposes, Varnish is an origin server.
> 
> If you read RFC2616 very carefully, you can find the one place where
> they failed to evict server-side caches from the text, when they
> realized that a cache under the control of the webmaster, is
> indistinguisable from a webserver, for protocol purposes.


I meant it for practical purposes, Poul-Henning.  But I'm sure you knew that. :)

--Michael
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Varnish use for purely binary files

2010-01-18 Thread Michael S. Fischer
On Jan 18, 2010, at 3:37 PM, pub crawler wrote:

>> Differences in latency of serving static content can vary widely based on
>> the web server in use, easily tens of milliseconds or more.  There are
>> dozens of web servers out there, some written in interpreted languages, many
>> custom-written for a specific application, many with add-ons and modules and
> 
> Most webservers as shipped are simply not very speedy.   Nginx,
> Cherokee, Lighty are three exceptions :)
> Latency is all over the place in web server software.  Caching is a
> black art still no matter where you are talking about having one or
> lacking one :)  Ten milliseconds is easily wasted in a web server,
> connection pooling, negotiating the transfer, etc.  Most sites have so
> many latency issues and such a lack of performance.  

Let me clear, in case I have not been clear enough already:

I am not talking about the edge cases of those low-concurrency, high-latency, 
scripted-language webservers that are becoming tied to web application 
frameworks like Rails and Django and that are the best fit for front-end 
caching because they are slow at serving dynamic content.  

But we are not discussing serving dynamic content in this thread anyway.  We 
are talking about binary files, aren't we?  Yes?  Blobs on disk?  Unless 
everyone is living on a different plane then me, then I think that's what we're 
talking about.

For those you should be using a general purpose webserver.  There's no reason 
you can't run both side by side.  And I stand by my original statement about 
their performance relative to Varnish.

--Michael
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Varnish use for purely binary files

2010-01-18 Thread Poul-Henning Kamp
In message , "Michael S. Fis
cher" writes:

>That's why you don't use those webservers as origin servers for
>that purpose.  But you don't use Varnish for it either.  It's not
>an origin server anyway.

Actually, for protocol purposes, Varnish is an origin server.

If you read RFC2616 very carefully, you can find the one place where
they failed to evict server-side caches from the text, when they
realized that a cache under the control of the webmaster, is
indistinguisable from a webserver, for protocol purposes.

Poul-Henning

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
p...@freebsd.org | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Varnish use for purely binary files

2010-01-18 Thread Poul-Henning Kamp
In message <4c3149fb1001181416r7cd1c1c2n923a438d6a0df...@mail.gmail.com>, pub c
rawler writes:

>So far Varnish is performing very well for us as a web server of these
>cached objects.   The connection time for an item out of Varnish is
>noticeably faster than with web servers we have used - even where the
>items have been cached.  We are mostly using 3rd party tools like
>webpagetest.org to look at the item times.

The average workload of a cache hit, last I looked, was 7 system
calls, with typical service times, from request received from kernel
until response ready to be written to kernel, of 10-20 microseconds.

Compared to the amount of work real webservers do for the same task,
that is essentially nothing.

I don't know if that is THE best performance, but I know of a lot
of software doing a lot worse.

Try running varnishhist if you have not already :-)

Poul-Henning

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
p...@freebsd.org | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Varnish use for purely binary files

2010-01-18 Thread pub crawler
> Differences in latency of serving static content can vary widely based on
> the web server in use, easily tens of milliseconds or more.  There are
> dozens of web servers out there, some written in interpreted languages, many
> custom-written for a specific application, many with add-ons and modules and

Most webservers as shipped are simply not very speedy.   Nginx,
Cherokee, Lighty are three exceptions :)
Latency is all over the place in web server software.  Caching is a
black art still no matter where you are talking about having one or
lacking one :)  Ten milliseconds is easily wasted in a web server,
connection pooling, negotiating the transfer, etc.  Most sites have so
many latency issues and such a lack of performance.  Most folks seem
to just ignore it though and think all is well with low performance.

That's why Varnish and the folks here are so awesome.   A band of data
crushers, bandwidth abusers and RAM junkies with lower latency in
mind.  Latency is an ugly multiplier - it gets multiplied by every
request, multiple requests per user, multiplied by all the use in a
period of time.   If your page has 60 elements to be served and you
add a mere 5ms to each element that's 300ms of latency just on serving
static items.  There are other scenarios too like dealing with people
on slow connections (if your audience has lots of these).

>  If you're serving pure static content with no need for application logic,
> then yes, there is little benefit to choosing a two-tier infrastructure when
> a one-tier out-of-the-box nginx/lighttpd/thttpd will do just fine.  But, if
> your content does not fit in memory, you're back to reverse-Squid or
> Varnish.  (Though nginx may have an on-disk cache?  And don't get me started
> on Apache caching. :-)

Static sites will still be aided in scaling fronting them with Varnish
or similar cache front end if they are big enough.  A small for
instance might be offloading images or items that require longer
connection timeouts to Varnish - reducing the disk IO perhaps and
being able to cut your open connections on your web server.  You could
do the same obviously by dissecting your site into multiple servers
and dividing the load ~ lose some of the functionality that is
appealing in Varnish and the ability to dynamically adjust traffic,
load, direction, etc. within Varnish.  Unsure if anything similar
exists in Nginx - but then you are turning a web server into something
else and likely some performance reduction.

Mind you,  most people here *I think* are dealing with big scaling -
busy sites, respectable and sometimes awe inspiring amounts of data.
Then there are those slow as can be app servers the might have to work
around too.  So the scale of latency issues is a huge cost center for
most folks.

Plenty of papers have been wrote about latency and the user
experience.  The slower the load the less people interact and in
commerce terms spend with the site.
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Varnish use for purely binary files

2010-01-18 Thread Michael S. Fischer
On Jan 18, 2010, at 3:08 PM, Ken Brownfield wrote:

>> I have a hard time believing that any difference in the total response time 
>> of a cached static object between Varnish and a general-purpose webserver 
>> will be statistically significant, especially considering typical Internet 
>> network latency.  If there's any difference it should be well under a 
>> millisecond.
> 
> I would suggest that you get some real-world experience, or at least do some 
> research in this area.  Like your earlier assertion, this is patently untrue 
> as a general conclusion.

> Differences in latency of serving static content can vary widely based on the 
> web server in use, easily tens of milliseconds or more.  There are dozens of 
> web servers out there, some written in interpreted languages, many 
> custom-written for a specific application, many with add-ons and modules and 
> other hijinx that can effect the latency of serving static content.

That's why you don't use those webservers as origin servers for that purpose.  
But you don't use Varnish for it either.  It's not an origin server anyway.

> In the real world, sites run their applications through web servers, and this 
> fact does (and should) guide the decision on the base web server to use, not 
> static file serving.

I meant webservers that more than 50%+ of the world uses, which do not include 
those.  I was assuming, perhaps incorrectly, that the implementor would have at 
least the wisdom/laziness to use a popular general-purpose webserver such as 
Apache for the purpose of serving static objects from the filesystem.   And 
that's not even really a stretch as it's the default for most servers.

>  (Though nginx may have an on-disk cache?  And don't get me started on Apache 
> caching. :-)

Doctor, heal thyself before you call me inexperienced.  Using application-level 
caching for serving objects from the filesystem rarely works, which is the main 
point of Varnish.  Just because *you* can't get good performance out of Apache 
doesn't mean it's not worth using.

--Michael
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Varnish use for purely binary files

2010-01-18 Thread Ken Brownfield
> I have a hard time believing that any difference in the total response time 
> of a cached static object between Varnish and a general-purpose webserver 
> will be statistically significant, especially considering typical Internet 
> network latency.  If there's any difference it should be well under a 
> millisecond.

I would suggest that you get some real-world experience, or at least do some 
research in this area.  Like your earlier assertion, this is patently untrue as 
a general conclusion.

Differences in latency of serving static content can vary widely based on the 
web server in use, easily tens of milliseconds or more.  There are dozens of 
web servers out there, some written in interpreted languages, many 
custom-written for a specific application, many with add-ons and modules and 
other hijinx that can effect the latency of serving static content.

Additionally, very few of these implement their own managed cache; the rest 
accidentally rely on filesystem cache which may or may not perform with low or 
predictable latency, and may not be large enough for a working set.

In the real world, sites run their applications through web servers, and this 
fact does (and should) guide the decision on the base web server to use, not 
static file serving.

Thus the primary importance IMHO of software like reverse-Squid and Varnish.  
If you're serving pure static content with no need for application logic, then 
yes, there is little benefit to choosing a two-tier infrastructure when a 
one-tier out-of-the-box nginx/lighttpd/thttpd will do just fine.  But, if your 
content does not fit in memory, you're back to reverse-Squid or Varnish.  
(Though nginx may have an on-disk cache?  And don't get me started on Apache 
caching. :-)
-- 
Ken


> --Michael


___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Varnish use for purely binary files

2010-01-18 Thread Michael S. Fischer
On Jan 18, 2010, at 2:16 PM, pub crawler wrote:

>> Most kernels cache recently-accessed files in RAM, and so common web servers 
>> such as Apache can ?>already serve up static objects very quickly if they 
>> are located in the buffer cache.  (Varnish's apparent >speed is largely 
>> based on the same phenomenon.)  If the data is already cached in the origin 
>> server's buffer >caches, then interposing an additional caching layer may 
>> actually be somewhat harmful because it will add >some additional latency.
> 
> So far Varnish is performing very well for us as a web server of these
> cached objects.   The connection time for an item out of Varnish is
> noticeably faster than with web servers we have used - even where the
> items have been cached.  We are mostly using 3rd party tools like
> webpagetest.org to look at the item times.
> 
> Varnish is good as a slice in a few different place in a cluster and a
> few more when running distributed geographic clusters.   Aside from
> Nginx or something highly optimized I am fairly certain Varnish
> provides faster serving of cached objects as an out of the box default
> experience.  I'll eventually find some time to test it in our
> environment against web servers we use.

I have a hard time believing that any difference in the total response time of 
a cached static object between Varnish and a general-purpose webserver will be 
statistically significant, especially considering typical Internet network 
latency.  If there's any difference it should be well under a millisecond.

--Michael___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Varnish use for purely binary files

2010-01-18 Thread pub crawler
> Most kernels cache recently-accessed files in RAM, and so common web servers 
> such as Apache can ?>already serve up static objects very quickly if they are 
> located in the buffer cache.  (Varnish's apparent >speed is largely based on 
> the same phenomenon.)  If the data is already cached in the origin server's 
> buffer >caches, then interposing an additional caching layer may actually be 
> somewhat harmful because it will add >some additional latency.

So far Varnish is performing very well for us as a web server of these
cached objects.   The connection time for an item out of Varnish is
noticeably faster than with web servers we have used - even where the
items have been cached.  We are mostly using 3rd party tools like
webpagetest.org to look at the item times.

> If you've evenly distributed your objects among a number of origin servers, 
> assuming they do nothing but >serve up these static objects, and the origin 
> servers have a sum total of RAM larger than your caching >servers, then you 
> might be better off just serving directly from the origin servers.

Varnish is good as a slice in a few different place in a cluster and a
few more when running distributed geographic clusters.   Aside from
Nginx or something highly optimized I am fairly certain Varnish
provides faster serving of cached objects as an out of the box default
experience.  I'll eventually find some time to test it in our
environment against web servers we use.

> On the other hand, there are some use cases, such as edge-caching, where 
> interposing a caching layer >can be quite helpful even if the origin servers 
> are fast, because making the object available closer to the

Edge caching and distributed cache front ends are exactly what's
needed.  It's a poor mans CDN but can be very effective if done well.

The question I posed is to see if this type of use (binary almost
purely) is being done and scaling well at large scale (50GB and
beyond).  Binary data usually poses more overhead as the data is
larger - less stored elements in RAM, often it can't be compressed
further, more FIFO type of purging due to this, etc.

-Paul
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Varnish use for purely binary files

2010-01-18 Thread Michael S. Fischer
On Jan 18, 2010, at 12:58 PM, pub crawler wrote:

> This is an inquiry for the Varnish community.
> 
> Wondering how many folks are using Varnish purely for binary storage
> and caching (graphic files, archives, audio files, video files, etc.)?
> 
> Interested specifically in large Varnish installations with either
> high number of files or where files are large in size.
> 
> Can anyone out there using Varnish for such care to say they are?

I guess it depends on your precise configuration.

Most kernels cache recently-accessed files in RAM, and so common web servers 
such as Apache can already serve up static objects very quickly if they are 
located in the buffer cache.  (Varnish's apparent speed is largely based on the 
same phenomenon.)  If the data is already cached in the origin server's buffer 
caches, then interposing an additional caching layer may actually be somewhat 
harmful because it will add some additional latency.

If you've evenly distributed your objects among a number of origin servers, 
assuming they do nothing but serve up these static objects, and the origin 
servers have a sum total of RAM larger than your caching servers, then you 
might be better off just serving directly from the origin servers.

On the other hand, there are some use cases, such as edge-caching, where 
interposing a caching layer can be quite helpful even if the origin servers are 
fast, because making the object available closer to the requestor can conserve 
network latency.  (In fact, overcommit may be OK in this situation if the I/O 
queue depth is reasonably shallow if you can guarantee that any additional I/O 
overhead is less than network latency incurred by having to go to the origin 
server.)

--Michael


___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc