Range Requests

2008-09-08 Thread DHF
Does varnish support range requests?  I want each range request of a
large file to be treated as a unique web object in varnish, and I'm
wondering if its possible.  Are the headers used in the hashing, or a
better question would be can the range-request header be included in the
hash?  Would that work or am I missing some pieces that would screw up
the plan?

Thanks,

--DHF

___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Security doubt about Varnish and firewall.

2008-04-22 Thread DHF
andan andan wrote:
 We have a security doubt: Should we install Varnish inside or outside
 firewall?

I run varnish on a many linux boxes with Netfilter default log and drop
rules and have not seen a performance problem.

 For better performance, we consider that the best choice is outside,
 but for
 obvious security reasons, the better is putting it into a DMZ.

This depends on your particular environment.  What kind of hardware are
you using?  What kind of firewall is it?  How much traffic can the
firewall handle?  How much traffic do you usually see to the backend
server?  Where is the backend server located?  What is your reason for
using a reverse proxy?  What is the expected hit ratio on the cache? 
What kind of content are you delivering?  Do you have any network
operations tasks that require you to collect data from the server in a
fashion that requires it to be behind the firewall?

If the backend server is through the firewall, it could be beneficial to
have your varnish box outside the firewall and you could restrict access
to the backend server to only the varnish servers ip or an internal ip
on a seperate network.  Then run iptables or ipfw on the varnish server
itself

 Any suggestions? Somebody has Varnish outside the firewall?

I have found no reason to not use ipfw or iptables on deployed servers,
the benefit in my opinion out weighs the performance loss.  With a
minimal ruleset the performance impact is so small its hard to measure
until you reach huge packets per second, or connections a second (
assuming your hardware isn't a few years away from collecting a pension
).  I have never seen a production box reach the limits of iptables
packets per second because whatever process is on the box ( apache,
varnish, squid, mysql, etc ) will have long ago melted down into a pile
of smoldering ruin, due to high load and iptables performance becomes
irrelevant.

--Dave

___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Varnish crashing when system starts to swap

2008-04-18 Thread DHF
Calle Korjus wrote:
 This is our startup command:

 /opt/varnish/sbin/varnishd -a :80 -p lru_interval 3600 -f 
 /opt/varnish/conf/default.vcl -T 127.0.0.1:6082 -t 3600 -w 128,1000,60 -u 
 varnish -g varnish -s file,/srv/varnish/varnish_storage.bin,30G -P 
 /var/run/varnish.pid

 Varnish looks fine until it's had abour 1,5 million requests, then we can see 
 the kswapd0 and kswapd1 start working and load average rises to about 200 and 
 the machine gets totally unresponsive. Top shows a lot of cpu beeing spent on 
 i/o waits and varnish child process restarts sometimes. In best case the 
 process restarts and the server starts behaving within 5 minutes but 
 sometimes varnish dies completely. One thing we have noticed is that the 
 reserved memory for varnish keeps rising and when it crashes it is usually 
 around 14G.
   
I would try lowering the storage file size to within your total system
ram, subtracting some memory for buffers and cache and apache.  See if
it still spirals into swap hell.  You could also try setting rlimits for
the varnish user, though I don't know if settings in
/etc/security/limits.conf apply to privilege dropped processes.

 The varnish storage file is running on the same physical disk as the system 
 and the swap, could that be the problem? Should varnish really allocate so 
 much memory so that the system starts to swap to disk?
   
I think what is happening is that your hit ratio is low  and your
storage size is quite large, so varnish has enough objects marked as hot
that its trying to hold them all in memory?  I don't know for sure, I
could be way off.  I think if you restrict the storage size there will
be increased disk activity as you churn the cache, but you won't be
churning swap space as well, and you shouldn't exhaust the virtual
memory of the system.  I'd have to test that though.

--Dave

___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: regsub, string concatenation?

2008-04-18 Thread DHF
Jon Drukman wrote:
 i'm trying to rewrite all incoming URLs to include the http host header 
 as part of the destination url.  example:

 incoming: http://site1.com/someurl
 rewritten: http://originserver.com/site/site1.com/someurl

 incoming: http://site2.com/otherurl
 rewritten: http://originserver.com/site/site2.com/someurl

 the originserver is parsing the original hostname out of the requested 
 url.  works great with one hardcoded host:

 set req.url = regsub(req.url, ^, /site/site1.com);

 i can't get it to use the submitted http host though...

   set req.url = regsub(req.url, ^, /site/ + req.http.host);

 varnish complains about the plus sign.  is there some way to do this 
 kind of string concatenation in the replacement?
   
Try this:

set req.url = /site/ req.http.host / req.url;

The extra / by itself might not be necessary.  Set will allow you to
concatenate strings but I'm not sure the regsub will.  I think this will
provide you what you are looking for, let me know.

--Dave

___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: cache empties itself?

2008-04-08 Thread DHF
Ricardo Newbery wrote:

 On Apr 7, 2008, at 10:30 PM, DHF wrote:

 Ricardo Newbery wrote:
 On Apr 7, 2008, at 5:22 PM, Michael S. Fischer wrote:


 Sure, but this is also the sort of content that can be cached back
 upstream using ordinary HTTP headers.



 No, it cannot.  Again, the use case is dynamically-generated 
 content  that is subject to change at unpredictable intervals but 
 which is  otherwise fairly static for some length of time, and 
 where serving  stale content after a change is unacceptable.  
 Ordinary HTTP headers  just don't solve that use case without 
 unnecessary loading of the  backend.

 Isn't this what if-modified-since requests are for?  304 not modified 
 is a pretty small request/response, though I can understand the 
 tendency to want to push it out to the frontend caches.  I would 
 think the management overhead of maintaining two seperate expirations 
 wouldn't be worth the extra hassle just to save yourself some ims 
 requests to a backend.  Unless of course varnish doesn't support ims 
 requests in a usable way, I haven't actually tested it myself.


 Unless things have changed recently, Varnish support for IMS is 
 mixed.  Varnish supports IMS for cache hits but not for cache misses 
 unless you tweak the vcl to pass them in vcl_miss.  Varnish will not 
 generate an IMS to revalidate it's own cache.
Good to know.

 Also it is not necessarily true that generating a 304 response is 
 always light impact.  I'm not sure about the Drupal case, but at least 
 for Plone there can be a significant performance hit even when just 
 calculating the Last-Modified date.  The hit is usually lighter than 
 that required for generating the full response but for high-traffic 
 sites, it's still a significant consideration.

 But the most significant issue is that IMS doesn't help in the 
 slightest to lighten the load of *new* requests to your backend.  IMS 
 requests are only helpful if you already have the content in your own 
 browser cache -- or in an intermediate proxy cache server (for proxies 
 that support IMS to revalidate their own cache).
The intermediate proxy was the case I was thinking about, but you are 
correct, if there is no intermediate proxy and varnish frontends don't 
revalidate with ims requests then the whole plan is screwed.
 Regarding the potential management overhead... this is not relevant to 
 the question of whether this strategy would increase your site's 
 performance.  Management overhead is a separate question, and not an 
 easy one to answer in the general case.  The overhead might be a 
 problem for some.  But I know in my own case, the overhead required to 
 manage this sort of thing is actually pretty trivial.
How do you manage the split ttl's?  Do you send a purge after a page has 
changed or have you crafted another way to force a revalidation of 
cached objects?

--Dave

 Ric





___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: cache empties itself?

2008-04-07 Thread DHF
Ricardo Newbery wrote:
 On Apr 7, 2008, at 5:22 PM, Michael S. Fischer wrote:

   
 Sure, but this is also the sort of content that can be cached back
 upstream using ordinary HTTP headers.
 


 No, it cannot.  Again, the use case is dynamically-generated content  
 that is subject to change at unpredictable intervals but which is  
 otherwise fairly static for some length of time, and where serving  
 stale content after a change is unacceptable.  Ordinary HTTP headers  
 just don't solve that use case without unnecessary loading of the  
 backend.
   
Isn't this what if-modified-since requests are for?  304 not modified is 
a pretty small request/response, though I can understand the tendency to 
want to push it out to the frontend caches.  I would think the 
management overhead of maintaining two seperate expirations wouldn't be 
worth the extra hassle just to save yourself some ims requests to a 
backend.  Unless of course varnish doesn't support ims requests in a 
usable way, I haven't actually tested it myself.

--Dave
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: cache empties itself?

2008-04-04 Thread DHF
Sascha Ottolski wrote:
 Am Freitag 04 April 2008 18:11:23 schrieb Michael S. Fischer:
   
 Ah, I see.

 The problem is that you're basically trying to compensate for a
 congenital defect in your design: the network storage (I assume NFS)
 backend.  NFS read requests are not cacheable by the kernel because
 another client may have altered the file since the last read took
 place.

 If your working set is as large as you say it is, eventually you will
 end up with a low cache hit ratio on your Varnish server(s) and
 you'll be back to square one again.

 The way to fix this problem in the long term is to split your file
 library into shards and put them on local storage.

 Didn't we discuss this a couple of weeks ago?
 

 exactly :-) what can I see, I did analyze the logfiles, and learned that 
 despite the fact that a lot of the access are truly random, there is 
 still a good amount of the request concentrated to a smaller set of the 
 images. of course, the set is changing over time, but thats what a 
 cache can handle perfectly.

 and my experiences seem to prove my theory: if varnish keeps running 
 like it is now for about 18 hours *knock on wood*, the cache hit rate 
 is close to 80 %! and that takes so much pressure from the backend that 
 the overall performance is just awesome.

 putting the files on local storage just doesn't scales well. I'm more 
 thinking about splitting the proxies like discussed on the list before: 
 a loadbalancer could distribute the URLs in a way that each cache holds 
 it's own share of the objects.
   
By putting intermediate caches between the file storage and the client,
you are essentially just spreading the storage locally between cache
boxes, so if this method doesn't scale then you are still in need of a
design change, and frankly so am I :)
What you need to model is the popularity curve for your content, if your
images do not fit with an 80/20 rule of popularity, ie. 20% of your
images soak up less than 80% or requests, then you will spend more time
thrashing the caches than serving the content, and Michael is right, you
would be better served to dedicate web servers with local storage and
shard your images across them.  If 80% of your content is rarely viewed,
then using the same amount of hardware defined as caching accelerators,
you will see an increase in throughput due to more hardware serving a
smaller number of images.  It all depends on your content and users
viewing habits.

--Dave

___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: the most basic config

2008-04-03 Thread DHF
Sascha Ottolski wrote:
 now, could someone help me interpreting the hitrate ratio and avg?

 Hitrate ratio:   10  100  360
 Hitrate avg: 0.3366   0.3837   0.4636
   
Hit rate is the number of hits/number of requests.  Hits are requests 
for objects that are in the cache, Misses are requests that go to the 
backend, the more misses the lower your Hitrate average.  Hitrate ratio 
is the ratio of hits to misses, I believe.  The lower your hitrate 
average the lower your performance.

--Dave
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: cache empties itself?

2008-04-03 Thread DHF
Sascha Ottolski wrote:
 how can this be? My varnish runs for about 36 hours now. yesterday 
 evening, the resident memory size was like 10 GB, which is still way 
 below the available 32. later that evening, I stopped letting request 
 to the proxy over night. now I came back, let the request back in, and 
 am wondering that I see a low cacht hit rate. looking a bit closer it 
 appears as if the cache got smaller over night, now the process only 
 consumes less than 1 GB of resident memory, which fits the 
 reported bytes allocated in the stats.

 can I somehow find out why my cached objects were expired? I have a 
 varnishlog -w running all the time, the the information might there. 
 but, what to look for, and even more important, how can I prevent that 
 expiration? I started the daemon with

 -p default_ttl=31104000

 to make it cache very aggresively...
   
There could be a lot of factors, is apache setting a max-age on the 
items?  As it says in the man page:

 default_ttl
   The default time-to-live assigned to objects if neither the 
backend
   nor the configuration assign one.  Note that changes to this 
param-
   eter are not applied retroactively.

Is this running on a test machine in a lab where you can control the 
requests this box gets?  If so you should run some tests to make sure 
that you really are caching objects.  Run wireshark on the apache server 
listening on port 80, and using curl send two requests for the same 
object, and make sure that only one request hits the apache box.  If 
thats working like you expect, and the Age header is incrementing, then 
you need to run some tests using a typical workload that your apache 
server expects to see.  Are you setting cookies on this site?

I think what is happening is that you are setting a max-age on objects 
from apache ( which you can verify using curl, netcat, telnet, whatever 
you like ), and varnish is honoring that setting and expiring items as 
instructed.  I'm not awesome with varnishtop and varnishlog yet, so I'm 
probably not the one to ask about getting those to show you an objects 
attributes, anyone care to assist on that front?

--Dave
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: cache empties itself?

2008-04-03 Thread DHF
Michael S. Fischer wrote:
 On Thu, Apr 3, 2008 at 10:26 AM, Sascha Ottolski [EMAIL PROTECTED] wrote:
   
  All this with 1.1.2. It's vital to my setup to cache as many objects as
  possible, for a long time, and that they really stay in the cache. Is
  there anything I could do to prevent the cache being emptied? May be
  I've been bitten by a bug and should give the trunk a shot?
 

 Just set the Expires: headers on the origin (backend) server responses
 to now + 10 years or something.
   
If you're not using php or some other cgi app, you can set headers using
mod_headers in apache, if you are running a web app, just set the
headers within the app itself.  You can also explicitly set the ttl on
objects in the cache using vcl code, but moving the load off the cache
to the backend makes more sense since you'll be cutting traffic down to
apache and it would free up cycles to modify headers.  If you have your
heart set on making varnish do the work you could add something like this:

sub vcl_fetch {
if (!obj.valid) {
error;
}
if (!obj.cacheable  ) {
pass;
}
if (obj.http.Set-Cookie) {
pass;
}
if (req.url ~ \.(jpg|jpeg|gif|png)$) {
set obj.ttl = 31449600;
}
insert;
}

But I would first look at getting apache to set the age correctly and
leave varnish to do what its good at.

--Dave

___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: cache empties itself?

2008-04-03 Thread DHF
Sascha Ottolski wrote:
 however, my main problem is currently that the varnish childs keep 
 restarting, and that this empties the cache, which effectively renders 
 the whole setup useless for me :-( if the cache has filled up, it works 
 great, if it restarts empty, obviously it doesn't.

 is there anything I can do to prevent such restarts?
   
Varnish doesn't just restart on its own.  Check to make sure you aren't
sending a kill signal if you are running logrotate through a cronjob. 
I'm not sure if a HUP will empty the cache or not. 

--Dave

___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc