Re: The cache deny QUERY change... partial rollback?

2008-12-11 Thread Alex Rousskov
On Mon, 2008-12-01 at 15:34 +0100, Henrik Nordstrom wrote:
 After analyzing a large cache with significantly declining hit ratio
 over the last months I have came to the conclusion that the removal of
 cache deny QUERY can have a very negative impact on hit ratio, this due
 to a number of flash video sites (youtube, google, various porno sites
 etc) who include per-view unique query parameters in the URL and
 responding with a cachable response.
 
 Because of this I suggest that we add back the cache deny rule in the
 recommended config, but leave the refresh_pattern change as-is.

Taking all the responses on this thread into account, it seems like what
you are proposing should be done as a temporary solution to the problem.

Does this affect both Squid versions?

Thank you,

Alex.




The cache deny QUERY change... partial rollback?

2008-12-01 Thread Henrik Nordstrom
After analyzing a large cache with significantly declining hit ratio
over the last months I have came to the conclusion that the removal of
cache deny QUERY can have a very negative impact on hit ratio, this due
to a number of flash video sites (youtube, google, various porno sites
etc) who include per-view unique query parameters in the URL and
responding with a cachable response.

Because of this I suggest that we add back the cache deny rule in the
recommended config, but leave the refresh_pattern change as-is.

People running reverse proxies or combating these cache busting sites
using store rewrites know how to change the cache rules, while many
users running general proxy servers are quite negatively impacted by
these sites if caching of query urls is allowed.

Regards
Henrik


signature.asc
Description: Detta är en digitalt signerad meddelandedel


Re: The cache deny QUERY change... partial rollback?

2008-12-01 Thread Adrian Chadd
2008/12/1 Henrik Nordstrom [EMAIL PROTECTED]:
 After analyzing a large cache with significantly declining hit ratio
 over the last months I have came to the conclusion that the removal of
 cache deny QUERY can have a very negative impact on hit ratio, this due
 to a number of flash video sites (youtube, google, various porno sites
 etc) who include per-view unique query parameters in the URL and
 responding with a cachable response.

 Because of this I suggest that we add back the cache deny rule in the
 recommended config, but leave the refresh_pattern change as-is.

 People running reverse proxies or combating these cache busting sites
 using store rewrites know how to change the cache rules, while many
 users running general proxy servers are quite negatively impacted by
 these sites if caching of query urls is allowed.

Hm, thats kind of interesting actually. Whats it displacing from the
cache? Is the drop of hit ratio due to the removal of other cachable
large objects, or other cachable small objects? Is it -just- flash
video thats exhibiting this behaviour?

Are you able to put up some examples and statistics? I really think
the right thing to do here is look at what various sites are doing and
try to open a dialogue with them. Chances are they don't really know
exactly how to (ab)use HTTP to get the semantics they want whilst
retaining control over their content.



Adrian


Re: The cache deny QUERY change... partial rollback?

2008-12-01 Thread Henrik Nordstrom
mån 2008-12-01 klockan 09:40 -0500 skrev Adrian Chadd:

 Hm, thats kind of interesting actually. Whats it displacing from the
 cache? Is the drop of hit ratio due to the removal of other cachable
 large objects, or other cachable small objects? Is it -just- flash
 video thats exhibiting this behaviour?

The studied cache is using LRU, and these flash videos effectively
reduce the cache size by filling the cache with large and never to be
referenced again objects.

 Are you able to put up some examples and statistics?

I'll try.

  I really think
 the right thing to do here is look at what various sites are doing and
 try to open a dialogue with them. Chances are they don't really know
 exactly how to (ab)use HTTP to get the semantics they want whilst
 retaining control over their content.

Probably true. Based on the URLs styles there seem to only be two or
three of these authentication/session schemes.

Regards
Henrik


signature.asc
Description: Detta är en digitalt signerad meddelandedel


Re: The cache deny QUERY change... partial rollback?

2008-12-01 Thread Tres Seaver
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Henrik Nordstrom wrote:
 After analyzing a large cache with significantly declining hit ratio
 over the last months I have came to the conclusion that the removal of
 cache deny QUERY can have a very negative impact on hit ratio, this due
 to a number of flash video sites (youtube, google, various porno sites
 etc) who include per-view unique query parameters in the URL and
 responding with a cachable response.
 
 Because of this I suggest that we add back the cache deny rule in the
 recommended config, but leave the refresh_pattern change as-is.
 
 People running reverse proxies or combating these cache busting sites
 using store rewrites know how to change the cache rules, while many
 users running general proxy servers are quite negatively impacted by
 these sites if caching of query urls is allowed.

Having  a single recommended config seems dubious:  I for one never
run squid as a forward proxy, for instance.  We should probably split
apart the default / recommended forward and reverse configurations
(which are just starting points, right?) and document how to tell which
one to start with.


Tres.
- --
===
Tres Seaver  +1 540-429-0999  [EMAIL PROTECTED]
Palladion Software   Excellence by Designhttp://palladion.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJNAo0+gerLs4ltQ4RAnlrAJ45FgRi1WjkyikSunADePZSOwwBTgCghz+E
9fOaumxljVn99Tm257N1rUw=
=Q9De
-END PGP SIGNATURE-


Re: The cache deny QUERY change... partial rollback?

2008-12-01 Thread Henrik Nordstrom
mån 2008-12-01 klockan 11:00 -0500 skrev Tres Seaver:

 Having  a single recommended config seems dubious:  I for one never
 run squid as a forward proxy, for instance.  We should probably split
 apart the default / recommended forward and reverse configurations
 (which are just starting points, right?) and document how to tell which
 one to start with.

The example/default configuration shipped with Squid is that of a normal
proxy. Reverse proxies do need some changes to that config, it's
unavoidable.

Also, if your site is sane then you use query parameters in a sane
manner and this while discussion is irrelevant.

Regards
Henrik



Re: The cache deny QUERY change... partial rollback?

2008-12-01 Thread Amos Jeffries
 mån 2008-12-01 klockan 09:40 -0500 skrev Adrian Chadd:

 Hm, thats kind of interesting actually. Whats it displacing from the
 cache? Is the drop of hit ratio due to the removal of other cachable
 large objects, or other cachable small objects? Is it -just- flash
 video thats exhibiting this behaviour?

 The studied cache is using LRU, and these flash videos effectively
 reduce the cache size by filling the cache with large and never to be
 referenced again objects.

 Are you able to put up some examples and statistics?

 I'll try.

  I really think
 the right thing to do here is look at what various sites are doing and
 try to open a dialogue with them. Chances are they don't really know
 exactly how to (ab)use HTTP to get the semantics they want whilst
 retaining control over their content.

 Probably true. Based on the URLs styles there seem to only be two or
 three of these authentication/session schemes.

 Regards
 Henrik


A global blockade is a little harsh when it's only a few offenders.
If we can locate a pattern to match just these sites while any dialog is
going on I'd be happy to support a reversal for just them. That would keep
most of the main bandwidth gains from doing it in the first place.

Amos





Re: The cache deny QUERY change... partial rollback?

2008-12-01 Thread Henrik Nordstrom
tis 2008-12-02 klockan 12:35 +1300 skrev Amos Jeffries:

 A global blockade is a little harsh when it's only a few offenders.
 If we can locate a pattern to match just these sites while any dialog is
 going on I'd be happy to support a reversal for just them. That would keep
 most of the main bandwidth gains from doing it in the first place.

In the analyzed cache there were no identified query objects  10 MB
without session identifiers in the query parameters.

These objects came from a wide range of sites. With some being more
prominent than others.

The majority were flash videos. But not all. There was also software
downloads, and some other data.

Among the flash video sites, there were about 3 different styles in how
the query parameters were encoded, suggesting that there is about as
many providers of the software used, or may be related to CDN networks
(not sure as it's impossible to tell from URL alone).

Regards
Henrik


signature.asc
Description: Detta är en digitalt signerad meddelandedel


Re: The cache deny QUERY change... partial rollback?

2008-12-01 Thread Mark Nottingham
Hmm. Given that heap GDSF out-performs LRU in the common case, and  
there's a crashing bug in LRU at the moment anyway, maybe the best  
thing to do is to change the default replacement policy -- and always  
compile in the heap algorithms?



On 02/12/2008, at 2:05 AM, Henrik Nordstrom wrote:


mån 2008-12-01 klockan 09:40 -0500 skrev Adrian Chadd:


Hm, thats kind of interesting actually. Whats it displacing from the
cache? Is the drop of hit ratio due to the removal of other cachable
large objects, or other cachable small objects? Is it -just- flash
video thats exhibiting this behaviour?


The studied cache is using LRU, and these flash videos effectively
reduce the cache size by filling the cache with large and never to be
referenced again objects.


Are you able to put up some examples and statistics?


I'll try.


I really think
the right thing to do here is look at what various sites are doing  
and

try to open a dialogue with them. Chances are they don't really know
exactly how to (ab)use HTTP to get the semantics they want whilst
retaining control over their content.


Probably true. Based on the URLs styles there seem to only be two or
three of these authentication/session schemes.

Regards
Henrik


--
Mark Nottingham   [EMAIL PROTECTED]