Re: [squid-users] Using Digests to reduce traffic between peers, Parent - Sibling configuration question

2015-10-27 Thread Jester Purtteman
> -Original Message-
> From: squid-users [mailto:squid-users-boun...@lists.squid-cache.org] On
> Behalf Of Amos Jeffries
> Sent: Tuesday, October 27, 2015 5:57 AM
> To: squid-users@lists.squid-cache.org
> Subject: Re: [squid-users] Using Digests to reduce traffic between peers,
> Parent - Sibling configuration question
> 
> On 27/10/2015 5:14 p.m., Jester Purtteman wrote:
> > So, here's my theory:  Setup  so that it caches
> > EVERYTHING, all of it, and catalogs it with this Digest.  It doesn't
> > expire anything, ever, the only way something gets released from that
> > cache is when the drive starts running out of room.  It's digest is
> > then sent to , which doesn't cache ANYTHING, NOTHING.
> > When a request comes through from a client,  checks
> > the refresh rules, and if it isn't too stale it gets served just like
> > it does now, but if it IS expired, it then asks  "hey,
> > how expired is this?" and  (which has all the bandwidth
> > it could ever want) grabs the content, and digests it.  If the digest
> > for the new retrieval matches something in the digest sent by
> > , then  sends up a message that says
> > "it's still fresh, the content was written by lazy people or idiots, carry 
> > on".
> 
> 
> You just described what HTTP/1.1 revalidation requests do. In your logs as
> REFRESH_*. Though they have to send HTTP headers around to get it to
> work, which is a little more expensive than the Digest would be, the result is
> far more reliable and accurate.
> 
> 
> The Cache Digest is just a one-way hash of the URL entries in the cache
> index. It is for reducing ICP queries to a peer proxy (ie the frontend cheap
> server). If you dont have a cache at both end of the conection it is not 
> useful.
> And like ICP it is full of false matches when used with modern HTTP/1.1.
> 
> Amos
> 

What I'm hearing is, there are facilities for handling that already, don't 
crack open digest*.cc anytime soon :)

Thank you Amos, both for the response, and for years of diligent effort.  I 
have probably read hundreds if not thousands of your responses now, your 
efforts are appreciated by a quiet mob of people scratching their head in the 
wilderness.  I have other questions, but they're unrelated, I'll let this 
thread go.

___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] Using Digests to reduce traffic between peers, Parent - Sibling configuration question

2015-10-27 Thread Alex Rousskov
On 10/26/2015 10:14 PM, Jester Purtteman wrote:

> I have been wrestling with squid for a while and my reading has brought
> “Cache-Digests” to my attention.  I suspect the answer is “that would be
...
> As far as I can tell from (very limited) experimenting and reading, this
> doesn’t **appear** to be how it works,


Hello Jester,

With a few configuration adjustments and code modifications, you can
make Cache Digests help with your use case. Cache Digests make decisions
based on request URLs. It sounds like you want to make decisions based
on response body as well. It is possible to change the code to do that,
but it will be a lot of non-trivial work and there will always be some
false positives because Cache Digests are not meant to give
always-precise answers.


As Amos mentioned, there are existing/standard HTTP mechanisms that are
meant to decrease pointless fetches across expensive links. However,
just like Cache Digests, "as is", they may not work well in your use
case. Those mechanisms make decisions based on origin-server-supplied
headers such as ETags. As you said, that information may be missing or
false in many responses.

Just like with Cache Digests, with a few configuration adjustments and
code modifications, you can make those standard mechanisms work better
for you. For example, you can teach Squid to generate its own ETag-like
content+header checksums that can be used in conditional HTTP requests
that Squid understands. None of this is easy, but it is doable.


There have been many proposals on how to solve this problem. I do not
think there is a single winning approach. Everybody seem to experiment
with their own tweaks of the existing tools and standards.

If you are looking for a solution that will cost you a few days/weeks of
development and sysadmin work, I do not think there is one. If you are
willing and able to invest a lot more, then I recommend that you
estimate the expected savings _before_ you invest in an expensive
solution. Getting reliable estimates is a complicated project on its
own, but it is still a lot cheaper than investing months into a solution
that does not meet your needs.


HTH,

Alex.

___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] Using Digests to reduce traffic between peers, Parent - Sibling configuration question

2015-10-27 Thread Jester Purtteman
That’s kind of what I figured.  My thinking is that there is probably a higher 
fraction of content tags that would match, but it would take some care in 
calculating a checksum that wasn't in fact larger than the data itself.  But in 
any event, it seems plausible to me that it is possible.  Sounds like cache 
digest may be the right spot to start reading.  If I get anything like 
somewhere, I'll keep you guys in the loop... more importantly, I'll ask a bunch 
of questions :) 

Jester Purtteman, PE
OptimERA Inc,
(907) 581-4983 - Office
(360) 701-7875 - Cell

-Original Message-
From: Alex Rousskov [mailto:rouss...@measurement-factory.com] 
Sent: Tuesday, October 27, 2015 10:09 AM
To: squid-users@lists.squid-cache.org
Cc: Jester Purtteman <jes...@optimera.us>
Subject: Re: [squid-users] Using Digests to reduce traffic between peers, 
Parent - Sibling configuration question

On 10/26/2015 10:14 PM, Jester Purtteman wrote:

> I have been wrestling with squid for a while and my reading has 
> brought “Cache-Digests” to my attention.  I suspect the answer is 
> “that would be
...
> As far as I can tell from (very limited) experimenting and reading, 
> this doesn’t **appear** to be how it works,


Hello Jester,

With a few configuration adjustments and code modifications, you can make 
Cache Digests help with your use case. Cache Digests make decisions based on 
request URLs. It sounds like you want to make decisions based on response body 
as well. It is possible to change the code to do that, but it will be a lot of 
non-trivial work and there will always be some false positives because Cache 
Digests are not meant to give always-precise answers.


As Amos mentioned, there are existing/standard HTTP mechanisms that are meant 
to decrease pointless fetches across expensive links. However, just like Cache 
Digests, "as is", they may not work well in your use case. Those mechanisms 
make decisions based on origin-server-supplied headers such as ETags. As you 
said, that information may be missing or false in many responses.

Just like with Cache Digests, with a few configuration adjustments and code 
modifications, you can make those standard mechanisms work better for you. For 
example, you can teach Squid to generate its own ETag-like
content+header checksums that can be used in conditional HTTP requests
that Squid understands. None of this is easy, but it is doable.


There have been many proposals on how to solve this problem. I do not think 
there is a single winning approach. Everybody seem to experiment with their own 
tweaks of the existing tools and standards.

If you are looking for a solution that will cost you a few days/weeks of 
development and sysadmin work, I do not think there is one. If you are willing 
and able to invest a lot more, then I recommend that you estimate the expected 
savings _before_ you invest in an expensive solution. Getting reliable 
estimates is a complicated project on its own, but it is still a lot cheaper 
than investing months into a solution that does not meet your needs.


HTH,

Alex.

___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


[squid-users] Using Digests to reduce traffic between peers, Parent - Sibling configuration question

2015-10-26 Thread Jester Purtteman
Greetings,

 

I have been wrestling with squid for a while and my reading has brought
"Cache-Digests" to my attention.  I suspect the answer is "that would be
neat, but that's not how it works", but I thought I'd ask a few questions.
I am running an ISP in a remote area only served by satellite links.  The
layout of my network is approximately (yes, very much simplifying here):

 


Internet  Servers 
Clients

 

We use the servers on the cheap link to perform some basic tunneling,
administration, and hosting some of our websites, nothing too fancy.  The
servers behind the nightmare satellite link provide RADIUS, Squid, a web
based login system, and some primitive SCADA that we use to monitor the
system.

 

The wall that I am running up against is that this whole issue of caching
dynamic content and browsers that do goofy things like asking for pages to
be served un-cacheable.  It is pretty clear the bookshelf missing the leg on
craiglist that was posted in 2012 is never, ever going to sell, but Chrome
just can't be sure enough I guess.  So, like any good amateur Squid admin I
violate http standards (I know, for shame :)!) and have it cache things for
a few minutes (so you may get shown the same ad twice, so what?).  I have
been pondering things like Riverbed and a bunch of other technologies, but
in the final analysis, they tend to only really work well when you're doing
SAMBA or something else with CRAZY repetition in the datastream and small
byte shifts.  Oh its good voodoo when it works, but it's not really
applicable to the caching the http problem, especially when they want about
3 kidneys, an arm, and firstborn child every year for a license.

 

Then I read the Cache-Digests section, and I *think* it either does
something very cool, or is perhaps a bit of hacking away from doing
something very cool.  So, I thought I'd ask for thoughts.  I am wondering if
it is possible using the existing layout, theoretically possible, or is just
a plain bad idea to use digests to refresh content on the expensive side of
the link.  The idea would go something like this: we'd have a server on the
cheap link, a server at the expensive end of the link, and a VPN type tunnel
between them.  I can tell you from much practice that openVPN and some
compression can get this part done.  We'll call them cheap-server,
expensive-server, and clients.  The layout becomes:

 

  Internet --- Satellite  ---
Clients

 

 is a transparent proxy using tproxy and exists already.
It has a pretty poor cache rate, mostly because of my ham-handed inability
to write good cache rules, but partly because the content providers in this
world need that jpg that hasn't changed since 2006 to go with "no-cache"
set, (grr ;). 

 

So, here's my theory:  Setup  so that it caches
EVERYTHING, all of it, and catalogs it with this Digest.  It doesn't expire
anything, ever, the only way something gets released from that cache is when
the drive starts running out of room.  It's digest is then sent to
, which doesn't cache ANYTHING, NOTHING.  When a request comes
through from a client,  checks the refresh rules, and if
it isn't too stale it gets served just like it does now, but if it IS
expired, it then asks  "hey, how expired is this?" and
 (which has all the bandwidth it could ever want) grabs the
content, and digests it.  If the digest for the new retrieval matches
something in the digest sent by , then 
sends up a message that says "it's still fresh, the content was written by
lazy people or idiots, carry on".

 

As far as I can tell from (very limited) experimenting and reading, this
doesn't *appear* to be how it works, but I may well just have this messed
up.  So, I thought I'd ask, is that the idea, is that possible, plausible,
on the road map, or just plain insane.  I'm not a gifted coder, but in a
pinch I can usually do more good than harm, just wondering if this is worth
digging into.  Curious what your thoughts are on this, thank you!

___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users