Re: [squid-users] Using Digests to reduce traffic between peers, Parent - Sibling configuration question
> -Original Message- > From: squid-users [mailto:squid-users-boun...@lists.squid-cache.org] On > Behalf Of Amos Jeffries > Sent: Tuesday, October 27, 2015 5:57 AM > To: squid-users@lists.squid-cache.org > Subject: Re: [squid-users] Using Digests to reduce traffic between peers, > Parent - Sibling configuration question > > On 27/10/2015 5:14 p.m., Jester Purtteman wrote: > > So, here's my theory: Setup so that it caches > > EVERYTHING, all of it, and catalogs it with this Digest. It doesn't > > expire anything, ever, the only way something gets released from that > > cache is when the drive starts running out of room. It's digest is > > then sent to , which doesn't cache ANYTHING, NOTHING. > > When a request comes through from a client, checks > > the refresh rules, and if it isn't too stale it gets served just like > > it does now, but if it IS expired, it then asks "hey, > > how expired is this?" and (which has all the bandwidth > > it could ever want) grabs the content, and digests it. If the digest > > for the new retrieval matches something in the digest sent by > > , then sends up a message that says > > "it's still fresh, the content was written by lazy people or idiots, carry > > on". > > > You just described what HTTP/1.1 revalidation requests do. In your logs as > REFRESH_*. Though they have to send HTTP headers around to get it to > work, which is a little more expensive than the Digest would be, the result is > far more reliable and accurate. > > > The Cache Digest is just a one-way hash of the URL entries in the cache > index. It is for reducing ICP queries to a peer proxy (ie the frontend cheap > server). If you dont have a cache at both end of the conection it is not > useful. > And like ICP it is full of false matches when used with modern HTTP/1.1. > > Amos > What I'm hearing is, there are facilities for handling that already, don't crack open digest*.cc anytime soon :) Thank you Amos, both for the response, and for years of diligent effort. I have probably read hundreds if not thousands of your responses now, your efforts are appreciated by a quiet mob of people scratching their head in the wilderness. I have other questions, but they're unrelated, I'll let this thread go. ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
Re: [squid-users] Using Digests to reduce traffic between peers, Parent - Sibling configuration question
On 10/26/2015 10:14 PM, Jester Purtteman wrote: > I have been wrestling with squid for a while and my reading has brought > “Cache-Digests” to my attention. I suspect the answer is “that would be ... > As far as I can tell from (very limited) experimenting and reading, this > doesn’t **appear** to be how it works, Hello Jester, With a few configuration adjustments and code modifications, you can make Cache Digests help with your use case. Cache Digests make decisions based on request URLs. It sounds like you want to make decisions based on response body as well. It is possible to change the code to do that, but it will be a lot of non-trivial work and there will always be some false positives because Cache Digests are not meant to give always-precise answers. As Amos mentioned, there are existing/standard HTTP mechanisms that are meant to decrease pointless fetches across expensive links. However, just like Cache Digests, "as is", they may not work well in your use case. Those mechanisms make decisions based on origin-server-supplied headers such as ETags. As you said, that information may be missing or false in many responses. Just like with Cache Digests, with a few configuration adjustments and code modifications, you can make those standard mechanisms work better for you. For example, you can teach Squid to generate its own ETag-like content+header checksums that can be used in conditional HTTP requests that Squid understands. None of this is easy, but it is doable. There have been many proposals on how to solve this problem. I do not think there is a single winning approach. Everybody seem to experiment with their own tweaks of the existing tools and standards. If you are looking for a solution that will cost you a few days/weeks of development and sysadmin work, I do not think there is one. If you are willing and able to invest a lot more, then I recommend that you estimate the expected savings _before_ you invest in an expensive solution. Getting reliable estimates is a complicated project on its own, but it is still a lot cheaper than investing months into a solution that does not meet your needs. HTH, Alex. ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
Re: [squid-users] Using Digests to reduce traffic between peers, Parent - Sibling configuration question
That’s kind of what I figured. My thinking is that there is probably a higher fraction of content tags that would match, but it would take some care in calculating a checksum that wasn't in fact larger than the data itself. But in any event, it seems plausible to me that it is possible. Sounds like cache digest may be the right spot to start reading. If I get anything like somewhere, I'll keep you guys in the loop... more importantly, I'll ask a bunch of questions :) Jester Purtteman, PE OptimERA Inc, (907) 581-4983 - Office (360) 701-7875 - Cell -Original Message- From: Alex Rousskov [mailto:rouss...@measurement-factory.com] Sent: Tuesday, October 27, 2015 10:09 AM To: squid-users@lists.squid-cache.org Cc: Jester Purtteman <jes...@optimera.us> Subject: Re: [squid-users] Using Digests to reduce traffic between peers, Parent - Sibling configuration question On 10/26/2015 10:14 PM, Jester Purtteman wrote: > I have been wrestling with squid for a while and my reading has > brought “Cache-Digests” to my attention. I suspect the answer is > “that would be ... > As far as I can tell from (very limited) experimenting and reading, > this doesn’t **appear** to be how it works, Hello Jester, With a few configuration adjustments and code modifications, you can make Cache Digests help with your use case. Cache Digests make decisions based on request URLs. It sounds like you want to make decisions based on response body as well. It is possible to change the code to do that, but it will be a lot of non-trivial work and there will always be some false positives because Cache Digests are not meant to give always-precise answers. As Amos mentioned, there are existing/standard HTTP mechanisms that are meant to decrease pointless fetches across expensive links. However, just like Cache Digests, "as is", they may not work well in your use case. Those mechanisms make decisions based on origin-server-supplied headers such as ETags. As you said, that information may be missing or false in many responses. Just like with Cache Digests, with a few configuration adjustments and code modifications, you can make those standard mechanisms work better for you. For example, you can teach Squid to generate its own ETag-like content+header checksums that can be used in conditional HTTP requests that Squid understands. None of this is easy, but it is doable. There have been many proposals on how to solve this problem. I do not think there is a single winning approach. Everybody seem to experiment with their own tweaks of the existing tools and standards. If you are looking for a solution that will cost you a few days/weeks of development and sysadmin work, I do not think there is one. If you are willing and able to invest a lot more, then I recommend that you estimate the expected savings _before_ you invest in an expensive solution. Getting reliable estimates is a complicated project on its own, but it is still a lot cheaper than investing months into a solution that does not meet your needs. HTH, Alex. ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
[squid-users] Using Digests to reduce traffic between peers, Parent - Sibling configuration question
Greetings, I have been wrestling with squid for a while and my reading has brought "Cache-Digests" to my attention. I suspect the answer is "that would be neat, but that's not how it works", but I thought I'd ask a few questions. I am running an ISP in a remote area only served by satellite links. The layout of my network is approximately (yes, very much simplifying here): Internet Servers Clients We use the servers on the cheap link to perform some basic tunneling, administration, and hosting some of our websites, nothing too fancy. The servers behind the nightmare satellite link provide RADIUS, Squid, a web based login system, and some primitive SCADA that we use to monitor the system. The wall that I am running up against is that this whole issue of caching dynamic content and browsers that do goofy things like asking for pages to be served un-cacheable. It is pretty clear the bookshelf missing the leg on craiglist that was posted in 2012 is never, ever going to sell, but Chrome just can't be sure enough I guess. So, like any good amateur Squid admin I violate http standards (I know, for shame :)!) and have it cache things for a few minutes (so you may get shown the same ad twice, so what?). I have been pondering things like Riverbed and a bunch of other technologies, but in the final analysis, they tend to only really work well when you're doing SAMBA or something else with CRAZY repetition in the datastream and small byte shifts. Oh its good voodoo when it works, but it's not really applicable to the caching the http problem, especially when they want about 3 kidneys, an arm, and firstborn child every year for a license. Then I read the Cache-Digests section, and I *think* it either does something very cool, or is perhaps a bit of hacking away from doing something very cool. So, I thought I'd ask for thoughts. I am wondering if it is possible using the existing layout, theoretically possible, or is just a plain bad idea to use digests to refresh content on the expensive side of the link. The idea would go something like this: we'd have a server on the cheap link, a server at the expensive end of the link, and a VPN type tunnel between them. I can tell you from much practice that openVPN and some compression can get this part done. We'll call them cheap-server, expensive-server, and clients. The layout becomes: Internet --- Satellite --- Clients is a transparent proxy using tproxy and exists already. It has a pretty poor cache rate, mostly because of my ham-handed inability to write good cache rules, but partly because the content providers in this world need that jpg that hasn't changed since 2006 to go with "no-cache" set, (grr ;). So, here's my theory: Setup so that it caches EVERYTHING, all of it, and catalogs it with this Digest. It doesn't expire anything, ever, the only way something gets released from that cache is when the drive starts running out of room. It's digest is then sent to , which doesn't cache ANYTHING, NOTHING. When a request comes through from a client, checks the refresh rules, and if it isn't too stale it gets served just like it does now, but if it IS expired, it then asks "hey, how expired is this?" and (which has all the bandwidth it could ever want) grabs the content, and digests it. If the digest for the new retrieval matches something in the digest sent by , then sends up a message that says "it's still fresh, the content was written by lazy people or idiots, carry on". As far as I can tell from (very limited) experimenting and reading, this doesn't *appear* to be how it works, but I may well just have this messed up. So, I thought I'd ask, is that the idea, is that possible, plausible, on the road map, or just plain insane. I'm not a gifted coder, but in a pinch I can usually do more good than harm, just wondering if this is worth digging into. Curious what your thoughts are on this, thank you! ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users