[RFC] post-cache REQMOD
Hello, I propose adding support for a third adaptation vectoring point: post-cache REQMOD. Services at this new point receive cache miss requests and may adapt them as usual. If a service satisfies the request, the service response may get cached by Squid. As you know, Squid currently support pre-cache REQMOD and pre-cache RESPMOD. We have received many requests for post-cache adaptation support throughput the years, and I personally resisted the temptation of adding another layer of complexity (albeit an optional one) because it is a lot of work and because many use cases could be addressed without post-cache adaptation support. The last straw (and the motivation for this RFC) was PageSpeed[1] integration. With PageSpeed, one can generate various variants of "optimized" content. For example, mobile users may receive smaller images. Apache and Nginx support PageSpeed modules. It is possible to integrate Squid with PageSpeed (and similar services) today, but it is not possible for Squid to _cache_ those generated variants unless one is willing to pay for another round trip to the origin server to get exactly the same unoptimized content. The only way to support Squid caching of PageSpeed variants without repeated round trips to the origin server is using two Squids. The parent Squid would cache origin server responses while the child Squid would adapt parent's responses and cache adapted content. Needless to say, running two Squids (each with its own cache) instead of one adds significant performance/administrative overheads and complexity. As far as internals are concerned, I am currently thinking of launching adaptation job for this vectoring point from FwdState::Start(). This way, its impact on the rest of Squid would be minimal and some adapters might even affect FwdState routing decisions. The initial code name for the new class is MissReqFilter, but that may change. The other candidate location for plugging in the new vectoring point is the Server class. However, that class is already complex. It handles communication with the next hop (with child classes doing protocol-specific work and confusing things further) as well as pre-cache RESPMOD vectoring point with caching initiation on top of that. The Server code already has trouble distinguishing various content streams it has to juggle. I am worried that adding another vectoring point there would make that complexity significantly worse. It is possible that we would be able to refactor/encapsulate some of the code so that it can be reused in both the existing Server and the new MissReqFilter classes. I will look out for such opportunities while trying to keep the overall complexity in check. Any objections to adding post-cache REQMOD or better implementation ideas? Thank you, Alex. [1] https://developers.google.com/speed/pagespeed/
Re: [RFC] post-cache REQMOD
On 11/07/2014 10:15 a.m., Alex Rousskov wrote: > Hello, > > I propose adding support for a third adaptation vectoring point: > post-cache REQMOD. Services at this new point receive cache miss > requests and may adapt them as usual. If a service satisfies the > request, the service response may get cached by Squid. As you know, > Squid currently support pre-cache REQMOD and pre-cache RESPMOD. Just to clarify you mean this to be the vectoring point which receives MISS-only traffic, as the existing one(s) receive HIT+MISS traffic? > We have received many requests for post-cache adaptation support > throughput the years, and I personally resisted the temptation of adding > another layer of complexity (albeit an optional one) because it is a lot > of work and because many use cases could be addressed without post-cache > adaptation support. > > The last straw (and the motivation for this RFC) was PageSpeed[1] > integration. With PageSpeed, one can generate various variants of > "optimized" content. For example, mobile users may receive smaller > images. Apache and Nginx support PageSpeed modules. It is possible to > integrate Squid with PageSpeed (and similar services) today, but it is > not possible for Squid to _cache_ those generated variants unless one is > willing to pay for another round trip to the origin server to get > exactly the same unoptimized content. Can you show how they are violating standard HTTP variant caching? the HTTPbis should probably be informed of the problem. If it is actually within standard then it would seem to be a missing feature of Squid to cache them properly. We could improve better by fixing Squid to cache more compliant traffic. > > The only way to support Squid caching of PageSpeed variants without > repeated round trips to the origin server is using two Squids. The > parent Squid would cache origin server responses while the child Squid > would adapt parent's responses and cache adapted content. Needless to > say, running two Squids (each with its own cache) instead of one adds > significant performance/administrative overheads and complexity. > > > As far as internals are concerned, I am currently thinking of launching > adaptation job for this vectoring point from FwdState::Start(). This > way, its impact on the rest of Squid would be minimal and some adapters > might even affect FwdState routing decisions. The initial code name for > the new class is MissReqFilter, but that may change. > Given that FwdState is the global selector to determine where MISS content comes from this sounds reasonable. I think after the miss_access tests is best position. We need to split miss_access lookup off into an async step to be a slow lookup anyway. > > The other candidate location for plugging in the new vectoring point is > the Server class. However, that class is already complex. It handles > communication with the next hop (with child classes doing > protocol-specific work and confusing things further) as well as > pre-cache RESPMOD vectoring point with caching initiation on top of > that. The Server code already has trouble distinguishing various content > streams it has to juggle. I am worried that adding another vectoring > point there would make that complexity significantly worse. Agreed. Bad idea. > > It is possible that we would be able to refactor/encapsulate some of the > code so that it can be reused in both the existing Server and the new > MissReqFilter classes. I will look out for such opportunities while > trying to keep the overall complexity in check. > > > Any objections to adding post-cache REQMOD or better implementation ideas? Just the above details about variant caching. Amos
Re: [RFC] post-cache REQMOD
On 07/10/2014 09:12 PM, Amos Jeffries wrote: > On 11/07/2014 10:15 a.m., Alex Rousskov wrote: >> I propose adding support for a third adaptation vectoring point: >> post-cache REQMOD. Services at this new point receive cache miss >> requests and may adapt them as usual. If a service satisfies the >> request, the service response may get cached by Squid. As you know, >> Squid currently support pre-cache REQMOD and pre-cache RESPMOD. > > Just to clarify you mean this to be the vectoring point which receives > MISS-only traffic, as the existing one(s) receive HIT+MISS traffic? + pre-cache REQMOD receives HIT+MISS request traffic. + pre-cache RESPMOD receives MISS response traffic. * post-cache REQMOD receives MISS request traffic. - post-cache RESPMOD receives HIT+MISS response traffic. All four vectoring points are documented in RFC 3507. Squid currently supports the first two. I propose adding support for the third. Each of the four points (and probably other!) is useful for some use cases. Besides getting different HIT/MISS traffic mixture, there are other differences between these vectoring points. For example, pre-cache REQMOD responses go straight to the HTTP client while post-cache REQMOD responses may be cached by Squid (this example is about request satisfaction mode of REQMOD). >> We have received many requests for post-cache adaptation support >> throughput the years, and I personally resisted the temptation of adding >> another layer of complexity (albeit an optional one) because it is a lot >> of work and because many use cases could be addressed without post-cache >> adaptation support. >> >> The last straw (and the motivation for this RFC) was PageSpeed[1] >> integration. With PageSpeed, one can generate various variants of >> "optimized" content. For example, mobile users may receive smaller >> images. Apache and Nginx support PageSpeed modules. It is possible to >> integrate Squid with PageSpeed (and similar services) today, but it is >> not possible for Squid to _cache_ those generated variants unless one is >> willing to pay for another round trip to the origin server to get >> exactly the same unoptimized content. > > Can you show how they are violating standard HTTP variant caching? the > HTTPbis should probably be informed of the problem. > If it is actually within standard then it would seem to be a missing > feature of Squid to cache them properly. We could improve better by > fixing Squid to cache more compliant traffic. This is unrelated to caching rules. HTTP does not have a notion of creating multiple responses from a single response, and that is exactly what PageSpeed and similar adaptations do: For example, they convert a large origin server image or page into several variants, one for each class of clients. Does this clarify? Thank you, Alex.
Re: [RFC] post-cache REQMOD
On 11/07/2014 4:27 p.m., Alex Rousskov wrote: > On 07/10/2014 09:12 PM, Amos Jeffries wrote: >> On 11/07/2014 10:15 a.m., Alex Rousskov wrote: >>> I propose adding support for a third adaptation vectoring point: >>> post-cache REQMOD. Services at this new point receive cache miss >>> requests and may adapt them as usual. If a service satisfies the >>> request, the service response may get cached by Squid. As you know, >>> Squid currently support pre-cache REQMOD and pre-cache RESPMOD. >> >> Just to clarify you mean this to be the vectoring point which receives >> MISS-only traffic, as the existing one(s) receive HIT+MISS traffic? > > + pre-cache REQMOD receives HIT+MISS request traffic. > + pre-cache RESPMOD receives MISS response traffic. > * post-cache REQMOD receives MISS request traffic. > - post-cache RESPMOD receives HIT+MISS response traffic. > > All four vectoring points are documented in RFC 3507. Squid currently > supports the first two. I propose adding support for the third. Each of > the four points (and probably other!) is useful for some use cases. > > Besides getting different HIT/MISS traffic mixture, there are other > differences between these vectoring points. For example, pre-cache > REQMOD responses go straight to the HTTP client while post-cache REQMOD > responses may be cached by Squid (this example is about request > satisfaction mode of REQMOD). > > >>> We have received many requests for post-cache adaptation support >>> throughput the years, and I personally resisted the temptation of adding >>> another layer of complexity (albeit an optional one) because it is a lot >>> of work and because many use cases could be addressed without post-cache >>> adaptation support. >>> >>> The last straw (and the motivation for this RFC) was PageSpeed[1] >>> integration. With PageSpeed, one can generate various variants of >>> "optimized" content. For example, mobile users may receive smaller >>> images. Apache and Nginx support PageSpeed modules. It is possible to >>> integrate Squid with PageSpeed (and similar services) today, but it is >>> not possible for Squid to _cache_ those generated variants unless one is >>> willing to pay for another round trip to the origin server to get >>> exactly the same unoptimized content. >> >> Can you show how they are violating standard HTTP variant caching? the >> HTTPbis should probably be informed of the problem. >> If it is actually within standard then it would seem to be a missing >> feature of Squid to cache them properly. We could improve better by >> fixing Squid to cache more compliant traffic. > > This is unrelated to caching rules. HTTP does not have a notion of > creating multiple responses from a single response, and that is exactly > what PageSpeed and similar adaptations do: For example, they convert a > large origin server image or page into several variants, one for each > class of clients. > Indeed. So this implementation of PageSpeed by requiring HTTP agents to transform traffic mid-transit from single to multiple responses is a violation. Or, PageSpeed is completely unnecessary mid-transit and has nothing to do with Squid and caching. Which can cache either the small shrunk objects, or the single large one just fine. It is instead the attempt to perform end-server operations in a proxy/gateway which is driving this change. Amos
Re: [RFC] post-cache REQMOD
The post-cache REQMOD and post-cache RESPMOD is a must for squid. The example of PageSpeed also is very good. I must note that there are already similar features integrated to other commercial products, for example: http://www.citrix.com/products/bytemobile-adaptive-traffic-management/tech-info.html ("Web and video optimization" -> "Quality Aware Transcoding", "Smartphone Application Acceleration" and "Web Optimization") The PageSpeed example fits better to a post-cache RESPMOD feature. Is the post-cacge REQMOD just a first step to support all post-cache vectoring points? On 07/11/2014 01:15 AM, Alex Rousskov wrote: Hello, I propose adding support for a third adaptation vectoring point: post-cache REQMOD. Services at this new point receive cache miss requests and may adapt them as usual. If a service satisfies the request, the service response may get cached by Squid. As you know, Squid currently support pre-cache REQMOD and pre-cache RESPMOD. We have received many requests for post-cache adaptation support throughput the years, and I personally resisted the temptation of adding another layer of complexity (albeit an optional one) because it is a lot of work and because many use cases could be addressed without post-cache adaptation support. The last straw (and the motivation for this RFC) was PageSpeed[1] integration. With PageSpeed, one can generate various variants of "optimized" content. For example, mobile users may receive smaller images. Apache and Nginx support PageSpeed modules. It is possible to integrate Squid with PageSpeed (and similar services) today, but it is not possible for Squid to _cache_ those generated variants unless one is willing to pay for another round trip to the origin server to get exactly the same unoptimized content. The only way to support Squid caching of PageSpeed variants without repeated round trips to the origin server is using two Squids. The parent Squid would cache origin server responses while the child Squid would adapt parent's responses and cache adapted content. Needless to say, running two Squids (each with its own cache) instead of one adds significant performance/administrative overheads and complexity. As far as internals are concerned, I am currently thinking of launching adaptation job for this vectoring point from FwdState::Start(). This way, its impact on the rest of Squid would be minimal and some adapters might even affect FwdState routing decisions. The initial code name for the new class is MissReqFilter, but that may change. The other candidate location for plugging in the new vectoring point is the Server class. However, that class is already complex. It handles communication with the next hop (with child classes doing protocol-specific work and confusing things further) as well as pre-cache RESPMOD vectoring point with caching initiation on top of that. The Server code already has trouble distinguishing various content streams it has to juggle. I am worried that adding another vectoring point there would make that complexity significantly worse. It is possible that we would be able to refactor/encapsulate some of the code so that it can be reused in both the existing Server and the new MissReqFilter classes. I will look out for such opportunities while trying to keep the overall complexity in check. Any objections to adding post-cache REQMOD or better implementation ideas? Thank you, Alex. [1] https://developers.google.com/speed/pagespeed/
Re: [RFC] post-cache REQMOD
The only way to support Squid caching of PageSpeed variants without repeated round trips to the origin server is using two Squids. The parent Squid would cache origin server responses while the child Squid would adapt parent's responses and cache adapted content. Needless to say, running two Squids (each with its own cache) instead of one adds significant performance/administrative overheads and complexity. Not such a big deal. Doing exactly this for years already, in my open proxy, to be used for mobile users. (Just a hobby) client-squid-ziproxy-squid-web Some functions of PageSpeed are performed in sandwiched ziproxy, like graphics compression to various levels, or gzip of plain pages. Needs multi-core CPU, of course. The more professional solution using a PageSpeed module combined with squid would only be of practical advantage for large scale users, like ISPs, in need of best performance and flexibility. -- Mit freundlichen Grüßen Reiner Karlsberg Ringerottstr. 50 45772 Marl Germany Tel.: (+49) (0)2365-8568281 Mob.: (+49) (0)1788904200 - No virus found in this message. Checked by AVG - www.avg.com Version: 2014.0.4570 / Virus Database: 3986/7831 - Release Date: 07/10/14
Re: [RFC] post-cache REQMOD
On 07/11/2014 05:27 AM, Tsantilas Christos wrote: > The PageSpeed example fits better to a post-cache RESPMOD feature. I do not think so. Post-cache RESPMOD does not allow Squid to cache the adapted variants. Please let me know if I missed how post-cache RESPMOD can do that. The key here is that PageSpeed and similar services want to create (and cache) many adapted responses out of a single virgin response. Neither HTTP itself nor the Squid architecture support that well. Post-cache REQMOD allows basic PageSpeed support (the first request for "small" adapted content gets "large" virgin content, but the second request for small content fetches it from the PageSpeed cache, storing it in Squid cache). To optimize PageSpeed support further (so that the first request can get small content), we will need to add another generally useful feature, but I would rather not bring it into this discussion (there will be a separate RFC if we get that far). The alternative is to create a completely new interface (not a true vectoring point) that allows an adaptation service to push multiple adapted responses into the Squid cache _and_ tell Squid which of those responses to use for the current request. While I have considered proposing that, I still think we would be better off supporting "standard" and "well understood" building blocks (such as standard adaptation vectoring points) rather than such highly-specialized interfaces. Please let me know if you disagree. > Is > the post-cacge REQMOD just a first step to support all post-cache > vectoring points? You can certainly view it that way, but I do not propose or promise adding post-cache RESPMOD :-). Thank you, Alex. > On 07/11/2014 01:15 AM, Alex Rousskov wrote: >> Hello, >> >> I propose adding support for a third adaptation vectoring point: >> post-cache REQMOD. Services at this new point receive cache miss >> requests and may adapt them as usual. If a service satisfies the >> request, the service response may get cached by Squid. As you know, >> Squid currently support pre-cache REQMOD and pre-cache RESPMOD. >> >> >> We have received many requests for post-cache adaptation support >> throughput the years, and I personally resisted the temptation of adding >> another layer of complexity (albeit an optional one) because it is a lot >> of work and because many use cases could be addressed without post-cache >> adaptation support. >> >> The last straw (and the motivation for this RFC) was PageSpeed[1] >> integration. With PageSpeed, one can generate various variants of >> "optimized" content. For example, mobile users may receive smaller >> images. Apache and Nginx support PageSpeed modules. It is possible to >> integrate Squid with PageSpeed (and similar services) today, but it is >> not possible for Squid to _cache_ those generated variants unless one is >> willing to pay for another round trip to the origin server to get >> exactly the same unoptimized content. >> >> The only way to support Squid caching of PageSpeed variants without >> repeated round trips to the origin server is using two Squids. The >> parent Squid would cache origin server responses while the child Squid >> would adapt parent's responses and cache adapted content. Needless to >> say, running two Squids (each with its own cache) instead of one adds >> significant performance/administrative overheads and complexity. >> >> >> As far as internals are concerned, I am currently thinking of launching >> adaptation job for this vectoring point from FwdState::Start(). This >> way, its impact on the rest of Squid would be minimal and some adapters >> might even affect FwdState routing decisions. The initial code name for >> the new class is MissReqFilter, but that may change. >> >> >> >> The other candidate location for plugging in the new vectoring point is >> the Server class. However, that class is already complex. It handles >> communication with the next hop (with child classes doing >> protocol-specific work and confusing things further) as well as >> pre-cache RESPMOD vectoring point with caching initiation on top of >> that. The Server code already has trouble distinguishing various content >> streams it has to juggle. I am worried that adding another vectoring >> point there would make that complexity significantly worse. >> >> It is possible that we would be able to refactor/encapsulate some of the >> code so that it can be reused in both the existing Server and the new >> MissReqFilter classes. I will look out for such opportunities while >> trying to keep the overall complexity in check. >> >> >> Any objections to adding post-cache REQMOD or better implementation >> ideas? >> >> >> Thank you, >> >> Alex. >> [1] https://developers.google.com/speed/pagespeed/ >>
Re: [RFC] post-cache REQMOD
On 07/11/2014 03:16 AM, Amos Jeffries wrote: > On 11/07/2014 4:27 p.m., Alex Rousskov wrote: >> This is unrelated to caching rules. HTTP does not have a notion of >> creating multiple responses from a single response, and that is exactly >> what PageSpeed and similar adaptations do: For example, they convert a >> large origin server image or page into several variants, one for each >> class of clients. > Indeed. So this implementation of PageSpeed by requiring HTTP agents to > transform traffic mid-transit from single to multiple responses is a > violation. I do not think so. This is just a content adaptation outside of the HTTP domain. If this is a "violation", then dreaming of unicorns is an HTTP violation because HTTP does not have a notion of a unicorn. > Or, PageSpeed is completely unnecessary mid-transit and has nothing to > do with Squid and caching. Evidently, folks deploying proxies say otherwise. We can tell them that they are "doing it wrong", but I do not think we should in this case. For some use cases, PageSpeed adaptation is the right thing to do. Also, this is not necessarily used "mid-transit" as discussed below. > Which can cache either the small shrunk > objects, or the single large one just fine. It is instead the attempt to > perform end-server operations in a proxy/gateway which is driving this > change. The need to better serve a diverse population of web clients is driving this change. An attempt to confine useful content adaptations to "end-servers" is futile at best. BTW, many want to deploy PageSpeed at the reverse proxy, which is an end-server from the HTTP client point of view so even if we adopt the "only end-servers can do that!" law, Squid should still support this kind of adaptation. Cheers, Alex.
Re: [RFC] post-cache REQMOD
On 07/11/2014 07:34 AM, Reiner Karlsberg wrote: >>> The only way to support Squid caching of PageSpeed variants without >>> repeated round trips to the origin server is using two Squids. The >>> parent Squid would cache origin server responses while the child Squid >>> would adapt parent's responses and cache adapted content. Needless to >>> say, running two Squids (each with its own cache) instead of one adds >>> significant performance/administrative overheads and complexity. > Not such a big deal. Agreed. For you and many others, it is not a big deal indeed. > The more professional solution using a PageSpeed module combined with > squid > would only be of practical advantage for large scale users, like ISPs, > in need of best performance and flexibility. Exactly! The "need of best performance" [and least administrative overhead] is what we are trying to accommodate in this case. Since many Squid improvements during the last 5+ years are sponsored by "large scale users", I do not think it is a good idea to reject their requests just because they are irrelevant to others. What do you think? Cheers, Alex.
Re: [RFC] post-cache REQMOD
On 07/11/2014 05:47 PM, Alex Rousskov wrote: On 07/11/2014 05:27 AM, Tsantilas Christos wrote: The PageSpeed example fits better to a post-cache RESPMOD feature. I do not think so. Post-cache RESPMOD does not allow Squid to cache the adapted variants. Please let me know if I missed how post-cache RESPMOD can do that. I did not read correctly the problem you want to solve. I had in my mind a proxy which cache original content and then adapts the cached content according client rules. But you want to cache adapted content. However still I am not sure I can understand how the post-cache reqmod will help. Assume the following scenario: - Client A requests original web page - Client B requests optimized web page (removed spaces and comments) I am expecting a solution which will store to cache two copies of the web page, the optimized and the original copy. A solution on this is can be to use a mechanism similar to the vary headers, for example define a ICAP header which should included to vary. I did not look to storeID feature but probably can be used for the same purpose. The key here is that PageSpeed and similar services want to create (and cache) many adapted responses out of a single virgin response. Neither HTTP itself nor the Squid architecture support that well. Post-cache REQMOD allows basic PageSpeed support (the first request for "small" adapted content gets "large" virgin content, but the second request for small content fetches it from the PageSpeed cache, storing it in Squid cache). To optimize PageSpeed support further (so that the first request can get small content), we will need to add another generally useful feature, but I would rather not bring it into this discussion (there will be a separate RFC if we get that far). Probably I did not understand well how the PageSpeed works or what a PageSpeed cache means. But in the above scenario squid looks that will store only one version of the content (the small content). Is this the only required? What am I missing? The alternative is to create a completely new interface (not a true vectoring point) that allows an adaptation service to push multiple adapted responses into the Squid cache _and_ tell Squid which of those responses to use for the current request. While I have considered proposing that, I still think we would be better off supporting "standard" and "well understood" building blocks (such as standard adaptation vectoring points) rather than such highly-specialized interfaces. Please let me know if you disagree. Is the post-cacge REQMOD just a first step to support all post-cache vectoring points? You can certainly view it that way, but I do not propose or promise adding post-cache RESPMOD :-). Thank you, Alex. On 07/11/2014 01:15 AM, Alex Rousskov wrote: Hello, I propose adding support for a third adaptation vectoring point: post-cache REQMOD. Services at this new point receive cache miss requests and may adapt them as usual. If a service satisfies the request, the service response may get cached by Squid. As you know, Squid currently support pre-cache REQMOD and pre-cache RESPMOD. We have received many requests for post-cache adaptation support throughput the years, and I personally resisted the temptation of adding another layer of complexity (albeit an optional one) because it is a lot of work and because many use cases could be addressed without post-cache adaptation support. The last straw (and the motivation for this RFC) was PageSpeed[1] integration. With PageSpeed, one can generate various variants of "optimized" content. For example, mobile users may receive smaller images. Apache and Nginx support PageSpeed modules. It is possible to integrate Squid with PageSpeed (and similar services) today, but it is not possible for Squid to _cache_ those generated variants unless one is willing to pay for another round trip to the origin server to get exactly the same unoptimized content. The only way to support Squid caching of PageSpeed variants without repeated round trips to the origin server is using two Squids. The parent Squid would cache origin server responses while the child Squid would adapt parent's responses and cache adapted content. Needless to say, running two Squids (each with its own cache) instead of one adds significant performance/administrative overheads and complexity. As far as internals are concerned, I am currently thinking of launching adaptation job for this vectoring point from FwdState::Start(). This way, its impact on the rest of Squid would be minimal and some adapters might even affect FwdState routing decisions. The initial code name for the new class is MissReqFilter, but that may change. The other candidate location for plugging in the new vectoring point is the Server class. However, that class is already complex. It handles communication with the next hop (with child classes doing protocol-specific work and confusing things further)
Re: [RFC] post-cache REQMOD
On 07/11/2014 11:46 AM, Tsantilas Christos wrote: > On 07/11/2014 05:47 PM, Alex Rousskov wrote: >> On 07/11/2014 05:27 AM, Tsantilas Christos wrote: >> >>> The PageSpeed example fits better to a post-cache RESPMOD feature. >> >> I do not think so. Post-cache RESPMOD does not allow Squid to cache the >> adapted variants. Please let me know if I missed how post-cache RESPMOD >> can do that. > > I did not read correctly the problem you want to solve. I had in my mind > a proxy which cache original content and then adapts the cached content > according client rules. > But you want to cache adapted content. Both, actually. > However still I am not sure I can understand how the post-cache reqmod > will help. > Assume the following scenario: >- Client A requests original web page >- Client B requests optimized web page (removed spaces and comments) There are several scenarios and tricks here. I am not going to cover all of them, but here is one possibility: * The "A" response gets cached as usual. When it goes through the pre-cache RESPMOD adaptation service, it gets optimized, and the optimized variant gets stored in the PageSpeed cache (inside the adapter). * The "B" request misses the Squid cache (HTTP Vary rules, StoreID, and/or a pre-cache REQMOD service ensures a miss). That miss request reaches the post-cache REQMOD service. The service finds a matching optimized response in the PageSpeed cache and satisfies the miss request with that. Squid caches the service response. * The Squid cache now contains both response variants. Any future request for the original or optimized page will be served from the Squid cache. Again, this is just one scenario/possibility. A post-cache REQMOD service can be used for many other things, of course. It allows to limit adaptation to misses and to cache adapted miss responses without going all the way to the origin server. In fact, one could use it to implement a whole web server backed by a Squid cache! The existing pre-cache REQMOD service can also be used as a web server replacement but it cannot get advantage of the Squid cache. Hope this clarifies, Alex.
Re: [RFC] post-cache REQMOD
On 12/07/2014 3:24 a.m., Alex Rousskov wrote: > On 07/11/2014 03:16 AM, Amos Jeffries wrote: >> On 11/07/2014 4:27 p.m., Alex Rousskov wrote: >>> This is unrelated to caching rules. HTTP does not have a notion of >>> creating multiple responses from a single response, and that is exactly >>> what PageSpeed and similar adaptations do: For example, they convert a >>> large origin server image or page into several variants, one for each >>> class of clients. > > >> Indeed. So this implementation of PageSpeed by requiring HTTP agents to >> transform traffic mid-transit from single to multiple responses is a >> violation. > > I do not think so. This is just a content adaptation outside of the HTTP > domain. If this is a "violation", then dreaming of unicorns is an HTTP > violation because HTTP does not have a notion of a unicorn. > > >> Or, PageSpeed is completely unnecessary mid-transit and has nothing to >> do with Squid and caching. > > Evidently, folks deploying proxies say otherwise. We can tell them that > they are "doing it wrong", but I do not think we should in this case. > For some use cases, PageSpeed adaptation is the right thing to do. Also, > this is not necessarily used "mid-transit" as discussed below. > We've been advising alternative to post-cache vector points as you say for some time. This does not appear to be much different to the other requests. There is an alternative. Question is whether the benefit gained by one more piece of software in the installed chain outweights the development effort and maintenance of a new vector point in Squid code. IMHO they are pretty evenly balanced at present. So its up to you whether you want to take on that workload. My opinion on where and how to integrate was mentionend in the initial email. > >> Which can cache either the small shrunk >> objects, or the single large one just fine. It is instead the attempt to >> perform end-server operations in a proxy/gateway which is driving this >> change. > > The need to better serve a diverse population of web clients is driving > this change. An attempt to confine useful content adaptations to > "end-servers" is futile at best. BTW, many want to deploy PageSpeed at > the reverse proxy, which is an end-server from the HTTP client point of > view so even if we adopt the "only end-servers can do that!" law, Squid > should still support this kind of adaptation. > Perhapse. Dont let my doubts there block you though. Amos
Re: [RFC] post-cache REQMOD
On 12/07/2014 2:47 a.m., Alex Rousskov wrote: > On 07/11/2014 05:27 AM, Tsantilas Christos wrote: > >> The PageSpeed example fits better to a post-cache RESPMOD feature. > > I do not think so. Post-cache RESPMOD does not allow Squid to cache the > adapted variants. Please let me know if I missed how post-cache RESPMOD > can do that. post-cache RESPMOD should be caching the large unfiltered object and Squid cache supplying it to the adaptation module for each adaptation task. Adaptation operations on each request, but no upstream contact necessary. post-cache REQMOD the cache stores the shrunk version of objects. Adaptors at this point cannot pull form cache, so request the larger object from upstream on each MISS in order to adapt befor caching. Adaptation operations only on MISS, but upstream fetch of the unfiltered large object on each adaptation. I guess you are assuming that the ICAP service stores the unfiltered object in its own cache and delivers the shrunk objects to Squid as reply responses to REQMOD. This is the only case in which post-cache REQMOD is more efficient overall than pre-cache REQMOD. > > The key here is that PageSpeed and similar services want to create (and > cache) many adapted responses out of a single virgin response. Neither > HTTP itself nor the Squid architecture support that well. Post-cache > REQMOD allows basic PageSpeed support (the first request for "small" > adapted content gets "large" virgin content, but the second request for > small content fetches it from the PageSpeed cache, storing it in Squid > cache). To optimize PageSpeed support further (so that the first request > can get small content), we will need to add another generally useful > feature, but I would rather not bring it into this discussion (there > will be a separate RFC if we get that far). > > The alternative is to create a completely new interface (not a true > vectoring point) that allows an adaptation service to push multiple > adapted responses into the Squid cache _and_ tell Squid which of those > responses to use for the current request. While I have considered > proposing that, I still think we would be better off supporting > "standard" and "well understood" building blocks (such as standard > adaptation vectoring points) rather than such highly-specialized > interfaces. Please let me know if you disagree. > IMHO if this feature is provided the persons requesting it will find that PageSpeed works no faster than pre-created shrunk variants over standard HTTP with working Vary caching. They are saving cheap disk storage by spending expensive CPU and network latency. Probably in an effort to speed up all those very old proxies that incorrectly implement Cache-Control:no-cache. Vary caching is after all the design of providing client-specific variants without all the work of realtime adaptation. (thats just me getting old though as I look desparingly on an ever more inefficient network - "back in my day..."). > >> Is >> the post-cacge REQMOD just a first step to support all post-cache >> vectoring points? > > You can certainly view it that way, but I do not propose or promise > adding post-cache RESPMOD :-). > > > Thank you, > > Alex. Amos
Re: [RFC] post-cache REQMOD
On 07/11/2014 08:19 PM, Amos Jeffries wrote: > On 12/07/2014 2:47 a.m., Alex Rousskov wrote: >> On 07/11/2014 05:27 AM, Tsantilas Christos wrote: >> >>> The PageSpeed example fits better to a post-cache RESPMOD feature. >> >> I do not think so. Post-cache RESPMOD does not allow Squid to cache the >> adapted variants. Please let me know if I missed how post-cache RESPMOD >> can do that. > > post-cache RESPMOD should be caching the large unfiltered object post-cache RESPMOD cannot cache something because it operates on the other side of the cache. Among the two RESPMODs, only pre-cache RESPMOD can cache. There is no need for RESPMOD to cache the large unfiltered object. That happens without adaptation. > and > Squid cache supplying it to the adaptation module for each adaptation task. That is possible if one implements post-cache RESPMOD. Unfortunately, both large Squid hits and PageSpeed adaptations are too slow to be used like that. For performance reasons, folks want to cache the adapted variants in the Squid cache. > post-cache REQMOD the cache stores the shrunk version of objects. > Adaptors at this point cannot pull form cache, so request the larger > object from upstream on each MISS in order to adapt befor caching. > Adaptation operations only on MISS, but upstream fetch of the > unfiltered large object on each adaptation. No, it does not work like that. Please see my description in the response to Christos (and also below) for details on how a post-cache REQMOD may work to accommodate PageSpeed needs. > I guess you are assuming that the ICAP service stores the unfiltered > object in its own cache and delivers the shrunk objects to Squid as > reply responses to REQMOD. This is the only case in which post-cache > REQMOD is more efficient overall than pre-cache REQMOD. No, the pre-cache RESPMOD service temporary stores the small adapted content (possibly many variants!) in its own cache, not the large virgin content. Post-cache REQMOD service delivers the right adapted variant to Squid, which caches it in Squid cache, and delivers it to the client. > IMHO if this feature is provided the persons requesting it will find > that PageSpeed works no faster than pre-created shrunk variants over > standard HTTP with working Vary caching. They are saving cheap disk > storage by spending expensive CPU and network latency. Probably in an > effort to speed up all those very old proxies that incorrectly implement > Cache-Control:no-cache. Or there may be no good place to store pre-created shrunk variants and/or pick the right variant on the origin server.Or the proxy people may not have much control over the origin server, even if both belong to the same [large] organization. Or the server is too far, too slow, too overloaded with more important stuff, etc. > Vary caching is after all the design of providing client-specific > variants without all the work of realtime adaptation. The solution I sketched may use Vary when storing variants in Squid cache. Vary is a useful mechanism and post-cache REQMOD does not preclude its use. Hope this clarifies, Alex.