[RFC] post-cache REQMOD

2014-07-10 Thread Alex Rousskov
Hello,

I propose adding support for a third adaptation vectoring point:
post-cache REQMOD. Services at this new point receive cache miss
requests and may adapt them as usual. If a service satisfies the
request, the service response may get cached by Squid. As you know,
Squid currently support pre-cache REQMOD and pre-cache RESPMOD.


We have received many requests for post-cache adaptation support
throughput the years, and I personally resisted the temptation of adding
another layer of complexity (albeit an optional one) because it is a lot
of work and because many use cases could be addressed without post-cache
adaptation support.

The last straw (and the motivation for this RFC) was PageSpeed[1]
integration. With PageSpeed, one can generate various variants of
"optimized" content. For example, mobile users may receive smaller
images. Apache and Nginx support PageSpeed modules. It is possible to
integrate Squid with PageSpeed (and similar services) today, but it is
not possible for Squid to _cache_ those generated variants unless one is
willing to pay for another round trip to the origin server to get
exactly the same unoptimized content.

The only way to support Squid caching of PageSpeed variants without
repeated round trips to the origin server is using two Squids. The
parent Squid would cache origin server responses while the child Squid
would adapt parent's responses and cache adapted content. Needless to
say, running two Squids (each with its own cache) instead of one adds
significant performance/administrative overheads and complexity.


As far as internals are concerned, I am currently thinking of launching
adaptation job for this vectoring point from FwdState::Start(). This
way, its impact on the rest of Squid would be minimal and some adapters
might even affect FwdState routing decisions. The initial code name for
the new class is MissReqFilter, but that may change.



The other candidate location for plugging in the new vectoring point is
the Server class. However, that class is already complex. It handles
communication with the next hop (with child classes doing
protocol-specific work and confusing things further) as well as
pre-cache RESPMOD vectoring point with caching initiation on top of
that. The Server code already has trouble distinguishing various content
streams it has to juggle. I am worried that adding another vectoring
point there would make that complexity significantly worse.

It is possible that we would be able to refactor/encapsulate some of the
code so that it can be reused in both the existing Server and the new
MissReqFilter classes. I will look out for such opportunities while
trying to keep the overall complexity in check.


Any objections to adding post-cache REQMOD or better implementation ideas?


Thank you,

Alex.
[1] https://developers.google.com/speed/pagespeed/


Re: [RFC] post-cache REQMOD

2014-07-10 Thread Amos Jeffries
On 11/07/2014 10:15 a.m., Alex Rousskov wrote:
> Hello,
> 
> I propose adding support for a third adaptation vectoring point:
> post-cache REQMOD. Services at this new point receive cache miss
> requests and may adapt them as usual. If a service satisfies the
> request, the service response may get cached by Squid. As you know,
> Squid currently support pre-cache REQMOD and pre-cache RESPMOD.

Just to clarify you mean this to be the vectoring point which receives
MISS-only traffic, as the existing one(s) receive HIT+MISS traffic?


> We have received many requests for post-cache adaptation support
> throughput the years, and I personally resisted the temptation of adding
> another layer of complexity (albeit an optional one) because it is a lot
> of work and because many use cases could be addressed without post-cache
> adaptation support.
> 
> The last straw (and the motivation for this RFC) was PageSpeed[1]
> integration. With PageSpeed, one can generate various variants of
> "optimized" content. For example, mobile users may receive smaller
> images. Apache and Nginx support PageSpeed modules. It is possible to
> integrate Squid with PageSpeed (and similar services) today, but it is
> not possible for Squid to _cache_ those generated variants unless one is
> willing to pay for another round trip to the origin server to get
> exactly the same unoptimized content.

Can you show how they are violating standard HTTP variant caching? the
HTTPbis should probably be informed of the problem.
If it is actually within standard then it would seem to be a missing
feature of Squid to cache them properly. We could improve better by
fixing Squid to cache more compliant traffic.

> 
> The only way to support Squid caching of PageSpeed variants without
> repeated round trips to the origin server is using two Squids. The
> parent Squid would cache origin server responses while the child Squid
> would adapt parent's responses and cache adapted content. Needless to
> say, running two Squids (each with its own cache) instead of one adds
> significant performance/administrative overheads and complexity.
> 
> 
> As far as internals are concerned, I am currently thinking of launching
> adaptation job for this vectoring point from FwdState::Start(). This
> way, its impact on the rest of Squid would be minimal and some adapters
> might even affect FwdState routing decisions. The initial code name for
> the new class is MissReqFilter, but that may change.
> 

Given that FwdState is the global selector to determine where MISS
content comes from this sounds reasonable.

I think after the miss_access tests is best position. We need to split
miss_access lookup off into an async step to be a slow lookup anyway.


> 
> The other candidate location for plugging in the new vectoring point is
> the Server class. However, that class is already complex. It handles
> communication with the next hop (with child classes doing
> protocol-specific work and confusing things further) as well as
> pre-cache RESPMOD vectoring point with caching initiation on top of
> that. The Server code already has trouble distinguishing various content
> streams it has to juggle. I am worried that adding another vectoring
> point there would make that complexity significantly worse.

Agreed. Bad idea.

> 
> It is possible that we would be able to refactor/encapsulate some of the
> code so that it can be reused in both the existing Server and the new
> MissReqFilter classes. I will look out for such opportunities while
> trying to keep the overall complexity in check.
> 
> 
> Any objections to adding post-cache REQMOD or better implementation ideas?

Just the above details about variant caching.

Amos



Re: [RFC] post-cache REQMOD

2014-07-10 Thread Alex Rousskov
On 07/10/2014 09:12 PM, Amos Jeffries wrote:
> On 11/07/2014 10:15 a.m., Alex Rousskov wrote:
>> I propose adding support for a third adaptation vectoring point:
>> post-cache REQMOD. Services at this new point receive cache miss
>> requests and may adapt them as usual. If a service satisfies the
>> request, the service response may get cached by Squid. As you know,
>> Squid currently support pre-cache REQMOD and pre-cache RESPMOD.
> 
> Just to clarify you mean this to be the vectoring point which receives
> MISS-only traffic, as the existing one(s) receive HIT+MISS traffic?

+ pre-cache REQMOD receives HIT+MISS request traffic.
+ pre-cache RESPMOD receives MISS response traffic.
* post-cache REQMOD receives MISS request traffic.
- post-cache RESPMOD receives HIT+MISS response traffic.

All four vectoring points are documented in RFC 3507. Squid currently
supports the first two. I propose adding support for the third. Each of
the four points (and probably other!) is useful for some use cases.

Besides getting different HIT/MISS traffic mixture, there are other
differences between these vectoring points. For example, pre-cache
REQMOD responses go straight to the HTTP client while post-cache REQMOD
responses may be cached by Squid (this example is about request
satisfaction mode of REQMOD).


>> We have received many requests for post-cache adaptation support
>> throughput the years, and I personally resisted the temptation of adding
>> another layer of complexity (albeit an optional one) because it is a lot
>> of work and because many use cases could be addressed without post-cache
>> adaptation support.
>>
>> The last straw (and the motivation for this RFC) was PageSpeed[1]
>> integration. With PageSpeed, one can generate various variants of
>> "optimized" content. For example, mobile users may receive smaller
>> images. Apache and Nginx support PageSpeed modules. It is possible to
>> integrate Squid with PageSpeed (and similar services) today, but it is
>> not possible for Squid to _cache_ those generated variants unless one is
>> willing to pay for another round trip to the origin server to get
>> exactly the same unoptimized content.
> 
> Can you show how they are violating standard HTTP variant caching? the
> HTTPbis should probably be informed of the problem.
> If it is actually within standard then it would seem to be a missing
> feature of Squid to cache them properly. We could improve better by
> fixing Squid to cache more compliant traffic.

This is unrelated to caching rules. HTTP does not have a notion of
creating multiple responses from a single response, and that is exactly
what PageSpeed and similar adaptations do: For example, they convert a
large origin server image or page into several variants, one for each
class of clients.


Does this clarify?


Thank you,

Alex.



Re: [RFC] post-cache REQMOD

2014-07-11 Thread Amos Jeffries
On 11/07/2014 4:27 p.m., Alex Rousskov wrote:
> On 07/10/2014 09:12 PM, Amos Jeffries wrote:
>> On 11/07/2014 10:15 a.m., Alex Rousskov wrote:
>>> I propose adding support for a third adaptation vectoring point:
>>> post-cache REQMOD. Services at this new point receive cache miss
>>> requests and may adapt them as usual. If a service satisfies the
>>> request, the service response may get cached by Squid. As you know,
>>> Squid currently support pre-cache REQMOD and pre-cache RESPMOD.
>>
>> Just to clarify you mean this to be the vectoring point which receives
>> MISS-only traffic, as the existing one(s) receive HIT+MISS traffic?
> 
> + pre-cache REQMOD receives HIT+MISS request traffic.
> + pre-cache RESPMOD receives MISS response traffic.
> * post-cache REQMOD receives MISS request traffic.
> - post-cache RESPMOD receives HIT+MISS response traffic.
> 
> All four vectoring points are documented in RFC 3507. Squid currently
> supports the first two. I propose adding support for the third. Each of
> the four points (and probably other!) is useful for some use cases.
> 
> Besides getting different HIT/MISS traffic mixture, there are other
> differences between these vectoring points. For example, pre-cache
> REQMOD responses go straight to the HTTP client while post-cache REQMOD
> responses may be cached by Squid (this example is about request
> satisfaction mode of REQMOD).
> 
> 
>>> We have received many requests for post-cache adaptation support
>>> throughput the years, and I personally resisted the temptation of adding
>>> another layer of complexity (albeit an optional one) because it is a lot
>>> of work and because many use cases could be addressed without post-cache
>>> adaptation support.
>>>
>>> The last straw (and the motivation for this RFC) was PageSpeed[1]
>>> integration. With PageSpeed, one can generate various variants of
>>> "optimized" content. For example, mobile users may receive smaller
>>> images. Apache and Nginx support PageSpeed modules. It is possible to
>>> integrate Squid with PageSpeed (and similar services) today, but it is
>>> not possible for Squid to _cache_ those generated variants unless one is
>>> willing to pay for another round trip to the origin server to get
>>> exactly the same unoptimized content.
>>
>> Can you show how they are violating standard HTTP variant caching? the
>> HTTPbis should probably be informed of the problem.
>> If it is actually within standard then it would seem to be a missing
>> feature of Squid to cache them properly. We could improve better by
>> fixing Squid to cache more compliant traffic.
> 
> This is unrelated to caching rules. HTTP does not have a notion of
> creating multiple responses from a single response, and that is exactly
> what PageSpeed and similar adaptations do: For example, they convert a
> large origin server image or page into several variants, one for each
> class of clients.
> 


Indeed. So this implementation of PageSpeed by requiring HTTP agents to
transform traffic mid-transit from single to multiple responses is a
violation.

Or, PageSpeed is completely unnecessary mid-transit and has nothing to
do with Squid and caching. Which can cache either the small shrunk
objects, or the single large one just fine. It is instead the attempt to
perform end-server operations in a proxy/gateway which is driving this
change.

Amos


Re: [RFC] post-cache REQMOD

2014-07-11 Thread Tsantilas Christos

The post-cache REQMOD and post-cache RESPMOD is a must for squid.

The example of PageSpeed also is very good. I must note that there are 
already similar features integrated to other commercial products, for 
example:


http://www.citrix.com/products/bytemobile-adaptive-traffic-management/tech-info.html 
  ("Web and video optimization" -> "Quality Aware Transcoding", 
"Smartphone Application Acceleration" and "Web Optimization")


The PageSpeed example fits better to a post-cache RESPMOD feature. Is 
the post-cacge REQMOD just a first step to support all post-cache 
vectoring points?




On 07/11/2014 01:15 AM, Alex Rousskov wrote:

Hello,

 I propose adding support for a third adaptation vectoring point:
post-cache REQMOD. Services at this new point receive cache miss
requests and may adapt them as usual. If a service satisfies the
request, the service response may get cached by Squid. As you know,
Squid currently support pre-cache REQMOD and pre-cache RESPMOD.


We have received many requests for post-cache adaptation support
throughput the years, and I personally resisted the temptation of adding
another layer of complexity (albeit an optional one) because it is a lot
of work and because many use cases could be addressed without post-cache
adaptation support.

The last straw (and the motivation for this RFC) was PageSpeed[1]
integration. With PageSpeed, one can generate various variants of
"optimized" content. For example, mobile users may receive smaller
images. Apache and Nginx support PageSpeed modules. It is possible to
integrate Squid with PageSpeed (and similar services) today, but it is
not possible for Squid to _cache_ those generated variants unless one is
willing to pay for another round trip to the origin server to get
exactly the same unoptimized content.

The only way to support Squid caching of PageSpeed variants without
repeated round trips to the origin server is using two Squids. The
parent Squid would cache origin server responses while the child Squid
would adapt parent's responses and cache adapted content. Needless to
say, running two Squids (each with its own cache) instead of one adds
significant performance/administrative overheads and complexity.


As far as internals are concerned, I am currently thinking of launching
adaptation job for this vectoring point from FwdState::Start(). This
way, its impact on the rest of Squid would be minimal and some adapters
might even affect FwdState routing decisions. The initial code name for
the new class is MissReqFilter, but that may change.



The other candidate location for plugging in the new vectoring point is
the Server class. However, that class is already complex. It handles
communication with the next hop (with child classes doing
protocol-specific work and confusing things further) as well as
pre-cache RESPMOD vectoring point with caching initiation on top of
that. The Server code already has trouble distinguishing various content
streams it has to juggle. I am worried that adding another vectoring
point there would make that complexity significantly worse.

It is possible that we would be able to refactor/encapsulate some of the
code so that it can be reused in both the existing Server and the new
MissReqFilter classes. I will look out for such opportunities while
trying to keep the overall complexity in check.


Any objections to adding post-cache REQMOD or better implementation ideas?


Thank you,

Alex.
[1] https://developers.google.com/speed/pagespeed/





Re: [RFC] post-cache REQMOD

2014-07-11 Thread Reiner Karlsberg




The only way to support Squid caching of PageSpeed variants without
repeated round trips to the origin server is using two Squids. The
parent Squid would cache origin server responses while the child Squid
would adapt parent's responses and cache adapted content. Needless to
say, running two Squids (each with its own cache) instead of one adds
significant performance/administrative overheads and complexity.

Not such a big deal. Doing exactly this for years already, in my open proxy, to 
be used for mobile users. (Just a hobby)
client-squid-ziproxy-squid-web

Some functions of PageSpeed are performed in sandwiched ziproxy, like graphics compression to various levels, or gzip of 
plain pages.

Needs multi-core CPU, of course.

The more professional solution using a  PageSpeed module combined with squid
would only be of practical advantage for large scale users, like ISPs, in need 
of best performance and flexibility.

--
Mit freundlichen Grüßen



Reiner Karlsberg

Ringerottstr. 50
45772 Marl
Germany


Tel.: (+49) (0)2365-8568281
Mob.: (+49) (0)1788904200



-
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2014.0.4570 / Virus Database: 3986/7831 - Release Date: 07/10/14



Re: [RFC] post-cache REQMOD

2014-07-11 Thread Alex Rousskov
On 07/11/2014 05:27 AM, Tsantilas Christos wrote:

> The PageSpeed example fits better to a post-cache RESPMOD feature. 

I do not think so. Post-cache RESPMOD does not allow Squid to cache the
adapted variants. Please let me know if I missed how post-cache RESPMOD
can do that.

The key here is that PageSpeed and similar services want to create (and
cache) many adapted responses out of a single virgin response. Neither
HTTP itself nor the Squid architecture support that well. Post-cache
REQMOD allows basic PageSpeed support (the first request for "small"
adapted content gets "large" virgin content, but the second request for
small content fetches it from the PageSpeed cache, storing it in Squid
cache). To optimize PageSpeed support further (so that the first request
can get small content), we will need to add another generally useful
feature, but I would rather not bring it into this discussion (there
will be a separate RFC if we get that far).

The alternative is to create a completely new interface (not a true
vectoring point) that allows an adaptation service to push multiple
adapted responses into the Squid cache _and_ tell Squid which of those
responses to use for the current request. While I have considered
proposing that, I still think we would be better off supporting
"standard" and "well understood" building blocks (such as standard
adaptation vectoring points) rather than such highly-specialized
interfaces. Please let me know if you disagree.


> Is
> the post-cacge REQMOD just a first step to support all post-cache
> vectoring points?

You can certainly view it that way, but I do not propose or promise
adding post-cache RESPMOD :-).


Thank you,

Alex.



> On 07/11/2014 01:15 AM, Alex Rousskov wrote:
>> Hello,
>>
>>  I propose adding support for a third adaptation vectoring point:
>> post-cache REQMOD. Services at this new point receive cache miss
>> requests and may adapt them as usual. If a service satisfies the
>> request, the service response may get cached by Squid. As you know,
>> Squid currently support pre-cache REQMOD and pre-cache RESPMOD.
>>
>>
>> We have received many requests for post-cache adaptation support
>> throughput the years, and I personally resisted the temptation of adding
>> another layer of complexity (albeit an optional one) because it is a lot
>> of work and because many use cases could be addressed without post-cache
>> adaptation support.
>>
>> The last straw (and the motivation for this RFC) was PageSpeed[1]
>> integration. With PageSpeed, one can generate various variants of
>> "optimized" content. For example, mobile users may receive smaller
>> images. Apache and Nginx support PageSpeed modules. It is possible to
>> integrate Squid with PageSpeed (and similar services) today, but it is
>> not possible for Squid to _cache_ those generated variants unless one is
>> willing to pay for another round trip to the origin server to get
>> exactly the same unoptimized content.
>>
>> The only way to support Squid caching of PageSpeed variants without
>> repeated round trips to the origin server is using two Squids. The
>> parent Squid would cache origin server responses while the child Squid
>> would adapt parent's responses and cache adapted content. Needless to
>> say, running two Squids (each with its own cache) instead of one adds
>> significant performance/administrative overheads and complexity.
>>
>>
>> As far as internals are concerned, I am currently thinking of launching
>> adaptation job for this vectoring point from FwdState::Start(). This
>> way, its impact on the rest of Squid would be minimal and some adapters
>> might even affect FwdState routing decisions. The initial code name for
>> the new class is MissReqFilter, but that may change.
>>
>>
>>
>> The other candidate location for plugging in the new vectoring point is
>> the Server class. However, that class is already complex. It handles
>> communication with the next hop (with child classes doing
>> protocol-specific work and confusing things further) as well as
>> pre-cache RESPMOD vectoring point with caching initiation on top of
>> that. The Server code already has trouble distinguishing various content
>> streams it has to juggle. I am worried that adding another vectoring
>> point there would make that complexity significantly worse.
>>
>> It is possible that we would be able to refactor/encapsulate some of the
>> code so that it can be reused in both the existing Server and the new
>> MissReqFilter classes. I will look out for such opportunities while
>> trying to keep the overall complexity in check.
>>
>>
>> Any objections to adding post-cache REQMOD or better implementation
>> ideas?
>>
>>
>> Thank you,
>>
>> Alex.
>> [1] https://developers.google.com/speed/pagespeed/
>>



Re: [RFC] post-cache REQMOD

2014-07-11 Thread Alex Rousskov
On 07/11/2014 03:16 AM, Amos Jeffries wrote:
> On 11/07/2014 4:27 p.m., Alex Rousskov wrote:
>> This is unrelated to caching rules. HTTP does not have a notion of
>> creating multiple responses from a single response, and that is exactly
>> what PageSpeed and similar adaptations do: For example, they convert a
>> large origin server image or page into several variants, one for each
>> class of clients.


> Indeed. So this implementation of PageSpeed by requiring HTTP agents to
> transform traffic mid-transit from single to multiple responses is a
> violation.

I do not think so. This is just a content adaptation outside of the HTTP
domain. If this is a "violation", then dreaming of unicorns is an HTTP
violation because HTTP does not have a notion of a unicorn.


> Or, PageSpeed is completely unnecessary mid-transit and has nothing to
> do with Squid and caching. 

Evidently, folks deploying proxies say otherwise. We can tell them that
they are "doing it wrong", but I do not think we should in this case.
For some use cases, PageSpeed adaptation is the right thing to do. Also,
this is not necessarily used "mid-transit" as discussed below.


> Which can cache either the small shrunk
> objects, or the single large one just fine. It is instead the attempt to
> perform end-server operations in a proxy/gateway which is driving this
> change.

The need to better serve a diverse population of web clients is driving
this change. An attempt to confine useful content adaptations to
"end-servers" is futile at best. BTW, many want to deploy PageSpeed at
the reverse proxy, which is an end-server from the HTTP client point of
view so even if we adopt the "only end-servers can do that!" law, Squid
should still support this kind of adaptation.


Cheers,

Alex.



Re: [RFC] post-cache REQMOD

2014-07-11 Thread Alex Rousskov
On 07/11/2014 07:34 AM, Reiner Karlsberg wrote:

>>> The only way to support Squid caching of PageSpeed variants without
>>> repeated round trips to the origin server is using two Squids. The
>>> parent Squid would cache origin server responses while the child Squid
>>> would adapt parent's responses and cache adapted content. Needless to
>>> say, running two Squids (each with its own cache) instead of one adds
>>> significant performance/administrative overheads and complexity.

> Not such a big deal.

Agreed. For you and many others, it is not a big deal indeed.


> The more professional solution using a  PageSpeed module combined with
> squid
> would only be of practical advantage for large scale users, like ISPs,
> in need of best performance and flexibility.

Exactly! The "need of best performance" [and least administrative
overhead] is what we are trying to accommodate in this case. Since many
Squid improvements during the last 5+ years are sponsored by "large
scale users", I do not think it is a good idea to reject their requests
just because they are irrelevant to others. What do you think?


Cheers,

Alex.



Re: [RFC] post-cache REQMOD

2014-07-11 Thread Tsantilas Christos

On 07/11/2014 05:47 PM, Alex Rousskov wrote:

On 07/11/2014 05:27 AM, Tsantilas Christos wrote:


The PageSpeed example fits better to a post-cache RESPMOD feature.


I do not think so. Post-cache RESPMOD does not allow Squid to cache the
adapted variants. Please let me know if I missed how post-cache RESPMOD
can do that.


I did not read correctly the problem you want to solve. I had in my mind 
a proxy which cache original content and then adapts the cached content 
according client rules.

But you want to cache adapted content.


However still I am not sure I can understand how the post-cache reqmod 
will help.

Assume the following scenario:
   - Client A requests original web page
   - Client B requests optimized web page (removed spaces and comments)

I am expecting a solution which will store to cache two copies of the 
web page, the optimized and the original copy.
A solution on this is can be to use a mechanism similar to the vary 
headers, for example define a ICAP header which should included to vary.
I did not look to storeID feature but probably can be used for the same 
purpose.





The key here is that PageSpeed and similar services want to create (and
cache) many adapted responses out of a single virgin response. Neither
HTTP itself nor the Squid architecture support that well. Post-cache
REQMOD allows basic PageSpeed support (the first request for "small"
adapted content gets "large" virgin content, but the second request for
small content fetches it from the PageSpeed cache, storing it in Squid
cache). To optimize PageSpeed support further (so that the first request
can get small content), we will need to add another generally useful
feature, but I would rather not bring it into this discussion (there
will be a separate RFC if we get that far).


Probably I did not understand well how the PageSpeed works or what a 
PageSpeed cache means.  But in the above scenario squid looks that will 
store only one version of the content (the small content).

Is this the only required?
What am I missing?



The alternative is to create a completely new interface (not a true
vectoring point) that allows an adaptation service to push multiple
adapted responses into the Squid cache _and_ tell Squid which of those
responses to use for the current request. While I have considered
proposing that, I still think we would be better off supporting
"standard" and "well understood" building blocks (such as standard
adaptation vectoring points) rather than such highly-specialized
interfaces. Please let me know if you disagree.



Is
the post-cacge REQMOD just a first step to support all post-cache
vectoring points?


You can certainly view it that way, but I do not propose or promise
adding post-cache RESPMOD :-).


Thank you,

Alex.




On 07/11/2014 01:15 AM, Alex Rousskov wrote:

Hello,

  I propose adding support for a third adaptation vectoring point:
post-cache REQMOD. Services at this new point receive cache miss
requests and may adapt them as usual. If a service satisfies the
request, the service response may get cached by Squid. As you know,
Squid currently support pre-cache REQMOD and pre-cache RESPMOD.


We have received many requests for post-cache adaptation support
throughput the years, and I personally resisted the temptation of adding
another layer of complexity (albeit an optional one) because it is a lot
of work and because many use cases could be addressed without post-cache
adaptation support.

The last straw (and the motivation for this RFC) was PageSpeed[1]
integration. With PageSpeed, one can generate various variants of
"optimized" content. For example, mobile users may receive smaller
images. Apache and Nginx support PageSpeed modules. It is possible to
integrate Squid with PageSpeed (and similar services) today, but it is
not possible for Squid to _cache_ those generated variants unless one is
willing to pay for another round trip to the origin server to get
exactly the same unoptimized content.

The only way to support Squid caching of PageSpeed variants without
repeated round trips to the origin server is using two Squids. The
parent Squid would cache origin server responses while the child Squid
would adapt parent's responses and cache adapted content. Needless to
say, running two Squids (each with its own cache) instead of one adds
significant performance/administrative overheads and complexity.


As far as internals are concerned, I am currently thinking of launching
adaptation job for this vectoring point from FwdState::Start(). This
way, its impact on the rest of Squid would be minimal and some adapters
might even affect FwdState routing decisions. The initial code name for
the new class is MissReqFilter, but that may change.



The other candidate location for plugging in the new vectoring point is
the Server class. However, that class is already complex. It handles
communication with the next hop (with child classes doing
protocol-specific work and confusing things further) 

Re: [RFC] post-cache REQMOD

2014-07-11 Thread Alex Rousskov
On 07/11/2014 11:46 AM, Tsantilas Christos wrote:
> On 07/11/2014 05:47 PM, Alex Rousskov wrote:
>> On 07/11/2014 05:27 AM, Tsantilas Christos wrote:
>>
>>> The PageSpeed example fits better to a post-cache RESPMOD feature.
>>
>> I do not think so. Post-cache RESPMOD does not allow Squid to cache the
>> adapted variants. Please let me know if I missed how post-cache RESPMOD
>> can do that.
> 
> I did not read correctly the problem you want to solve. I had in my mind
> a proxy which cache original content and then adapts the cached content
> according client rules.
> But you want to cache adapted content.

Both, actually.



> However still I am not sure I can understand how the post-cache reqmod
> will help.
> Assume the following scenario:
>- Client A requests original web page
>- Client B requests optimized web page (removed spaces and comments)

There are several scenarios and tricks here. I am not going to cover all
of them, but here is one possibility:

* The "A" response gets cached as usual. When it goes through the
pre-cache RESPMOD adaptation service, it gets optimized, and the
optimized variant gets stored in the PageSpeed cache (inside the adapter).

* The "B" request misses the Squid cache (HTTP Vary rules, StoreID,
and/or a pre-cache REQMOD service ensures a miss). That miss request
reaches the post-cache REQMOD service. The service finds a matching
optimized response in the PageSpeed cache and satisfies the miss request
with that. Squid caches the service response.

* The Squid cache now contains both response variants. Any future
request for the original or optimized page will be served from the Squid
cache.


Again, this is just one scenario/possibility. A post-cache REQMOD
service can be used for many other things, of course. It allows to limit
adaptation to misses and to cache adapted miss responses without going
all the way to the origin server.

In fact, one could use it to implement a whole web server backed by a
Squid cache! The existing pre-cache REQMOD service can also be used as a
web server replacement but it cannot get advantage of the Squid cache.


Hope this clarifies,

Alex.



Re: [RFC] post-cache REQMOD

2014-07-11 Thread Amos Jeffries
On 12/07/2014 3:24 a.m., Alex Rousskov wrote:
> On 07/11/2014 03:16 AM, Amos Jeffries wrote:
>> On 11/07/2014 4:27 p.m., Alex Rousskov wrote:
>>> This is unrelated to caching rules. HTTP does not have a notion of
>>> creating multiple responses from a single response, and that is exactly
>>> what PageSpeed and similar adaptations do: For example, they convert a
>>> large origin server image or page into several variants, one for each
>>> class of clients.
> 
> 
>> Indeed. So this implementation of PageSpeed by requiring HTTP agents to
>> transform traffic mid-transit from single to multiple responses is a
>> violation.
> 
> I do not think so. This is just a content adaptation outside of the HTTP
> domain. If this is a "violation", then dreaming of unicorns is an HTTP
> violation because HTTP does not have a notion of a unicorn.
> 
> 
>> Or, PageSpeed is completely unnecessary mid-transit and has nothing to
>> do with Squid and caching. 
> 
> Evidently, folks deploying proxies say otherwise. We can tell them that
> they are "doing it wrong", but I do not think we should in this case.
> For some use cases, PageSpeed adaptation is the right thing to do. Also,
> this is not necessarily used "mid-transit" as discussed below.
> 

We've been advising alternative to post-cache vector points as you say
for some time. This does not appear to be much different to the other
requests. There is an alternative. Question is whether the benefit
gained by one more piece of software in the installed chain outweights
the development effort and maintenance of a new vector point in Squid code.
 IMHO they are pretty evenly balanced at present. So its up to you
whether you want to take on that workload. My opinion on where and how
to integrate was mentionend in the initial email.


> 
>> Which can cache either the small shrunk
>> objects, or the single large one just fine. It is instead the attempt to
>> perform end-server operations in a proxy/gateway which is driving this
>> change.
> 
> The need to better serve a diverse population of web clients is driving
> this change. An attempt to confine useful content adaptations to
> "end-servers" is futile at best. BTW, many want to deploy PageSpeed at
> the reverse proxy, which is an end-server from the HTTP client point of
> view so even if we adopt the "only end-servers can do that!" law, Squid
> should still support this kind of adaptation.
> 

Perhapse. Dont let my doubts there block you though.

Amos


Re: [RFC] post-cache REQMOD

2014-07-11 Thread Amos Jeffries
On 12/07/2014 2:47 a.m., Alex Rousskov wrote:
> On 07/11/2014 05:27 AM, Tsantilas Christos wrote:
> 
>> The PageSpeed example fits better to a post-cache RESPMOD feature. 
> 
> I do not think so. Post-cache RESPMOD does not allow Squid to cache the
> adapted variants. Please let me know if I missed how post-cache RESPMOD
> can do that.

post-cache RESPMOD should be caching the large unfiltered object and
Squid cache supplying it to the adaptation module for each adaptation task.
 Adaptation operations on each request, but no upstream contact necessary.

post-cache REQMOD the cache stores the shrunk version of objects.
Adaptors at this point cannot pull form cache, so request the larger
object from upstream on each MISS in order to adapt befor caching.
 Adaptation operations only on MISS, but upstream fetch of the
unfiltered large object on each adaptation.

I guess you are assuming that the ICAP service stores the unfiltered
object in its own cache and delivers the shrunk objects to Squid as
reply responses to REQMOD. This is the only case in which post-cache
REQMOD is more efficient overall than pre-cache REQMOD.



> 
> The key here is that PageSpeed and similar services want to create (and
> cache) many adapted responses out of a single virgin response. Neither
> HTTP itself nor the Squid architecture support that well. Post-cache
> REQMOD allows basic PageSpeed support (the first request for "small"
> adapted content gets "large" virgin content, but the second request for
> small content fetches it from the PageSpeed cache, storing it in Squid
> cache). To optimize PageSpeed support further (so that the first request
> can get small content), we will need to add another generally useful
> feature, but I would rather not bring it into this discussion (there
> will be a separate RFC if we get that far).
> 
> The alternative is to create a completely new interface (not a true
> vectoring point) that allows an adaptation service to push multiple
> adapted responses into the Squid cache _and_ tell Squid which of those
> responses to use for the current request. While I have considered
> proposing that, I still think we would be better off supporting
> "standard" and "well understood" building blocks (such as standard
> adaptation vectoring points) rather than such highly-specialized
> interfaces. Please let me know if you disagree.
> 

IMHO if this feature is provided the persons requesting it will find
that PageSpeed works no faster than pre-created shrunk variants over
standard HTTP with working Vary caching. They are saving cheap disk
storage by spending expensive CPU and network latency. Probably in an
effort to speed up all those very old proxies that incorrectly implement
Cache-Control:no-cache.

Vary caching is after all the design of providing client-specific
variants without all the work of realtime adaptation.

(thats just me getting old though as I look desparingly on an ever more
inefficient network - "back in my day...").

> 
>> Is
>> the post-cacge REQMOD just a first step to support all post-cache
>> vectoring points?
> 
> You can certainly view it that way, but I do not propose or promise
> adding post-cache RESPMOD :-).
> 
> 
> Thank you,
> 
> Alex.


Amos




Re: [RFC] post-cache REQMOD

2014-07-11 Thread Alex Rousskov
On 07/11/2014 08:19 PM, Amos Jeffries wrote:
> On 12/07/2014 2:47 a.m., Alex Rousskov wrote:
>> On 07/11/2014 05:27 AM, Tsantilas Christos wrote:
>>
>>> The PageSpeed example fits better to a post-cache RESPMOD feature. 
>>
>> I do not think so. Post-cache RESPMOD does not allow Squid to cache the
>> adapted variants. Please let me know if I missed how post-cache RESPMOD
>> can do that.
> 
> post-cache RESPMOD should be caching the large unfiltered object 

post-cache RESPMOD cannot cache something because it operates on the
other side of the cache. Among the two RESPMODs, only pre-cache RESPMOD
can cache.

There is no need for RESPMOD to cache the large unfiltered object. That
happens without adaptation.


> and
> Squid cache supplying it to the adaptation module for each adaptation task.

That is possible if one implements post-cache RESPMOD. Unfortunately,
both large Squid hits and PageSpeed adaptations are too slow to be used
like that. For performance reasons, folks want to cache the adapted
variants in the Squid cache.


> post-cache REQMOD the cache stores the shrunk version of objects.
> Adaptors at this point cannot pull form cache, so request the larger
> object from upstream on each MISS in order to adapt befor caching.
>  Adaptation operations only on MISS, but upstream fetch of the
> unfiltered large object on each adaptation.

No, it does not work like that. Please see my description in the
response to Christos (and also below) for details on how a post-cache
REQMOD may work to accommodate PageSpeed needs.


> I guess you are assuming that the ICAP service stores the unfiltered
> object in its own cache and delivers the shrunk objects to Squid as
> reply responses to REQMOD. This is the only case in which post-cache
> REQMOD is more efficient overall than pre-cache REQMOD.

No, the pre-cache RESPMOD service temporary stores the small adapted
content (possibly many variants!) in its own cache, not the large virgin
content. Post-cache REQMOD service delivers the right adapted variant to
Squid, which caches it in Squid cache, and delivers it to the client.


> IMHO if this feature is provided the persons requesting it will find
> that PageSpeed works no faster than pre-created shrunk variants over
> standard HTTP with working Vary caching. They are saving cheap disk
> storage by spending expensive CPU and network latency. Probably in an
> effort to speed up all those very old proxies that incorrectly implement
> Cache-Control:no-cache.

Or there may be no good place to store pre-created shrunk variants
and/or pick the right variant on the origin server.Or the proxy people
may not have much control over the origin server, even if both belong to
the same [large] organization. Or the server is too far, too slow, too
overloaded with more important stuff, etc.


> Vary caching is after all the design of providing client-specific
> variants without all the work of realtime adaptation.

The solution I sketched may use Vary when storing variants in Squid
cache. Vary is a useful mechanism and post-cache REQMOD does not
preclude its use.


Hope this clarifies,

Alex.