Re: [squid-dev] [RFC] client header mangling

2016-06-10 Thread Alex Rousskov
On 06/07/2016 05:09 AM, Amos Jeffries wrote:
> I've been looking at ways to resolve the long Vary discussion going on
> in squid-users with a patch that we can accept into mainline. What they
> (joe and Yuri) have at present works, but only with extra
> request_header_replace config preventing integrity problems.

Disclaimer: I have not read the Vary discussion on squid-users. I
suspect your RFC is generic enough to ignore that discussion as far as
the RFC is concerned.


> One way to make useful progress would be to finally add the recurring
> request for request_header_access/replace to work on client messages in
> a pre-cache doCallouts hook rather than only a post-cache hook.

I assume that by "finally add" you meant something like "finally implement".


> I am imagining this being done on the adapted request headers after
> ICAP, eCAP and URL-rewrite have all done their things. 

Sounds good.


> And using the
> same request_header_* directive ACL lists as for outbound traffic.

I am not sure exactly what you mean, but which headers/transactions to
mangle should be up to the Squid admin. The pre- and post-cache mangling
will often differ. The pre- and post-cache mangling API will be very
similar, of course, but we should not restrict the admin to a single set
of rules that is always applied on both sides of the cache.

One elegant way to implement this would be to add a vectoring-point ACL
that will match "pre-cache" and "post-cache" vectoring points (at
least). That way, you do not need to add new directives but admins can
mangle headers differently on each side. I suspect these ACLs would be
useful in other contexts as well.


As for "eCAP versus squid.conf" mangling, I suggest the following rules
of thumb:

0. Message body mangling belongs to eCAP/ICAP.

1. If header mangling decisions require information contained in the
message body, such mangling belongs to eCAP/ICAP.

2. Header field mangling that cannot be expressed using "add field",
"delete field", or "a regex substitution of the field value" operation
belongs to eCAP/ICAP.

3. All other mangling actions can be supported directly in squid.conf,
at any vectoring point.


Thank you,

Alex.

___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] [RFC] client header mangling

2016-06-09 Thread Amos Jeffries
On 10/06/2016 9:49 a.m., Eliezer Croitoru wrote:
> I am trying to understand so bear with me couple seconds.
> I have seen that there are pages\servers which doesn't state about the 
> User-Agent in the Vary response while still taking it into account.
> 
> The caching side of the picture is storing an object which will never be 
> served.
> The HIT ratio is a whole other story  of the picture.
> 
> Since I am not inside the code but I do try to understand, currently what 
> happens?

Currently what happens is Squid uses the real client headers or
ICAP/eCAP adapted headers when looking up the cached objects variant.

The admin might be fixing the actual response using
request_header_access/replace to craft what the server sees. But that
does not help Squid with the variants. Altering the headers prior to
cache lookup is needed for that, which today means ICAP/eCAP are needed.

I'm proposing making he header alteratiosn affect input as well as output.


> How many lookups are done for\per a request? 

1 or 2 if it is a Vary response.

> Do we run an object lookup after the response headers was received from the 
> server?

No.

> Can we predict a Vary object based on the request only?(I assume that it will 
> be an estimated and not absolute certainty if at all)

No.

> 
> Also let say we have a 1k page ahead, would we want it to be fetched
> from disk\ram store rather then from the origin server after we told
> it we want the object?

This is not relevant. Size of the objects and where each would be placed
is not relevant to the problem.

The issue is that there are a huge, possibly infinite number of such
objects wasting filenum spaces/slots in the cache. There are only 2^25
object slots per cache location, so these huge sets of variants can
really cramp the storage even if they are only 1 bytes in size.


> 
> I am almost sure that lowering the disk and ram stored objects should
> be a goal by itself if we cannot "dig" them up from ram or disk later
> for any use.

Yes, but not relevant to the current decision.

The question here is whether we should allow pre-cache header mangling
by admin as a way to reduce number of objects count for Vary responses.

The alternative is requiring them to use ICAP or eCAP to do it.
Possibly asking someone to write an eCAP module.


> 
> A request_header_replace can work only for "generic" ones such as
> without a language preference such as "br" added to some requests by
> browser add-ons.

No. It can and will work for any header. Just requires the admin to know
what ones to modify and when. Which is still somewhat hard for unusual
headers.

Which is part of why I RFC'd it rather than going ahead and proposing a
patch.

> 
> Now a step further, I can write a tiny ICAP service that will
> "handle" common Vary headers from FireFox and other browsers to test
> how it affects caches in general.

I'd rather eCAP for this than ICAP. But if you think you can do it and
want to try then we can work on the details of what the code needs to do.

Amos

___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] [RFC] client header mangling

2016-06-09 Thread Eliezer Croitoru
I am trying to understand so bear with me couple seconds.
I have seen that there are pages\servers which doesn't state about the 
User-Agent in the Vary response while still taking it into account.

The caching side of the picture is storing an object which will never be served.
The HIT ratio is a whole other story  of the picture.

Since I am not inside the code but I do try to understand, currently what 
happens?
How many lookups are done for\per a request? Do we run an object lookup after 
the response headers was received from the server?
Can we predict a Vary object based on the request only?(I assume that it will 
be an estimated and not absolute certainty if at all)
Also let say we have a 1k page ahead, would we want it to be fetched from 
disk\ram store rather then from the origin server after we told it we want the 
object?

I am almost sure that lowering the disk and ram stored objects should be a goal 
by itself if we cannot "dig" them up from ram or disk later for any use.

A request_header_replace can work only for "generic" ones such as without a 
language preference such as "br" added to some requests by browser add-ons.

Now a step further, I can write a tiny ICAP service that will "handle" common 
Vary headers from FireFox and other browsers to test how it affects caches in 
general.

Eliezer


Eliezer Croitoru
Linux System Administrator
Mobile: +972-5-28704261
Email: elie...@ngtech.co.il


-Original Message-
From: squid-dev [mailto:squid-dev-boun...@lists.squid-cache.org] On Behalf Of 
Amos Jeffries
Sent: Tuesday, June 7, 2016 2:10 PM
To: Squid Developers
Subject: [squid-dev] [RFC] client header mangling

I've been looking at ways to resolve the long Vary discussion going on in 
squid-users with a patch that we can accept into mainline. What they (joe and 
Yuri) have at present works, but only with extra request_header_replace config 
preventing integrity problems.

One way to make useful progress would be to finally add the recurring request 
for request_header_access/replace to work on client messages in a pre-cache 
doCallouts hook rather than only a post-cache hook.

I am imagining this being done on the adapted request headers after ICAP, eCAP 
and URL-rewrite have all done their things. And using the same request_header_* 
directive ACL lists as for outbound traffic.

Any alternative ideas or objections?

Amos

___
squid-dev mailing list
squid-dev@lists.squid-cache.org <mailto:squid-dev@lists.squid-cache.org> 
http://lists.squid-cache.org/listinfo/squid-dev
___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


[squid-dev] [RFC] client header mangling

2016-06-07 Thread Amos Jeffries
I've been looking at ways to resolve the long Vary discussion going on
in squid-users with a patch that we can accept into mainline. What they
(joe and Yuri) have at present works, but only with extra
request_header_replace config preventing integrity problems.

One way to make useful progress would be to finally add the recurring
request for request_header_access/replace to work on client messages in
a pre-cache doCallouts hook rather than only a post-cache hook.

I am imagining this being done on the adapted request headers after
ICAP, eCAP and URL-rewrite have all done their things. And using the
same request_header_* directive ACL lists as for outbound traffic.

Any alternative ideas or objections?

Amos

___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev