Advancing IPv6 properly

2007-02-12 Thread Amos Jeffries
I have been giving the branch a lot of thought and a little testing 
recently.


I have come to the conclusion that the IPv6 branch under squid3 as it 
stands is in need of quite a makeover. I'd like your ideas on this:


   Creating an IPAddress class to replace all the nasty macros 
currently used in the (incomplete) preperation of HEAD for the future 
ipv6 branch merge.



I am not certain at this point if the creation of such a class warrants 
it's own branch. One of the things I'd like guidance on is whether or 
not, and if so from where it would be best to branch.



My reasons:
 -  The dev guides I have read on the site read that squid3 is meant to 
be a C++ program (objects and explicit types _NO_ macros) but has not 
yet been fully ported up as yet. This would form another small step in 
that upgrade path.


 - The current ipv6 branch uses exclusively macros to enable a smooth 
upgrade when the ipv6 side of it is going. This is built into the nature 
of the branch and is partially moved up to HEAD already in preperation. 
Not a nice method of transition in an object-based app.


 - Most of the ipv6 mods still need to be finished and tested anyway, 
so will not suffer greatly from the shrinkage thhis would cause.



Is it worth it? and would anyone with more knowledge of the future code 
than I have like to hazard a guess at an expected timespan for it?



Amos


Re: [squid-users] Re: redirect program technicalities

2007-02-12 Thread Henrik Nordstrom
mån 2007-02-12 klockan 13:14 +0530 skrev Siddhesh PaiRaikar:
 On 2/12/07, Siddhesh PaiRaikar [EMAIL PROTECTED] wrote:
  hi..
 
  Can someone please tell me .. that when squid passes on the URL to another
  program such as say squidguard...
 
  does it write to stdin or somewhere else... and also wen it gets back a URL
  from the external program.. does it get it on stdout...

yes, or it writes to an IPC channel connected to the helpers stdin, and
receives responses on an IPC channel connected to the helpers stdout.

  as when i studied the code of squidguard i found out that squidguard writes
  the URLs to and retrieves them from stdout and stdin respectively buti am
  unable to find that matching code in squid... i tried a lot..

redirect.c, calling helper.c functions for talking to the helper, using
ipc.c functions for setting up the communication.

Regards
Henrik


signature.asc
Description: Detta är en digitalt signerad	meddelandedel


more website work

2007-02-12 Thread Adrian Chadd
I've just modified Duane's html generating scripts (which are in 
~adrian/pages-db
for now) to spit out stuff into the new website directory.

http://new.squid-cache.org/ now has working support, related software/writings 
and
logfile script pages.




Adrian



Re: vary stuff

2007-02-12 Thread Henrik Nordstrom
mån 2007-02-12 klockan 20:46 +0800 skrev Adrian Chadd:
 I'm about to start implementing replacement memory-only store client
 primitives and I'm not fully on top of how the vary code abuses
 store objects to do its thing in store.c.

He, abuses is a good description.

It actually doesn't do very much with the store objects as such, most of
the magic is currently taking place in the request  and used by the
storeLookupByRequestMethod call..

 Would you mind if the Vary support was culled out of the storage work
 branch until I've tidied up the storage manager layer somewhat?

No problem. It's not really that tricky thing to support. The tricky
part was getting it into Squid-2 without a suitable store interface or
even intermediary layer..

The things you need to remember about Vary:ing objects and HTTP caching
in general.

0. Caching specifications in HTTP is primarily concerned with GET
requests resulting in 200 OK or derived responses (i.e. 206/304) and
variants of that 200 OK with N variants per URI on the server identified
uniquely by ETag and/or Content-Location. There is some odd twists like
POST which may return a cachable 200 OK suitable for later GET requests
of the same URI (doubt this is used anywhere btw..).

1. The client-intermediary lookup API needs to be async for it to be
able to do the vary dance. May need multiple store lookups and possibly
a conditional upstream request to find the correct response.

2. In the optimal world each variant has a unique ETag identifying the
response entity (body + entity headers). Such objects may be shared by
multiple request thanks to If-None-Match 304 replies building up the
knowledge of the Vary logics in the cache. Responses not having an ETag
is identified by their request headers selected by Vary and unique for
that request header combination.

3. There vary dance has two different but related results

a) On a cache hit (maching request found), the result is a the
matching response entity (headers + body), based on priory seen request
headers and Vary responses and the object (ETag or unique) this maps to.

b) On a cache miss not finding a matching Request headers + Vary
response header pair one need to find a list of ETag:s of the currently
cached variants (fresh and expired equal) of the URI. Used for building
an If-None-Match conditional request for finding out which (if any)
cached variant is valid for this request.

A twist here is that many server implementers of mainly dynamic gzip
content-encoding (which really really should be done as
transfer-encoding) don't understand that well HTTP and messes up wrt
ETag and Content-Location. Due to this we need a blacklist where ETag
alone isn't trused but must be combined with the Accept-Encoding request
header as well to identify the variants of the URI. The Content-Location
problem will bite us the day we start to follow the RFC and correctly
invalidate variants on changes and I have not yet identified if there is
a similar workaround possible..


Some words on ETag vs Content-Location:

This whole dance is based on the server driven content negotiation
scheme. thought of as a server having multiple variants of the same
object, differing in format (i.e. gif/jpeg/png), language (i.e
sv/en/de), encoding (i.e identify/gzip/deflate), each stored as a unique
file in the http directory of the server and each accessible separately
by unique URIs.

Content-Location defines the exact origin of the response. ETag
identifies the exact version of the response.

ETag is guaranteed to be unique for all variants and for a strong ETag
all versions of the URI so the protocol focuses on ETag in mapping
relations between requests and responses.

Content-Location is mainly used in invalidations to make sure all users
gets the most recently seen version of a variant.



There is still some small details wrt freshness of Vary:ing objects
which I have not fully understood how it's supposed to work. In the
worst case we may need to maintain it separately per request header
combination.


Regards
Henrik



signature.asc
Description: Detta är en digitalt signerad	meddelandedel


Re: Advancing IPv6 properly

2007-02-12 Thread Henrik Nordstrom
mån 2007-02-12 klockan 21:34 +1300 skrev Amos Jeffries:

 Creating an IPAddress class to replace all the nasty macros 
 currently used in the (incomplete) preperation of HEAD for the future 
 ipv6 branch merge.

Yes, what I proposed when the branch was created..

 I am not certain at this point if the creation of such a class warrants 
 it's own branch. One of the things I'd like guidance on is whether or 
 not, and if so from where it would be best to branch.

Depends on what you think is easiest for what you think of doing.

If fixing up the existing branch in-place to make sense feels doable
then do so

If it's easier for you to start over from scratch and then bring over
the pieces which make sense from the existing branch then start a new
branch.

 My reasons:
   -  The dev guides I have read on the site read that squid3 is meant to 
 be a C++ program (objects and explicit types _NO_ macros) but has not 
 yet been fully ported up as yet. This would form another small step in 
 that upgrade path.

Yes.

   - The current ipv6 branch uses exclusively macros to enable a smooth 
 upgrade when the ipv6 side of it is going. This is built into the nature 
 of the branch and is partially moved up to HEAD already in preperation. 
 Not a nice method of transition in an object-based app.

Yes.

   - Most of the ipv6 mods still need to be finished and tested anyway, 
 so will not suffer greatly from the shrinkage thhis would cause.

Quite likely.

 Is it worth it? and would anyone with more knowledge of the future code 
 than I have like to hazard a guess at an expected timespan for it?

I would think much of the groundwork of identifying IPv4 dependent parts
has been done already in the current ipv6 branch. So it shouldn't be
much more than implementing the class, and a big search/replace to make
use of it and to fix up a few remaining #ifdefs.

I would suspect doing it in the existing branch is easier as the code
sections needed to be touched is already identified there.. And
hopefully you may find that some of the transition can even be done
incrementally by redefining the macros to make use of the new class but
you know this code better than anyone else around here..

Regards
Henrik


signature.asc
Description: Detta är en digitalt signerad	meddelandedel


Re: vary stuff

2007-02-12 Thread Adrian Chadd
On Mon, Feb 12, 2007, Henrik Nordstrom wrote:

  Would you mind if the Vary support was culled out of the storage work
  branch until I've tidied up the storage manager layer somewhat?
 
 No problem. It's not really that tricky thing to support. The tricky
 part was getting it into Squid-2 without a suitable store interface or
 even intermediary layer..

Hm! I'll give it a shot this evening. I wouldn't mind it if you yanked
the code out of the storework branch before me though.. ;)

(And thanks for the caching description here!)

 1. The client-intermediary lookup API needs to be async for it to be
 able to do the vary dance. May need multiple store lookups and possibly
 a conditional upstream request to find the correct response.

We'll have to do this to support a number of 'other' things, such as
the types of storedirs people have wanted over the past: eg md5-based
reiserfs access - performance may suffer but the memory footprint will
be drastically smaller!

From what I've heard from others (as I don't have a commercial web cache lab
here) commercial caches treat Vary content very, very simplisticly. We might
want to re-evaluate how we handle Vary - eg allowing for Vary header contents
to be 'normalised' (eg Vary: Accept-Encoding shouldn't Vary based on the
verbatim header contents as UA's are pretty arbitrary with their Accept-Encoding
headers; instead tokenise Accept-Encoding into a number of states and Vary
based on those states.)

But ok. If you or I or someone else feels up to it then lets yank the Vary
support out of the SF storework branch and leave it out until we've got the
rest of the store manager sorted. It does mean we might have trouble finding
testers (wikimedia probably won't as they really do want Vary support) but
I'll see what I can do.




Adrian



Re: icap in squid3

2007-02-12 Thread Alex Rousskov
On Mon, 2007-02-12 at 08:38 +0100, Axel Westerhold wrote:

 Well, the syntax you are proposing is somewhat limited.
 
 Here are my comments:
 
 1.) cn=%u assumes that the used username equals the assigned CN which is
 most of the time wrong. Normally the UID (or in AD the samaccountname) is
 used for authentication. This will lead to a failure using cn=%u

 2.) The given URI is not flexible enough as it assumes that all user object
 will always be located within the same root object. The used syntax provides
 fast access because it will avoid search operations but will fail as soon as
 the object is located in a sub OU or a totally different tree.

 3.) LDAP allows for all kinds of unicode chars we would need to encode
 properly. While this is definately possible I wonder if there really should
 be another encoding scheme impüplemented into squid.

It seems to me that Jeremy is not proposing any syntax, encoding, or URI
format (except perhaps for some default values). He wants to add ability
to use any URI, with any LDAP (or not LDAP) tags. The patch gives user a
set of supported substitutions. The user can use whatever substitution
codes they need in whatever opaque text filling they need. Please see
below for examples.

We should be able to agree on that set without much trouble because
adding more substitutions is not a problem. For example, if somebody
needs a username without a domain, there should be a substitution for
it.

If there are more than 5-7 substitutions, we may want to argue whether
single letter %S substitutions are better than easier-to-remember and
harder-to-mistype ${LongNames}. I would probably vote for the latter,
but it is not a big deal.

Alternative encodings should be supported, of course, perhaps as
$encodingName(string-to-be-encoded) substitution, where
string-to-be-encoded may have variable substitutions? Again, there is an
example below.

 What I think might work better is as follows:
 
 A.) A user authenticates using a proper DN authenticating against an LDAP
 Server.  In this case the username will be the DN and can be transmitted.
 
 B.) The user authenticates using a uid (samaccountname). Either this uid is
 already usable on it's own an we can transfer it without any changes just
 encoding it base64 if requested (which will keep us out of trouble with
 UTF-8 or Unicode chars). In case we get this stuff from a windows user
 sending us a domain prefix, we should be able to split the username into
 domain and username. The hard part will be to find some kind of abstract how
 to transfer this.

Encoding aside, can the above two requirements be expressed as a set of
substitutions?

 What we definately need are the following configuration entries:
 
 A.) Do we need to split the username into parts and if so using which
 seperator. ('' = off or '\' or '+' etc.)

Can the separator be up to the admin? Do we need to define it?

 B.) The X- Header used to transfer the username (bare username without any
 instruction on how to use it (X-Authenticated-User, X-User, X-MyUser, X-Blah
 etc.)

Agreed. The icap_client_username_header option controls that now, but
please see (C) below.

 C.)  The X-Header used to transfer the prefix if any.

Should we just support an arbitrary set of user-configurable header
names, with user-configurable values? If we add substitutions support,
then Squid should not really care about the meaning of the header. For
example,

 icap_client_add_header X-Username $base64($UserName)
 icap_client_add_header X-Prefix bar=$base64($Foo+$Bar)foo=blah
 ...

 D.) Something to force base64 Encoding on above headers

See for a suggestion above.

 This ensures that the ICAP Client get's all the info we might have for the
 user authenticating. This works fine if the ICAP Client will only deal with
 one squid and it's auth scheme. As soon as we have x squids authenticating
 to various sources but only one icap client we need to add some additional
 information for the client to find the correct auth source. So we need to
 tell the ICAP client the used auth (LDAP,WINNT etc) and where the target
 (hostname:port) is.  From there the client should use all infos received to
 build it's internal request.

Can substitutions handle this? Or do we need dynamic support for
selecting an appropriate set of X-Headers, depending on how the user
authenticated?

Cheers,

Alex.


 Am 09.02.2007 21:55 Uhr schrieb Jeremy Hall unter
 [EMAIL PROTECTED]:
 
  Hello,
  
  I can't remember.  What was the decided path for what was once the
  icap_auth_scheme? I recall there was some concern about my suggestion of
  having the ability to use ldap://hostname/cn=%u,dc=%d,dc=name,dc=int
  
  but I don't remember what the outcome was.
  
  _J



Re: icap in squid3

2007-02-12 Thread Axel Westerhold
Hi Alex,

  


Am 13.02.2007 6:03 Uhr schrieb Alex Rousskov unter
[EMAIL PROTECTED]:

 On Mon, 2007-02-12 at 08:38 +0100, Axel Westerhold wrote:
 
 Well, the syntax you are proposing is somewhat limited.
 
 Here are my comments:
 
 1.) cn=%u assumes that the used username equals the assigned CN which is
 most of the time wrong. Normally the UID (or in AD the samaccountname) is
 used for authentication. This will lead to a failure using cn=%u
 
 2.) The given URI is not flexible enough as it assumes that all user object
 will always be located within the same root object. The used syntax provides
 fast access because it will avoid search operations but will fail as soon as
 the object is located in a sub OU or a totally different tree.
 
 3.) LDAP allows for all kinds of unicode chars we would need to encode
 properly. While this is definately possible I wonder if there really should
 be another encoding scheme impüplemented into squid.
 
 It seems to me that Jeremy is not proposing any syntax, encoding, or URI
 format (except perhaps for some default values). He wants to add ability
 to use any URI, with any LDAP (or not LDAP) tags. The patch gives user a
 set of supported substitutions. The user can use whatever substitution
 codes they need in whatever opaque text filling they need. Please see
 below for examples.

My point was that there is no use for a fixed LDAP template due to the way
LDAP is normally build. Let's move on to your proposal about substition and
encoding.
 
 We should be able to agree on that set without much trouble because
 adding more substitutions is not a problem. For example, if somebody
 needs a username without a domain, there should be a substitution for
 it.
 
 If there are more than 5-7 substitutions, we may want to argue whether
 single letter %S substitutions are better than easier-to-remember and
 harder-to-mistype ${LongNames}. I would probably vote for the latter,
 but it is not a big deal.
 
 Alternative encodings should be supported, of course, perhaps as
 $encodingName(string-to-be-encoded) substitution, where
 string-to-be-encoded may have variable substitutions? Again, there is an
 example below.
 

From what I can see there is following available info (if at all):

From Login:

Username
Prefix
Group

and that's it. Maybe my lack of fantasy let's me miss some of the additions.

One comment on a nice feature I would like to have but still considering for
security reasons:

When an ICAP Server requieres auth for user mapping to rules/policies you
sometimes run into a problem with sources with can't auth or destinations
you do not want to require auth for. While you can use ACL's to get this
done easily on squid sometimes the icap clients won't play ball. As a result
some destinations are not using the icap virus scanner/ content system to
make it work. So, maybe but just as a thought, it would be nice to use ACL's
to automatically assign a username für those services so that they can be
properly matched. 



See my further comments below


 What I think might work better is as follows:
 
 A.) A user authenticates using a proper DN authenticating against an LDAP
 Server.  In this case the username will be the DN and can be transmitted.
 
 B.) The user authenticates using a uid (samaccountname). Either this uid is
 already usable on it's own an we can transfer it without any changes just
 encoding it base64 if requested (which will keep us out of trouble with
 UTF-8 or Unicode chars). In case we get this stuff from a windows user
 sending us a domain prefix, we should be able to split the username into
 domain and username. The hard part will be to find some kind of abstract how
 to transfer this.
 
 Encoding aside, can the above two requirements be expressed as a set of
 substitutions?
 

They can but (see below)

 What we definately need are the following configuration entries:
 
 A.) Do we need to split the username into parts and if so using which
 seperator. ('' = off or '\' or '+' etc.)
 
 Can the separator be up to the admin? Do we need to define it?
 

Must be configurable so empty string turns off and non-empty turns on and
defines sperator. Samba for instance allows for Seperator modifications.
Also, this gives squid some flexibility.

 B.) The X- Header used to transfer the username (bare username without any
 instruction on how to use it (X-Authenticated-User, X-User, X-MyUser, X-Blah
 etc.)
 
 Agreed. The icap_client_username_header option controls that now, but
 please see (C) below.
 
 C.)  The X-Header used to transfer the prefix if any.
 
 Should we just support an arbitrary set of user-configurable header
 names, with user-configurable values? If we add substitutions support,
 then Squid should not really care about the meaning of the header. For
 example,
 
  icap_client_add_header X-Username $base64($UserName)
  icap_client_add_header X-Prefix bar=$base64($Foo+$Bar)foo=blah
  ...
 

I like this from a technical point of view. But I can also see my