Re: concurrency attribute and questions.

2009-04-05 Thread Amos Jeffries

louis gonzales wrote:

List,
1) for the concurrency attribute does this simply indicate how many
items in a batch will be sent to the external helper?


Sort of. The strict definition of 'batch' does not apply. Better to 
think of it as a window max-size.


So from 0 to N-concurrency items will be passed straight to the helper 
before squid starts waiting for their replies to free-up the slots.




1.1) assuming concurrency is set to 6 for example, and let's assume
a user's browser session sends out 7 actual URL's through the proxy
request - does this mean 6 will go to the first instance of the
external helper, and the 7th will go to a second instance of the
helper?


1-6 will go straight through probably with the IDs 1-6.
#7 may or may not go straight through, depending if one of the first 6 
was finished at that time.




1.1.1) Assuming the 6 from the first part of the batch return OK and
the 7th returns ERR, will the user's browser session, render the 6
and not render the 7th?


Depends entirely on how the ERR/OK results are used in squid.conf.

(you might be denying on OK or allowing on ERR).


 More importantly, how does Squid know that
the two batches - one of 6, and one with 1, for the 7 total, know that
all 7 came from the same browser session?


There is no such thing as a browser session to Squid.

Each is a separate object, these 7 happen MAY be coming from the same 
IP, but may be different software for all squid cares, or may come from 
more than one IP completely.




What I have currently:
- openldap with postgresql, used for my user database, which permits
me to use the auth_param squid_ldap_auth module to authenticate my
users with.
- a postgresql database storing my acl's for the given user database

Process:
Step1: user authenticates through squid_ldap_auth
Step2: the user requested URL(and obviously all images, content, ...)
get passed to the external helper
Step3: external helper checks those URL's against the database for the
specific user and then determines OK or ERR

Issue1:
How to have the user requested URL(and all images, content, ...) get
passed as a batch/bundle, to a single external helper instance, so I
can collectively determine OK or ERR

Any ideas?  Is the concurrency attribute to declare a maximum number
of requests that go to a single external helper instance?


number of *parallel* requests the helper can process. Most helpers 
shipped with Squid are non-parallel (concurrency=1).



 So if I
set concurrency to 15, should I have the external helper read count++
while STDIN lines come in, until no more, then I know I have X number
in a batch/bundle?


Depends on the language your helper is coded in. As long as it can 
process 15 lines of input in parallel without mixing anything up.


Looks like a perl helper, they can do parallel just fine with no special 
reads needed. But it must handle the extra ID token at the start of the 
line properly.




Obviously there is no way to predetermine how many URL's/URI's will
need to be checked against the database, so if I set concurrency to
1024, presuming to be high enough that no single request will max it
out, then I can just count++ and when the external helper is done
counting STDIN readlines, I can process to determine OK or ERR for
that specific request?


additional point to this:
  the ttl=N option will cache the OK/ERR result for that lookup for N 
seconds. This can greatly reduces the number of tests passed back even 
further.




Issue2:
I'd like to just have a single external helper instance start up, that
can fork() and deal with each URL/URI request, however, I'm not sure
Squid in its current incarnation passes enough information OR doesn't
permit specific enough passback (from the helper) information, to make
this happen.


Squid passes an ID for each line of input. As long as the result goes 
back out stdout of the helper Squid itself forked with that ID at the 
front Squid does not care the order of responses.


You will need to make sure your parallel child stdout/stderr write to 
your parent helpers stdout/stderr. But it should be possible.



Amos
--
Please be using
  Current Stable Squid 2.7.STABLE6 or 3.0.STABLE13
  Current Beta Squid 3.1.0.6


Re: concurrency attribute and questions.

2009-04-05 Thread louis gonzales
Amos,
Thanks for your responses, they make things clearer for me, so now I
can ask better questions :)  What I'd like to do is have my PERL
helper fork as necessary, rather than starting up (children=50) or
(children=100), or N external_acl_type instances which is not
efficient and based off of an indeterminate number of users, 50 or 100
may not be enough, or at times too many.

What settings in the squid.conf line tell Squid the external helper
will fork to handle subsequent objects?  Below is my current line:
external_acl_type eXhelperI children=1 %LOGIN %METHOD %{Host}
/usr/lib/squid/eXhelper.pl

since I set children=1 only one eXhelper.pl starts up with Squid,
with the idea in mind that eXhelper forks children processes as
necessary.  I'm still trying to determine what state information Squid
passes to the external helper besides the %LOGIN/%METHOD... [ below
you mentioned an ID token, are you referring to the %LOGIN ID token?
Or something else? ].

I understand that Squid forks the eXhelper.pl, which means Squid owns
the ppid(parent process ID) of the eXhelper.pl - ideally I'd like to
have this single child then, fork subprocesses too, currently I'm
uncertain what input trigger(or signal) if any exists, to have the
single external helper fork the subprocess to check the object and how
to uniquely ensure the OK or ERR goes back to the calling ID
token

Thank you again - I did write some comments below.

On Sun, Apr 5, 2009 at 6:05 AM, Amos Jeffries squ...@treenet.co.nz wrote:
 louis gonzales wrote:

 List,
 1) for the concurrency attribute does this simply indicate how many
 items in a batch will be sent to the external helper?

 Sort of. The strict definition of 'batch' does not apply. Better to think of
 it as a window max-size.
Louis: So should I have the PERL helper buffer the data passed to
it, rather than reading line by line - if buffer what are the
start and end identifiers?


 So from 0 to N-concurrency items will be passed straight to the helper
 before squid starts waiting for their replies to free-up the slots.
Louis: If 0-(jth) objects belong to a specific user's request and
(jth)-Nth belong to a different user request, assuming concurrency is
set to N, how does one differentiate in the external helper, which set
belongs to who (I'm using the %LOGIN parameter so I know which userID,
as authenticated by ldap, is making the request) - in other words,
after I've determined OK for the 0-(jth) and ERR for the
(jth)-Nth, the specific instance of the helper will need to return two
different values.  Basically my helper checks each one of the squid
passed Objects(URL/%LOGIN) pairs against the ACL's in the postgresql
database.  My use case, guarantees the only end user application will
be a web browser, so with that assumption, when the end user opens
www.foxnews.com, for instance, there are a multitude of objects, so my
specific question is: when squid goes to retrieve all of these objects
for the requesting user, does Squid - a) with concurrency set high
enough, send all of these objects to the same external helper instance
and await a single OR or ERR?  and b) with concurrency off, does
Squid one-to-one object-to-external_helper_instance awaiting for OK
or ERR?



 1.1) assuming concurrency is set to 6 for example, and let's assume
 a user's browser session sends out 7 actual URL's through the proxy
 request - does this mean 6 will go to the first instance of the
 external helper, and the 7th will go to a second instance of the
 helper?

 1-6 will go straight through probably with the IDs 1-6.
 #7 may or may not go straight through, depending if one of the first 6 was
 finished at that time.
Louis: is it ever possible with concurrency enabled, that objects from
two different users will enter into a single external helper instance?



 1.1.1) Assuming the 6 from the first part of the batch return OK and
 the 7th returns ERR, will the user's browser session, render the 6
 and not render the 7th?

 Depends entirely on how the ERR/OK results are used in squid.conf.

 (you might be denying on OK or allowing on ERR).


  More importantly, how does Squid know that
 the two batches - one of 6, and one with 1, for the 7 total, know that
 all 7 came from the same browser session?

 There is no such thing as a browser session to Squid.

 Each is a separate object, these 7 happen MAY be coming from the same IP,
 but may be different software for all squid cares, or may come from more
 than one IP completely.
Louis: right, but Squid obviously has to know which IP the request
came from, in order to serve the page(s), so when the external helper
processes the OK or ERR, certainly those will trace back the path
from which they came to the correct requesting application(browser or
other).



 What I have currently:
 - openldap with postgresql, used for my user database, which permits
 me to use the auth_param squid_ldap_auth module to authenticate my
 users with.
 - a postgresql database storing my acl's for 

concurrency attribute and questions.

2009-04-04 Thread louis gonzales
List,
1) for the concurrency attribute does this simply indicate how many
items in a batch will be sent to the external helper?

1.1) assuming concurrency is set to 6 for example, and let's assume
a user's browser session sends out 7 actual URL's through the proxy
request - does this mean 6 will go to the first instance of the
external helper, and the 7th will go to a second instance of the
helper?

1.1.1) Assuming the 6 from the first part of the batch return OK and
the 7th returns ERR, will the user's browser session, render the 6
and not render the 7th?  More importantly, how does Squid know that
the two batches - one of 6, and one with 1, for the 7 total, know that
all 7 came from the same browser session?

What I have currently:
- openldap with postgresql, used for my user database, which permits
me to use the auth_param squid_ldap_auth module to authenticate my
users with.
- a postgresql database storing my acl's for the given user database

Process:
Step1: user authenticates through squid_ldap_auth
Step2: the user requested URL(and obviously all images, content, ...)
get passed to the external helper
Step3: external helper checks those URL's against the database for the
specific user and then determines OK or ERR

Issue1:
How to have the user requested URL(and all images, content, ...) get
passed as a batch/bundle, to a single external helper instance, so I
can collectively determine OK or ERR

Any ideas?  Is the concurrency attribute to declare a maximum number
of requests that go to a single external helper instance?  So if I
set concurrency to 15, should I have the external helper read count++
while STDIN lines come in, until no more, then I know I have X number
in a batch/bundle?

Obviously there is no way to predetermine how many URL's/URI's will
need to be checked against the database, so if I set concurrency to
1024, presuming to be high enough that no single request will max it
out, then I can just count++ and when the external helper is done
counting STDIN readlines, I can process to determine OK or ERR for
that specific request?

Issue2:
I'd like to just have a single external helper instance start up, that
can fork() and deal with each URL/URI request, however, I'm not sure
Squid in its current incarnation passes enough information OR doesn't
permit specific enough passback (from the helper) information, to make
this happen.

Any deeper insights, would be tremendously appreciated.

Thanks,

-- 
Louis Gonzales
BSCS EMU 2003
HP Certified Professional
louis.gonza...@linuxlouis.net