Re: BZR Errors

2012-02-17 Thread Pieter De Wit

On 11/02/2012 20:49, Pieter De Wit wrote:

Hi All,

Following the wiki instructions from:

http://wiki.squid-cache.org/BzrInstructions?action=show&redirect=Squid3VCS 
<http://wiki.squid-cache.org/BzrInstructions?action=show&redirect=Squid3VCS> 



I am running into the following errors (clean BZR install on Ubuntu 
12.04 dev branch)


:~/source$ bzr init-repo --1.14 squid
Shared repository with trees (format: 1.14 or 1.9)
Location:
  shared repository: squid
:~/source/squid$ bzr branch --bind 
http://bzr.squid-cache.org/bzr/squid3/trunk

Branched 12041 revisions.
New branch bound to http://bzr.squid-cache.org/bzr/squid3/trunk
pieter@srv1:~/source/squid$ bzr branch trunk PDW
bzr: ERROR: Not a branch: "/home/pieter/source/squid/trunk/".

This is the step at making a branch to hack on.

Any ideas ?

Thanks !

Pieter
Turns out something is lacking in the "plain" method. the "cbranch" 
worked fine


Cheers,

Pieter


Re: Storing of information

2012-02-17 Thread Pieter De Wit



ICAP is not ideal for implementing control points other than at the
beginning of messages. It would be awkward and performance-expensive to
use ICAP for quota control if you want to change things in the middle of
a transaction. ICAP also lacks proactive notifications from the ICAP
server to Squid, which would be nice for certain quota operations.

Compared to ICAP, eCAP has a much lower performance overhead, but has a
similar beginning-of-message design limitation. If you want to do this
using loadable modules, we could discuss extending eCAP, but I am not
sure it is the best approach.


I recommend considering reshaping client_db code to work with shared
memory so that it can work correctly in SMP mode. I do not think that
would be very difficult to do and it would be immediately useful.

Once that is done, you can add a control message queue and use external
processes to manage the quota database as needed. This design would
minimize performance impact while allowing for any external quote
management programs to co-exist with Squid.


HTH,

Alex.
Yip, seems to be the general feeling out there. With this redesign, 
would iCAP and eCAP not move into the same "hooks" ? They might not be 
external programs, but the connections to the servers can be made at 
that time ?


Alex, would it be ok if I pop a few emails over to you to assist with 
the SMP stuff, if needed ?


Cheers,

Pieter


Re: Storing of information

2012-02-17 Thread Pieter De Wit


The download even if known-length can be aborted at any time, also the 
backend system may change the quota at any time as well.
So IMO the best idea is to collpase the requests all down to a request 
asking for N bytes and passing along any parameters which the quota 
backend needs.


The basic idea was started here:
  http://bugs.squid-cache.org/show_bug.cgi?id=1849

Looking back at the discussion thread it was started by you in Feb 
2009 the model description is here 
http://marc.info/?l=squid-dev&m=123570800116923&w=1. Although it seems 
I sent you something in private before that with more details. Sorry 
that mail is gone now.



The Measurement Factory have since created the client_delay_pool part 
of it but without any helper hooks. So the current is only /sec 
capping. Adding a helper API hook that sets the client DB  quota field 
values and updates it when exhausted


That is fully controllable already with per-request limitations and 
speeds.


The big cases that are left is fixed-size quotas that run down. No 
need for lookups with details from particular headers or such at this 
point.





Yip - I remember this now. I am sure myself and Alex also exchanged some 
emails regarding this and then I ran out of time and left the idea 
there. I was also afraid of stepping on Alex's work (this happen in a 
prev. project, cost me hours of coding), which now completed, won't be 
an issue anymore.


I'll get my BZR working and then spend some time on client_db.cc

Cheers,

Pieter


Re: Storing of information

2012-02-11 Thread Pieter De Wit


* the parsing bottleneck gets crunched several times: on first 
arrival, in the ICAP server, and on return to Squid,
* the ICAP server bypass optimization can't be used since quote needs 
to measure every byte,

* tunneled data does not get sent to ICAP services,

Not exactly perfect service, but it offers the most complete quota 
control without adding complexity to Squid.


eCAP might be a slightly better. It sill runs inside Squid and has 
some processing overhead, but should reduce the parse problems and 
network delays involved with ICAP.




Points to reading URL's are more than welcome, also, so is examples 
of libicapapi :)


Hopefully someone else knows some then, because I dont :(

Amos

Hi Amos,

You said that you proposed some work a while ago, would you mind sharing 
that? I gave the network thing some thoughts and I can see how the delay 
would hurt squid. I kept on comparing it to milters, but these don't 
mind a few ms delay, email is a lot less interactive.


The thought process I am going with is something along the lines of a 
process that is "spoken" to, like ecap perhaps, via pipes or a lib or 
some such. This process will be notified based on the following:


(* - Request, **-Reply)

* I would like to go to protocol://site
** Is there quota left to allow this, if the user has 0 quota left, 
block the request, no use

* The server said the object is X bytes long, can I continue to download it
** Yes, there is quota. The problem comes in if the server didn't give a 
length, if that is the case, perhaps only allow 1024 bytes until his 
quota runs out. There is also the problem if the server said the object 
is bigger than it really is...

* Can I sent the following 1024 bytes
** Yes, there is quota.

At any given step, if the quota runs out, the connect is aborted. This 
will involve some tie in with the FD struct that you guys have already. 
I do recall myself and Alex having a chat about this. I referred to it 
as "hooks" into the FD struct. I *think* the talk about "hooks" in the 
FD struct was aborted because it didn't add enough value at the time, or 
real life caught up to me or or or :)


Based on this, I would like to re-float the idea of "hooks" in the FD 
struct.  From the top of my head, one would have modules that expose 
certain function/procedures:


OnClientConnect (source_ip,source_port,target_ip,target_port);
OnClientRequest (URL);
OnClientRequestContent (content,size,offset);
OnClientResponse (URL,size);
OnClientResponseContent (content,size,offset);
OnClientDisconnect ();

I will outright say, I have no clue how modules work (thinking about 
apache etc) and these are shamelessly based on my Delphi XP with Objects.


Cheers,

Pieter

P.S. Might be worth starting a new thread perhaps ?


Re: Storing of information

2012-02-10 Thread Pieter De Wit


I like the idea of pushing it off into ICAP. That does being up the 
problem that things like auth loops, errors and related self-DoS 
events are omitted from the quota counts. Also tunnels are adapted 
only for the HTTP headers portion, once they get to the blind-tunnel 
parts its direct byte shuffling between two TCP sockets.




In squid it would have to be a AsyncJob to start with, since the 
memory spaces are still too much twisted together to add threading 
cleanly. When that is working, splitting process or thread away may be 
an option for improving over the Job.



Scrap that idea then :)

From my limited playing with the ICAP protocol, what would be the short 
comings ? I looked at RESPMOD mode, which seems to be "Respond Mode", as 
in, modify the response. I can't see the short fall here, it had the 
length and all that ?


Points to reading URL's are more than welcome, also, so is examples of 
libicapapi :)


Cheers,

Pieter


BZR Errors

2012-02-10 Thread Pieter De Wit

Hi All,

Following the wiki instructions from:

http://wiki.squid-cache.org/BzrInstructions?action=show&redirect=Squid3VCS 



I am running into the following errors (clean BZR install on Ubuntu 
12.04 dev branch)


:~/source$ bzr init-repo --1.14 squid
Shared repository with trees (format: 1.14 or 1.9)
Location:
  shared repository: squid
:~/source/squid$ bzr branch --bind 
http://bzr.squid-cache.org/bzr/squid3/trunk

Branched 12041 revisions.
New branch bound to http://bzr.squid-cache.org/bzr/squid3/trunk
pieter@srv1:~/source/squid$ bzr branch trunk PDW
bzr: ERROR: Not a branch: "/home/pieter/source/squid/trunk/".

This is the step at making a branch to hack on.

Any ideas ?

Thanks !

Pieter


Re: Storing of information

2012-02-10 Thread Pieter De Wit

On 11/02/2012 15:43, Amos Jeffries wrote:

On 11/02/2012 8:34 a.m., Pieter De Wit wrote:

Hi Guys,

So I saw on the mailing list the question about quotas came up again. 
I thought I would give it a shot (planning etc) and I was wondering, 
does "Squid" have a way to store data ?


In Quota you will have Bandwidth (bytes, not per second) that has to 
be checked/adjusted and updated to disk. Instead of writing my own 
routines to store this in my "own" format, it would be nice to have 
something that grows with Squid.


Discard the idea of disk formats for quota control. You are dealing 
with individual packet read/writes at this level. Everything of 
importance needs to be in RAM. Squid has a client_db memory cache 
which stores statistics and details about each unique client. Several 
transaction state controllers (ConnStateData, TunnelStateData, 
*ServerData) already interact with that for per-client bandwidth 
reporting.


At the more abstracted level, semi-accurate quotas would need the 
client_db to be backed up on disk or somewhere periodically. (oh yay, 
yet another event to cause "squid keeps pausing" bugs).


 Or alternatively, the design I worked out a years or so ago uses a 
helper process to query some database or system managing client 
quotas. There is a lot of interest in RADIUS and ActiveDirectory 
backends controlling this type of thing, and a helper interface is 
much more flexible than pre-determined config formats. You can query 
this helper on each new client to receive a quota value which gets 
used up then re-checked. We still get some slowdown from the lookups, 
but it is up to the admin to configure how much quota bytes the helper 
requests each cycle and thus how much overhead they add.


Amos


Hi Amos,

Thanks - I second the idea of a helper process, I like the "milter" type 
idea where this can run on another box, away from Squid. I have been 
messing around with ICAP to do this, what are your thoughts on that ? 
IMHO, the time part of quotas should be handled by ACL's, it is 
impossible for Squid to know how long someone has been on the net.


Back to the "sync" of client_db. I agree that the working set of the DB 
should be in memory, but would a threaded approach slow squid down "that 
much" ? I am thinking along the lines of a pthread that just sits there 
and lazy writes dirty objects to disk. A quick lock of the global mutex, 
copy the object, unlock mutex, write object, rinse repeat. There will 
also have to be a shutdown hook...the helper process is looking pretty 
good round about now :)


For some reason, version control systems and I have never been friends. 
I can't get a copy of the source code using BZR (using the instructions 
on the Squid wiki), so I will build against the latest .tar.gz that I 
can find. I also suspect this will be new files and very little 
modifying of current files.


Cheers,

Pieter


Storing of information

2012-02-10 Thread Pieter De Wit

Hi Guys,

So I saw on the mailing list the question about quotas came up again. I 
thought I would give it a shot (planning etc) and I was wondering, does 
"Squid" have a way to store data ?


In Quota you will have Bandwidth (bytes, not per second) that has to be 
checked/adjusted and updated to disk. Instead of writing my own routines 
to store this in my "own" format, it would be nice to have something 
that grows with Squid.


I can't see that squid would have needed this up to now so perhaps this 
is a good time to get something like that in ?


I am thinking of a simple GET/SET/DEL system that will take two 
parameters (except DEL), one a key and one a value. If an object doesn't 
exists then a GET will return NULL. SET will create an object, otherwise 
it will update the value.


Right, time to grab the code :)

Cheers,

Pieter


Re: Squid and SSI

2012-01-30 Thread Pieter De Wit

Hi - Sorry for jumping in :)

On Tue, 31 Jan 2012, Amos Jeffries wrote:


On 28.01.2012 09:46, goro goro wrote:

Dear Sir(s)



Greetings and welcome.


    I Am a system administrator at an ISP, Squid is at the Heart of
it, Caching is important to me , as is Through Put.

I have Been trying for months now , fiddiling with /Proc anyway I
can, and with the squid.conf, also with the Limits.conf

I finaly was able to reach a throughput of 80Mb which is not even close
to what I really need, One of the Server's Reached 120Mb, but i wasnt
able to know why!!!.


What hardware are you using ? (Memory/disks/CPU)


Depends on what criteria the load balancing is using.


I have used (and not written a wiki page on this yet) heartbeat/ldirectord 
without issues to load share proxies, ICCP is your friend here. If you 
want we can go thru those setups



3 : I am looking for a patch that can Write the Access Log Directlly to
Oracle or Mysql , unlike squid2Mysql , I dont want it to read the
file Access.log and then write it, I need squid to write it to the DB
directlly, Is that even possible ?


I would ship the logging to another machine, then process it/move it into 
mysql. Syslog is your friend here


Cheers,

Pieter

Re: [RFC] cache architecture

2012-01-24 Thread Pieter De Wit


Hard to implement given the current "leg work" is already done ? How 
well does the current version of squid handle multicores and can this 
take advantage of cores ?


Should be easy. We have not exactly checked and cocumented the DiskIO 
library API. But
The current AIO handles SMP exactly as well as the system AIO library 
can, same for pthreads library behind DiskThreads.


Taken from iscsitarget - they have a "wthreads x" config option that 
spawns x number of threads for write only I believe, not sure of the 
reading. You can't control this in the AIO Lib. (I think ?) but perhaps 
something like this could be useful for pthreads




The cache_dir can report this up to the top layer via their loading 
factor when they are not servicing requests. I was considering it to 
prioritise CLEAN builds before DIRTY ones or cache_dir by the speed of 
its storage type and loading factor.


It seems we are heading to naming cache_[dir|mem] otherwise the options 
might become confusing ? Almost the same as "cache_peer name=bla" (While 
writing this below I came up with another idea, storage weights which 
might solve "my issue" with the double object store)


cache_dir name=sata /var/dir1 128G 128 128
cache_io_lib sata AIO
cache_rebuild_weight sata 1
cache_request_weight sata 100
cache_state sata readonly
cache_weight sata 100

cache_dir name=ssd /var/dir2 32G 128 128
cache_io_lib ssd pthreads
cache_rebuild_weight ssd 100
cache_request_weight ssd 1
cache_state ssd all
cache_weight ssd 10

cache_mem name=mem1 1G
cache_state mem1 all
cache_weight mem1 1

(I feel the memory one is "out of place" but perhaps someone else has 
another idea/thought process - why would you need two cache_mem's ?)


What I wanted to show above was the use of "name=" in cache_dir, that 
lead to another idea, "cache_weight". So we are happy that the options 
are now settable per cache_dir :)


cache_weight will allow an admin to specify the "transit cost" of 
objects in a cache. Squid starts up and wants to serve objects as 
quickly as we can. Memory can be used without issues right away for 
caching. Now we start to initialize the disk caches. In my example 
above, the ssd cache should init before the sata giving us some storage 
space. During the init of the "sata" cache, the memory allocation is 
already filled up, squid starts expiring objects to the next cost, so an 
object would travel from memory, to "ssd" (much the same as it does now ?)


Now, "sata" is still busy with init'ed, but "ssd" has also filled up, so 
we are forced to retire an object in "ssd", much like we do now. Once 
"sata" is done, it will join the queue, so objects will expire like:


mem1->ssd->sata (ignoring the fact that it's set to read-only for now)

If we have an object already in the "sata" cache that is new in "ssd" we 
would expire that object as soon as the "sata" cache is done setting up. 
We do how ever now have the overhead of reading an object, writing it 
somewhere else (please please please admins - make it other spindles !!! 
:) ), freeing the original space, then write the new object.


Another example:

"sata" init'ed before "ssd":

Before init: mem1->sata
After init: mem1->ssd->sata

Now we could have the problem of sata and ssd having the same object. We 
would expire the higher cost one (the one in sata) since the object is 
required more than we "thought" ? This is the *only* way object can 
travel "up" the disk cost chain, otherwise we could be throwing objects 
between cache's all day long.


Let's stop there while I am ahead :)





For every x requests, action an "admin/clean up" request, unless 
"Queue 1" is empty, then drain "Queue 2"


I am also thinking of a "third" queue, something like:

Queue 1 - Write requests (depends on cache state, but has the most 
impact - writes are slow)

Queue 2 - Read requests (as above, but less of an impact)
Queue 3 - Admin/Clean up

The only problem I have so far is Queue 1 is above Queue 2.they 
might be swapped since you are reading more than writing ? Perhaps 
another config option.


cache_dir /var/dir1 128G 128 128 Q1=read Q2=write (cache_dir syntax 
wrong)
cache_dir /var/dir2 32G 128 128 Q1=write Q2=read (as above, but this 
might be on ssd)


I think this might be going too far ?

Cheers,

Pieter



No comments on the three queue's per cache "space" ?


Re: [RFC] cache architecture

2012-01-24 Thread Pieter De Wit
Sorry - my mail client messes with the layout, I will double space from 
now on :)


Spotted a few mistakes with my suggestion:

On 24/01/2012 23:02, Pieter De Wit wrote:


Perhaps a 9) Implement dual IO queues - I *think* the IO has been 
moved into it's own thread, if not, the queuing can still be 
applied. Any form of checking the cache is going to effect squid, so 
how do we ensure we are idle, dual queues :) Queue 1 holds the 
requests for squid, queue 2 holds the admin/clean up requests. The 
IO "thread" (if not threaded), before handling an admin/clean up 
request checks Queue 1 for requests, empties is *totally before* 
heading into Queue 2. This will allow you to have the same caching 
as now, relieving the start-up problems ? Might lead to the same 
double cache of objects as above (if you make the cache writable 
before the scan is done)


I wonder about priority queues every now and then. It is an 
interesting idea. The I/O is currently done with pluggable modules 
for various forms. DiskThreads and AIO sort of do this but are FIFO 
queued in N parallel queues. Prioritised queues could be an 
interesting additional DiskIO module.
Hard to implement given the current "leg work" is already done ? How 
well does the current version of squid handle multicores and can this 
take advantage of cores ?


What I'm looking for is a little bit more abstracted towards the 
architecture level across cache type and implementation. At that 
scale we can't use any form of "totally empty" queue condition 
because on caches that receive much traffic the queue would be quite 
full, maybe never actually empty. Several of the problems we have now 
are waiting on the cache load completed (ie the load action queue 
empty) before a cache is even considered for use.


Amos
At that scale, no matter what you do, you will impact performance/your 
"wanted" outcome. It's about reaching an acceptable balance which I 
think, you, as a dev, will have a hard time predicting for any real 
life usage out there. Perhaps "we" (in " since I am yet to contrib a 
single line of code :) ) can make it "Weighted Priority" and as such, 
have squid.conf options to tune it. The Admin has to decide how 
aggresive squid must be at rebuilding (makes me think of the raid 
rebuild options in HP RAID controllers) the cache. I am thinking of:


cache_rebuild_weight <0-"max int"> ?


Can't be zero since we won't rebuild then, but what if we want have more 
than 1 per 1, maybe we should have 2 options:


cache_rebuild_weight <1-max int>
cache_request_weight <1-max int>

?


For every x requests, action an "admin/clean up" request, unless 
"Queue 1" is empty, then drain "Queue 2"


I am also thinking of a "third" queue, something like:

Queue 1 - Write requests (depends on cache state, but has the most 
impact - writes are slow)

Queue 2 - Read requests (as above, but less of an impact)
Queue 3 - Admin/Clean up

The only problem I have so far is Queue 1 is above Queue 2.they 
might be swapped since you are reading more than writing ? Perhaps 
another config option.


cache_dir /var/dir1 128G 128 128 Q1=read Q2=write (cache_dir syntax 
wrong)
cache_dir /var/dir2 32G 128 128 Q1=write Q2=read (as above, but this 
might be on ssd)


I think this might be going too far ?

Cheers,

Pieter

Also, if we have the "squid.state" loaded, what stops us from writing 
objects in free space, if there is ? We know how big the cache is/was 
and how big it's allowed to be ? As before, this will lead to the double 
storage of objects, but, this can be free'd


Cheers,

Pieter


Re: [RFC] cache architecture

2012-01-24 Thread Pieter De Wit


Perhaps a 9) Implement dual IO queues - I *think* the IO has been 
moved into it's own thread, if not, the queuing can still be applied. 
Any form of checking the cache is going to effect squid, so how do we 
ensure we are idle, dual queues :) Queue 1 holds the requests for 
squid, queue 2 holds the admin/clean up requests. The IO "thread" (if 
not threaded), before handling an admin/clean up request checks Queue 
1 for requests, empties is *totally before* heading into Queue 2. 
This will allow you to have the same caching as now, relieving the 
start-up problems ? Might lead to the same double cache of objects as 
above (if you make the cache writable before the scan is done)


I wonder about priority queues every now and then. It is an 
interesting idea. The I/O is currently done with pluggable modules for 
various forms. DiskThreads and AIO sort of do this but are FIFO queued 
in N parallel queues. Prioritised queues could be an interesting 
additional DiskIO module.
Hard to implement given the current "leg work" is already done ? How 
well does the current version of squid handle multicores and can this 
take advantage of cores ?


What I'm looking for is a little bit more abstracted towards the 
architecture level across cache type and implementation. At that scale 
we can't use any form of "totally empty" queue condition because on 
caches that receive much traffic the queue would be quite full, maybe 
never actually empty. Several of the problems we have now are waiting 
on the cache load completed (ie the load action queue empty) before a 
cache is even considered for use.


Amos
At that scale, no matter what you do, you will impact performance/your 
"wanted" outcome. It's about reaching an acceptable balance which I 
think, you, as a dev, will have a hard time predicting for any real life 
usage out there. Perhaps "we" (in " since I am yet to contrib a single 
line of code :) ) can make it "Weighted Priority" and as such, have 
squid.conf options to tune it. The Admin has to decide how aggresive 
squid must be at rebuilding (makes me think of the raid rebuild options 
in HP RAID controllers) the cache. I am thinking of:


cache_rebuild_weight <0-"max int"> ?

For every x requests, action an "admin/clean up" request, unless "Queue 
1" is empty, then drain "Queue 2"


I am also thinking of a "third" queue, something like:

Queue 1 - Write requests (depends on cache state, but has the most 
impact - writes are slow)

Queue 2 - Read requests (as above, but less of an impact)
Queue 3 - Admin/Clean up

The only problem I have so far is Queue 1 is above Queue 2.they 
might be swapped since you are reading more than writing ? Perhaps 
another config option.


cache_dir /var/dir1 128G 128 128 Q1=read Q2=write (cache_dir syntax 
wrong)
cache_dir /var/dir2 32G 128 128 Q1=write Q2=read (as above, but this 
might be on ssd)


I think this might be going too far ?

Cheers,

Pieter



Re: [RFC] cache architecture

2012-01-23 Thread Pieter De Wit

Hi Amos,

My 2c :)

I know you have noted it already, but please please don't change the 
start to something like another project does (start up dirty scan - I 
have personally seen a 12 hour plus start up time). Why they still on 
that, who knows...anyways, back to the email here :)


Alex, sorry if you covered these things, I have briefly skimmed over 
your email (was in transit) so count them as a +1.


On 24/01/2012 15:24, Amos Jeffries wrote:
This is just a discussion at present for a checkup and possibly 
long-term re-design of the overall Architecture for store logics. So 
the list of SHOULD DO etc will contain things Squid already does.


This post is prompted by 
http://bugs.squid-cache.org/show_bug.cgi?id=3441 and other ongoing 
hints about user frustrations on the help lists and elsewhere.


Getting to the chase;

 Squids existing methods of startup cache loading and error recovery 
are slow with side-effects impacting bandwidth and end-user experience 
in various annoying ways. The swap.state mechanism speeds loading up 
enormously as compared to the DIRTY scan, but in some cases is still 
too slow.



Ideal Architecture;

Squid starts with assumption of not caching. Cache spaces are loaded 
as soon as possible with priority to the faster types. But loaded 
asynchronously to the startup in a plug-n-play design.


1) Requests are able to be processed at all times, but storage ability 
will vary independent of Squid operational status.

+ minimal downtime to first request accepted and responded
- lost all or some caching benefits at times
(+) The memory cache will be empty, "plenty" of space for 
objectsmore on this later (not sure about squid -k reconfigure etc...)


2) cache_mem shall be enabled by default and first amongst all caches
+ reduces the bandwidth impact from (1) if it happens before first 
request
+ could also be setup async while Squid is already operating (pro from 
(1) while minimising the con)


3) possibly multiple cache_mem. A traditional non-shared cache_mem, a 
shared memory space, and an in-transit unstructured space.
+ non-shared cache_mem allows larger objects than possible with the 
shared memory.
+ separate in-transit area allows collapsed forwarding to occur for 
incomplete but cacheable objects
  note that private and otherwise non-shareable in-transit objects are 
a separate thing not mentioned here.
- maybe complex to implement and long-term plans to allow paging 
mem_node pieces of large files should obsolete the shared/non-shared 
split.
I guess the multiple cache_mem's will "load share" like the cache_dir's 
atm ? Not sure why I mention this


4) config load/reload at some point enables a cache_dir
+ being async means we are not delaying first response waiting for 
potentially long slow disk processed to complete

- creates a high MISS ratio during the wait for these to be available
- adds CPU and async event queue load on top of active traffic loads, 
possibly slowing both traffic and cache availability


5) cache_dir maintains distinct (read,add,delete) states for itself
+ this allows read-only (1,0,0) caches, read-and-retain (1,1,0) caches
+ also allows old storage areas to be gracefully deprecated using 
(1,0,1) with object count decrease visibly reporting the progress of 
migration.
+2 on this. Perhaps add a config option to cache_mem and cache_dir 
called "state", e.g: state=ro|rw|"expire", so "ro,expire" will allow a 
cache to be expired, aka, objects deleted (maybe Alex covered this ?)


6) cache_dir structure maintains a "current" and a "max" available 
fileno setting.
current always starting at 0 and being up to max. max being at 
whatever swap.state, a hard-coded value or appropriate source tells 
Squid it should be.
+ allows scans to start with caches set to full access, but limit the 
area of access to a range of already scanned fileno between 0 and 
current.
+ allows any number of scan algorithms beyond CLEAN/DIRTY and while 
minimising user visible impact.

+ allows algorithms to be switched while processing
+ allows growing or shrinking cache spaces in real-time

7) cache_dir scan must account for corruption of both individual 
files, the index entries, and any meta data construct like swap.state


8) cache_dir scan should account for externally added files. 
Regardless of CLEAN/DIRTY algorithm being used.
   by this I mean check for and handle (accept or erase) cache_dir 
entries not accounted for by the swap.state or equivalent meta data.
+ allows reporting what action was taken about the extra files. Be it 
erase or import and any related errors.



Anything else?

Amos


When you mentioned "cache spaces" I thought:

Why not have blocks allocated (or block devices) like this:

Block 0+ (of said block "thing") - This is a header/info block/swap 
state replacement

Block x/y/z - Allocated to objects

After reading block 0, you know what blocks are free and you can start 
caching in those, yes this will cause double objects, but you g

Re: [RFC] Package download pages

2012-01-22 Thread Pieter De Wit

Hi Guys,

I might miss something here, but how about:

/mirrors/ <-- Mirror info here
/downloads/ <-- Packages, with a link to "mirrors"

Under /downloads/ we could have /major_version/version/files.bla e.g:

/downloads/3.1/3.1.7/squid-3.1.7.deb

/Version can still link to /downloads/

Cheers,

Pieter

On Mon, 23 Jan 2012, Amos Jeffries wrote:

The current website layout for Versions/ and Download/ is a little confusing 
and I would like to combine the two into a simpler form.


The main problem is the download links and section called Download/ is just 
documentation about downloading and mirrors. The actual download files 
themselves are under Versions/. The link between the two is an easily 
overlooked link on Download/ and there are several click layers to wade 
through before any actual package link becomes available.


I would like to make the Download/ index page a redirect to Versions/ index 
page, move the Download/ text to an intro for the Versions/ page, and add 
some automated links from the Versions/ page straight to the latest tarballs.
Eventually I would like to link to the OS distro package sources of the 
stale release(s) from here too and get rid of all those "where do I find 
binaries for distro X" FAQs.



Comments? Other ideas?


Amos




Re: Ip::Address performance

2010-11-19 Thread Pieter De Wit

Hey Guys,

I am joining the thread a bit late and I might miss a few things, but 
wouldn't this be a simple solution:


(From memory, this is pretty close to the "union" one Alex suggested)

NB : There is a very very small chance that an IPv6 address can be 
converted into an IPv4 address, from memory, it's the 2001:: addresses, 
but we can google that to confirm.


* We have a struct that contains both the IPv4 and IPv6 struct
* When an IPv4 address is needed, it's stored in the IPv4 struct (same 
with IPv6)
* If we have an IPv4 address and we need to return an IPv6 address the 
conversion is done once, then the result is stored in the struct (IMO, a 
simple 'if (struct.we_have_ipv6==1) ...' should do the trick


The other side of this is that we have 3 methods for returning the address:

* Get_IPv6 ();
* Get_IPv4 ();
* Get_IP ();

The purpose of 'Get_IP ()' is so that we can retrieve what we need, with 
a simple command and not really care about what is stored. As an example:


* if we only have an IPv6 address for someone, Get_IP will return that, 
otherwise IPv4, or an error if we have nothing.
* if we have both IPv4 and 6, and we can use both, then for the time, we 
"prefer" IPv4, and we return that
* Get_IPv4 will return an error if the IPv6 address can't be converted 
(but really, I can't see the need for this, just covering bases)


IPv6 is gaining speed, but it's not as well connected as we would have 
like to see. The point is that squid, imho, should still default to the 
IPv4 address. There can be factors that change this (config file/checks 
at startup - like no ipv4 address ?/etc), but these will only effect 
'Get_IP' and the checks are 'light' or limited startup/config re-read code.


As a side note, the two struct can be populated by the resolver.

Hope I am not way off track here :)

Cheers,

Pieter

On 20/11/2010 06:31, Alex Rousskov wrote:

On 11/19/2010 12:06 AM, Amos Jeffries wrote:

On 19/11/10 09:27, Alex Rousskov wrote:

Hello,

This email summarizes my questions and suggestions related to 
optimizing

Ip::Address implementation.

Disclaimer: I am writing from general performance and API principles
point of view. I am not an IP expert and may miss some critical
portability concerns, among other things. I am counting on Amos and
others to audit any ideas worth considering.


Ip::Address is actually a socket address. Besides the IP address 
itself,

the class maintains the socket address family and port (at least).

Currently, Ip::Address is designed around a single sockaddr_in6 data
member that holds the address family, socket port, the IP address
itself, and some extra info. This is 28 bytes of storage, most of which
are used in case of an IPv6 address.

If we were to deal with IPv4 addresses only, the corresponding "native"
IPv4 sockaddr_in structure takes 16 bytes, and only the first 8 of 
which

are normally used.

Currently, Ip::Address implementation strategy can be summarized as
"convert any input into a full-blown IPv6 address as needed, store that
IPv6 address, and then convert that IPv6 address into whatever the code
actually needs". This is a simple, solid approach that probably helped
eliminate many early IPv6 implementation bugs.

Unfortunately, it is rather expensive performance-wise because the 
input

is usually not an IPv6 address and the output is usually not an IPv6
address either. While each single conversion (with the exception of
conversion from and to textual representation) is relatively cheap,
these conversions add up. Besides, the conversions themselves and even
the "what am I really?" tests are often written rather inefficiently.

For example, to go from an IPv4 socket address to an a.b.c.d textual
representation of that address, the current code may allocate,
initialize, use, and discard several sockaddr_in6 structures, do a 
dozen

of sockaddr_in6 structure scans for certain values, and copy parts of
sockaddr_in and sockaddr_in6 structures a few times a long the way to
the ntoa() call. After the ntoa() call, scan the resulting string to
find its end.

The old code equivalent of the above? Initialize one IPv4 sockaddr_in
structure on stack and pass it to ntoa()!


I see three ways to optimize this without breaking correctness:

* one-struct: Keep using a single sockaddr_in6 data member and convert
everything into an IPv6 socket address as we do now. Optimize each
conversion step, remove repeated "who am I?" checks during each step,
and optimize each check itself.


All of the storage methods require them to identify the type accessed.
The only way to completely avoid them is to revert to v4-only or v6-only
compile modes.

Yes they seriously need some optimization and reduction.


Yes, some checks are unavoidable, of course. What I was referring to 
is replacing the current


if (I am vX) ... do a little ...
if (I am vX) ... do a little ...
if (I am vX) ... do a little ...
if (I am vX) ... do a little ...

code with something closer to this

Re: Squid Idea

2009-12-15 Thread Pieter De Wit
Hi Thomas,

You sure can - the option is called "cache_peer_acl" - google it for some 
examples.

BTW - This is the squid-dev group, not the "general support" one - I think your 
query is more suited to the general support one. From the top of my head, it's 
email is sq...@squid-cache.org

Cheers,

Pieter
  - Original Message - 
  From: Thomas Grant (Messagelabs) 
  To: squid-dev@squid-cache.org 
  Sent: Wednesday, December 16, 2009 03:03
  Subject: Squid Idea


  Hi There,

   

  Is it possible using Squid to say the following:

   

  If browsing all sites, go to PROXY1.MYPROXY.COM however if browsing 
BBC.CO.UK, GOOGLE.COM, HOTMAIL.CO.UK (examples only) go to PROXY2.MYPROXY.COM?

   

  Basically, the option to set a different web proxy address for a given list 
of IP or Domains that you are trying to reach?

   

  Thanks

  Tom

   

  Kind Regards

   

  Tom Grant


  Support Centre Analyst

  Symantec Hosted Services
  www.messagelabs.com  
  

   

  24x7 Global Client Support Centre: 
  US/Canada: +1 (866) 807 6047 
  EMEA: +44 (0) 870 850 3014 
  Australia: 1 (800) 088099 
  Hong Kong: 1 (800) 901220 
  Asia Pacific: +852 6902 1130
  supp...@messagelabs.com 
  

   



   

  MessageLabs Ltd, a company registered in England number 3834506, 1240 
Lansdowne Court, Gloucester Business Park, Gloucester, GL3 4AB

   


  __
  This email has been scanned by the MessageLabs Email Security System.
  For more information please visit http://www.messagelabs.com/email 
  __
<>

Re: SourceLayout: where to put Delay*?

2009-04-09 Thread Pieter De Wit

I vote for:

trafficcontrol/
tcontrol/
tc/

?

- Original Message - 
From: "Kinkie" 

To: 
Sent: Friday, April 10, 2009 10:08 AM
Subject: Re: SourceLayout: where to put Delay*?


On Thu, Apr 9, 2009 at 11:42 PM, Alex Rousskov
 wrote:

Hello,

Where should we put the following traffic shaping-related stuff?

[...]

Here are a few options:

shaping/
tshaping/
tshape/
shape/
delay/
delays/
quota/
quotas/

Please keep in mind that DelayPools is just a mechanism for some traffic
shaping and quota control features so perhaps it is better not to use
delay*/ as the directory name.


I find quota/ is the most fitting.


--
   /kinkie



Re: Mysterious acts of 3.1

2009-04-06 Thread Pieter De Wit

Hi Amos,

I am going to take a swing at things here and say it something to do with 
the thread that you create. Something (some global variable etc) is not 
locked in a mutex or you are using a non-thread-safe function/procedure. 
It's most likely in the first couple of lines of the function/procedure that 
starts the thread.


I did a bit of work on threads using C, not C++ so my advise is a bit 
shaken...if not stired :)


Cheers,

Pieter

- Original Message - 
From: "Amos Jeffries" 

To: "squid-dev >> Squid Developers" 
Sent: Tuesday, April 07, 2009 1:19 AM
Subject: Mysterious acts of 3.1



Anyone able to say what this means?

Found while testing 3.1-BZR rev 9483

Program received signal SIGABRT, Aborted.
[Switching to Thread 0xb7541b90 (LWP 29220)]
0xb7fce410 in __kernel_vsyscall ()
#0  0xb7fce410 in __kernel_vsyscall ()
#1  0xb7b73085 in raise () from /lib/tls/i686/cmov/libc.so.6
#2  0xb7b74a01 in abort () from /lib/tls/i686/cmov/libc.so.6

#3  0x080f2cd5 in Debug::xassert (msg=0x82210f8 "CurrentDebug", 
file=0x82210a4 "../../SQUID_3_1/src/debug.cc", line=709)

at ../../SQUID_3_1/src/debug.cc:745

#4  0x080f3215 in Debug::getDebugOut () at 
../../SQUID_3_1/src/debug.cc:709


#5  0x0810e0fc in fd_open (fd=20, type=4, desc=0x823e144 "async-io 
completetion event: main") at ../../SQUID_3_1/src/fd.cc:187


#6  0x081b2ab5 in CommIO::Initialise () at 
../../SQUID_3_1/src/comm.cc:2302
#7  0x0820570f in squidaio_thread_loop (ptr=0x874ac84) at 
../../SQUID_3_1/src/CommIO.h:30

#8  0xb7fa64fb in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#9  0xb7c1ee5e in clone () from /lib/tls/i686/cmov/libc.so.6





Re: Feature: quota control

2009-03-30 Thread Pieter De Wit



Ok - so here is how I see this :

* We need to agree (maybe a strong word) on a traffic "limiting" standard.
(Think how the limits are implemented, rather than done - if the object
size is known, pass that to the "limiter", if not, everytime data is RX'ed
using something like a session id or so) If this has been done - sorry -
and point me to it ? :) Does this justify "public" input ? I am pretty sure
the squid dev's use squid "more" than the avg. Joe

* How do these "tasks" talk to each other ? Are they modules etc ?

Maybe we should open a wiki page with some proposals ?




Re: Feature: quota control

2009-03-30 Thread Pieter De Wit

On Mon, 30 Mar 2009 11:38:31 -0600, Alex Rousskov
 wrote:
> On 03/30/2009 03:16 AM, Pieter De Wit wrote:
> 
>> I gave this some thought. Why don't we build a system close to
>> sendmail's milter system. An API where by clients can plug in and offer
>> services - one can be a traffic counter, traffic limiter (as what we
>> proposing here) and maybe a URL block, a virus scanner etc.
> 
> We already have such a plugin API for message- and content-related
> tasks. It is called eCAP, and it has been mentioned on this thread.
> 
> Compared to eCAP, traffic shaping and quotas are different in scope as
> they work on multiple messages and often do not care about the messages
> at all (just raw bytes traffic). So, we have a few options:
> 
> 1) Use eCAP nearly "as is" for traffic shaping and quotas, even though
> it is not a perfect fit for the task.
> 
> 2) Significantly enhance eCAP to offer traffic shaping-specific hooks
> (as a standard addition or as a Squid extension), even though it may
> lead to eCAP API bloat.
> 
> 3) Develop a different plugin API that specializes in aggregate traffic
> management and is unrelated to eCAP, even though it may lead to
> duplicating a lot of eCAP-related code.
> 
> I am not sure, but I think option #2 is the worst. What do you think?
> 
> Cheers,
> 
> Alex.

Hey Alex and List,

Sorry that I didn't read up on eCAP before starting this process, but I
believe that it won't be on the TODO list if eCAP "does it right". What I
saw in my mind is something like this :

Client/Bandwidth limiting registers with squid
...
squid get a request to download object x from site y by user z at ip
a.b.c.d etc
squid sends that request to the client (just "text" not the object)
client replies "yes" or "no" - if yes the client needs to track how much
data is left for that user etc.

(Client here is the limiting software - couldn't think of a better
name...coffee)

I feel this should be a persistent connection that is matched by an acl for
speed etc.

Now to the workings of this. I am leaning towards option 3. I am not sure
how hard it is to maintain "two" API's. To be honest, I am not even sure
how hard it is to maintain one API. I feel that this really only needs two
or three commands, so is it really going to bloat the API ?

Broken down - what is the most "this API" would need ?

Cheers,

Pieter
 
> 



Re: Feature: quota control

2009-03-30 Thread Pieter De Wit

Hi Guys,

I gave this some thought. Why don't we build a system close to sendmail's 
milter system. An API where by clients can plug in and offer services - one 
can be a traffic counter, traffic limiter (as what we proposing here) and 
maybe a URL block, a virus scanner etc.


I know that there are re-directors that do this kind of thing, but surely 
squid has reached a stage now where it's big enough to expand this way ? 
Sendmail had to end up doing it ?


It might even give a solution to the "single thread disk problem" since you 
could have your disk stores register in the same way. I am just thowing 
ideas around - maybe they should be targeted for Squid 4, who knows :)


What you guys think ?

Cheers,

Pieter

- Original Message - 
From: "Adrian Chadd" 

To: "Amos Jeffries" 
Cc: "Robert Collins" ; "Mark Nottingham" 
; "Pieter De Wit" ; 


Sent: Saturday, March 28, 2009 5:31 PM
Subject: Re: Feature: quota control


Just to add to this - implementing it as a delay pool inside Squid
flattens traffic into one byte pool. Various places may not do this at
all - there may be "free" versus "non-free" (which means one set of
ACLs inside Squid); there may be "cheap" versus "expensive" (again,
possibly requiring multiple delay pools and multiple ACLs to map it
all together; again all inside Squid) - things get very messy, very
quickly.

This is why my proposal (which I hope -finally- gets approved so I can
begin work on it ASAP! :) involves passing off the traffic assignment
to an external daemon that implements -all- of the traffic assignment
and accounting logic. Squid will then just send requests for traffic
and interim updates like you've said.

2c,


Adrian

2009/2/26 Amos Jeffries :

Robert Collins wrote:


On Fri, 2009-02-27 at 10:00 +1100, Mark Nottingham wrote:


Honestly, if I wanted to do byte-based quotas today, I'd have an
external ACL helper talking to an external logging helper; that way, you
can just log the response sizes to a daemon and then another daemon 
would

use that information to make a decision at access time. The only even
mildly hard part about this is sharing state between the daemons, but if
you don't need the decisions to be real-time, it's not that bad 
(especially

considering that in any serious deployment, you'll have state issues
between multiple boxes anyway).


Sure; I think that would fit with 'ensuring enough hooks' :P

-Rob


The brief description of what I gave Pieter to start with was:

A pool based on DelayPools in that Squid decrements live as traffic goes
through. With a helper/ACL hook to retrieve the initial pool size and to
call as needed to check for current quotas.

How the helper operates is not relevant to Squid. Thats important.

The key things being that; its always called for new visitors to assign 
the
start quota, and when the quota is nearing empty its called again to see 
if

they get more.

Helper would need to send back "UNITS AMOUNT MINIMUM" where UNITS is the
unit of quota (seconds, bytes, requests, misses?, other?), AMOUNT being a
integer count of units the client is allowed to use, and MINIMUM is the
level of units where the helper is to be asked for an update.

0 remaining units results in an Error page 'quota exceeded' or somesuch.

Amos
--
Please be using
Current Stable Squid 2.7.STABLE6 or 3.0.STABLE13
Current Beta Squid 3.1.0.5






Feature: quota control

2009-02-21 Thread Pieter De Wit

Hi Guys,

I would like to offer my time in working on this feature - I have not 
done any squid dev, but since I would like to see this feature in Squid, 
I thought I would take it on.


I have briefly contacted Amos off list and we agreed that there is no 
"set in stone" way of doing this. I would like to propose that we then 
start throwing around some ideas and let's see if we can get this into 
squid :)


Some ideas that Amos quickly said :

   - "Based" on delay pools
   - Use of external helpers to track traffic


The way I see this happening is that a Quota is like a pool that empties 
based on 2 classes - bytes and requests. Requests will be for things 
like the number of requests, i.e. a person is only allowed to download 5 
exe's per day or 5 requests of >1meg or something like that (it just 
popped into my head :) )


Bytes is a pretty straight forward one, the user is only allowed x 
amount of bytes per y amount of time.


Anyways - let the ideas fly :)

Cheers,

Pieter