omit to loop-forever processing some regex acls

2008-11-26 Thread Matt Benjamin
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256




- --

Matt Benjamin

The Linux Box
206 South Fifth Ave. Suite 150
Ann Arbor, MI  48104

http://linuxbox.com

tel. 734-761-4689
fax. 734-769-8938
cel. 734-216-5309

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJLYaAJiSUUSaRdSURCNBMAJ90xJm8VjlLJuubuxqi2drt8plR7QCdHXDs
zBhdg5Gf8JScY8BdXqMZf8I=
=Kd5i
-END PGP SIGNATURE-
--- HttpHeaderTools.c.orig  2008-11-07 17:00:20.0 -0500
+++ HttpHeaderTools.c   2008-11-07 17:52:14.0 -0500
@@ -246,6 +246,7 @@
 ?,\t\r\n
 };
 int quoted = 0;
+
 delim[0][1] = del;
 delim[2][1] = del;
 assert(str  item  pos);
@@ -258,6 +259,7 @@
 *pos += strspn(*pos, delim[2]);
 
 *item = *pos;  /* remember item's start */
+
 /* find next delimiter */
 do {
*pos += strcspn(*pos, delim[quoted]);
@@ -265,13 +267,15 @@
break;
if (**pos == '') {
quoted = !quoted;
-   *pos += 1;
+   goto advance;
}
if (quoted  **pos == '\\') {
*pos += 1;
-   if (**pos)
-   *pos += 1;
+   goto advance;
}
+advance:
+   if (**pos)
+ (*pos)++;
 } while (**pos);
 len = *pos - *item;/* *pos points to del or '\0' */
 /* rtrim */



fixup for URI decoration with port when not wanted

2008-11-26 Thread Matt Benjamin
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Affects store keys and cache peering lookups.

Matt

- --

Matt Benjamin

The Linux Box
206 South Fifth Ave. Suite 150
Ann Arbor, MI  48104

http://linuxbox.com

tel. 734-761-4689
fax. 734-769-8938
cel. 734-216-5309

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJLYbwJiSUUSaRdSURCD8oAJ4m5aa3dY95qVsNN4ociDuI2375EgCeOANb
HAtw5ccxyMiICF/ShN0bg3Q=
=6WMf
-END PGP SIGNATURE-
diff --git a/src/cf.data.pre b/src/cf.data.pre
index b9dc4c9..7d19ba4 100644
--- a/src/cf.data.pre
+++ b/src/cf.data.pre
@@ -3921,6 +3921,16 @@ DOC_START
 	sporadically hang or never complete requests set this to on.
 DOC_END
 
+NAME: httpd_accel_no_append_port
+COMMENT: on|off
+TYPE: onoff
+DEFAULT: off
+LOC: Config.onoff.accel_no_append_port
+DOC_START
+	Do not append the accelerator port to request URI.  This
+	is intended for clustered accelerator setups.
+DOC_END
+
 COMMENT_START
  DELAY POOL PARAMETERS
  -
diff --git a/src/client_side.c b/src/client_side.c
index 23c4274..09899c9 100644
--- a/src/client_side.c
+++ b/src/client_side.c
@@ -3842,9 +3842,13 @@ parseHttpRequest(ConnStateData * conn, HttpMsgBuf * hmsg, method_t * method_p, i
 	if (strchr(host, ':'))
 		snprintf(http-uri, url_sz, %s://%s%s,
 		conn-port-protocol, host, url);
-	else
+	else if(Config.onoff.accel_no_append_port) {
+		snprintf(http-uri, url_sz, %s://%s%s,
+			 conn-port-protocol, host, url);
+	} else {
 		snprintf(http-uri, url_sz, %s://%s:%d%s,
-		conn-port-protocol, host, port, url);
+			 conn-port-protocol, host, port, url);
+	}
 	debug(33, 5) (VHOST REWRITE: '%s'\n, http-uri);
 	} else if (internalCheck(url)) {
 	goto internal;
diff --git a/src/structs.h b/src/structs.h
index 12652ab..33c7185 100644
--- a/src/structs.h
+++ b/src/structs.h
@@ -688,6 +688,7 @@ struct _SquidConfig {
 	int collapsed_forwarding;
 	int relaxed_header_parser;
 	int accel_no_pmtu_disc;
+	int accel_no_append_port;
 	int global_internal_static;
 	int httpd_suppress_version_string;
 	int via;


Re: Rv: Why not BerkeleyDB based object store?

2008-11-26 Thread Mark Nottingham
Just a tangental thought; has there been any investigation into  
reducing the amount of write traffic with the existing stores?


E.g., establishing a floor for reference count; if it doesn't have n  
refs, don't write to disk? This will impact hit rate, of course, but  
may mitigate in situations where disk caching is desirable, but  
writing is the bottleneck...



On 26/11/2008, at 9:14 AM, Kinkie wrote:


On Tue, Nov 25, 2008 at 10:23 PM, Pablo Rosatti
[EMAIL PROTECTED] wrote:
Amazon uses BerkeleyDB for several critical parts of its website.  
The Chicago Mercatile Exchange uses BerkeleyDB for backup and  
recovery of its trading database. And Google uses BerkeleyDB to  
process Gmail and Google user accounts. Are you sure BerkeleyDB is  
not a good idea to replace the Squid filesystems even COSS?


Squid3 uses a modular storage backend system, so you're more than
welcome to try to code it up and see how it compares.
Generally speaking, the needs of a data cache such as squid are very
different from those of a general-purpose backend storage.
Among the other key differences:
- the data in the cache has little or no value.
 it's important to know whether a file was corrupted, but it can
always be thrown away and fetched from the origin server at a
relatively low cost
- workload is mostly writes
 a well-tuned forward proxy will have a hit-rate of roughly 30%,
which means 3 writes for every read on average
- data is stored in incremental chunks

Given these characteristics, a long list of mechanisms database-like
systems have such as journaling, transactions etc. are a  waste of
resources.
COSS is explicitly designed to handle a workload of this kind. I would
not trust any valuable data to it, but it's about as fast as it gets
for a cache.

IMHO BDB might be much more useful as a metadata storage engine, as
those have a very different access pattern than a general-purpose
cache store.
But if I had any time to devote to this, my priority would be in
bringing 3.HEAD COSS up to speed with the work Adrian has done in 2.

--
   /kinkie


--
Mark Nottingham   [EMAIL PROTECTED]




Associating accesses with cache.log entries

2008-11-26 Thread Mark Nottingham
I've been playing around with associating specific requests with the  
debug output they generate, with a simple patch to _db_print along  
these lines:


if (Config.Log.accesslogs  Config.Log.accesslogs-logfile) {
  seqnum = LOGFILE_SEQNO(Config.Log.accesslogs-logfile);
}
snprintf(f, BUFSIZ, %s %i| %s,
debugLogTime(squid_curtime),
seqnum,
format);

This leverages the sequence number that's available in custom access  
logs (%sn).


It's really useful for debugging requests that are causing problems,  
etc; rather than having to correlate times and URLs, you can just  
correlate sequence numbers. It also makes it possible to automate  
debug output (which is the direction I want to take this in).


beyond the obvious cleanup that needs to happen (e.g., outputting '-'  
or blank instead of 0 if there isn't an access log line associated, a  
few questions;


* How do people feel about putting this in cache.log all the time? I  
don't think it'll break any scripts (there aren't many, and those that  
are tend to grep for specific phrases, rather than do an actual parse,  
AFAICT). Is the placement above appropriate?


* The sequence number mechanism doesn't guarantee uniqueness in the  
log file; if squid is started between rotates, it will reset the  
counters. Has fixing this been discussed?


* Is it reasonable to hardcode this to associate the numbers with the  
first configured access_log?


* To make this really useful, it would be necessary to be able to  
trigger debug_options (or just all debugging) based upon an ACL match.  
However, this looks like it would require changing how debug is  
#defined. Any comments on this?


Cheers,

--
Mark Nottingham   [EMAIL PROTECTED]




Re: Associating accesses with cache.log entries

2008-11-26 Thread Kinkie
On Thu, Nov 27, 2008 at 4:21 AM, Mark Nottingham [EMAIL PROTECTED] wrote:
 I've been playing around with associating specific requests with the debug
 output they generate, with a simple patch to _db_print along these lines:

if (Config.Log.accesslogs  Config.Log.accesslogs-logfile) {
  seqnum = LOGFILE_SEQNO(Config.Log.accesslogs-logfile);
}
snprintf(f, BUFSIZ, %s %i| %s,
debugLogTime(squid_curtime),
seqnum,
format);

 This leverages the sequence number that's available in custom access logs
 (%sn).

 It's really useful for debugging requests that are causing problems, etc;
 rather than having to correlate times and URLs, you can just correlate
 sequence numbers. It also makes it possible to automate debug output (which
 is the direction I want to take this in).

Looks interesting to me.

 beyond the obvious cleanup that needs to happen (e.g., outputting '-' or
 blank instead of 0 if there isn't an access log line associated, a few
 questions;

 * How do people feel about putting this in cache.log all the time? I don't
 think it'll break any scripts (there aren't many, and those that are tend to
 grep for specific phrases, rather than do an actual parse, AFAICT). Is the
 placement above appropriate?

I'd avoid the | character, but apart from that it makes sense to me

 * The sequence number mechanism doesn't guarantee uniqueness in the log
 file; if squid is started between rotates, it will reset the counters. Has
 fixing this been discussed?

I don't think that uniqueness has much value, correlating seqnum with
the timestamp will address any uncertain cases.

 * Is it reasonable to hardcode this to associate the numbers with the first
 configured access_log?

 * To make this really useful, it would be necessary to be able to trigger
 debug_options (or just all debugging) based upon an ACL match. However, this
 looks like it would require changing how debug is #defined. Any comments on
 this?

YES! It's something I've been thinking about for some time.
Count me in.

-- 
/kinkie


Re: Rv: Why not BerkeleyDB based object store?

2008-11-26 Thread Adrian Chadd
I thought about it a while ago but i'm just out of time to be honest.
Writing objects to disk only if they're popular or you need the RAM to
handle concurrent accesses for large objects for some reason would
probably way way improve disk performance as the amount of writing
would drop drastically.

Sponsorship for investigating and developing this is gladly accepted :)


Adrian


2008/11/26 Mark Nottingham [EMAIL PROTECTED]:
 Just a tangental thought; has there been any investigation into reducing the
 amount of write traffic with the existing stores?

 E.g., establishing a floor for reference count; if it doesn't have n refs,
 don't write to disk? This will impact hit rate, of course, but may mitigate
 in situations where disk caching is desirable, but writing is the
 bottleneck...


 On 26/11/2008, at 9:14 AM, Kinkie wrote:

 On Tue, Nov 25, 2008 at 10:23 PM, Pablo Rosatti
 [EMAIL PROTECTED] wrote:

 Amazon uses BerkeleyDB for several critical parts of its website. The
 Chicago Mercatile Exchange uses BerkeleyDB for backup and recovery of its
 trading database. And Google uses BerkeleyDB to process Gmail and Google
 user accounts. Are you sure BerkeleyDB is not a good idea to replace the
 Squid filesystems even COSS?

 Squid3 uses a modular storage backend system, so you're more than
 welcome to try to code it up and see how it compares.
 Generally speaking, the needs of a data cache such as squid are very
 different from those of a general-purpose backend storage.
 Among the other key differences:
 - the data in the cache has little or no value.
  it's important to know whether a file was corrupted, but it can
 always be thrown away and fetched from the origin server at a
 relatively low cost
 - workload is mostly writes
  a well-tuned forward proxy will have a hit-rate of roughly 30%,
 which means 3 writes for every read on average
 - data is stored in incremental chunks

 Given these characteristics, a long list of mechanisms database-like
 systems have such as journaling, transactions etc. are a  waste of
 resources.
 COSS is explicitly designed to handle a workload of this kind. I would
 not trust any valuable data to it, but it's about as fast as it gets
 for a cache.

 IMHO BDB might be much more useful as a metadata storage engine, as
 those have a very different access pattern than a general-purpose
 cache store.
 But if I had any time to devote to this, my priority would be in
 bringing 3.HEAD COSS up to speed with the work Adrian has done in 2.

 --
   /kinkie

 --
 Mark Nottingham   [EMAIL PROTECTED]





Re: omit to loop-forever processing some regex acls

2008-11-26 Thread Adrian Chadd
G'day!

If these are patches against Squid-2 then please put them into the
Squid bugzilla so we don't lose them.

There's a different process for Squid-3 submissions.

Thanks!


Adrian


2008/11/26 Matt Benjamin [EMAIL PROTECTED]:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256




 - --

 Matt Benjamin

 The Linux Box
 206 South Fifth Ave. Suite 150
 Ann Arbor, MI  48104

 http://linuxbox.com

 tel. 734-761-4689
 fax. 734-769-8938
 cel. 734-216-5309

 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.7 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

 iD8DBQFJLYaAJiSUUSaRdSURCNBMAJ90xJm8VjlLJuubuxqi2drt8plR7QCdHXDs
 zBhdg5Gf8JScY8BdXqMZf8I=
 =Kd5i
 -END PGP SIGNATURE-