[squid-users] If-Unmodified-Since implementation question

2011-03-10 Thread Guy Bashkansky
I'm trying to implement the If-Unmodified-Since (IUMS) capability in Squid 2.x.

When a client IUMS request comes, I raise a new request flag iums,
store the iums time, and send an IMS request to the origin.  When the
origin response comes back, I modify the store entry memory object’s
reply status according to the IUMS time logic in both
httpProcessReplyHeader() and clientHandleIMSReply().

This works either when the IUMS precondition is satisfied (then I send
200 to client, instead of 304 typically) or for the cache *hit only*
when the IUMS precondition is unsatisfied (then I send 412 to client,
instead of 200 typically).

However, when the IUMS precondition is unsatisfied upon cache *miss*,
the origin's 200 response is still being sent to client, even after I
modify the store entry memory object’s reply status to 412 in
httpProcessReplyHeader().  I cannot figure out how and when the status
is being restored back to 200.

I’ve set gdb data-watch breakpoints over the store entry memory
object’s reply status.  But then, later during the execution, gdb
reports watchpoint expression error: bad address.

I’m stuck now, and would really appreciate any help.  Thanks.


On Tue, Jan 11, 2011 at 2:38 PM, Guy Bashkansky guy...@gmail.com wrote:

 I have to modify the behavior of a customized version of Squid 2.4
 STABLE6 code, either by configuration or by coding.  Currently I can
 not switch to any other Squid version, because of the customizations.


 Problem description:

 - When a client sends a byte-range request with an If-Unmodified-Since
 header AND the object in Squid's cache is stale, then this Squid
 version generates a request to origin with both IUMS and IMS headers,
 which is conflicting and undefined by RFC2616.  The origin throws an
 error.


 Proposed solution:

 - On an IMS check for a content that was requested with a UIMS header,
 Squid should only insert the IMS header, not the IUMS header.  (If
 only the IUMS header was added, then the origin would return origin
 content unnecessarily, since it hasn't changed from the the cached
 version.)

 - Once the origin check is complete, then Squid cache should compute
 IUMS calculations as defined in RFC2616, returning possibly a 206
 Partial or 412 precondition failed.
 http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html


 Questions:

 - Is there any possibility to facilitate such behavior using Squid 2.4
 STABLE6 configuration?

 - If not, then where in the code should I start to look to make the
 necessary code change, and approximately how?

 - I could not find any notion of If-Unmodified-Since in the Squid 2.4
 STABLE6 code.  What's the best way to handle this?


[squid-users] Squid sends conflicting headers to origin when If-Unmodified-Since header is present from client

2011-01-11 Thread Guy Bashkansky
I have to modify the behavior of a customized version of Squid 2.4
STABLE6 code, either by configuration or by coding.  Currently I can
not switch to any other Squid version, because of the customizations.


Problem description:

- When a client sends a byte-range request with an If-Unmodified-Since
header AND the object in Squid's cache is stale, then this Squid
version generates a request to origin with both IUMS and IMS headers,
which is conflicting and undefined by RFC2616.  The origin throws an
error.


Proposed solution:

- On an IMS check for a content that was requested with a UIMS header,
Squid should only insert the IMS header, not the IUMS header.  (If
only the IUMS header was added, then the origin would return origin
content unnecessarily, since it hasn't changed from the the cached
version.)

- Once the origin check is complete, then Squid cache should compute
IUMS calculations as defined in RFC2616, returning possibly a 206
Partial or 412 precondition failed.
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html


Questions:

- Is there any possibility to facilitate such behavior using Squid 2.4
STABLE6 configuration?

- If not, then where in the code should I start to look to make the
necessary code change, and approximately how?

- I could not find any notion of If-Unmodified-Since in the Squid 2.4
STABLE6 code.  What's the best way to handle this?


Re: [squid-users] Re: How to ignore query terms for store key?

2010-09-09 Thread Guy Bashkansky
Amos, Matus,

Some websites embed in query terms arbitrary redundant information
which is irrelevant to content distribution, but prevents effective
caching by giving same object different URLs each time.

For such websites (recognized by regex ACLs), stripping those
redundant cache-unfriendly query terms for storing provides a way of
effective caching without hurting the web functionality.

Guy


On Thu, Sep 9, 2010 at 7:28 AM, Matus UHLAR - fantomas
uh...@fantomas.sk wrote:

 are you sure that http://www.google.sk/search?q=one  should give the same
 result as http://www.google.sk/search?q=two?

 I think that you and your users will be very surprised...


On Fri, Sep 3, 2010 at 8:09 PM, Amos Jeffries squ...@treenet.co.nz wrote:

 First, please answer: Why? what possible problem could require you to do this 
 massive abuse of the web?


[squid-users] Failure recovery, reconfiguration/restart during rebuild (Was: How to ignore query terms for store key?)

2010-09-08 Thread Guy Bashkansky
Thanks, storeurl_rewrite works.  Of course, I shouldn't have chomped
the newline in the first place :)

Now, I'm trying to investigate failure and recovery properties of
Squid 2.7 STABLE9 relative to Squid 2.4:

Specifically, Squid 2.4 cbdata.c memory management sometimes crashed
under high load, apparently due to:
http://www.squid-cache.org/bugs/show_bug.cgi?id=761: assertion failed:
cbdata.c:249: c-locks  0 w/diskd

This bug appears on and off in Bugzilla, seems to be partially fixed in:
http://www.squid-cache.org/cgi-bin/cvsweb.cgi/squid/src/fs/diskd/store_io_diskd.c.diff?r1=1.37r2=1.38
and
http://bugs.squid-cache.org/attachment.cgi?id=1604,
- then reappears differently.  Any insight on this issue?


Another issue with Squid 2.x as parent cache: after failure/restart,
while rebuilding, it opens edge sockets, but it does not serve edge
requests until rebuilding is finished.
This seems to be fixed in Squid 3:
http://bugs.squid-cache.org/show_bug.cgi?id=513
-- How about Squid 2.7?


And finally, any insight from the community on the failure/restart
recovery speed of 2.7 vs 2.4?
How long it takes to rebuild, say, 80% full 16TB drive?  More or less
than 15 minutes?

Regards,
Guy

2010/9/8 Henrik Nordström hen...@henriknordstrom.net:
 tis 2010-09-07 klockan 18:59 -0700 skrev Guy Bashkansky:

 /usr/local/squid/bin/strip-query.pl
     #!/usr/local/bin/perl -Tw
     $| = 1; while() { chomp; s/\?\S*//; print; } ### my strip query test

 If you chomp the newline then you need to add it back when printing the
 result.

 Regards
 Henrik




[squid-users] Re: How to ignore query terms for store key?

2010-09-07 Thread Guy Bashkansky
Thanks, storeurl_rewrite in Squid 2.7 looks like the right solution.

But when I try to use it to strip query, Squid does not respond:

/usr/local/squid/etc/squid.conf
storeurl_access allow all # just for the test, will narrow down later
storeurl_rewrite_program /usr/local/squid/bin/strip-query.pl

/usr/local/squid/bin/strip-query.pl
#!/usr/local/bin/perl -Tw
$| = 1; while() { chomp; s/\?\S*//; print; } ### my strip query test

$ /usr/local/squid/sbin/squid -k reconfigure -f /usr/local/squid/etc/squid.conf
...2010/08/09 15:07:10| helperOpenServers: Starting 5
'strip-query.pl' processes...

$ wget ... 'origin_url'
...Proxy request sent, awaiting response...  ### gets stuck

(Without storeurl_rewrite wget works OK)

What am I missing?


2010/9/7 Henrik Nordström hen...@henriknordstrom.net

 fre 2010-09-03 klockan 18:03 -0700 skrev Guy Bashkansky:
  Is there a way to ignore URI query terms when forming store keys?
  Maybe some rule or extension?

 http://wiki.squid-cache.org/Features/StoreUrlRewrite

 needs to be implemented for Squid-3 as well.. currently a Squid-2 only
 feature.

 Regards
 Henrik




On Fri, Sep 3, 2010 at 8:09 PM, Amos Jeffries squ...@treenet.co.nz wrote:
 Guy Bashkansky wrote:

 Is there a way to ignore URI query terms when forming store keys?
 Maybe some rule or extension?

 In the Squid code it could look something like:
 { char *p = strchr(uri, '?'); if (p) *p = '\0'; }

 But the only code like this deals with strip_query_terms,
 which only affects logging, not storing.

 First, please answer: Why? what possible problem could require you to do
 this massive abuse of the web?

 storeurl_rewrite

 Be EXTREMELY careful about what you re-write. A good understanding of how
 the re-written website operates is needed. Along with complete trust that
 the website will not alter its design (or layout) in any way for the
 lifetime of your config.

 Amos
 --
 Please be using
  Current Stable Squid 2.7.STABLE9 or 3.1.7
  Beta testers wanted for 3.2.0.2



[squid-users] How to ignore query terms for store key?

2010-09-03 Thread Guy Bashkansky
Is there a way to ignore URI query terms when forming store keys?
Maybe some rule or extension?

In the Squid code it could look something like:
{ char *p = strchr(uri, '?'); if (p) *p = '\0'; }

But the only code like this deals with strip_query_terms,
which only affects logging, not storing.


[squid-users] High CPU load; debug log shows tight loop: storeDiskdDirRebuildFromSwapLog ... new_StoreEntry ... storeDirSwapLog ... diskHandleWrite

2010-01-07 Thread Guy Bashkansky
I'm observing very high Squid CPU load, which negatively affects performance.

Enabling debug_options ALL,9 for a few seconds shows this extremely
frequent loop -- Squid is doing only this millions of times:

2010/01/07 12:29:23| storeDiskdDirRebuildFromSwapLog: SWAP_LOG_ADD
11627CB9B6##243670258F67 0015##1B
2010/01/07 12:29:23| storeGet: looking up 11627CB9B6##243670258F67
2010/01/07 12:29:23| storeDiskdAddDiskRestore:
11627CB9B6##243670258F67, fileno=0015##1B
2010/01/07 12:29:23| new_StoreEntry: returning 0x8ac##00
2010/01/07 12:29:23| storeHashInsert: Inserting Entry 0x8ac##00 key
'11627CB9B6##243670258F67'
2010/01/07 12:29:23| storeDiskdDirReplAdd: added node 0x8ac##00 to dir 0
2010/01/07 12:29:23| storeDirSwapLog: SWAP_LOG_ADD
11627CB9B6##243670258F67 0 0015##1B
2010/01/07 12:29:23| diskHandleWrite: FD 8
2010/01/07 12:29:23| diskHandleWrite: FD 8 writing 193 bytes
2010/01/07 12:29:23| diskHandleWrite: FD 8 len = 193

Any idea what might cause this?


Re: [squid-users] 'gprof squid squid.gmon' only shows the initial configuration functions

2009-12-09 Thread Guy Bashkansky
Is there an oprofile version for FreeBSD?  I thought it is limited to
Linux.  On FreeBSD I tried pmcstat, but it gives an initialization
error.

My version of Squid is old and customized (so I can't upgrade) and may
not have builtin timers.  Since what version did they appear?

As for gprof - even with the event loop on top, still the rest of the
table might give some idea why the CPU is overloaded.  The problem is
- I see only initial configuration functions:

 called/total   parents
index  %timeself descendents  called+selfname   index
 called/total   children
spontaneous
[1] 63.40.170.00 _mcount [1]
---
   0.000.10   1/1   _start [3]
[2] 36.00.000.10   1 main [2]
   0.000.10   1/1   parseConfigFile [4]
...
---
spontaneous
[3] 36.00.000.10 _start [3]
   0.000.10   1/1   main [2]
---
   0.000.10   1/1   main [2]
[4] 36.00.000.10   1 parseConfigFile [4]
   0.000.09   1/1   readConfigLines [5]
   0.000.00 169/6413parse_line [6]
...


System info:

# uname -m -r -s
FreeBSD 6.2-RELEASE-p9 amd64

# gcc -v
Using built-in specs.
Configured with: FreeBSD/amd64 system compiler
Thread model: posix
gcc version 3.4.6 [FreeBSD] 20060305


There are 7 fork()s for unlinkd/diskd helpers.  Can these fork()s
affect profiling info?

On Wed, Dec 9, 2009 at 2:04 AM, Robert Collins
robe...@robertcollins.net wrote:
 On Tue, 2009-12-08 at 15:32 -0800, Guy Bashkansky wrote:
 I've built squid with the -pg flag and run it in the no-daemon mode
 (-N flag), without the initial fork().

 I send it the SIGTERM signal which is caught by the signal handler, to
 flag graceful exit from main().

 I expect to see meaningful squid.gmon, but 'gprof squid squid.gmon'
 only shows the initial configuration functions:

 gprof isn't terribly useful anyway - due to squids callback based model,
 it will see nearly all the time belonging to the event loop.

 oprofile and/or squids built in analytic timers will get much better
 info.

 -Rob



[squid-users] 'gprof squid squid.gmon' only shows the initial configuration functions

2009-12-08 Thread Guy Bashkansky
I've built squid with the -pg flag and run it in the no-daemon mode
(-N flag), without the initial fork().

I send it the SIGTERM signal which is caught by the signal handler, to
flag graceful exit from main().

I expect to see meaningful squid.gmon, but 'gprof squid squid.gmon'
only shows the initial configuration functions:

  called/total   parents
index  %timeself descendents  called+selfname   index
  called/total   children
 spontaneous
[1] 63.40.170.00 _mcount [1]
---
0.000.10   1/1   _start [3]
[2] 36.00.000.10   1 main [2]
0.000.10   1/1   parseConfigFile [4]
...
---
 spontaneous
[3] 36.00.000.10 _start [3]
0.000.10   1/1   main [2]
---
0.000.10   1/1   main [2]
[4] 36.00.000.10   1 parseConfigFile [4]
0.000.09   1/1   readConfigLines [5]
0.000.00 169/6413parse_line [6]
...


System info:

# uname -m -r -s
FreeBSD 6.2-RELEASE-p9 amd64

# gcc -v
Using built-in specs.
Configured with: FreeBSD/amd64 system compiler
Thread model: posix
gcc version 3.4.6 [FreeBSD] 20060305


There are 7 fork()s for unlinkd/diskd helpers.  Can these fork()s
affect profiling info?


Re: [squid-users] Squid at 100% CPU with 10 minutes period

2009-12-05 Thread Guy Bashkansky
Amos, thanks for the links.  I've looked at the mailing list and the
open bugs, and could not find something similar to what I see, with
the 10 minutes period.

We're using a customized version of Squid 2.4 STABLE6, and it's not in
my power to upgrade it to any later version...  It runs on FreeBSD
6.2-RELEASE-p9 amd64 servers.

I know I need some profiling/debugging information to determine where
CPU spends its cycles, but on these servers most usual tools are
either absent or not working very well:
There's no 'oprofile' for FreeBSD, 'pmcstat' fails to run (no lib?),
'gprof' does not give info beyond parseConfigFile() even in my custom
profiling-enabled version with -N, 'gdb' does not recognize debug
info, and 'strace' is not installed.

I've found 'truss' command to be working and traced system calls made
by the squid process, trying to recognize some patterns -- noticed
that during CPU load spike write() sometimes returns EPIPE, 'Broken
pipe'.

Does my version (2.4 STABLE6) ring any bells?


On Sat, Dec 5, 2009 at 3:33 AM, Amos Jeffries squ...@treenet.co.nz wrote:

 Guy Bashkansky wrote:

 Amos,

 Where can I find a list of the solved Squid uses 100% CPU bugs?  It
 would help me figure out which one I may be experiencing.

 Sorry for being gruff.

 The mailing list queries:
  http://squid.markmail.org/search/?q=squid+uses+100%25+cpu

 The remaining open bugs mentioning
 http://bugs.squid-cache.org/buglist.cgi?quicksearch=100%25+CPU

 The closed ones are hard to find as they have been re-named to say what the 
 actual problems was.


 Which release and version of squid is this?


 When I attach gdb to the running squid process, it does not find
 debugging info, despite compiling with -g flag (maybe a gdb setup
 problem, nm does show symbols).

 When I try to use gprof for profiling (squid compiled with -pg),
 squid.gmon contains only the initial configuration functions
 profiling.  I stop squid with -k shutdown signal so it exits normally
 (which is necessary to produce squid.gmon).

 Squid needs to be started with -N to prevent forking a child daemon that does 
 all the actual work. The main process can then be traced.

 Amos
 --
 Please be using
  Current Stable Squid 2.7.STABLE7 or 3.0.STABLE20
  Current Beta Squid 3.1.0.15


[squid-users] Squid at 100% CPU with 10 minutes period

2009-12-04 Thread Guy Bashkansky
Hi,

The problem: on a certain origin content, Squid reaches 100% CPU
periodically, every 10 minutes, so the cache service suffers.

Any clues where to look?  Maybe this problem and its solution are known?

The CPU load pattern is something like this, minute-by-minute:
40%, 40%, 60%, 80%, 99%, 99%, 99%, 80%, 60%, 40%, repeat.

Thanks,
Guy


Re: [squid-users] Squid at 100% CPU with 10 minutes period

2009-12-04 Thread Guy Bashkansky
Amos,

Where can I find a list of the solved Squid uses 100% CPU bugs?  It
would help me figure out which one I may be experiencing.

When I attach gdb to the running squid process, it does not find
debugging info, despite compiling with -g flag (maybe a gdb setup
problem, nm does show symbols).

When I try to use gprof for profiling (squid compiled with -pg),
squid.gmon contains only the initial configuration functions
profiling.  I stop squid with -k shutdown signal so it exits normally
(which is necessary to produce squid.gmon).

Guy

On Fri, Dec 4, 2009 at 3:47 PM, Amos Jeffries squ...@treenet.co.nz wrote:

 We have solved many Squid uses 100% CPU bugs. There are others probably
 still present. Which one are you talking about?

 http://wiki.squid-cache.org/SquidFaq/BugReporting

 Amos
 --
 Please be using
  Current Stable Squid 2.7.STABLE7 or 3.0.STABLE20
  Current Beta Squid 3.1.0.15



[squid-users] How to restrict access to designated client IP address blocks in Squid configuration?

2009-09-21 Thread Guy Bashkansky
Using Squid as a reverse cache proxy, need to give access only to
clients whose IP addresses are from particular netblocks:

acl  service  dstdomain  .foo.com
acl  clients  src  123.45.67.89/255.255.255.128
http_access  deny  service  all
http_access  allow  service  clients

What may be the possible reason that clients with IP addresses not
from that netblock can still access the service?


[squid-users] Re: If refresh_pattern only extends expiration, how to force time-to-live in Squid code?

2009-09-02 Thread Guy Bashkansky
(Resending, first time accidentally sent with HTML formatting, bounced)

Now I see the Expires header having a value in the past, which may
confuse clients and caches further down the chain.
Scenario: origin returns max-age=900 (15 min) and refresh_pattern
overrides expire to 24 hours, what do the headers to the client look
like?

On the first request (cache-miss), the Expires header is not added to
the response sent to client.
On subsequent cache-hits the Expires header is added to the response
sent to client. (Why this artifact?)

The Expires header is set to time the object was received from the
origin plus the value in the Max-age header.
This results in the Expires header having a value in the past when the
cached object is older than the Max-age.

How to fix it best? (in my local version)

Since my trouble is the Expires header having a value in the past, I
consider suppressing the Squid artifact of inserting an Expires
header.
Is there a Squid configuration ability to do so?  If not, what would
be the right way to do it in my local code branch?


On Fri, Aug 28, 2009 at 6:12 PM, Guy Bashkansky guy...@gmail.com wrote:

 Henrik,
 Thanks, it works!
 Guy

 On Thu, Aug 27, 2009 at 2:00 AM, Henrik Nordstrom 
 hen...@henriknordstrom.net wrote:

 ons 2009-08-26 klockan 18:17 -0700 skrev Guy Bashkansky:

  If indeed refresh_pattern only extends expiration, I would like to
  develop a feature that enforces an exact time-to-live (per URL) in my
  local branch of Squid code.

 See the refreshStaleness() function. Should be sufficient to move the
 max age check up above the expires check.

 Regards
 Henrik



-- Forwarded message --
From: Guy Bashkansky guy...@gmail.com
Date: Wed, Aug 26, 2009 at 6:17 PM
Subject: If refresh_pattern only extends expiration, how to force
time-to-live in Squid code?
To: squid-...@squid-cache.org


I've tried to set an exact time-to-live (override origin cache
control) in Squid (2.4 STABLE6) configuration by refresh_pattern,
e.g.:

refresh_pattern   30_minutes_cache_control_url   15   0%   15
override-expire   ignore-max-age

Observed: URL is matched (in log), but objects still cached for 30
minutes, rather than for 15 minutes, as hoped.

If indeed refresh_pattern only extends expiration, I would like to
develop a feature that enforces an exact time-to-live (per URL) in my
local branch of Squid code.

What would be the most reasonable way to do this?  How can I force
objects to expire from cache after a given time?

Thanks.


[squid-users] Re: If refresh_pattern only extends expiration, how to force time-to-live in Squid code?

2009-09-02 Thread Guy Bashkansky
I'm using a customized version of Squid 2.4 STABLE6.  But nothing
seems to be customized in refresh.c, except for my own recent swap of
age and expires checks (as recommended).

Probably the expires header is added in some other place, it's just
difficult to figure out exactly where in the code and how it is
controlled.


On Wed, Sep 2, 2009 at 1:29 PM, Henrik
Nordstromhen...@henriknordstrom.net wrote:
 ons 2009-09-02 klockan 12:42 -0700 skrev Guy Bashkansky:

 Now I see the Expires header having a value in the past, which may
 confuse clients and caches further down the chain.
 Scenario: origin returns max-age=900 (15 min) and refresh_pattern
 overrides expire to 24 hours, what do the headers to the client look
 like?

 On the first request (cache-miss), the Expires header is not added to
 the response sent to client.
 On subsequent cache-hits the Expires header is added to the response
 sent to client. (Why this artifact?)

 Is it? Should not, at least not unless you run 2.7 with the
 act-as-origin option..

 The Expires header is set to time the object was received from the
 origin plus the value in the Max-age header.

 odd..

 Regards
 Henrik




[squid-users] refresh_pattern only extends expiration?

2009-08-26 Thread Guy Bashkansky
Please help:

Trying to set an exact time-to-live (override origin cache control) in
Squid (2.4 STABLE6) configuration:
___

refresh_pattern   30_minutes_cache_control_url   15   0%   15
override-expire   ignore-max-age

Observed: URL is matched (in log), but objects still cached for 30
minutes, rather than 15, as hoped.
___

refresh_pattern   30_minutes_cache_control_url   60   0%   60
override-expire   ignore-max-age

Observed: URL is matched (in log), and objects are cached for 50-70
minutes (not exactly 60).
___

Q1: Does refresh_pattern only extend expiration? Is there a way to
enforce an exact time-to-live (per URL) in Squid?

Q2: Does refresh_pattern operate in +-10 minutes granularity?  Why 60
minutes become 50 - 70?

Thanks.


[squid-users] Re: refresh_pattern only extends expiration?

2009-08-26 Thread Guy Bashkansky
Forgot to mention -- Squid is used as a _reverse_ cache proxy (server side).

On Wed, Aug 26, 2009 at 2:23 PM, Guy Bashkanskyguy...@gmail.com wrote:
 Please help:

 Trying to set an exact time-to-live (override origin cache control) in
 Squid (2.4 STABLE6) configuration:
 ___

 refresh_pattern   30_minutes_cache_control_url   15   0%   15
 override-expire   ignore-max-age

 Observed: URL is matched (in log), but objects still cached for 30
 minutes, rather than 15, as hoped.
 ___

 refresh_pattern   30_minutes_cache_control_url   60   0%   60
 override-expire   ignore-max-age

 Observed: URL is matched (in log), and objects are cached for 50-70
 minutes (not exactly 60).
 ___

 Q1: Does refresh_pattern only extend expiration? Is there a way to
 enforce an exact time-to-live (per URL) in Squid?

 Q2: Does refresh_pattern operate in +-10 minutes granularity?  Why 60
 minutes become 50 - 70?

 Thanks.