Re: [PATCH] Do not send unretriable requests on reused pinned connections

2012-12-02 Thread Henrik Nordström
lör 2012-12-01 klockan 18:02 -0700 skrev Alex Rousskov:

 a) If we are talking about unretriable requests, the client already
 violated HTTP by sending such a request over the persistent client
 connection. Some of these clients may fail to handle resets properly.

End-user-agents have much more freedom in what is retriable or not.

 b) To implement 2+1 correctly, Squid will need to close the _client_
 connection when the origin server sends Connection: close. We do not
 do that now IIRC.

Yes. If we don't do that today then it's a bug.

Regards
Henirk



Build failed in Jenkins: 3.3-matrix » obsd-49-x86 #14

2012-12-02 Thread noc
See http://build.squid-cache.org/job/3.3-matrix/./label=obsd-49-x86/14/

--
Started by upstream project 3.3-matrix build number 14
originally caused by: 
Started by user Amos Jeffries
Building remotely on obsd-49-x86 in workspace 
http://build.squid-cache.org/job/3.3-matrix/./label=obsd-49-x86/ws/
java.io.IOException: Failed to mkdirs: 
http://build.squid-cache.org/job/3.3-matrix/./label=obsd-49-x86/ws/
at hudson.FilePath.mkdirs(FilePath.java:973)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1306)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:679)
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:584)
at hudson.model.Run.execute(Run.java:1516)
at hudson.matrix.MatrixRun.run(MatrixRun.java:146)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:236)
Retrying after 10 seconds
java.io.IOException: Failed to mkdirs: 
http://build.squid-cache.org/job/3.3-matrix/./label=obsd-49-x86/ws/
at hudson.FilePath.mkdirs(FilePath.java:973)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1306)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:679)
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:584)
at hudson.model.Run.execute(Run.java:1516)
at hudson.matrix.MatrixRun.run(MatrixRun.java:146)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:236)
Retrying after 10 seconds
java.io.IOException: Failed to mkdirs: 
http://build.squid-cache.org/job/3.3-matrix/./label=obsd-49-x86/ws/
at hudson.FilePath.mkdirs(FilePath.java:973)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1306)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:679)
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:584)
at hudson.model.Run.execute(Run.java:1516)
at hudson.matrix.MatrixRun.run(MatrixRun.java:146)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:236)


Build failed in Jenkins: 3.HEAD-amd64-CentOS-5.3 #2166

2012-12-02 Thread noc
See http://build.squid-cache.org/job/3.HEAD-amd64-CentOS-5.3/2166/changes

Changes:

[Amos Jeffries] Prep for 3.3.0.2, 3.2.4, 3.1.22

--
[...truncated 7836 lines...]
373b802000-373ba01000 ---p 2000 fd:00 130496 
/lib64/libcom_err.so.2.1
373ba01000-373ba02000 rw-p 1000 fd:00 130496 
/lib64/libcom_err.so.2.1
373bc0-373bc02000 r-xp  fd:00 130495 
/lib64/libkeyutils-1.2.so
373bc02000-373be01000 ---p 2000 fd:00 130495 
/lib64/libkeyutils-1.2.so
373be01000-373be02000 rw-p 1000 fd:00 130495 
/lib64/libkeyutils-1.2.so
373c00-373c02c000 r-xp  fd:00 244769 
/usr/lib64/libgssapi_krb5.so.2.2
373c02c000-373c22c000 ---p 0002c000 fd:00 244769 
/usr/lib64/libgssapi_krb5.so.2.2
373c22c000-373c22e000 rw-p 0002c000 fd:00 244769 
/usr/lib64/libgssapi_krb5.so.2.2
373c40-373c424000 r-xp  fd:00 244767 
/usr/lib64/libk5crypto.so.3.1
373c424000-373c623000 ---p 00024000 fd:00 244767 
/usr/lib64/libk5crypto.so.3.1
373c623000-373c625000 rw-p 00023000 fd:00 244767 
/usr/lib64/libk5crypto.so.3.1
373c80-373c808000 r-xp  fd:00 235112 
/usr/lib64/libkrb5support.so.0.1
373c808000-373ca07000 ---p 8000 fd:00 235112 
/usr/lib64/libkrb5support.so.0.1
373ca07000-373ca08000 rw-p 7000 fd:00 235112 
/usr/lib64/libkrb5support.so.0.1
373cc0-373cc91000 r-xp  fd:00 244768 
/usr/lib64/libkrb5.so.3.3
373cc91000-373ce91000 ---p 00091000 fd:00 244768 
/usr/lib64/libkrb5.so.3.3
373ce91000-373ce95000 rw-p 00091000 fd:00 244768 
/usr/lib64/libkrb5.so.3.3
2b3963992000-2b3963994000 rw-p 2b3963992000 00:00 0 
2b39639a6000-2b39639f r-xp  fd:00 240454 
/usr/lib64/libcppunit-1.12.so.1.0.0
2b39639f-2b3963bf ---p 0004a000 fd:00 240454 
/usr/lib64/libcppunit-1.12.so.1.0.0
2b3963bf-2b3963bf3000 rw-p 0004a000 fd:00 240454 
/usr/lib64/libcppunit-1.12.so.1.0.0
2b3963bf3000-2b3963c3d000 rw-p 2b3963bf3000 00:00 0 
7fff1675a000-7fff1677 rw-p 7ffe8000 00:00 0  [stack]
7fff167cc000-7fff167d r-xp 7fff167cc000 00:00 0  [vdso]
ff60-ffe0 ---p  00:00 0  
[vsyscall]
/bin/sh: line 4:  2847 Aborted ${dir}$tst
FAIL: tests/testHttpRequest
SKIP: cache_cf.cc operator void* (not implemented).
...
OK (11)
PASS: tests/testStore

OK (4)
PASS: tests/testString
stub time| persistent connection module initialized
..
OK (10)
PASS: tests/testURL
.
OK (1)
PASS: tests/testConfigParser
...
OK (3)
PASS: tests/testStatHist
SKIP: StatHist.cc enumInit (not implemented).
SKIP: StatHist.cc enumInit (not implemented).
SKIP: StatHist.cc enumInit (not implemented).
SKIP: StatHist.cc enumInit (not implemented).
SKIP: StatHist.cc enumInit (not implemented).
SKIP: StatHist.cc enumInit (not implemented).
SKIP: StatHist.cc enumInit (not implemented).
SKIP: StatHist.cc enumInit (not implemented).
SKIP: StatHist.cc enumInit (not implemented).
SKIP: StatHist.cc enumInit (not implemented).
SKIP: StatHist.cc enumInit (not implemented).
SKIP: StatHist.cc enumInit (not implemented).
SKIP: StatHist.cc enumInit (not implemented).
SKIP: StatHist.cc enumInit (not implemented).
SKIP: StatHist.cc enumInit (not implemented).
SKIP: StatHist.cc enumInit (not implemented).
SKIP: tools.cc UsingSmp (not implemented).
SKIP: tools.cc InDaemonMode (not implemented).
SKIP: tools.cc UsingSmp (not implemented).
SKIP: tools.cc InDaemonMode (not implemented).
SKIP: cache_cf.cc operator void* (not implemented).
SKIP: tools.cc UsingSmp (not implemented).
SKIP: tools.cc InDaemonMode (not implemented).
SKIP: tools.cc UsingSmp (not implemented).
SKIP: stub_store_rebuild.cc storeRebuildComplete (not implemented).
SKIP: tools.cc InDaemonMode (not implemented).
SKIP: StatHist.cc count (not implemented).
SKIP: StatHist.cc count (not implemented).
SKIP: StatHist.cc count (not implemented).
SKIP: StatHist.cc count (not implemented).
SKIP: StatHist.cc count (not implemented).
SKIP: StatHist.cc count (not implemented).
SKIP: tools.cc InDaemonMode (not implemented).
SKIP: StatHist.cc count (not implemented).
SKIP: StatHist.cc count (not implemented).
SKIP: StatHist.cc count (not implemented).
SKIP: StatHist.cc count (not implemented).
SKIP: StatHist.cc count (not implemented).
SKIP: StatHist.cc count (not implemented).
SKIP: tools.cc InDaemonMode (not implemented).
SKIP: StatHist.cc count (not implemented).
SKIP: StatHist.cc count (not implemented).
SKIP: StatHist.cc count (not implemented).
SKIP: StatHist.cc 

Re: [PATCH] stage 2 of store url as fake store_url helper

2012-12-02 Thread Alex Rousskov
On 12/02/2012 06:55 AM, Eliezer Croitoru wrote:

 Changed the storeUrl etc to storeId and store_id.


 +const char *HttpRequest::urlStoreId()
 +{
 +   if (store_id)
 + return store_id;
 +   else
 + return urlCanonical((HttpRequest*)this);
 +}

Please replace urlStoreId with storeId. The method does not return
a URL.

Please either make the method constant or remove the cast to (HttpRequest*).

If you leave the cast in, use C++ const_cast instead of C-style cast
to make the intent clear.


Please document all new variables, functions, and methods (including
this one) using brief doxygen comments. I believe Squid style guide on
Squid wiki has more detailed instructions and examples.


 Please search your patch for tabulation characters in sources and
 replace the ones you added with spaces.

 Will do. 

And please remove whitespace non-changes, such as those in the class
SquidConfig declaration.



 +NAME: store_id_program store_url_program
 +TYPE: wordlist
 +LOC: Config.Program.store_id
 +DEFAULT: none
 +DOC_START
 + Specify the location of the executable URL rewriter to use.

The new option does not rewrite URLs. It maps URLs to store entry IDs.
Please adjust documentation accordingly and check the text you have
copied for other discrepancies. BTW, I think it is better to just say
same as directive_foo but applied to directive_bar if the copied
description would be essentially the same as some existing directive
description.


 +// IF it contained valid entry for the old URL-rewrite helper 
 protocol
 +debugs(85, DBG_IMPORTANT, ERROR: store-URL helper returned invalid 
 result code. Wrong helper?   reply);
 +break;
 +
 +case HelperReply::BrokenHelper:
 +debugs(85, DBG_IMPORTANT, ERROR: store-URL helper:   reply  , 
 attempt #  (store_id_fail_count+1)   of 2);

This code still mentions store-URL helper and URL-rewrite.


 +storeAppendPrintf(sentry, No StoreIds defined\n);

 +helperStats(sentry, storeIds, StoreIds Statistics);

 +  because all store_Ids were busy: %d\n, 
 n_bypassed);

Please be consistent in how you name the store ID mapping helper in
user-visible messages (and documentation).


Thank you,

Alex.



Re: [PATCH] stage 2 of store url as fake store_url helper

2012-12-02 Thread Eliezer Croitoru

On 12/2/2012 7:56 PM, Alex Rousskov wrote:

On 12/02/2012 06:55 AM, Eliezer Croitoru wrote:


Changed the storeUrl etc to storeId and store_id.




+const char *HttpRequest::urlStoreId()
+{
+   if (store_id)
+ return store_id;
+   else
+ return urlCanonical((HttpRequest*)this);
+}


Please replace urlStoreId with storeId. The method does not return
a URL.

OK will do.



Please either make the method constant or remove the cast to (HttpRequest*).

If you leave the cast in, use C++ const_cast instead of C-style cast
to make the intent clear.
I have used this cast without really knowing it was a C-style. I have 
used the case since there was a small problem while using it without one.

Thanks.




Please document all new variables, functions, and methods (including
this one) using brief doxygen comments. I believe Squid style guide on
Squid wiki has more detailed instructions and examples.

Will do.





Please search your patch for tabulation characters in sources and
replace the ones you added with spaces.



Will do.





And please remove whitespace non-changes, such as those in the class
SquidConfig declaration.

This was meant for later when I will write the documents.
Now I have some problem starting the helper.
I need to learn the new helpers code to understand what can and cannot 
be done later in any case will be.


Amos, if you can add some debugs to the helper reply handling parsing etc.
It took me a while to understand that squid didnt started the helper at 
all and kind of bypassed based on that the helper returned Unknown 
result.





+NAME: store_id_program store_url_program
+TYPE: wordlist
+LOC: Config.Program.store_id
+DEFAULT: none
+DOC_START
+   Specify the location of the executable URL rewriter to use.


The new option does not rewrite URLs. It maps URLs to store entry IDs.
Please adjust documentation accordingly and check the text you have
copied for other discrepancies. BTW, I think it is better to just say
same as directive_foo but applied to directive_bar if the copied
description would be essentially the same as some existing directive
description.



+// IF it contained valid entry for the old URL-rewrite helper protocol
+debugs(85, DBG_IMPORTANT, ERROR: store-URL helper returned invalid result code. 
Wrong helper?   reply);
+break;
+
+case HelperReply::BrokenHelper:
+debugs(85, DBG_IMPORTANT, ERROR: store-URL helper:   reply  , attempt #  
(store_id_fail_count+1)   of 2);


This code still mentions store-URL helper and URL-rewrite.

Thanks.





+storeAppendPrintf(sentry, No StoreIds defined\n);



+helperStats(sentry, storeIds, StoreIds Statistics);



+  because all store_Ids were busy: %d\n, n_bypassed);


Please be consistent in how you name the store ID mapping helper in
user-visible messages (and documentation).


Sorry about the docs.
It wasn't suppose to be part of the patch.
This was only for testing purposes and forgot it was there.

I will report back later tomorrow with test results.

Eliezer

--
Eliezer Croitoru
https://www1.ngtech.co.il
sip:ngt...@sip2sip.info
IT consulting for Nonprofit organizations
eliezer at ngtech.co.il


can I run a self-contained build?

2012-12-02 Thread carteriii
My approach to building  installing software usually follows a common
pattern to have different release versions in different directories and then
either just launch locally from one of those directories or use a symbolic
link to point to the current version.  This allows me to quickly install or
rollback by simply changing the launch command or symbolic link.  By
example, think of the following structure:

/home/user/Documents/squid/mybuild_v1
/home/user/Documents/squid/mybuild_v2
/home/user/Documents/squid/current - ./mybuild_v2

This allows me to use either mybuild_v1 or mybuild_v2 or I can also still
use the default package build on the machine which gets installed in various
standard locations such as /etc/squid3, /usr/lib/squid3, etc. etc.

I just realized that squid appears to maintain some absolute path references
which don't seem to like my approach of a self-contained release folder.  My
realization came from hitting a few errors like these when I moved the
top-level directory:

FATAL: MIME Config Table
/home/user/Documents/squid/mybuild_v1/etc/mime.conf: (2) No such file or
directory
FATAL: unlinkd_program
/home/user/Documents/squid/mybuild_v1/libexec/unlinkd: (2) No such file or
directory

I am now aware of the build configuration options for Ubuntu:

http://wiki.squid-cache.org/KnowledgeBase/Ubuntu#Compiling

One of those options sets --sysconfdir=/etc/squid and another sets
--libexecdir=${prefix}/lib/squid but those are absolute paths.  I'm trying
to scan the source files to figure out where  how these configuration
options get used, but I'm worried that I may miss something so I feel it's
safer to ask for confirmation.

Is it possible to run from a self-contained build?  Are there any run-time
options that mirror the functionality of the configuration options that can
be used when building?  Or can I specify a configuration option path which
is relative to either the binaries or the configuration file?

Thanks in advance.




--
View this message in context: 
http://squid-web-proxy-cache.1019090.n4.nabble.com/can-I-run-a-self-contained-build-tp4657533.html
Sent from the Squid - Development mailing list archive at Nabble.com.


Jenkins build is back to normal : 3.2-matrix » master #283

2012-12-02 Thread noc
See http://build.squid-cache.org/job/3.2-matrix/./label=master/283/changes



Build failed in Jenkins: 3.2-matrix » obsd-49-x86 #283

2012-12-02 Thread noc
See http://build.squid-cache.org/job/3.2-matrix/./label=obsd-49-x86/283/

--
Started by upstream project 3.2-matrix build number 283
originally caused by: 
Started by an SCM change
Building remotely on obsd-49-x86 in workspace 
http://build.squid-cache.org/job/3.2-matrix/./label=obsd-49-x86/ws/
java.io.IOException: Failed to mkdirs: 
http://build.squid-cache.org/job/3.2-matrix/./label=obsd-49-x86/ws/
at hudson.FilePath.mkdirs(FilePath.java:973)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1306)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:679)
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:584)
at hudson.model.Run.execute(Run.java:1516)
at hudson.matrix.MatrixRun.run(MatrixRun.java:146)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:236)
Retrying after 10 seconds
java.io.IOException: Failed to mkdirs: 
http://build.squid-cache.org/job/3.2-matrix/./label=obsd-49-x86/ws/
at hudson.FilePath.mkdirs(FilePath.java:973)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1306)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:679)
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:584)
at hudson.model.Run.execute(Run.java:1516)
at hudson.matrix.MatrixRun.run(MatrixRun.java:146)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:236)
Retrying after 10 seconds
java.io.IOException: Failed to mkdirs: 
http://build.squid-cache.org/job/3.2-matrix/./label=obsd-49-x86/ws/
at hudson.FilePath.mkdirs(FilePath.java:973)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1306)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:679)
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:584)
at hudson.model.Run.execute(Run.java:1516)
at hudson.matrix.MatrixRun.run(MatrixRun.java:146)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:236)


Re: can I run a self-contained build?

2012-12-02 Thread Amos Jeffries

On 03.12.2012 13:05, carteriii wrote:
My approach to building  installing software usually follows a 
common
pattern to have different release versions in different directories 
and then
either just launch locally from one of those directories or use a 
symbolic
link to point to the current version.  This allows me to quickly 
install or

rollback by simply changing the launch command or symbolic link.  By
example, think of the following structure:

/home/user/Documents/squid/mybuild_v1
/home/user/Documents/squid/mybuild_v2
/home/user/Documents/squid/current - ./mybuild_v2

This allows me to use either mybuild_v1 or mybuild_v2 or I can also 
still
use the default package build on the machine which gets installed in 
various

standard locations such as /etc/squid3, /usr/lib/squid3, etc. etc.

I just realized that squid appears to maintain some absolute path 
references
which don't seem to like my approach of a self-contained release 
folder.  My
realization came from hitting a few errors like these when I moved 
the

top-level directory:

FATAL: MIME Config Table
/home/user/Documents/squid/mybuild_v1/etc/mime.conf: (2) No such file 
or

directory
FATAL: unlinkd_program
/home/user/Documents/squid/mybuild_v1/libexec/unlinkd: (2) No such 
file or

directory

I am now aware of the build configuration options for Ubuntu:

http://wiki.squid-cache.org/KnowledgeBase/Ubuntu#Compiling

One of those options sets --sysconfdir=/etc/squid and another sets
--libexecdir=${prefix}/lib/squid but those are absolute paths.  I'm 
trying
to scan the source files to figure out where  how these 
configuration
options get used, but I'm worried that I may miss something so I feel 
it's

safer to ask for confirmation.

Is it possible to run from a self-contained build?


Of Course. Use --prefix=  and omit the other options which integrate it 
with the Ubuntu-specific install locations.


Alternatively
--program-suffix= can be used to append a specific value to the end of 
all the installed binaries. The configuration files and helpers all have 
config directives to alter them, and will default to the suffix'ed 
version of each file.



 Are there any run-time
options that mirror the functionality of the configuration options 
that can
be used when building?  Or can I specify a configuration option path 
which

is relative to either the binaries or the configuration file?


http://www.squid-cache.org/Doc/config/mime_table/
http://www.squid-cache.org/Doc/config/unlinkd_program/

Amos



Re: Multiple issues in Squid-3.2.3 SMP + rock + aufs + a bit of load

2012-12-02 Thread Henrik Nordström
fre 2012-11-30 klockan 15:38 -0700 skrev Alex Rousskov:
 On 11/30/2012 02:25 PM, Henrik Nordström wrote:
 
  We should look into why it is at all needed. From what I can understand
  it should not be needed.
 
 Agreed. Please do that if you can.

Found it. The event loop gets saturated with timed events during the
rebuild and never exectues comm events. There is a little premature
optimization there skipping the primary engine while checking all
others, causing comm to starve to death.

Fixed by this change. I think it's correct but review please.

diff -ru squid-3.2.3/src/EventLoop.cc squid-3.2.3-dpl/src/EventLoop.cc
--- squid-3.2.3/src/EventLoop.cc2012-10-20 15:39:49.0
+0300
+++ squid-3.2.3-dpl/src/EventLoop.cc2012-12-03 01:52:25.381523614
+0200
@@ -111,7 +111,6 @@
 // generate calls and events
 typedef engine_vector::iterator EVI;
 for (EVI i = engines.begin(); i != engines.end(); ++i) {
-if (*i != waitingEngine)
-checkEngine(*i, false);
+checkEngine(*i, false);
 }


With this
 * original timeouts works fine
 * rebuild is now done in background as expected

Unfortunately it uncovered another minor bug resulting in 100% CPU
usage. Fixed by this:

diff -ru --exclude helpers --exclude tools squid-3.2.3/src/comm.cc
squid-3.2.3-dpl/src/comm.cc
--- squid-3.2.3/src/comm.cc 2012-10-20 15:39:49.0 +0300
+++ squid-3.2.3-dpl/src/comm.cc 2012-12-03 02:59:34.713503938 +0200
@@ -2071,7 +2071,7 @@
 case COMM_OK:

 case COMM_TIMEOUT:
-return 0;
+return EVENT_IDLE;

 case COMM_IDLE:



Also in testing this I observed that fast ACLs do not do inverse of last
rule on no match. Excplicit allow/deny all rule is needed. Don't think
this is intentional.

Regards
Henrik






Jenkins build is back to normal : 3.HEAD-amd64-CentOS-5.3 #2167

2012-12-02 Thread noc
See http://build.squid-cache.org/job/3.HEAD-amd64-CentOS-5.3/2167/changes



Re: Multiple issues in Squid-3.2.3 SMP + rock + aufs + a bit of load

2012-12-02 Thread Alex Rousskov
On 12/02/2012 06:46 PM, Henrik Nordström wrote:
 fre 2012-11-30 klockan 15:38 -0700 skrev Alex Rousskov:
 On 11/30/2012 02:25 PM, Henrik Nordström wrote:

 We should look into why it is at all needed. From what I can understand
 it should not be needed.

 Agreed. Please do that if you can.
 
 Found it. The event loop gets saturated with timed events during the
 rebuild and never exectues comm events.

My understanding is that the bug is in Squid store rebuilding(?) code
that assumes that Squid will have time to do some I/O before the next
timed event is fired, but actually there is a lot of non-I/O work to do,
and the next rebuild step event becomes ready before any I/O can
happen. That event schedules another event that, again, fires before we
can get to I/Os. In this context, I/O is regular select-based I/O
orchestrated by the waitingEngine and not direct disk reading or writing
that Store code might do during rebuild.

I doubt this should be fixed by allowing I/O into the process current
non-I/O events engines loop. The problem is not in the lack of I/Os,
but in an unrealistic scheduling expectations by non-I/O code (or lack
of API to schedule those rebuild events on a realistic schedule).


 There is a little premature
 optimization there skipping the primary engine while checking all
 others, causing comm to starve to death.

It was not an optimization IIRC. The idea was that Squid should process
all non-I/O events that are ready now _before_ generating new events
triggered by new I/O. We also wanted the select loop to run once, with a
timeout as opposed to twice (per main loop iteration).


 Fixed by this change. I think it's correct but review please.

The change does not really fix the bug if I understand the bug
correctly. It just masks it. Squid would violate the intended wait x
seconds before every rebuld step schedule even more after this change.

More importantly, I am concerned that this change may inject new I/O
callbacks between old async calls if we still have some I/O callbacks
that are not done using async calls API. That is likely to create
hard-to-find bugs because the order of events (in a broad sense) will be
disrupted.

For example, the connection closing code is written under the assumption
that the very last async call scheduled by the closing code will indeed
be the last async call related to the corresponding descriptor. After
this change, this might no longer be true because some I/O processing
(for a different FD) may register new async calls for the same [closing]
FD. In the past, we struggled a lot to avoid such cases!


I can think of a couple of good approaches to solve this:

1) When entering the non-I/O engines loop, remember now and only
process timed events that are ready at that remembered time. Do not
advance that time value until we exit the loop and let I/O engine run.
This does not change the structure of the engines loop. Some events will
still be late, but there will be no disruptive injections of new I/O
activity that might not use async calls. This is probably a
smaller/easier change, but it also masks the true problem.

2) Allow [Store] to schedule events ASAP but after the next I/O loop.
This will fix the schedule as the scheduled events will no longer be
late. This solution preserves the engines loop structure but requires
an additional/new eventAdd API.

I think #2 is a cleaner solution to the problem at hand but perhaps #1
and #2 can be combined to prevent similar bugs caused by other code
(that this fix will not rewrite to use the new after next I/O API from
#2).

What do you think?


 diff -ru squid-3.2.3/src/EventLoop.cc squid-3.2.3-dpl/src/EventLoop.cc
 --- squid-3.2.3/src/EventLoop.cc2012-10-20 15:39:49.0
 +0300
 +++ squid-3.2.3-dpl/src/EventLoop.cc2012-12-03 01:52:25.381523614
 +0200
 @@ -111,7 +111,6 @@
  // generate calls and events
  typedef engine_vector::iterator EVI;
  for (EVI i = engines.begin(); i != engines.end(); ++i) {
 -if (*i != waitingEngine)
 -checkEngine(*i, false);
 +checkEngine(*i, false);
  }
 


 With this
  * original timeouts works fine
  * rebuild is now done in background as expected

Yes, but other things will probably break in subtle ways unless we are
certain that the waitingEngine (i.e., select loop) no longer triggers
callbacks that do not use async calls and, hence, cut in front of the
async call queue. This change also results in waitingEngine being called
twice in a row (once without a timeout and once with a timeout IIRC),
which feels like a bad idea from efficiency point of view.


 Also in testing this I observed that fast ACLs do not do inverse of last
 rule on no match. Excplicit allow/deny all rule is needed. Don't think
 this is intentional.

Definitely not! There are some serious known ACL processing bugs, but
this is not one of them.


Thank you,

Alex.