Re: [PATCH] Move timeout handling from ConnOpener::connect to ConnOpener::timeout
On 01/17/2013 10:56 AM, Rainer Weikusat wrote: > Alex Rousskov writes: >> On 01/16/2013 09:58 PM, Amos Jeffries wrote: >>> * timeout() happens first and runs doneConnecting() while connect() is >>> queued. >> >> Yes (timeout wins and ends ConnOpener job). > > It is possible to 'do better' here fairly easily, see more detailed > explanation below. No, it is not. It is possible to spend a lot of time on writing, reviewing, and fixing complicated code that will optimize the case that virtually never happens. Correctly detecting this race and properly handling it is not easy and is not useful. >>> - we have no handle to the connect() AsyncCall stored so we can't >>> cancel it directly (bug?). >> >> We do not need to cancel it (just like with the timeout call discussed >> above). The async call will not be delivered to the destroyed ConnOpener >> job). > > I'm rather in favour of cancelling the calls, at least where easily > possible, because this makes the debugging/ diagnostic output more > easy to understand. We should, of course, clear the stored timeout (or I/O) callback after that callback has been fired. Without that, swanSong() would try to clear a fired callback, and that would be confusing. I do not have the guts to object to us calling the cancel() method for the non-fired callback that is no longer relevant (as long as it is done consistently). However, I do not see how that improves debugging in ConnOpener case because the callback will know that the job is gone without our explicit cancellation. And I am sure somebody will eventually think that calling cancel has some useful side-effects such as closing file descriptors or freeing some memory (it does not). >> Ending the job is enough to guarantee that it will not be called via an >> async job call: > > Taking this into account and after having spent some more time > thinking on this, I think what the timeout handler should do is, > > 1. Clear calls_.timeout_. Yes. calls.timeout must reflect whether there is a timeout notification scheduled (from our job point of view). > 2. Check if temporaryFd_ is still valid. At the moment, this is always > the case, but it would prudent wrt general 'defensive programming' and > the check will become necessary for change 3|. I do not know what valid is in this context, but temporaryFd_ and earlyAbort must reflect whether there is a Comm I/O scheduled (from our job point of view). > 2a. If this was the case, check if the write handler of the > corresponding fde is still set. If I/O is scheduled, we should stop it, close the descriptor, and free memory using Comm APIs. This condition should probably be detected only in swanSong(), to avoid code duplication. The job should exit after a timeout, of course. Accessing Comm write handler pointers in some low-level table should be avoided if at all possible. If Comm APIs need to be adjusted to handle this better, we can adjust them. Amos, do you remember what "partially open FD" in the following comment meant, exactly? Why cannot we just call comm_close() to cleanup? > // it never reached fully open, so cleanup the FD handlers > // Note that comm_close() sequence does not happen for partially open FD In your latest patch, please move fd cleanup from timeout() and from doneConnecting() into a dedicated method and call that method from swanSong() if temporaryFd_ is still set there. Do not call it from doneConnecting() or anywhere else. Remember that a job may end for reasons other than some timeout or I/O. swanSong() is the cleanup method. Please do not forget to set calls.earlyAbort to NULL after canceling it in the fd cleanup method. > +ptr = static_cast(F->write_data); > +delete ptr; This is not going to work well. F->write_data now points to deleted memory. If you absolutely have to do this, please set it to NULL after delete. Thank you, Alex.
Re: [PATCH] Move timeout handling from ConnOpener::connect to ConnOpener::timeout
On 01/17/2013 02:53 PM, Rainer Weikusat wrote: > Alex Rousskov writes: >> On 01/17/2013 10:56 AM, Rainer Weikusat wrote: > Then, the ::connect > method (or some code running on behalf of that) would close > temporaryFd_ before scheduling a 'delayed connect retry' which would > open a new socket. A ::timeout running then won't need to free > write_data and won't have to cancel the early_abort call. Agreed. > If the write handler was scheduled > by the time timout runs, this happened because 'something definite' is > known about the state of the connection attempt: Either, it suceeded > or the OS returned an error. Throwing this 'real information' away in > order to save a line of code seems not right to me even if it is a > fringe case of a fringe case. As I said, one would not be just saving lines of code. One would be most likely introducing bugs (that is exactly what happened in the past with this code). We have at most two "real information" pieces: a timeout and I/O status. The first to get to ConnOpener wins. HTH, Alex.
Re: [PATCH] Move timeout handling from ConnOpener::connect to ConnOpener::timeout
On 01/17/2013 06:35 PM, Amos Jeffries wrote: >> Amos, do you remember what "partially open FD" in the following comment >> meant, exactly? Why cannot we just call comm_close() to cleanup? > > It means we have issued socket() and connect() OS calls. fd_table[] > state probably exists. But we are still waiting on either TCP response, > or the In-Progress event to run and the FD to be confirmed as open. > This is signalled by (temporaryFd_ >= 0). Why cannot we just call comm_close() to cleanup Comm state? Alex.
Re: [PATCH] Move timeout handling from ConnOpener::connect to ConnOpener::timeout
On 01/17/2013 10:28 PM, Amos Jeffries wrote: > On 18/01/2013 5:37 p.m., Alex Rousskov wrote: >> On 01/17/2013 06:35 PM, Amos Jeffries wrote: >>>> Amos, do you remember what "partially open FD" in the following comment >>>> meant, exactly? Why cannot we just call comm_close() to cleanup? >>> It means we have issued socket() and connect() OS calls. fd_table[] >>> state probably exists. But we are still waiting on either TCP response, >>> or the In-Progress event to run and the FD to be confirmed as open. >>> This is signalled by (temporaryFd_ >= 0). >> Why cannot we just call comm_close() to cleanup Comm state? > > Because comm_openex() and comm_close() are not "symmetrical" in what > they do. comm_close() involves a lot of handlers cleanup and close code > for state objects which simply do not exist yet. Why non-existing handlers and objects are a problem? We have a write handler and a close handler registered with Comm, right? Why not call comm_close() to clean them up (from Comm point of view) and close the descriptor? We do not need those handlers to be actually called (we may even cancel them before calling comm_close). We just need to clear Comm state and close the descriptor. Will comm_close() break if we call it instead of reaching into Comm guts and manipulating its tables? If yes, what is Comm missing? Perhaps we can give that missing piece to Comm so that we can comm_close()? > The FD has only had socket() and connect() done on it and a few specific > things done to the fd_table. One of which is registering the > calls_.earlyAbort_ handler as what gets run on comm_close() in case the > Squid shutdown/restart sequence or TCP close happens during this Jobs > operations. If we use comm_close() most of the required cleanup will > never happen. Please note that I am _not_ suggesting to call comm_close to clean up our job state (like a lot of Squid code does) even though I believe it would be possible if doneAll() is adjusted accordingly. I am just trying to find a way to call comm_close() to clean up Comm state without violating Comm APIs. We will clean the job's state in addition to calling comm_close(). Thank you, Alex.
Re: [PATCH] Propagate pinned connection persistency and closures to the client.
On 01/18/2013 05:28 AM, Amos Jeffries wrote: > In src/client_side_reply.cc: > > * "not sending to a closing on pinned zero reply " does not make any sense. > Did you mean "not sending more data after a pinned zero reply "? > although even that is a bit unclear. Since this is public log text I > think we better clarify it a lot before this patch goes in. "not sending more data after a pinned zero reply " is a good fix IMO. This text is printed at level 3 so only developers need to understand it. If you prefer, we can do "not sending more data after a pinned ERR_ZERO_SIZE_OBJECT" but "zero reply" has a better chance of being valid as errors change. FWIW, this check is needed because Squid produces an ERR_ZERO_SIZE_OBJECT response (as it should!), and we need to stop that response from being sent to the client while an async notification is being delivered to the client-side connection closing code. This race happens because Store still delivers responses synchronously while many other parts of the code have been already converted to use async calls. Thank you, Alex.
Re: [PATCH] Consumption of POST body
On 01/18/2013 08:14 AM, Steve Hill wrote: > The attached patch enables autoconsumption of the BodyPipe if access > isn't allowed in order to discard the request body rather than letting > it fill the buffers. Please note that the patch enables auto-consumption for all callout errors, not just access denials. Also, the trunk code is even more complex in this area. I believe your patch, when carefully ported to trunk, may enable auto-consumption for all callout errors *that have readNextRequest set*. That makes sense to me: if we are going to read the next request, we should consume the current one without closing the client connection so we should not call expectNoForwarding() that may have that side effect. The overall intent sounds correct to me. Even if we do not want to keep the connection open after responding with an error, it may be impossible to deliver the error message to a half-broken client that does not read while writing to us. However, this brings us back to trunk r11920 and bug 3420. While that fix was triggered by an assertion, it is clear from the code comments and the commit message that we were worried about a client tying Squid client connection "forever" (with or without sending data). Your changes will re-enable that behavior in some cases, I guess. More on that below. > I'm not entirely sure this is the best fix - calling > expectNoForwarding() would seem to be neater, but this also causes the > client-side socket to close, which would break authentication mechanisms > that require the connection to be kept alive. Hopefully someone with a > better understanding of the Squid internals can review the patch and > tell me if it is the right approach? Thank you very much for treading carefully and disclosing potential problems! Indeed, having two methods with approximately the same purpose and a hidden critical distinction is asking for trouble. A lot of existing expectNoForwarding-path code comments describe the situation that matches yours. Moreover, as far as I can see, we may call the old expectNoForwarding() first and then call your forwardToBitbucket() later, all for the same error! As far as I can tell, we have several cases to take care of (or ignore) here: 1. Authentication: Must read next request on the same connection so must have readNextRequest set (if not, this is probably a bug). Have to live with the possibility of the client sending forever. 2. Adaptation errors: The code may have been written to abort the client connection, but it also sets readMore or readNextRequest. This sounds inconsistent. We should review this, especially since this is the only place where we call expectNoForwarding() today! 3. Other errors: These may or may not have readNextRequest set. Let's assume that they set (or not set) that flag correctly. Suggestions: * Use expectNoForwarding() instead of adding forwardToBitbucket(). * In ConnStateData::noteBodyConsumerAborted() do not stop reading if flags.readMore is set. Just keep auto-consuming. This may prevent authenticating connections closures even though expectNoForwarding() is used. Needs to be tested: Will we correctly stop and switch to the new request when the body finally ends? * Discuss and possibly remove readNextRequest setting from ClientHttpRequest::handleAdaptationFailure(), to remove inconsistency discussed in #2 above. * Discuss whether we are OK with clients tying up Squid after an [authentication] error. Perhaps we need another timeout there, perhaps there are already mechanisms to address that, or perhaps we do not really care. What do you think? Alex.
Re: [PATCH] Avoid abandoned client connections after url_rewriter redirects CONNECT
On 01/18/2013 05:32 AM, Amos Jeffries wrote: > On 17/01/2013 6:47 a.m., Alex Rousskov wrote: >> Hello, >> >> clientProcessRequest() assumes that a CONNECT request is always >> tunneled and sets flags.readMore to false. However, if url_rewriter >> redirects the CONNECTing user, Squid responds with a redirect message >> and does not tunnel. In that case, we do want to read more. Otherwise, >> keepaliveNextRequest() will declare the connection abandoned and the >> connection descriptor will "leak" until the connection lifetime expires, >> even if the client disconnects immediately after receiving the redirect >> response. >> >> The fix delays setting flags.readMore to false until we are about to >> call tunnelStart(). >> >> The effect on CONNECT authentication (another case where CONNECT is not >> tunneled) is untested, but I hope that code continues to work because it >> should be OK with reading more requests on the [being] authenticated >> connection. >> >> These changes may also fix other similar not-tunneled CONNECT cases. >> They are not related to SslBump. >> >> >> One of the attached patches is for trunk. The other one is for v3.2. I >> did not check whether the problem exists in v3.1. >> >> >> HTH, >> >> Alex. > +1. It looks okay, although I would like confirmation from somebody > using NTLM authentication that things still work afterwards. My patches broke SslBump -- Squid started reading from the client connection before switching to SSL :-(. The attached patches were tested with SslBump (and without). They are more conservative/less invasive: I now leave readMore set to false for CONNECT until it is clear that Squid is not going to tunnel or bump. This should break fewer cases. Alex. Avoid abandoned client connections after url_rewriter redirects CONNECT. clientProcessRequest() assumes that a CONNECT request is always tunneled and sets flags.readMore to false. However, if url_rewriter redirects the CONNECTing user, Squid responds with a redirect message and does not tunnel. In that case, we do want to read more. Otherwise, keepaliveNextRequest() will declare the connection abandoned and the connection descriptor will "leak" until the connection lifetime expires, even if the client disconnects immediately after receiving the redirect response. The fix delays resets flags.readMore back to true if we call end up calling httpStart() instead of tunnelStart(). My earlier solution was to keep flags.readMore as true until tunnelStart(), but that breaks SslBump because Squid starts reading the [encrypted] client connection (trying to find the next HTTP request there) while obtaining server certificate. This happens because ConnStateData::clientAfterReadingRequests() calls readSomeData() if flags.readMore is true, even if context->mayUseConnection() is also true. The current solution is better because, besides working, it keeps flags.readMore and context->mayUseConnection() consistent. There should be no effect on CONNECT authentication (another case where CONNECT is not tunneled) because the changes should not touch authentication code paths, but that assertion has not been tested. === modified file 'src/client_side.cc' --- src/client_side.cc 2013-01-18 05:57:22 + +++ src/client_side.cc 2013-01-18 21:47:00 + @@ -2762,40 +2762,45 @@ const bool supportedExpect = (expect.caseCmp("100-continue") == 0); if (!supportedExpect) { clientStreamNode *node = context->getClientReplyContext(); clientReplyContext *repContext = dynamic_cast(node->data.getRaw()); assert (repContext); conn->quitAfterError(request); repContext->setReplyToError(ERR_INVALID_REQ, HTTP_EXPECTATION_FAILED, request->method, http->uri, conn->clientConnection->remote, request, NULL, NULL); assert(context->http->out.offset == 0); context->pullData(); goto finish; } } http->request = HTTPMSGLOCK(request); clientSetKeepaliveFlag(http); // Let tunneling code be fully responsible for CONNECT requests if (http->request->method == Http::METHOD_CONNECT) { context->mayUseConnection(true); +// "may" in "mayUse" is meaningful here: no tunnel if Squid redirects, +// needs to authenticate, or responds with an error. For now, pause so +// that we do not read and try to parse tunneled and/or encrypted data. +// We resume reading if we get to ClientHttpRequest::httpStart() or +// ConnStateData::switchToHttps(). conn->flags.readMore = false; } #if USE_SSL
Re: [PATCH] Move timeout handling from ConnOpener::connect to ConnOpener::timeout
On 01/18/2013 04:32 PM, Rainer Weikusat wrote: > Alex Rousskov writes: >> On 01/17/2013 02:53 PM, Rainer Weikusat wrote: >>> If the write handler was scheduled >>> by the time timout runs, this happened because 'something definite' is >>> known about the state of the connection attempt: Either, it suceeded >>> or the OS returned an error. Throwing this 'real information' away in >>> order to save a line of code seems not right to me even if it is a >>> fringe case of a fringe case. >> As I said, one would not be just saving lines of code. One would be most >> likely introducing bugs (that is exactly what happened in the past with >> this code). We have at most two "real information" pieces: a timeout and >> I/O status. The first to get to ConnOpener wins. > I think you're right and not checking for this is the better choice. Glad we agree on this one! Please do not waste your time on recovering or even thinking about the just-connected state when the timeout won the race. Thank you, Alex.
Re: [PATCH] Consumption of POST body
On 01/21/2013 07:04 AM, Steve Hill wrote: > I have changed to using expectNoForwarding() and removed the > stopReceiving() call from noteBodyConsumerAborted() and this appears to > work correctly: Good, thank you for checking this! We just need to decide whether reading the entire request is what we want in all cases. The earlier changes were done to stop Squid reading that request (because reading the whole requests wastes a lot of resources if the request is huge or, worse, malicious). > I don't really understand the purpose of the readMore flag, can you > explain it? It is one of the factors that Squid connection context uses to read the new request from the client connection. ConnStateData::clientParseRequests() is key here. Please note that Squid connection context is not the only object that may do reading and that, in theory, that context does not deal with reading request bodies. See also: mayUseConnection(). The whole thing is complex, messy, and, hence, buggy. >> * Discuss and possibly remove readNextRequest setting from >> ClientHttpRequest::handleAdaptationFailure(), to remove inconsistency >> discussed in #2 above. > > As mentioned above, I don't understand what readMore is actually for. It is difficult (for me) to describe all its current [side] effects, which is a part of the problem, but it is close to "Squid client side needs to read more incoming data, either for the current request or to get the next request". > However, as I mentioned in my previous email, the only time I can see us > wanting to blindly close the connection for adaptation failures is when > we've already started sending a response to the client and the > adaptation service blows up in the middle of the response body. It > doesn't _look_ like handleAdaptationFailure() handles that event though, > unless I'm mistaken? Yes, handleAdaptationFailure() may be called when adaptation fails in the middle of the response. However, handleAdaptationFailure() is for REQMOD adaptations (which may overlap with RESPMOD adaptations in some cases!). >> * Discuss whether we are OK with clients tying up Squid after an >> [authentication] error. Perhaps we need another timeout there, perhaps >> there are already mechanisms to address that, or perhaps we do not >> really care. > 4. The client makes a request that generates an error, uses chunked > encoding and sends an infinitely long body. > I haven't tested (4), but I have no reason to believe there's anything > that currently stops it from happening. I'm not sure whether or not we > should do something to prevent it - that's one for discussion. Yes, (4) is the interesting case here. We do prevent (4) on certain adaptation errors. This is what prompted the original changes in this area! It feels wrong to just undo those changes without a discussion (because they were deemed useful because they were solving a real problem for some users at the time). However, I agree that we do not prevent (4) in many (all?) other cases. So, should we a) undo the existing limited protection (because it is limited and conflicts with the cases were we do want to read the whole thing, at least as far as code is concerned); b) expand the existing protection to all cases (because it is better than allowing case #4); c) adjust behavior based on existing and/or new configuration options, to make the choice between a and b depend on the current circumstances and expand the configurable protection to all cases; or d) polish and apply your changes, resulting in protection for the previously covered cas(es) and no added protection for other cases. Wait for somebody to care enough about a more comprehensive and consistent solution to make it happen. This is essentially what my suggestions were meant to accomplish if "discuss" items lead nowhere. > On 01/21/2013 04:19 PM, Amos Jeffries wrote: >> If the data is being bropped on arrival by Squid we should be watching >> for these cases and aborting client connections which go on too long. I >> suspect the read_timout timer should be kept ticking on these reads and >> not reset to 0. That would cap the abuse time at 15 minutes per client >> rather than infinite. This idea is for option (c) above. I like it a lot because the proposed behavior is meaningful, flexible, and does not require new options. Steve, do you think you have the cycles to implement Amos' suggestion? It is not going to be easy, but it would be the best outcome of this discussion (that I can think of). Thank you, Alex.
Re: Bug 3111:
On 01/20/2013 04:09 PM, Amos Jeffries wrote: > Hi Alex, > could you clarify the situation you mention in > http://bugs.squid-cache.org/show_bug.cgi?id=3111 please? > > You said there that ConnOpener leaves events scheduled on successful > exit. But, I was of the understanding that we should NOT explicitly > clear and cancel any Calls in ConnOpener()::connected() due to > duplication with the swanSong() actions run after that successful > connect Call ended? The current ConnOpener leaves a fd_table[].timeout set under some conditions. That field is not a callback or an event, it is a time value. It has to be cleared because it is not associated with ConnOpener (and, I suspect, it may not even be strongly associated with the connection descriptor because we manipulate fd_table[] directly). The changes discussed now on squid-dev will resolve this because we will no longer rely on manipulating raw fd_table[] for timeouts and/or because we will have a single place where the timeout state is correctly cleared. HTH, Alex.
Re: [PATCH] Move timeout from ConnOpener::connect to ConnOpener::timeout
On 01/21/2013 09:53 AM, Rainer Weikusat wrote: > +// XXX: connect method installed a write handler > +// after a successful non-blocking connect. > +// Since this handler is not going to be called > +// and Comm::SetSelect is not (yet) 'smart pointer > +// aware', the Pointer object needs to be > +// destroyed explicitly here to prevent the ConnOpener > +// object from being leaked. > +// I do not think the above is 100% accurate, on two counts: We are deleting ptr because we are about to remove our callback that deletes the ptr. This is not directly related to SetSelect not being aware of smart pointers (whatever that means) -- ptr is not smart. Also, it is the ptr that leaks. The ConnOpener object leak is just a side effect of the ptr leak. I suggest the following phrasing: XXX: We are about to remove write_handler, which was responsible for deleting write_data, so we have to delete write_data ourselves. Comm currently calls SetSelect handlers synchronously so if write_handler is set, we know it has not been called yet. ConnOpener converts that sync call into an async one, but only after deleting ptr, so that is not a problem. If you agree that this phrasing is better, it can be changed when the patch is applied, no need to resubmit. The tail starting with "Comm currently ..." explains why our async conversion is not a problem. That question bothers me every time I look at this code, so I wanted to add an answer, but you may strip that answer if you do not like it. I do not see any other problems with this patch, but it is difficult for me to grasp all the interactions because of the other problems remaining in the code. For example, the timeout is still not cleared correctly on successful connect. I am OK with this going in since Rainer prefers to fix one problem at a time. I just hope we will remember/see all of the remaining problems as this work progresses :-). Personally, I would prefer a single commit fixing all known problems, but that is not a strong preference. Thanks a lot, Alex.
Re: Squid 3.3 countdown
On 01/20/2013 03:40 AM, Amos Jeffries wrote: > As I mentioned in the last release announcement for 3.3. This month I am > ticking off the requirements for 3.3 to go stable. > > As it stands we have four bugs which appear to be major enough to block > that release: > > * http://bugs.squid-cache.org/show_bug.cgi?id=3740 > - "ssl_bump none ..." apparently not working. > > You on this one Alex? Currently, I am waiting for the requested debugging information. I cannot tell whether I would be able to work on this until I know what the problem is. > * http://bugs.squid-cache.org/show_bug.cgi?id=3743 > - ssl_crtd not sending fatal startup errors to stderr > > Any taker? Christos? Yes, we will take care of that, but it is hardly a blocker! Thank you, Alex.
Logging commands: TCP logger vs Logging daemon
Hello, I noticed that "tcp://host:port" logger does not get "rotate logs" and other commands that the "daemon:foo" helper gets. The TCP logger just gets the logging line, without any prefix to accommodate future commands or extensions. This seems rather unfortunate because it prevents Squid from reporting significant logging-related events to the remote TCP logger. Log rotation is one example. Another example would be reporting the number of lost (due to connectivity problems) access.log records to a remote logger (if we start supporting such a feature). Is it to late to adjust documentation so that all logging helpers can get logging commands? If it is too late, how about adding an "understands commands" option to access_log so that loggers that can understand commands can receive them? Thank you, Alex.
Re: Wiki Abuse
On 01/19/2013 04:16 AM, Kinkie wrote: >I've now performed the cleanup. There are now 360 registered user > accounts in the wiki. Did the cleanup solve the wiki performance problem? Thank you, Alex.
Re: [PATCH] Avoid abandoned client connections after url_rewriter redirects CONNECT
On 01/18/2013 08:43 PM, Amos Jeffries wrote: > On 19/01/2013 11:53 a.m., Alex Rousskov wrote: >> On 01/18/2013 05:32 AM, Amos Jeffries wrote: >>> On 17/01/2013 6:47 a.m., Alex Rousskov wrote: >>>> Hello, >>>> >>>> clientProcessRequest() assumes that a CONNECT request is always >>>> tunneled and sets flags.readMore to false. However, if url_rewriter >>>> redirects the CONNECTing user, Squid responds with a redirect message >>>> and does not tunnel. In that case, we do want to read more. Otherwise, >>>> keepaliveNextRequest() will declare the connection abandoned and the >>>> connection descriptor will "leak" until the connection lifetime >>>> expires, >>>> even if the client disconnects immediately after receiving the redirect >>>> response. >>>> >>>> The fix delays setting flags.readMore to false until we are about to >>>> call tunnelStart(). >>>> >>>> The effect on CONNECT authentication (another case where CONNECT is not >>>> tunneled) is untested, but I hope that code continues to work >>>> because it >>>> should be OK with reading more requests on the [being] authenticated >>>> connection. >>>> >>>> These changes may also fix other similar not-tunneled CONNECT cases. >>>> They are not related to SslBump. >>>> >>>> >>>> One of the attached patches is for trunk. The other one is for v3.2. I >>>> did not check whether the problem exists in v3.1. >>>> >>>> >>>> HTH, >>>> >>>> Alex. >>> +1. It looks okay, although I would like confirmation from somebody >>> using NTLM authentication that things still work afterwards. >> >> My patches broke SslBump -- Squid started reading from the client >> connection before switching to SSL :-(. >> >> The attached patches were tested with SslBump (and without). They are >> more conservative/less invasive: I now leave readMore set to false for >> CONNECT until it is clear that Squid is not going to tunnel or bump. >> This should break fewer cases. > > > I think the correct way will be to take your first patch, but shuffle > the use of stopReading() above tunnelStart() to above both tunnelStart() > and sslBumpStart(), then add a readMore=true when sslBump setup is > completed. I do not think that is a good idea for two reasons: 1) As you know, both tunnelStart() and sslBumpStart() are called from processRequest(). A lot of things can happen _before_ we get to that method if we allow readMore to remain true while we are getting there! Some of those things will probably lead to bugs. For example, Squid may read [encrypted] data from the client socket and try to interpret that as the beginning of the next request (or the body of the current?). This is what I may have seen in the logs from take1 tests (I did not investigate, but it sounds quite possible based on the code). The key here is that ClientSocketContext::keepaliveNextRequest() and/or clientAfterReadingRequests() may start reading and interpreting again if readMore is true while doCallouts() for the previous request are still "waiting" for some async call to return. Granted, it may be possible that the same kind of bad things already happen without my patches, but I do not want to add more of them (because I would end up being responsible for fixing all of them as it would appear that I broke something that worked). 2) The problem with take1 of the patch was not just that we started reading from the client while SslBump was being setup. We also started reading from the client when responding with an error (including authentication?) or a redirect. I am seriously worried that if we do not disable client reading, we will hit assertions elsewhere on that path because Squid client-side is not very good at handling a new request while the old one is still being processed, especially in corner/error cases like the ones I just mentioned. I think it is much safer to remain conservative here and disable reading until it is known to be safe. Did the above arguments convince you that take2 version of the patch is better than what you are proposing? > If this works it confirms a suspicion of mine that sslBumpStart() should > be moved inside tunnelStart() and that function should be the place to > select between handling CONNECT as ssl-bump / Upgrade: / or blind tunnel. It would be wrong to start a non-tunneling path in a function called tunnelStart() IMO. Currently, ClientHttpRequest::processRequest() is where various "start" methods are called and where the decision to tunnel or bump is made for CONNECT requests. I think that is a reasonable design (even though the processRequest() method name is too general and should be changed). We can move CONNECT handling to a CONNECT-dedicated method, but the current code is too simple to _require_ such a method IMO. Thank you, Alex.
Re: [PATCH] schedule connect timeouts via eventAdd
On 01/23/2013 04:39 PM, Rainer Weikusat wrote: > Rainer Weikusat writes: > > [...] > >> The downside of this approach is that this means adding >> future connect timeouts will become proportionally more expensive as >> more and more connections are being established > > This is an understatement. Based on a somewhat simplified model of the > event list (initially empty, only connect timeouts), adding n timeouts > in a row would have a cost cost 2 * (n - 1) + 1 (assuming 'insertion > cost/ empy list' 1, same for removal cost) when adding and removing > the events but would be (n * (n + 1)) / 2 for the other case (sum of > 1 + 2 + ... + n). Side note: I am very glad you noticed that problem with event-based timeouts. This tells me somebody is thinking about the code beyond the itch to squash a known bug. :-) To simplify your analysis and make it more meaningful, please consider a typical steady state, where you have R new connections per second, t second connection establishment delay, T second timeout, and negligible number of actual timeouts. In that state, one event-based approach gives you R*t timeouts and the second event-based approach gives you R*T timeouts registered with Squid at any time. What are the cost of handling one event-based timeout in the first (add and forget) and second (add and remove) event-based approaches? Since we start search from the oldest event and all timeouts will go to the very end of the list, I think the costs are: add-and-forget: R*T add-and-remove: 2*R*t (*) The 2 multiplier for add-and-remove is kind of the worst case -- in general, the event we want to cancel will be in the beginning of the queue, not the end of it! If the above model is correct, the best choice should be clear by now because a typical 2*t is a lot less than a typical T, but let's try R = 1000 new connections per second, T = 60 seconds (default), and t = 0.1 second (conservative!). add-and-forget: 1000 * 60 = 60'000 operations add-and-remove: 2 * 1000 * 0.1 =200 operations Still "200" or even "100" (*) is a significant cost for this more-or-less realistic example. We can reduce that cost if we add an "add by searching from the end of the queue" eventAdd() optimization (either automated or explicit). In that case, the costs will become: add-and-forget-o: 1 add-and-remove-o: R*t Or we could go back to fd-based timeouts, but we would need to be extra careful with maintaining them using the same raw Comm manipulations that have screwed us in the past. That would be especially difficult across changing temporary FDs... Cost summary: CPU RAM current fd-based: 1 R*t add-and-forget-o: 1 R*T add-and-remove-o:R*t R*t Are my estimates correct? Is storing 60'000 events instead of 100 acceptable? I kind of doubt it is... :-( HTH, Alex.
boolean bit fields
On 01/23/2013 07:05 PM, Amos Jeffries wrote: > On 24/01/2013 7:20 a.m., Kinkie wrote: >>the attached patch turns the unsigned int:1 flags in CachePeer to >> bools. > Please retain the :1 bitmasking. My microbench is showing a consistent > ~50ms speed gain on bitmasks over full bool, particularly when there are > multiple bools in the structure. We also get some useful object size gains. Hello, FYI: With g++ -O3, there is no measureable performance difference between bool and bool:1 in my primitive tests (sources attached). I do see that non-bool bit fields are consistently slower though ("foo:0" below means type "foo" without bit fields; bool tests are repeated to show result variance): host1: > bool:0 size: 12 final: -1085972333.443588956 iterations: 100M time: > 1.206s > bool:1 size: 12 final: -1085972333.443588956 iterations: 100M time: > 1.203s > uint32_t:0 size: 20 final: -1085972333.443588956 iterations: 100M time: > 1.191s > uint32_t:1 size: 12 final: -1085972333.443588956 iterations: 100M time: > 1.525s >uint8_t:0 size: 12 final: -1085972333.443588956 iterations: 100M time: > 1.204s >uint8_t:1 size: 12 final: -1085972333.443588956 iterations: 100M time: > 1.527s > bool:0 size: 12 final: -1085972333.443588956 iterations: 100M time: > 1.204s > bool:1 size: 12 final: -1085972333.443588956 iterations: 100M time: > 1.204s host2: > bool:0 size: 12 final: -1085972333.443588956 iterations: 100M time: > 0.851s > bool:1 size: 12 final: -1085972333.443588956 iterations: 100M time: > 0.848s > uint32_t:0 size: 20 final: -1085972333.443588956 iterations: 100M time: > 0.863s > uint32_t:1 size: 12 final: -1085972333.443588956 iterations: 100M time: > 1.150s >uint8_t:0 size: 12 final: -1085972333.443588956 iterations: 100M time: > 0.849s >uint8_t:1 size: 12 final: -1085972333.443588956 iterations: 100M time: > 1.150s > bool:0 size: 12 final: -1085972333.443588956 iterations: 100M time: > 0.848s > bool:1 size: 12 final: -1085972333.443588956 iterations: 100M time: > 0.849s host3: > bool:0 size: 12 final: -1085972333.443588956 iterations: 100M time: > 0.615s > bool:1 size: 12 final: -1085972333.443588956 iterations: 100M time: > 0.615s > uint32_t:0 size: 20 final: -1085972333.443588956 iterations: 100M time: > 0.696s > uint32_t:1 size: 12 final: -1085972333.443588956 iterations: 100M time: > 0.928s >uint8_t:0 size: 12 final: -1085972333.443588956 iterations: 100M time: > 0.615s >uint8_t:1 size: 12 final: -1085972333.443588956 iterations: 100M time: > 0.928s > bool:0 size: 12 final: -1085972333.443588956 iterations: 100M time: > 0.615s > bool:1 size: 12 final: -1085972333.443588956 iterations: 100M time: > 0.615s With g++ -00, boolean bit fields become slower than plain bool as well: > bool:0 size: 12 final: -1085972333.443588956 iterations: 100M time: > 1.347s > bool:1 size: 12 final: -1085972333.443588956 iterations: 100M time: > 2.023s > uint32_t:0 size: 20 final: -1085972333.443588956 iterations: 100M time: > 1.448s > uint32_t:1 size: 12 final: -1085972333.443588956 iterations: 100M time: > 2.002s >uint8_t:0 size: 12 final: -1085972333.443588956 iterations: 100M time: > 1.371s >uint8_t:1 size: 12 final: -1085972333.443588956 iterations: 100M time: > 2.034s > bool:0 size: 12 final: -1085972333.443588956 iterations: 100M time: > 1.348s > bool:1 size: 12 final: -1085972333.443588956 iterations: 100M time: > 2.095s The same is actually true for -O3 with an older g++ v3.4.6 on a different host: > bool:0 size: 12 final: -1085972333.443588956 iterations: 100M time: > 4.468s > bool:1 size: 12 final: -1085972333.443588956 iterations: 100M time: > 6.238s > uint32_t:0 size: 20 final: -1085972333.443588956 iterations: 100M time: > 4.876s > uint32_t:1 size: 12 final: -1085972333.443588956 iterations: 100M time: > 6.209s >uint8_t:0 size: 12 final: -1085972333.443588956 iterations: 100M time: > 4.470s >uint8_t:1 size: 12 final: -1085972333.443588956 iterations: 100M time: > 6.208s > bool:0 size: 12 final: -1085972333.443588956 iterations: 100M time: > 4.471s > bool:1 size: 12 final: -1085972333.443588956 iterations: 100M time: > 6.231s To me, it looks like bit fields in general may hurt performance where memory composition is not important (as expected, I guess), and that some compilers remove any difference between full and bit boolean with -O3 (that surprised me). G++ assembly source comparison seem to confirm that -- boolean-based full and bit assembly sources are virtually identical with -O3 and newer g++ versions, while bit fields show a lot more assembly operations with -O0 (both diffs attached). Assembly is well beyond my expertise though. Am I testing this wrong or is it a case of YMMV? If it is "YMMV", should we err on the side of simplicity and use simple bool where memory s
Re: [PATCH] schedule connect timeouts via eventAdd
On 01/23/2013 08:45 PM, Amos Jeffries wrote: > On 24/01/2013 1:50 p.m., Alex Rousskov wrote: >> To simplify your analysis and make it more meaningful, please consider a >> typical steady state, where you have R new connections per second, t >> second connection establishment delay, T second timeout, and negligible >> number of actual timeouts. In that state, one event-based approach gives >> you R*t timeouts and the second event-based approach gives you R*T >> timeouts registered with Squid at any time. >> >> What are the cost of handling one event-based timeout in the first (add >> and forget) and second (add and remove) event-based approaches? Since we >> start search from the oldest event and all timeouts will go to the very >> end of the list, I think the costs are: >> >> add-and-forget: R*T >> add-and-remove: 2*R*t >> >> (*) The 2 multiplier for add-and-remove is kind of the worst case -- in >> general, the event we want to cancel will be in the beginning of the >> queue, not the end of it! > > I think there is a ^ calculation missing in the add-and-forget formula. > Because the list growth is exponential the add() cost rises > exponentially for each R+1. I do not think the list grows in a steady state that I am considering here: What we add to the list (at rate R per second) is then removed from that list (at the same rate). The length of the list is "constant" and determined by the time an event spends inside the list (T or t). If the list grows, Squid will run out of RAM so there is no steady state (an overload case that is not relevant to this discussion). > The different length of timer values on each event type above means we > cannot easily convert the list type to dlink_list. At least a full > analysis of the queue usage. Possibly implementation of both front and > back push ops switched by a check for which end is best to inject from > (more cycles spent doing the check etc). I think it would be easy to make the list work in both directions and the decision which direction to use would be either up to the caller or based on a simple comparison with the first and last event timestamp (which one is closer to the new event?). However, that does not address the RAM usage concern that both of us share (among other things). Thank you, Alex.
Re: boolean bit fields
On 01/24/2013 02:43 AM, Amos Jeffries wrote: > On 24/01/2013 7:51 p.m., Alex Rousskov wrote: >> On 01/23/2013 07:05 PM, Amos Jeffries wrote: >>> On 24/01/2013 7:20 a.m., Kinkie wrote: >>>> the attached patch turns the unsigned int:1 flags in CachePeer to >>>> bools. >> >>> Please retain the :1 bitmasking. My microbench is showing a consistent >>> ~50ms speed gain on bitmasks over full bool, particularly when there are >>> multiple bools in the structure. We also get some useful object size >>> gains. >> Hello, >> >> FYI: With g++ -O3, there is no measureable performance difference >> between bool and bool:1 in my primitive tests (sources attached). I do >> see that non-bool bit fields are consistently slower though ("foo:0" >> below means type "foo" without bit fields; bool tests are repeated to >> show result variance): ... >> To me, it looks like bit fields in general may hurt performance where >> memory composition is not important (as expected, I guess), and that >> some compilers remove any difference between full and bit boolean with >> -O3 (that surprised me). >> >> G++ assembly source comparison seem to confirm that -- boolean-based >> full and bit assembly sources are virtually identical with -O3 and newer >> g++ versions, while bit fields show a lot more assembly operations with >> -O0 (both diffs attached). Assembly is well beyond my expertise though. > At -O3 G++ is optimizing for speed at expense of code size. > -O2 is probably a better comparision level and AFAIK the preferred level > for high-performance and small code size build. Recent g++ versions optimize bitfield booleans to work as fast as full booleans starting with -O1. There is no .asm difference among -O1, -O2, and -O3 optimization levels. -O0 is the default. This means that your test results contradict mine -- all my tests show consistently worse bitfield performance when optimization is disabled (-O0). Squid defaults to -O2 in most cases, right? If my test code is representative, there will be no difference in compiled full and bitfield code at that optimization level. >> >> Am I testing this wrong or is it a case of YMMV? If it is "YMMV", should >> we err on the side of simplicity and use simple bool where memory >> savings are not important or not existent? > > I think YMMV with the run-time measurement. I had to run the tests many > times to get an average variance range on the speed even at 100M loops. > Some runs the speed was 100ms out in the other direction, but only some, > most were 50ms towards bool:1. And the results differed between flag > position and struct with 1-byte length and struct with enough flags for > 2-bytes. FWIW, my test structure includes both int and bitfields. I have not experimented with bitfield-only structures because they are not common in Squid. I cannot explain your results (especially since I do not know what your test code is). The only thing that seems to be clear is that you are using -O0 (default) which is not that relevant if we are comparing performance of a typical Squid installation since g++ applies bool optimizations starting with -O1 and most Squids are built with -O2. > I did not have time to look at the ASM, thank you for the details there. > If -O2 shows the same level of cycles reduction I think I will change my > tack... It does in my tests. I do not know how representative my test code is though. > we should be letting it handle the bitfields. BUT, we should still take > care to arrange the flags and members such that -O has an easy job > reducing them. If we leave any :1 flags, we should also make sure that all of them are using unsigned integers (or bool) as the base type. Using signed integers leads to bugs as the difference in "final" checksum below demonstrates (the final value should not change when bitfields are enabled, but does change for signed integer types): >int32_t:0 size: 20 final: -1085972333.443588956 iterations: 100M time: > 0.784s >int32_t:1 size: 12 final: 140980519.610307440 iterations: 100M time: 2.111s > int8_t:0 size: 12 final: -1085972333.443588956 iterations: 100M time: > 0.892s > int8_t:1 size: 12 final: 140980519.610307440 iterations: 100M time: 2.111s > bool:0 size: 12 final: -1085972333.443588956 iterations: 100M time: > 0.848s > bool:1 size: 12 final: -1085972333.443588956 iterations: 100M time: > 0.848s Thank you, Alex.
Re: boolean bit fields
On 01/24/2013 09:27 AM, Kinkie wrote: > From what I understand, the current consensus is to use :1'ed bools, right? I hope the consensus is to use full bools (because they are simpler and often faster) except where memory footprint of the structure is both affected _and_ important (a rare case). In those rare cases, use bool:1. HTH, Alex.
Re: [PATCH] schedule connect timeouts via eventAdd
On 01/24/2013 05:57 AM, Rainer Weikusat wrote: > Amos Jeffries writes: >> On 24/01/2013 1:50 p.m., Alex Rousskov wrote: >>> Is storing 60'000 events instead of 100 acceptable? I kind of doubt it >>> is... :-( >> I agree. > As a somewhat more general remark to that: There are two obvious > alternative implementations, namely > > - use a timing wheel, ie, a 'circular' hash table where new events are > linked onto the list at (head + interval) % wheel_size. That would > be the traditional kernel implementation of this facility (That's > from what I remember from reading about that. I've never used it > myself) > > - use a differently implemented priority queue. The usual 'obvious > choice' would be a binary heap stored in an array Since * there are well-justified concerns about RAM or performance overhead of event-based timeouts with the current event queue, * replacing legacy event queue with a better one is a serious stand-alone project, with no "obviously best" replacement available in STL, and * we need to fix ConnOpener ASAP, I suggest that we reverse the course and focus on fd-based implementation of timeouts instead. It will be efficient, and we can make it correct. I can help with that. Any better ideas for moving forward here? Thank you, Alex.
Re: [PATCH] schedule connect timeouts via eventAdd
On 01/24/2013 11:16 AM, Rainer Weikusat wrote: >> * there are well-justified concerns about RAM or performance overhead of >> event-based timeouts with the current event queue, > > The concern I originally wrote about was whether it would be better to > let the event-based timeout expire or to cancel it instead. The 'RAM > overhead' of using the event queue would be the memory needed for an > ev_entry object. That's 48 bytes, compared to the size of the > ConnOpener object figuring at 112 bytes. For add-and-forget, the RAM overhead is not a few bytes per ConnOpener object because those events stay in the queue a lot longer than ConnOpener objects stay alive. With 60'000 alive events, the RAM overhead concern is justified. For add-and-delete, there is justified CPU overhead (and no RAM overhead) IMO. >> * replacing legacy event queue with a better one is a serious >> stand-alone project, with no "obviously best" replacement available in >> STL, and > > That's easily doable in an afternoon and completely straight-forward I bet the authors of the original event queue code had similar thoughts. And we are still fixing occasional bugs in that code or add bugs while trying to interface with it. The theoretical concept is indeed straightforward, but things usually get complex when it clashes with Squid realities. And since you do not necessarily have a comprehensive view on how the event queue is used by the rest of Squid code, I am not sure you can select the right algorithm alone. This means more discussions, iterations, etc. There have been many innocent-looking projects that were later discovered to significantly drop Squid performance. And here we are talking about very performance sensitive code in the very middle of Squid main loop! So please forgive my skepticism regarding your estimates of complexity and implementation speed of the event queue optimization project... > And the fd-based timeout code does linear searches of > the FD table in order to determine when a timeout occurred which is > also not exactly a 'scalable' approach. If you are talking about the checkTimeouts() loop, it will not become longer or shorter after this project is completed, so it does not seem to be relevant? That loop has its own problems, and it can be optimized or even removed, but that would be a separate project. Please give me a few hours to implement initial fd-based ConnOpener changes so that we can discuss a specific alternative to event queue optimizations. Thank you, Alex.
Re: [PATCH] schedule connect timeouts via eventAdd
On 01/24/2013 12:51 PM, Rainer Weikusat wrote: > linked-list implementation will be good enough because [...] > I don't think it should be discarded as > 'not good enough' based on assumptions alone. Why not? If a simple model shows significant overhead, and there are no known reasons to doubt that model, why is that not a sufficient justification to try alternatives that have lower overhead? Alex.
Re: [PATCH] schedule connect timeouts via eventAdd
On 01/24/2013 02:46 PM, Rainer Weikusat wrote: > In my opinion, using > a more sensibly priority queue algorithm (still both extremely simple > to implement and 'readily available' in a canned form), insofar the > deficiencies of the simple one become actually relevant, makes a lot more > sense than devising special-case 'workarounds' for these deficiencies > because it is conjectured that they *might* otherwise become > relevant. I am all for using better queuing algorithms. However, I am against replacing one central queue implementation with another in the middle of a different and urgent project just because it is conjectured that the other implementation is going to be simple and fast enough, especially when there are reasons to believe that it cannot be fast enough (because of the fundamental properties of the problem). > Also in my opinion (and apparently according the opinion of > someone who wrote a couple of comments a la > > > /* TODO: remove these fd_table accesses. But old code still depends on > fd_table flags to > * indicate the state of a raw fd object being passed around. > * Also, legacy code still depends on comm_local_port() with no > access to Comm::Connection > * when those are done comm_local_port can become one of our member > functions to do the below. > */ > ) 'these fd_table accesses' are a really bad idea from a design > standpoint. Yes, they are. However, removing them would take a lot more than a different queue implementation. Moreover, implementing ConnOpener timeouts using events when the rest of the code is using Comm is also bad design. Alex.
Re: [PATCH] Move timeout handling from ConnOpener::connect to ConnOpener::timeout
On 01/18/2013 04:54 AM, Amos Jeffries wrote: > On 18/01/2013 6:47 p.m., Alex Rousskov wrote: >> Will comm_close() break if we call it instead of reaching into Comm guts >> and manipulating its tables? If yes, what is Comm missing? Perhaps we >> can give that missing piece to Comm so that we can comm_close()? Amos, First of all, thank you for taking the time to answer my questions. I believe this issue is an important part of ConnOpener cleanup. > * Notice that until ConnOpener has finished the FD is not open, as in > "if(isOpen(fd)) return;" produces false, so comm_close() will exit > before it gets to any cleanup. AFAICT, Comm marks FD open upon successful comm_openex() exit. ConnOpener even asserts that the temporary FD is flagged as open: > // Update the fd_table directly because conn_ is not yet storing the FD > assert(temporaryFd_ < Squid_MaxFD); > assert(fd_table[temporaryFd_].flags.open); Connection::isOpen() will still return false, but we do not care about that. We just want to call comm_close() as everybody else and comm_close() does not require Connection. > * The registered close handler is earlyAbort which flags the FD as error > instead of timeout or success. Yes. We would have to remove our close handler before closing if we do not want it to be called when we close, just like all other code using Comm I/O does. The close handler is there for ConnOpener to be notified if some _other_ code closes our FD (e.g., shutdown). > * fd_close(temporaryFd_), which we use already, cleans up all the > non-ConnOpener state which is associated with the FD. AFAICT, here is some of the other things we are missing by poorly duplicating comm_close() sequence: > ++ statCounter.syscalls.sock.closes; > > /* When one connection closes, give accept() a chance, if need be */ > Comm::AcceptLimiter::Instance().kick(); > fdd_table[fd].close_file = file; > fdd_table[fd].close_line = line; > comm_empty_os_read_buffers(fd); Are these critical? Probably not, but they do invalidate the point that our cleanup code is equivalent to comm_close(). Sooner or later, some of these will come back to hunt us. > It is simpler just to prune out the two bits of ConnOpener state in > ConnOpener itself rather than rely on the generic close sequence, which > has to do a lot of checking for irrelevant things. ConnOpener state should be pruned in ConnOpener, of course, but I think it is clear that we are not doing a good job cleaning Comm state. We duplicate some comm_close() code but missing some details. I agree that comm_close() sequence has many irrelevant things, but it is also used only in a rare case when ConnOpener fails. We do not need to optimize that path! >> I am just trying to find a way to call comm_close() to clean up Comm >> state without violating Comm APIs. We will clean the job's state in >> addition to calling comm_close(). > > ConnOpener lives in the limbo area between fd_open()/fd_close() which > sets up the TCP state on an FD and comm_open()/comm_close() which setup > and tear down the mountain of comm transaction I/O state on an FD. I do not see what our temporary FD lacks that the rest of Squid code using comm_open() and comm_close() have. I do not see why it is in that limbo area. Perhaps it was at some point but not anymore? > On the other error/timeout exit points from ConnOpener fd_close()+unwinding > is the more appropriate way to cleanup [because Comm failed to setup the > FD] It did not fail. Comm_openex() was successful. > You may indeed be able to re-write comm_close() such that it will work > on any FD, including one which failed to reach success in ConnOpener. > But I think that is a much more difficult/larger project outside the > scope of the priority bug-fixes Rainer is working on here. Perhaps I am missing something, but I do not see why we cannot call comm_close() without any rewrite. We would still need to solve several ConnOpener-specific problems, of course, the ones that are orthogonal to us not properly cleaning up Comm state on failures. Thank you, Alex.
Re: [PATCH] schedule connect timeouts via eventAdd
On 01/24/2013 04:38 PM, Amos Jeffries wrote: > I was under the impression that we had > proven the usefulness of add-and-remove model. I do not think so. I provided an estimate for add-and-remove costs and demonstrated how add-and-remove can be optimized, but those costs were high and that optimization was not sufficient IMO. Also, using eventAdd for I/O timeouts when the rest of Squid is using Comm for I/O timeouts will move us in the wrong direction. > So, with maintainer hat on please do the changes necessary to perform > add-and-remove cleanly for timeout events. *If* possible using the > event*() API instead of raw FD table access. I will submit a patch that includes add-and-remove for timeouts. If somebody wants to write a patch with clean removal of timeouts using eventDelete(), please be warned that eventDelete() code currently does not work in ConnOpener context. I will not waste your time further by describing the boring details of why it does not. Thank you, Alex.
[PREVIEW] ConnOpener fixes
Hello, The attached patch fixes several ConnOpener problems by relying on AsyncJob protections while maintaining a tighter grip on various I/O and sleep states. It is in PREVIEW state because I would like to do more testing, but it did pass basic tests, and I am not currently aware of serious problems with the patch code. I started with Rainer Weikusat's timeout polishing patch posted yesterday, but all bugs are mine. Here are some of the addressed problems: * Connection descriptor was not closed when attempting to reconnect after failures. We now properly close on failures, sleep with descriptor closed, and then reopen. * Timeout handler was not cleaned up properly in some cases, causing memory leaks (for the handler Pointer) and possibly timeouts that were fired (for then-active handler), after the connection was passed to the initiator. * Comm close handler was not cleaned up properly. * Connection timeout was enforced for each connection attempt instead of all attempts together. and possibly other problems. The full extent of all side-effects of mishandled race conditions and state conflicts is probably unknown. TODO: Needs more testing, especially around corner cases. Does somebody need more specific callback cancellation reasons? Consider calling comm_close instead of direct write_data cleanup. Make connect_timeout documentation in squid.conf less ambiguous. Move prevalent conn_ debugging to the status() method? Polish Comm timeout handling to always reset .timeout on callback? Consider revising eventDelete() to delete between-I/O sleep timeout. Feedback welcomed. Thank you, Alex. Author: Ales Rousskov Author: Rainer Weikusat Fixed several ConnOpener problems by relying on AsyncJob protections while maintaining a tighter grip on various I/O and sleep states. * Connection descriptor was not closed when attempting to reconnect after failures. We now properly close on failures, sleep with descriptor closed, and then reopen. * Timeout handler was not cleaned up properly in some cases, causing memory leaks (for the handler Pointer) and possibly timeouts that were fired (for then-active handler), after the connection was passed to the initiator. * Comm close handler was not cleaned up properly. * Connection timeout was enforced for each connection attempt instead of all attempts together. and possibly other problems. The full extent of all side-effects of mishandled race conditions and state conflicts is probably unknown. TODO: Needs more testing, especially around corner cases. Does somebody need more specific callback cancellation reasons? Consider calling comm_close instead of direct write_data cleanup. Make connect_timeout documentation in squid.conf less ambiguous. Move prevalent conn_ debugging to the status() method? Polish Comm timeout handling to always reset .timeout on callback? Consider revising eventDelete() to delete between-I/O sleep timeout. === modified file 'src/comm/ConnOpener.cc' --- src/comm/ConnOpener.cc 2013-01-16 10:35:54 + +++ src/comm/ConnOpener.cc 2013-01-25 01:42:09 + @@ -16,343 +16,422 @@ #include "ip/tools.h" #include "SquidConfig.h" #include "SquidTime.h" #if HAVE_ERRNO_H #include #endif class CachePeer; CBDATA_NAMESPACED_CLASS_INIT(Comm, ConnOpener); Comm::ConnOpener::ConnOpener(Comm::ConnectionPointer &c, AsyncCall::Pointer &handler, time_t ctimeout) : AsyncJob("Comm::ConnOpener"), host_(NULL), temporaryFd_(-1), conn_(c), callback_(handler), totalTries_(0), failRetries_(0), -connectTimeout_(ctimeout), -connectStart_(0) +deadline_(squid_curtime + static_cast(ctimeout)) {} Comm::ConnOpener::~ConnOpener() { safe_free(host_); } bool Comm::ConnOpener::doneAll() const { // is the conn_ to be opened still waiting? if (conn_ == NULL) { return AsyncJob::doneAll(); } // is the callback still to be called? if (callback_ == NULL || callback_->canceled()) { return AsyncJob::doneAll(); } +// otherwise, we must be waiting for something +Must(temporaryFd_ >= 0 || calls_.sleep_); return false; } void Comm::ConnOpener::swanSong() { -// cancel any event watchers -// done here to get the "swanSong" mention in cancel debugging. -if (calls_.earlyAbort_ != NULL) { -calls_.earlyAbort_->cancel("Comm::ConnOpener::swanSong"); -calls_.earlyAbort_ = NULL; -} -if (calls_.timeout_ != NULL) { -calls_.timeout_->cancel("Comm::ConnOpener::swanSong"); -calls_.timeout_ = NULL; -} - if (callback_ != NULL) { -if (callback_->canceled()) -callback_ = NULL; -else -// inform the still-waiting caller we are dying -doneConnecting(COMM_ERR_CONNECT, 0); +// inform the still-waiting caller w
Re: [PATCH] Move timeout handling from ConnOpener::connect to ConnOpener::timeout
On 01/24/2013 06:37 PM, Amos Jeffries wrote: >>> /* When one connection closes, give accept() a chance, if need be */ >>> Comm::AcceptLimiter::Instance().kick(); > > This one might kick us. (sorry bad pun). Heh! AFAICT, this kick() might deprive us from an FD slot later if we were using the last FD available (and now closing it because of the connect failures that we are hoping to overcome on a retry). I do not think this is a problem because ConnOpener will handle the corresponding conn_openex() "FD limit" error as any other. After all, ConnOpener may run out of FDs even if we did not kick any pending accepts. Did I miss any bad side effects of this kick()? > IIRC it was just the isOpen() and F->closing() logics that were the > problem. If that has gone away like it seems to I have no issue with > using comm_close() other than a token hand waving in the direction of > performance. Sounds good. Since this is a rare failure case, I do not think we need to optimize it at the expense of code quality. If no other side effects are spotted, and I have the time, I will try comm_close(), which will remove one more instance of direct fd_table manipulation, and a nasty/complex one! Thank you, Alex.
Re: [PREVIEW] ConnOpener fixes
On 01/24/2013 09:35 PM, Amos Jeffries wrote: > On 25/01/2013 3:06 p.m., Alex Rousskov wrote: > NP: This is way beyond the simple fixes Ranier was working on. The > changes here relys on code behaviour which will limit the patch to trunk > or 3.3. I was a bit borderline on the earlier on the size of Raniers > patches, but this is going over the change amount I'm comfortable > porting to the stable branch with a beta cycle coming to an end. I have not looked at v3.2 specifically during this project, so I do not know which behaviors make my patch incompatible with that version. I certainly do not insist that you port it to v3.2, but I am happy to help with that if you decide that it is the best way forward. > * You are still making comments about what "Comm" should do (XXX: Comm > should!). ConnOpener *is* "Comm" at this point in the transaction. Fixed comments by using specific Comm API names. > * It was probably done beforehand, but it much clearer that it happens > now that the sleep()/DelayedRetry mechanism leaks Pointer() Where do you see a leak? Please note that our static DelayedRetry event callback is always called, regardless of the job state/existence, so it will always delete the Pointer. I clarified that point by adjusting the comment in cancelSleep(). > * Looking at your comment in there about comm_close() I see why we are > at such loggerheads about it. > +++ comm_close() does *not* replace the write_data cleanup. Fixed comments that were missing two facts: 1) ConnOpener's bypass of Comm::Write() API makes COMMIO_FD_WRITECB(fd)->active() false, which means comm_close() will not call and cleanup the write handler properly. Sorry for missing that point. We should consider using Comm::Write eventually so that comm_close can do this cleanup for us instead (and so that we can use regular non-leaking job callbacks). 2) comm_close() is probably slightly broken itself because its COMMIO_FD_WRITECB cleanup condition is probably wrong. It should not matter whether there is an active callback when it comes to the SetSelect cleanup call. That call should be done unconditionally, and some SetSelect call implementations may optimize to clean only when there _was_ a write handler at some point. We have fixed a related bug #3048 (r10880) by adding those cleanup calls, but it looks like we should have been even more aggressive and make cleanup unconditional. This may explain why I see many "FD ready but there is no handler" debug lines in some cache.logs! If I am correct, fixing this may even speed things up a little as we would be polling fewer FDs :-). > * apart from the timeout unset the relevant parts of the comm_close() > sequence are all in comm_close_complete(). Not exactly. There is more potentially useful code in current _comm_close(): source code address setting and the arguably less useful comm_empty_os_read_buffers() call. And who knows what else will be added or changed there! We should call comm_close() instead of trying to take it apart and use what we think is currently relevant. IMO this is basic API sanity policy: If we call comm_open, we must call comm_close. > * the naming of "ConnOpener::open()" is ambiguous. Since it is not the > active open() operation in this Job. It is the getSocket() operation and > should be named as such. Fixed by renaming open() to createSocket(). I wanted to emphasize that this method creates something rather than just extracting something that was there before. Adjusted patch attached. Besides method renaming and comment changes discussed above, this patch * Calls comm_close() as discussed. * No longer resets COMM_SELECT_WRITE when keeping FD for the job initiator, to mimic Comm::DoSelect() optimization. We still clear our write handler, of course. Thank you, Alex. Author: Ales Rousskov Author: Rainer Weikusat Fixed several ConnOpener problems by relying on AsyncJob protections and comm_close(), while maintaining a tighter grip on various I/O and sleep states. * Connection descriptor was not closed when attempting to reconnect after failures. We now properly close on failures, sleep with descriptor closed, and then reopen. * Timeout handler was not cleaned up properly in some cases, causing memory leaks (for the handler Pointer) and possibly timeouts that were fired (for then-active handler) after the connection was passed to the initiator. * Comm close handler was not cleaned up properly. * statCounter.syscalls.sock.closes counter was not updated on FD closure. * Waiting pending accepts were not kicked on FD closure. * Connection timeout was enforced for each connection attempt instead of applying to all attempts taken together. and possibly other problems. The full extent of all side-effects of mishandled race conditions and state conflicts is probably unknown. TODO: Needs more testing, espec
Re: doZeroOnPush
On 01/24/2013 07:27 PM, Amos Jeffries wrote: > On 24/01/2013 11:36 p.m., Kinkie wrote: >> On Thu, Jan 24, 2013 at 11:28 AM, Amos Jeffries wrote: >>> On 24/01/2013 5:56 a.m., Kinkie wrote: Hi Amos, what is the best way to signal in a mempools client that the constructor is complete and that thus zeroing is not necessary? It's not immediately evident from looking at the source.. It seems that this is only done in mem.cc for string pools. >>> >>> memDataInit() has an optional bool flag. Set it to false. >> .. I still can't fully understand how it relates to the usual >> MEMPROXY_CLASS/MEMPROXY_CLASS_INLINE, and I can't find any good >> example. >> >> >> -- >> /kinkie > > I think I found the answer. And its no good... > > The objects defined with MEMPROXY_CLASS() macro are all using > MemAllocatorProxy::getAllocator() in lib/MemPools.cc to late-initialize > their pools. There is no zero flag being passed around in any of those > macros or their code paths. > > So far as I can see to make it optional we will need to make either a > template or a new macro which accepts the flag and saves it for use when > MemAllocatorProxy::getAllocator().create(...) is done. The doZero() > setter needs to be called from the same if() condition right after > MemAllocatorProxy::getAllocator().create(). > > So: > 1) > - update the macro and roll it out in one step defaulting as (true) > everywhere > - then set it to false as things get verified. > - then eventually remove it again when we have a fully no-zero state > for all classes. I think this is the best option, especially if you trust Coverity to eventually find all initialization problems. It comes with virtually no initial risk and results in very few code changes. > 2) > - add a second macro which sets it to true. > - roll out a conversion to the new maro OK, but I think #1 is better because #1 results in fewer code changes. I may change my opinion if Kinkie's tests prove that fewer doZero buffers result in significantly better performance, justifying per-class, incremental changes. > 3) > - make a template which can replace MEMPROXY_CLASS() etc. with zero > optional flag > - rollout the template as we verify classes are constructed properly Doable, but too complex IMO, without much added benefit that I can see. Thank you, Alex.
Re: Logging commands: TCP logger vs Logging daemon
On 01/24/2013 10:00 PM, Amos Jeffries wrote: > On 24/01/2013 8:52 a.m., Alex Rousskov wrote: >> Hello, >> >> I noticed that "tcp://host:port" logger does not get "rotate logs" >> and other commands that the "daemon:foo" helper gets. The TCP logger >> just gets the logging line, without any prefix to accommodate future >> commands or extensions. This seems rather unfortunate because it >> prevents Squid from reporting significant logging-related events to the >> remote TCP logger. >> >> Log rotation is one example. Another example would be reporting the >> number of lost (due to connectivity problems) access.log records to a >> remote logger (if we start supporting such a feature). >> >> >> Is it to late to adjust documentation so that all logging helpers can >> get logging commands? >> >> If it is too late, how about adding an "understands commands" option to >> access_log so that loggers that can understand commands can receive them? >> >> >> Thank you, >> >> Alex. > > There are at least two users making use of the TCP output method that I > am aware of. > > At this point it woudl be a little messy but still possibly worth doing > as a concept. > > However, this tcp: module is designed as a stream replacement of > access.log stdio output all the tools are designed to operate with. Right, but I assume that none of those stdio tools can deal with a TCP stream "as is" and if a tool is adjusted to deal with the TCP stream, it can be also adjusted to filter commands out. Even in the trivial case of a script using "nc" or similar to convert a TCP stream into a file on a remote host, that script can add a grep step to get rid of commands. I agree that we will inconvenience the existing TCP loggers. If that is a significant problem, we would have to add an "understand commands" option (if there is consensus that such logging commands are useful, of course). > Details such as when rotation is requested by the (now external) admin > or when Squid starts/stops/reconfigures have never been logged to the > access.log files before (udp: and syslog: do not get them either) and > cache.log is not sent over the tcp: module. That design seems reasonable to me as far as syslog and files are concerned: Neither a file nor syslog can handle commands. On the other hand, a custom program receiving our entries can. Please note that I am not trying to argue with your points (which are all valid) or defend a particular solution at this point. I am just discussing and evaluating available options... > As for lines lost. That might be worth recording. I am thinking a log > line " ... lost NNN lines during TCP connection outage ..." would be > sufficient ?? but logged where? * Logging the above line to cache.log would be easy, but it would be difficult for automation tools to collect that information and correlate it with actual log entries, especially if cache.log is not sent over the tcp: module to the remote processing daemon. * Logging the above line to access.log (or equivalent stream) as if it were a log entry, will violate the log format (which would be worse than adding command support IMO). Thank you, Alex.
Re: [PREVIEW] ConnOpener fixes
On 01/25/2013 09:42 PM, Amos Jeffries wrote: > On 26/01/2013 6:35 a.m., Alex Rousskov wrote: >> On 01/24/2013 09:35 PM, Amos Jeffries wrote: >>> * It was probably done beforehand, but it much clearer that it happens >>> now that the sleep()/DelayedRetry mechanism leaks Pointer() >> Where do you see a leak? Please note that our static DelayedRetry event >> callback is always called, regardless of the job state/existence, so it >> will always delete the Pointer. I clarified that point by adjusting the >> comment in cancelSleep(). > You commented earlier that the events were scheduled as AsyncCall queue. Yes, they are (after spending time in the event queue). > You commented earlier that AsyncCall queue would not run a Call if the > Job were gone. Yes, if it is a job call. (The protection is offered by the call dialer, not the caller or the callee. This is why we can have one async queue for many different types of async calls.) Calls created and placed in the async queue by the event queue are not job calls. The current event queue code converts a C-style event callback into a basic async call. It does not have enough information to create a job call or to realize that the event is storing such a job callback already. > I connected those two statements to imply the DelayRetry event would not > happen. Same as is happening for InProgress. Yes, I see where the confusion was coming from. Sorry for not being clear. This is especially messy because DelayedRetry/event queue and SetSelect/InProgress are using different callback mechanisms (SetSelect vs event queue). > Uhm, were you meaning these events are *always* run and the scheduling > happens in our wrapper which deletes the Pointer() ? Yes, ConnOpener event queue events always run (there is optional cbdata layer that offers call protection for some events but we are not using that). The async calls created from those events always fire. At that point, our handler wrapper is called. It deletes the Pointer and schedules an async job call to our job. Only that final call will not fire if the job is gone. In summary, there are two async calls per event here. The first one is a dumb callback that will always fire. That one deletes Pointer. The second one is a job callback and may not fire, but that is OK. Needless to say, all these layers will disappear when the event queue is converted to AsyncCall API, just like upper Comm levels were. This layering is very annoying IMO. Fortunately, it does not represent a serious performance problem because most performance-sensitive paths do not use events these days IIRC. > If so there is no leak. Otherwise there is. No leak AFAICT. I still need to run valgrind and other leak tests though. > On the whole I am kind of against calling comm_close() for the sheer > number of conditionals it requires to effectively do a NOP operation. It > is nicer from the code readability and completeness viewpoint, but not > performance. Even a few dozen micro-seconds in the ConnOpener and > related stages can translate to a significatn proportion of a MEM_HIT > transaction time. Fortunately, comm_close() in ConnOpener is not called for 99.99% of [memory] hits in a typical environment. In fact, it is probably not called for 99% of requests in most environments. It is only called when we needed to connect to the origin server _and_ we failed to do so. Still, we can optimize it to be more efficient. Most conditionals are probably too fast to be a serious problem, but we can get rid of the final "really close" async call when it is not needed (because no async callbacks were called). That would be a nice small optimization project if somebody wants to volunteer! Another performance optimization target could be that FD buffer cleaning loop. Perhaps there are cases where we do not need to run it at all. I have not researched it though. Another project! > Okay. However I am trying to work towards an overall goal of reducing > the things comm_close() does, by moving the cleanup of these things into > the Jobs which are setting them. If this is done as an optimization, with comm_close still guaranteeing proper and full Comm state cleanup, then it may help (see above for two related optimization ideas). Otherwise, it sounds too risky IMO. >> Adjusted patch attached. Besides method renaming and comment changes >> discussed above, this patch >> >> * Calls comm_close() as discussed. >> >> * No longer resets COMM_SELECT_WRITE when keeping FD for the job >> initiator, to mimic Comm::DoSelect() optimization. We still clear our >> write handler, of course. > > You still only delete and unset write_data, without clearing the rest of > the write state at the same time via Comm::SetSelect(). [ See the next bullet first! The bug fixed there may have caused a c
[PATCH] ConnOpener fixes
Hello, The attached patch fixes several ConnOpener problems by relying on AsyncJob protections while maintaining a tighter grip on various I/O and sleep states. This patch is identical to the last ConnOpener patch posted in PREVIEW state. I am now done with valgrind testing. No leaks were detected, although I doubt my tests were comprehensive enough to trigger all possible conditions where the unpatched code leaks. I started with Rainer Weikusat's timeout polishing patch posted a few days ago, but all bugs are mine. Here are some of the addressed problems: * Connection descriptor was not closed when attempting to reconnect after failures. We now properly close on failures, sleep with descriptor closed, and then reopen. * Timeout handler was not cleaned up properly in some cases, causing memory leaks (for the handler Pointer) and possibly timeouts that were fired (for then-active handler) after the connection was passed to the initiator. * Comm close handler was not cleaned up properly. * statCounter.syscalls.sock.closes counter was not updated on FD closure. * Waiting pending accepts were not kicked on FD closure. * Connection timeout was enforced for each connection attempt instead of applying to all attempts taken together. and possibly other problems. The full extent of all side-effects of mishandled race conditions and state conflicts is probably unknown. TODO (outside this project scope): Polish comm_close() to always reset "Select" state. Make connect_timeout documentation in squid.conf less ambiguous. Move prevalent conn_ debugging to the status() method? Polish Comm timeout handling to always reset .timeout on callback? Revise eventDelete() to delete between-I/O sleep timeout? HTH, Alex. Author: Ales Rousskov Author: Rainer Weikusat Fixed several ConnOpener problems by relying on AsyncJob protections and comm_close(), while maintaining a tighter grip on various I/O and sleep states. * Connection descriptor was not closed when attempting to reconnect after failures. We now properly close on failures, sleep with descriptor closed, and then reopen. * Timeout handler was not cleaned up properly in some cases, causing memory leaks (for the handler Pointer) and possibly timeouts that were fired (for then-active handler) after the connection was passed to the initiator. * Comm close handler was not cleaned up properly. * statCounter.syscalls.sock.closes counter was not updated on FD closure. * Waiting pending accepts were not kicked on FD closure. * Connection timeout was enforced for each connection attempt instead of applying to all attempts taken together. and possibly other problems. The full extent of all side-effects of mishandled race conditions and state conflicts is probably unknown. TODO: Needs more testing, especially around corner cases. Does somebody need more specific callback cancellation reasons? Make connect_timeout documentation in squid.conf less ambiguous. Move prevalent conn_ debugging to the status() method? Polish Comm timeout handling to always reset .timeout on callback? Polish comm_close() to always reset "Select" state? Consider revising eventDelete() to delete between-I/O sleep timeout. === modified file 'src/comm/ConnOpener.cc' --- src/comm/ConnOpener.cc 2013-01-16 10:35:54 + +++ src/comm/ConnOpener.cc 2013-01-26 07:26:36 + @@ -16,343 +16,435 @@ #include "ip/tools.h" #include "SquidConfig.h" #include "SquidTime.h" #if HAVE_ERRNO_H #include #endif class CachePeer; CBDATA_NAMESPACED_CLASS_INIT(Comm, ConnOpener); Comm::ConnOpener::ConnOpener(Comm::ConnectionPointer &c, AsyncCall::Pointer &handler, time_t ctimeout) : AsyncJob("Comm::ConnOpener"), host_(NULL), temporaryFd_(-1), conn_(c), callback_(handler), totalTries_(0), failRetries_(0), -connectTimeout_(ctimeout), -connectStart_(0) +deadline_(squid_curtime + static_cast(ctimeout)) {} Comm::ConnOpener::~ConnOpener() { safe_free(host_); } bool Comm::ConnOpener::doneAll() const { // is the conn_ to be opened still waiting? if (conn_ == NULL) { return AsyncJob::doneAll(); } // is the callback still to be called? if (callback_ == NULL || callback_->canceled()) { return AsyncJob::doneAll(); } +// otherwise, we must be waiting for something +Must(temporaryFd_ >= 0 || calls_.sleep_); return false; } void Comm::ConnOpener::swanSong() { -// cancel any event watchers -// done here to get the "swanSong" mention in cancel debugging. -if (calls_.earlyAbort_ != NULL) { -calls_.earlyAbort_->cancel("Comm::ConnOpener::swanSong"); -calls_.earlyAbort_ = NULL; -} -if (calls_.timeout_ != NULL) { -calls_.timeout_->cancel("Comm::ConnOpener::swanSong"); -calls_.timeout_ = NULL; -} - if (callback_ != NULL) { -
Re: [PATCH] ConnOpener fixes
On 01/28/2013 11:33 AM, Rainer Weikusat wrote: > Alex Rousskov writes: > > [...] > >> Author: Rainer Weikusat > > Since this has absolutely no relation to any code I ever wrote, I > don't want my name in here. If the patch is accepted, I will remove that line during commit. Thank you, Alex.
Re: Sometimes, old principles might still be good.
On 01/29/2013 03:50 AM, Reiner Karlsberg wrote: > So, after spending a > few days starting to dig into rather new source code regarding Rock, the > beginning of an old fairy tail came to my mind: > - Use a lot of comments. > - Explicitly define every variable, and comment. > - Data structures had a pre-defined layout, > Similar naming conventions were used for executable code. Squid coding guidelines and code reviews should and, for the most part, do follow those principles IMO. Not all code has been written and accepted using those principles though. Sometimes, we miss problems. Sometimes, we have to chose between the lesser of two evils, primarily due to lack of resources. Personally, I agree with and try to follow the spirit of those principles, including in my Rock work. Some of them have to be adjusted as you move from assembly to C++ (e.g., "a lot of comments" is not really the goal, "understandable code", for the lack of a better term, is). Does Rock comply with Squid coding guidelines? It tries, but there are many exceptions. Many of those exceptions stem from the fact that the APIs and code structure that Rock had to use do not comply themselves. For example, the fs/ and DiskIO/ directory structure and naming conventions are ugly; the Store API has serious flaws (polishing Store API is on my to-do list). Some exceptions were just due to shortcuts we have taken, especially when copying ugly but-known-to-work code from another Squid branch. One important rule that you have not mentioned above is "be consistent". That includes "be consistent with the code around you", which often contradicts other rules. Balancing consistency and gradual code improvement is very difficult, especially when dealing with a large body of ugly code. BTW, if you are interested in Rock, consider working with the Large Rock branch on Launchpad. I hope that code will replace the current one soon so if you want to work on significant improvements, it is best to base them on Large Rock branch IMO. If you have questions about Rock or Store in general, please do not hesitate to ask here. Thank you, Alex.
Re: [RFC] HTTP problems caused by broken ICAP servers
On 01/29/2013 03:50 AM, Amos Jeffries wrote: > The issue outlined very clearly in this squid-users mail: > http://www.squid-cache.org/mail-archive/squid-users/201301/0358.html > is that ICAP server was broken and producing a stupid response. For the record, I do not see anything broken in that ICAP server behavior, but I am biased since I wrote that server. The ICAP response in question appears to be valid from the ICAP point of view. As for the "stupidity" of the encapsulated HTTP response constructed by the ICAP service adapter, it is difficult for me to asses it since "stupid" is not an HTTP term. RFC 2616 allows GET requests with bodies. Needless to say that using such requests is not a good idea in general, especially when the body is actually empty! Based on the squid-users email, it sounds like the addition of the empty body was not intentional in this particular case. The ICAP service adapter probably added an empty body but should not have. > Can we add some better protocol idiocy detection on the ICAP responses > please? Personally, I am not a big fan of general-purpose proxies policing HTTP messages. As you probably know, I am more annoyed by thousands of "OMG, invalid HTTP request character detected! Wear a sweater on Tuesday!" warnings in cache.log than by thousands of those invalid characters :-). ICAP is a somewhat special case because ICAP responses are often generated by software under Squid admin control, but we should tread carefully because most ICAP responses are just an echo of outside HTTP messages (where ICAP 204 was not possible) and because some ICAP services are not under admin controls. > I know Squid is faithfully replicating what ICAP tol it to produce. But > we need something to flag when ICAP is telling Squid to produce a risky > HTTP syntax combination. Or something the Squid admin has configured (or > defaulted) to not be possible. I agree that it may be useful to warn or even err when the from-ICAP HTTP message violates Squid's own message admission policies. Based on your discussion with Michael, clientIsContentLengthValid() needs to be polished and moved outside client side code so that ICAP can reuse it. I suspect there are more similar validity checks that should be aggregated and placed in src/http/ and used by both Squid sides and ICAP/eCAP. Can you compile a list of checks in the existing code that you think should be shared that way? I cannot promise speedy delivery of these changes, but I can accept your request to implement those changes as time permits. Is that what you are asking for? Thank you, Alex.
Re: Propose to polish the consistency of case sensitivity in configurations
On 01/29/2013 01:49 AM, Tianyin Xu wrote: > 3. For configuration like A B=C, A and B should be sensitive while C > should be insensitive. The case sensitivity of squid.conf directive parameters (i.e., all the stuff that follows the directive name) depends on the directive. Many C values are case-sensitive (e.g., various locations). However, you should not see a lot of str*cmp() code applied to those values -- if that is how you plan locating code for planned consistency fixes. We can probably declare all known names such as directive names and directive parameter names case-insensitive. That would make their handling consistent and backward compatible. However, that "permissive" direction feels inconsistent with recent changes that, among other things, restricted the variation of "positive" keywords such as "on", "enabled", and "yes". Other than consistency with some existing options, I do not see value in making configuration names case-insensitive (unless required so by some communication protocol, of course). Variations in case just make configs messier IMO. If we were to start from scratch, I would argue that all known configuration names should be case sensitive. I find it interesting that squid.conf.documented does not document default case sensitivity. Perhaps we can use that to justify making everything sensitive by default? :-) HTH, Alex.
Re: Propose to polish the consistency of case sensitivity in configurations
On 01/29/2013 01:35 PM, Tianyin Xu wrote: > As long as the rule of case sensitivity is consistent, I > think it's good and less confusing. Anyway, identifiers like file > paths, urls, hostnames are case sensitive. Well, host names are usually case insensitive, but the exact answer (and many other similar caveats) depend on the context. > So, I should make everything case sensitive by replacing the > strcasecmp-like string comparison to strcmp. Am I right? You would have to carefully evaluate each context rather than quickly replacing everything, I am afraid. Also, I recommend waiting for other opinions before proceeding with this project: Others may prefer a different sensitivity default or even vote against any changes in this area! Thank you, Alex. > On Tue, Jan 29, 2013 at 12:21 PM, Alex Rousskov wrote: >> On 01/29/2013 01:49 AM, Tianyin Xu wrote: >> >>> 3. For configuration like A B=C, A and B should be sensitive while C >>> should be insensitive. >> >> The case sensitivity of squid.conf directive parameters (i.e., all the >> stuff that follows the directive name) depends on the directive. Many C >> values are case-sensitive (e.g., various locations). However, you should >> not see a lot of str*cmp() code applied to those values -- if that is >> how you plan locating code for planned consistency fixes. >> >> >> We can probably declare all known names such as directive names and >> directive parameter names case-insensitive. That would make their >> handling consistent and backward compatible. However, that "permissive" >> direction feels inconsistent with recent changes that, among other >> things, restricted the variation of "positive" keywords such as "on", >> "enabled", and "yes". >> >> Other than consistency with some existing options, I do not see value in >> making configuration names case-insensitive (unless required so by some >> communication protocol, of course). Variations in case just make configs >> messier IMO. If we were to start from scratch, I would argue that all >> known configuration names should be case sensitive. >> >> I find it interesting that squid.conf.documented does not document >> default case sensitivity. Perhaps we can use that to justify making >> everything sensitive by default? :-) >> >> >> HTH, >> >> Alex. >> > > >
Re: scalable event scheduling
On 01/28/2013 03:29 PM, Rainer Weikusat wrote: > so easy that even using the STL implementation wouldn't be worth the > effort Can you clarify why we should not just use one of the STL queues instead? If two pieces of code attempt to provide essentially the same functionality, why not use the standard piece? I think this high-level concern is better resolved before low-level polishing begins. The next question is about new functionality added by the patch so it remains valid even if we use a standard queue (or even leave the current queue implementation in place): > the event scheduler will store a value at the > pointed-to location which can be passed to eventDelete to cancel a pending > event 'safely' in logarithmic time, meaning, even if the event was > possibly already put onto the async queue. I cannot find the patch code which would make canceling an async queued event work. I see that eventDelete deletes ev_tag, but I do not understand how that will cancel the event if it is already in the async queue. Did I misunderstood what you meant by "cancel a pending event already put onto the async queue"? If you did mean that events can be reliably canceled until fired, can you point me to the right place in the patch or explain how the code will prevent an async queued event from firing if that event is canceled while waiting in the async queue? Thank you, Alex.
Re: Partial contenet,206
On 01/30/2013 06:11 AM, Eliezer Croitoru wrote: > Is there any plans for collapse forwarding? > I couldn't see any written one in 3.3 or 3.4 . Yes, please see bug #3495 which is dedicated to that feature: http://bugs.squid-cache.org/show_bug.cgi?id=3495 Alex.
Re: scalable event scheduling
On 01/30/2013 09:30 AM, Rainer Weikusat wrote: > Alex Rousskov writes: >> On 01/28/2013 03:29 PM, Rainer Weikusat wrote: >>> so easy that even using the STL implementation wouldn't be worth the >>> effort >> >> Can you clarify why we should not just use one of the STL queues >> instead? > > I agree with your opinion that "the STL doesn't contain an 'obviously > good' choice for a priority queue implementation in the given > context". I am glad we agree on this, but I think that STL must be seriously considered before we add something custom. The overhead of making that decision was one of the reasons why I considered optimizing event queue a stand-alone, non-trivial project. I see no reason to optimize Squid event queue now, but if the consensus is that it should be optimized, I think we should evaluate usability of standard STL queues as the first step. >>> the event scheduler will store a value at the >>> pointed-to location which can be passed to eventDelete to cancel a pending >>> event 'safely' in logarithmic time, meaning, even if the event was >>> possibly already put onto the async queue. > The purpose of the eventDelete routine is not to cancel a call the > event scheduler already put onto the async call queue but to prevent > such a call from being put onto this queue if this is still > possible. And I didn't intend change that. Noted, but I am still missing something: Why add a new event tag API if the existing eventDelete() code can already cancel events outside the async queue? [ If the new tag API is needed, and if the cancellation API is documented, please clarify that only events currently outside the async queue can be canceled.] I do not think such "occasionally working" cancellation is a good idea though. The caller code would have to take different actions depending on whether the event was actually canceled or not. And, in some cases, the correct "not canceled" action would be difficult to implement as the ConnOpener use case shows. I also think that the need for explicit destruction of the event "tag" by the caller is a serious downside because it increases the risk of memory leaks. FWIW, here is how I think the event cancellation should be supported instead: 1. Another form of eventAdd() should be added to accept caller's async call. New code, especially async jobs, should use that form (jobs will and avoid call wrappers that way). Old code that wants to cancel events should use this new form as well. 2. The caller may store and cancel that async call at any time prior to its firing, just like any other async callback. 3. When the event is ready for the async queue, the event scheduler should put the same async call from #1 into the async queue. The current event scheduler already creates an async call object for nearly all events anyway, so this does not add CPU overhead. The only overhead is that we will consume the same [small] amount of RAM sooner. In exchange, we get guaranteed call cancellation, job protection, and improved debugging. Alex. P.S. I have also considered the following alternative, but rejected it because to use event cancellation new caller code would have to be written anyway, so we do not need to preserve the current eventAdd() parameters for that new code: 1. eventAdd() should create, store, and return an async call. 2. The caller may store and cancel the returned async call at any time prior to its firing, just like any other async callback. 3. When the event is ready for the async queue, the event scheduler should put the stored async call into the async queue.
Re: [PATCH] Remove ErrorState::flags
On 01/30/2013 10:30 AM, Kinkie wrote: > Hi, > the attached patch removes the data member ErrorState::flags. It > contains a single flag which is set once and never read. > > (the patch looks huge because I've added lots of context, it's only a > few lines). +1 Alex.
Re: scalable event scheduling
On 01/30/2013 07:12 PM, Rainer Weikusat wrote: > Alex Rousskov writes: >> FWIW, here is how I think the event cancellation should be supported >> instead: >> 1. Another form of eventAdd() should be added to accept caller's async >> call. New code, especially async jobs, should use that form (jobs will >> and avoid call wrappers that way). Old code that wants to cancel events >> should use this new form as well. > This would imply keeping cancelled events on the event scheduler queue > until they expire 'on their own' because the metainformation necessary > to remove them efficiently from this queue is not available in this > way. The API I sketched does not preclude supporting efficient removal of canceled events from the queue. Those are two separate issues! For example, the event queue code could efficiently map the supplied async call pointer to the event when deleteEvent() is called. Using that event pointer, the event can be deleted efficiently. > A better idea would be to add an interface which takes a call instead > of the function pointer + argument pointer as argument That is exactly what step #1 in my sketch (quoted above) describes. > so that all > options would be available to be used whenever their use happens to > make sense (an even better idea would be to integrate these two > separate things in some 'closure-like' object and to relieve users > from the burden of worrying about two pointers instead of one but > that's sort of a digression). I am not sure what you mean by "options" but async calls are closure-like objects, essentially. >> The current event scheduler already creates an async call object for >> nearly all events anyway, so this does not add CPU overhead. The only >> overhead is that we will consume the same [small] amount of RAM >> sooner. > > It actually does this for all events. No, it does not do that for events removed from the event queue. While that rarely happens, my sketch will add an overhead in those cases (but so will any scheme that allocates another handle for the event). > This kind of double-indirection > is what makes getting rid of an even which is possibly still on the > event-scheduler queue so difficult I do not see the difficulty: the current code already supports removal from the event queue (it just needs some polishing to carefully relax "must be here" assertions). And if the caller has the async call, then dealing with the events already in the async queue stops being a problem. > Unless there's (again) something I'm missing, running event > scheduler events directly from the EventScheduler::checkEvents routine > should be considered as 'general case'. 'Running async calls' could be > provided on top of that using a suitable, general 'event running > routine'. I am not sure I understand the above, so please do not interpret the comments below as a criticism of some sort, just an explanation of why events are supported using async queue and not the other way around: * Early async calls were implemented via events. We consciously went away from that model. * Most events are rare and deal with performance-insensitive code. They are for the code that wants (and can afford) to wait. Nearly all events with zero timeouts have been or should be replaced with async calls(**). * Async queue is the opposite of the event queue as far as performance is concerned: It is in the middle of most performance-sensitive processing and, hence, needs to be as fast as possible. What is currently missing is async call support in the event queue, adding job protection and reliable event cancellation. It is quite easy to add as sketched in the previous email. Implementing that is a nice (a little challenging but compact and generally useful) project looking for a volunteer. HTH, Alex. (**)With an important (but rare) exception of so called "heavy events". Those heavy events can afford to wait a little as well though.
Re: scalable event scheduling
On 01/30/2013 06:30 PM, Amos Jeffries wrote: > BUT this is not the same project Rainer is working on (replacing > the engine, not the API). Please note that my sketch does not replace the existing event API, it only adds to it (just like Rainer's patch does). There were two primary changes in the patch that Rainer has posted: A) An event queue engine replacement. B) An additional event cancellation API. In my response, I addressed both (because the patch contained both changes), but I do consider them separate. For example, it is possible to replace the event queue engine without providing an additional event cancellation API. It is also possible to provide an additional event cancellation API without replacing the engine. And both can be done at the same time, of course! Cheers, Alex.
Re: scalable event scheduling
On 01/30/2013 06:30 PM, Amos Jeffries wrote: > I've kind of gone off the idea of leaving old code using a wrapper API > with the old behaviour. All that we have got out of that approach is > added technical debt for others whoc have to ome along afterwards and > learn two API interfaces then decide which ne to remove I agree that adding a conflicting API is bad, so I would like to add a fourth step to my sketch: 4. Upgrade all users of eventDelete(f,a) API that currently supply a non-NULL second argument to use new eventDelete() that supplies an async call (from step #1). If possible, convert all other eventDelete users to the new eventDelete(). If not possible, rename the old two-argument eventDelete(f,NULL) to a single-argument eventDeleteAll(f) call. This fourth step will eliminate or, in the worst case, minimize doubts and conflicts for future API users. > It is far better, easier, simpler, safer to have the one person who > understands the change properly, replace all the existing API fuctions > in one polish patch. This will also teach them if there is some weird > caller hidden somewhere that breaks under the new design. AND, will > assist with portage by causing build errors when the API from newer code > is not converted to the older versions required API. Agreed. Unfortunately, it is often the case that the "one person who understands the change properly" is nowhere to be found, too busy, is viewed as disrupting attempts to fix something by others, and/or multiple people think they understand the change properly, but their understanding differs :-). If we had more long-term contributors, we could probably solve this by officially designating people to be permanently responsible for code areas they know best. For now, we all have to modify and review code we do not fully understand and deal with the consequences. I hope we can continue to do that in a civil way. Cheers, Alex.
Re: [PATCH] ConnOpener fixes
On 01/30/2013 08:18 PM, Amos Jeffries wrote: > One performane optimization I can see: > > test for the > callback_->cancelled() case and debugs() report it instead of scheduling. Will add. > The result of omitting this check saves one if() test per connection by > adding the full overheads of memory, scheduling, and CPU for one > AsyncCall queue operation per connection. ConnOpener operates on the > server-side where multiple client-side Jobs may result in TIMEOUT or > ABORTED. This cancellation case is a small but common case. FWIW, I consider the fact that Squid schedules dead-end async calls a performance bug. IIRC, I made ScheduleCall() return bool specifically to return false when there is no reason to schedule the call (because it cannot be dialed). AFAIK, it is a common enough case to be worth an extra check -- I see a lot of dead calls scheduled in production debugging logs, especially around job termination time. It would be nice to investigate whether that check was removed (why?) or never written. It would not prevent AsyncCall memory allocation, but it would still save a lot of cycles _and_ improve code quality and debugging. In the same time, somebody should investigate whether asyncCall() can return nil when the dialer cannot dial -- that would prevent even the memory allocation for the stillborn async call! Thank you, Alex.
Re: scalable event scheduling
On 01/31/2013 07:21 AM, Rainer Weikusat wrote: > Alex Rousskov writes: >> There were two primary changes in the patch that Rainer has posted: >> >> A) An event queue engine replacement. >> B) An additional event cancellation API. > These are not two 'separate, primary changes': One of the features of > the 'other' data structure is that it enables efficient removal of > events Motivation and development plans are often important in understanding changes, of course, but when deciding what to do with the code, we have to step back a little from the specific problem the code was attempting to solve and look at what the code does. When we do that to your patch, the existence of two distinct changes that can (and IMO should) be evaluated separately becomes apparent IMO. This view in no way damages or diminishes your contribution. It just makes it easier to correctly evaluate it. The same patch can also be evaluated "as a whole" if the reviewer prefers that approach, of course. > [*] Don't assume that everyone you don't yet know must be more-or-less > a student warily trying his first 'real-world' programming steps. I have seen no evidence that somebody have assumed the above on these threads. I certainly have not. Just because somebody thinks there is a better way to do something (or that something should not be done), does not mean they think badly of others! And surely, as a seasoned developer, you know that even programming gurus make mistakes, especially when dealing with a new-to-them code base. Setbacks are unavoidable, but you may find the whole process more rewarding and efficient by accepting that smart, honest, and well-meaning people can disagree. A disagreement on this list is virtually never a sign of some kind of ill intent or hidden agenda. Assuming that (and especially acting on that assumption) only hurts the process. We are all here to make Squid better, even though that means different things to different people. And while forking is an important right, Squid would not exist if it was forked every time an idea or patch was modified or shot down. HTH, Alex.
Re: [RFC] sketch of a 'unified cancellation' approach
On 01/31/2013 10:08 AM, Rainer Weikusat wrote: > All of Alex' objections to the cancelling are - of course - completely > well-founded and this should really work in the way he described > it. The problem is just "that's not how the existing code works". Yes, but the four steps I sketched out recently are based on the existing code and do _not_ require significant modifications of that code: We already have cancellable async calls and we already have an event queue that creates and uses them internally. We just need to adjust the queue so that it accepts an async call from outside instead of secretly creating it. What you describe below goes a step further (more modifications, farther from the existing code). This is not a critique of what you have described -- just a note that it would be necessary to do something like those four simpler steps anyway, and it is best to do them first, before adding more/higher layers. > I > spent some time thinking how a 'nicer' cancellation subsystem with all > desirable properties (both 'simple to use' and 'efficient to > implement' could look like). Below is a sketch of that, feedback very > much appreciated (May overlap with existing facilities. ATM, this is > supposed to be abstract). > > - create abstract base class 'call queue' with virtual methods > 'schedule' and 'cancel' > > - create another abstract base class 'call' with virtual methods > 'take' and 'cancel' > > Something desiring to schedule a call would then create a call object > and pass that as argument to the schedule method of a 'call queue' object. > This object would do whatever is necessary to put this call onto its > queue and invoke the 'take' method in order to take ownership of this > call, passing a pointer to itself and 'some data item' as > arguments. Assuming the original creator of the call wants the cancel > it, it would invoke the cancel method of the call which would - in > turn - invoke the cancel method of its current owner, passing a > pointer to itself and the data item handed over when the 'call queue' > took ownership. The 'call queue object' cancel method would then use this > information in order to 'get rid of the call', whatever may be > necessary to do that. This is the right design IMO. We can indeed add the notion of "being queued" to the existing async calls so that when AsyncCall::cancel() method is called, the call object knows which "queue" to notify about the cancellation. Some queues may not do anything about that notification (e.g., the async queue), some will remove the call (e.g., the event queue). If implemented on top of the four steps discussed earlier, it will simplify callers that use events: Those callers will be able to do call->cancel() instead of remembering to call eventDelete(). If you decide to work on this, please use the following plan: 1. Implement the four steps sketched earlier. They (or something like them) are required to implement your design above anyway. Guide that code through squid-dev review to the official commit. 2. Implement CallQueue, the abstract queue interface you described above. Add the corresponding AsyncCall::enqueued(CallQueue *) and AsyncCall::dequeued(CallQueue *) methods to AsyncCall, to maintain "current queue" information inside async calls. The dequeued() call does not really need a parameter, but I think it would be nice to assert that the queue has not changed. The AsyncCall::cancel() method should call queue->dequeue(*this) if queue is set. Post as PREVIEW. 3. Make existing async queue (AsyncCallQueue) and the existing event queue (EventScheduler) children of CallQueue. Implement pure virtual methods (enqueue and dequeue) in each child. Post as PATCH. The above plan will probably need some adjustments, but I hope it has enough specifics to point you in the right direction as far as existing code modifications are concerned. I think this plan will allow us to improve event handling without getting bogged down in discussions about the need and the means for a more efficient internal event queue implementation. And without precluding that more efficient implementation in the future. It also has two intermediate steps to minimize losses if there is a disagreement on the specifics and something needs to be redone. Thank you, Alex.
Re: [PATCH] ConnOpener fixes
On 01/30/2013 08:18 PM, Amos Jeffries wrote: > One performane optimization I can see: > > * if the callback_ is != NULL but has been cancelled by the receiving > Job. Your code now calls sendAnswer() and schedules the already > cancelled Call. > Previously this was tested for in swanSong() to simply NULL the > callback_ pointer instead of scheduling a dead Call. > NP: it would probably be a good idea to test for the > callback_->cancelled() case and debugs() report it instead of scheduling. > > > +1. Other than the above performance tweak this looks fine for commit. Added the extra check and committed to trunk as r12629. Please let me know if you need my help with porting this to v3.2. Thank you, Alex.
[RFC] Peek and Splice
Hello, Many SslBump deployments try to minimize potential damage by _not_ bumping sites unless the local policy demands it. Unfortunately, this decision must currently be made based on very limited information: A typical HTTP CONNECT request does not contain many details and intercepted TCP connections are even worse. We would like to give admins a way to make bumping decision later in the process, when the SSL server certificate is available (or when it becomes clear that we are not dealing with an SSL connection at all!). The project is called Peek and Splice. The idea is to peek at the SSL client Hello message (if any), send a similar (to the extent possible) Hello message to the SSL server, peek at the SSL server Hello message, and then decide whether to bump. If the decision is _not_ to bump, the server Hello message is forwarded to the client and the two TCP connections are spliced at TCP level, with Squid shoveling TCP bytes back and forth without any decryption. If we succeed, the project will also pave the way for SSL SNI support because Squid will be able to send client SNI info to the SSL server, something that cannot be done today without modifying OpenSSL. I will not bore you with low-level details, but we think there is a good chance that Peek and Splice is possible to implement without OpenSSL modifications. In short, we plan using OpenSSL BIO level to prevent OpenSSL from prematurely negotiating secure connections on behalf of Squid (before Squid decides whether to bump or splice). We have started writing BIO code, and basic pieces appear to work, but the major challenges are still ahead of us so the whole effort might still fail. There are a few high-level things in this project that are not clear to me. I hope you can help find the best solutions: 1. Should other bumping modes switch to using SSL BIO that is required for Peek and Splice? Pros: Supporting one low-level SSL I/O model keeps code simpler. Cons: Compared to OpenSSL native implementation, our BIO code will probably add overheads (not to mention bugs). Is overall code simplification worth adding those overheads and dangers? 2. How to configure two ssl_bump decisions per transaction? When Peek and Splice is known to cause problems, the admin should be able to disable peeking using CONNECT/TCP level info alone. Thus, we probably have to keep the current ssl_bump option. We can add a "peek" action that will tell Squid to enable Peek and Slice: Peek at the certificates without immediately bumping the client or server connection (the current code does bump one or the other immediately). However, many (most?) bumping decisions should be done when server certificate is known -- the whole point behind Peek and Splice. We can add ssl_bump2 or ssl_bump_peeked that will be applied to peeked transactions only: ssl_bump peek safeToPeek ssl_bump none all ssl_bump_peeked server-first safeToBump ssl_bump_peeked splice all Is that the best configuration approach, or am I missing a more elegant solution? If there are any other Peek and Splice suggestions or concerns, please let me know. Thank you, Alex.
Re: [RFC] Peek and Splice
On 02/01/2013 06:47 AM, Marcus Kool wrote: > This Peek&Splice feature will make ssl_bump a useful feature since > without Peek&Splice ssl_bump aborts all non-SSL CONNECTS from Skype > and other applications, so the user community will certainly welcome this. Well, SslBump is already useful in environments where non-SSL CONNECTs are either prohibited or can be detected and bypassed using CONNECT or TCP-level information. Peek and Splice will allow bypass of non-SSL tunnels without building complicated white lists. While not in this project scope, Peek and Splice would probably make it possible (with some additional work) to allow Squid to detect and block non-SSL tunnels without bumping SSL tunnels. That could be useful in environments where HTTPS is allowed (and does not need to be bumped) but other tunnels are prohibited. > Currently Squid only sends to the ICAP server a >REQMOD CONNECT www.example.com:443 (without content) > and there is never a RESPMOD. > I, as author of ufdbGuard and the (yet unpublished) new ICAP content > filter, > would welcome very much if the data of the peeks (client and server) > is encapsulated into ICAP requests for the obvious purpose of > content filtering. Squid already sends bumped (i.e., decrypted) HTTP messages to ICAP and eCAP. If that does not happen in your SslBump tests, it is a bug or misconfiguration. Squid cannot send encrypted HTTP messages to ICAP or eCAP -- you must use SslBump if you want to filter encrypted traffic. There is no way around that. Or are you thinking about sending SSL Hello messages to ICAP and eCAP services? If Peek and Splice succeeds, that will be technically possible as well, but will require more work and would be a separate project. Thank you, Alex.
Re: [RFC] Peek and Splice
On 01/31/2013 11:24 PM, Amos Jeffries wrote: > On 1/02/2013 5:17 p.m., Alex Rousskov wrote: >> 2. How to configure two ssl_bump decisions per transaction? >> >> When Peek and Splice is known to cause problems, > What problems and how would this be known? I will answer the second part first: Problems will be detected because Squid will respond with error messages and/or users will complain that "something does not work". No new magic here. This is how current SslBump (and pretty much any other Squid feature) problems are detected. One example of a problem that will probably happen sooner or later is SSL cypher incompatibility: client supports cyphers A and B, Squid supports cyphers B and C, server supports cyphers C and D. Regular SslBump is possible, but Peek and Splice will fail (if Squid mimics client Hello when talking to the server) because client and server have no common cyphers. In this particular example, the non-proxied connection would probably have failed as well, so it is not a big deal, but one can imagine an SSL server that responds differently to different clients, making Squid look bad. Another example is non-SSL traffic that does not start with some kind of Hello message from the client. In this case, Squid will wait for SSL Hello but nothing will come. Squid will timeout. Initial Peek and Splice implementation will probably handle that as a regular error. Eventually, we will probably have to make behavior on timeouts and other soft errors configurable (e.g., on_peek_timeout=tunnel). >> ssl_bump peek safeToPeek >> ssl_bump none all >> >> ssl_bump_peeked server-first safeToBump >> ssl_bump_peeked splice all >> >> Is that the best configuration approach, or am I missing a more elegant >> solution? > What about making "peek" a flag that can optionally prefix any of the > ssl_bump ACL actions... > > ssl_bump client-first foo !safeToPeek > ssl_bump peek client-first foo > ssl_bump server-first !safeToPeek > ssl_bump peek server-first safeToBump > ssl_bump none all I like that, but I do not think it will work because the ACLs you want to use for the "to peek or not to peek?" decision are going to be different from the ACLs you want to use for "to bump or not to bump _after_ we peeked" decision. The former ACLs would use CONNECT/TCP-level information. The latter ACLs would use SSL certificate information. Thus, you do not want to use the same set of ACLs for one ssl_bump rule. In other words, the decision what to do after we peek cannot be done before we peek. That is why we want to peek in the first place... In your specific example, the "safeToBump" ACL cannot be evaluated until we peek. > Overall the whole thing is getting complex and risks making a huge mess. I agree that this will increase code complexity. I am afraid this is a necessary price to pay for safer, more precise SslBump. As for the mess, I am not sure what you mean exactly, but I do not think Peek and Splice should be allowed to stay in Squid unless it works well in some environments and can be disabled in others. > Time to reduce the options. I do not think it is realistic to expect us to be able to reduce the number of configuration options when adding a new optional feature. I would very much prefer to keep the number of options the same, but we need to find a way to do that. Currently, I cannot offer anything better than adding ssl_bump_peeked or similar, but I hope others can think of something. > This *is* a quazi-legal feature after all. I do not like using scary but undefined terms like that. Virtually any feature is going to be illegal somewhere and useful elsewhere. After all, there are situations where unfiltered internet access is illegal and situations where unfiltered internet access is legal but lethal. I think that we should continue to allow features that significant portions of the community find useful, especially when those features do not violate internet protocols. Let our users decide what is legal and/or moral in their environment. Thank you, Alex.
Re: [RFC] Peek and Splice
On 02/01/2013 09:08 AM, Marcus Kool wrote: > On 02/01/2013 01:48 PM, Alex Rousskov wrote: >> Or are you thinking about sending SSL Hello messages to ICAP and eCAP >> services? If Peek and Splice succeeds, that will be technically possible >> as well, but will require more work and would be a separate project. > I was thinking about this: when Squid peeks at the data and finds that it > is non-SSL, send it to the ICAP server to ask its opinion. > This is obviously more work, but also extremely useful, since a > content filter is only useful if it is able to inspect _all_ content, > and consequently the feature of Squid to connect to content filters > is only useful if Squid sends _all_ data to the content filter for > analysis. > > Perhaps needless to say: virusses like to communicate in non-standard > ways to Squid would be considered much more secure if it sends _all_ data > to an ICAP server for analysis. I agree with the general "everything we proxy should be available for analysis" principle. Getting to that point would be difficult because protocols and APIs such as ICAP, eCAP, external ACL helper, and url_rewriter were not designed to deal with "everything". They need to be tweaked or extended to work with non-HTTP traffic. We already do that in some cases (e.g., FTP) but more is needed to handle "everything". Cheers, Alex.
Re: int to intXX_t conversion
On 02/01/2013 04:01 AM, Kinkie wrote: > Is there any objection for me to entertain myself doing "int" -> > "intXX_t" and "unisgned int -> uintXX_t" conversions in the meanwhile? > This seems to be in line with c++11 changes, and should have no > adverse effects. The immediate adverse effect would be conflicts with other pending features, of course. What is the motivation behind this? Why intXX_t is better than int when we do not really care about the exact variable size? Thank you, Alex.
Re: int to intXX_t conversion
On 02/01/2013 10:20 AM, Kinkie wrote: > On Fri, Feb 1, 2013 at 6:08 PM, Alex Rousskov wrote: >> On 02/01/2013 04:01 AM, Kinkie wrote: >>> Is there any objection for me to entertain myself doing "int" -> >>> "intXX_t" and "unisgned int -> uintXX_t" conversions in the meanwhile? >>> This seems to be in line with c++11 changes, and should have no >>> adverse effects. >> What is the motivation behind this? Why intXX_t is better than int when >> we do not really care about the exact variable size? > It adds clarity, and a predictable cross-platform, cross-os and > cross-architecture memory layout. I am not sure int32_t adds "clarity" compared to "int" but it is difficult to argue about such vague concepts, and I will not in this case. Renaming "int" certainly has some pros. However, you have not listed any "cons". Are there really none? I think I can come up with a few: - There is currently no criteria which explicit int size to use. I think you have to develop and get them reviewed _before_ making these widespread changes. And we all would have to obey the final rules in the new code. This does add some overhead now and long-term. - Saying "I did not think about the number of bits" is better than implying "I did think about the number of bits" when, in fact, I did not. A typical "int i = func();" code is a good example: If I see int32_t there, I would think that somebody carefully checked that func() can never return something bigger than that. Of course, you can carefully check every int in Squid, but that would take a lot of time that is better spent elsewhere. - Is it possible that on some platforms, int is not int32_t and is faster than int32_t? Now or in the foreseeable future? We know that on some platforms int_fast32_t is not int so perhaps the "opposite" can also be true? And if int is and will always be int32_t (e.g., for backward compatibility reasons), then perhaps spelling it differently is pointless? - the visual distinction between signed and unsigned becomes less pronounced when you use [u]intXX_t names. - more ink for the more common "int" case > It is of course a matter of opinion in the end. It seems that I am > being more aggressive than the average with legacy code, especially > for "search and replace" things which don't require much brain power > such as this one. Is "int" really a "legacy", outdated concept not recommended for new code by C++ gurus? From my non-guru point of view, "int" is preferred when we do not want to think about exact bit sizes (whether that "not to think" decision is correct or not is a separate question, but the type accurately reflects that decision). > Besides StringNG, I'm trying to apply myself to side things such as > cleanup, c++-ification, STL-ification (in place of the plethora of > linked list implementations we have in place now), etc. Nothing wrong with that, of course, but I hope I can convince you to add bug fixing to the above list :-). I believe Amos mentioned 60+ v3.3 bugs waiting for a volunteer... Thank you, Alex.
Re: int to intXX_t conversion
On 02/01/2013 12:06 PM, Kinkie wrote: > BTW: c++11 defines intXX_t as part of the core language; I believe > that to be an indication that the language will evolve in that > direction. I sure hope C++ does not evolve in the direction of uint64_t! :-) Just because something very useful is added, does not mean it should be used by default instead of something old (but also very useful). >> - Saying "I did not think about the number of bits" is better than >> implying "I did think about the number of bits" when, in fact, I did >> not. > In fact, blindly search & replacing and only thereafter think about > the opportunity of up- or downsizing some variables is probably the > most effective strategy. If the goal is to replace "int" with "int32_t", then yes. If the goal is to get something out of those changes, then I think such blind search-and-replace would do a lot more harm than good, especially since internally "int" is "int32_t" on all platforms we know about. In other words, if the plan is: 1. Blindly replace all int with int32_t. 2. Think and replace int32_t with something better where needed. Then I strongly recommend starting with step #2 and replacing "int" with something better where needed and only where needed. > One further pro I can think of, it provides further clues to automated > code checking tools about what we expect the needed size of a variable > to be, To be more precise, it lies that we expect something specific. After step #1, we do not magically start expecting something. We still "did not think about it" but now we claim to others that we have. >>> It is of course a matter of opinion in the end. It seems that I am >>> being more aggressive than the average with legacy code, especially >>> for "search and replace" things which don't require much brain power >>> such as this one. >> >> Is "int" really a "legacy", outdated concept not recommended for new >> code by C++ gurus? From my non-guru point of view, "int" is preferred >> when we do not want to think about exact bit sizes (whether that "not to >> think" decision is correct or not is a separate question, but the type >> accurately reflects that decision). > > legacy and outdated, that's a bit of an extreme statement. > But (according to http://en.cppreference.com/w/cpp/types/integer ) > c++11 defines 24 new integer types of various widths. I think we all agree that new int types are useful. I do not yet agree that all "int" uses should be replaced with one of the new types. >>> Besides StringNG, I'm trying to apply myself to side things such as >>> cleanup, c++-ification, STL-ification (in place of the plethora of >>> linked list implementations we have in place now), etc. >> >> Nothing wrong with that, of course, but I hope I can convince you to add >> bug fixing to the above list :-). I believe Amos mentioned 60+ v3.3 bugs >> waiting for a volunteer... > > I'm still not good enough to produce meaningful results in these > areas, unfortunately. I think replacing many "int"s (with good results) is much harder than fixing an isolated bug. I am sure you are capable of both though! Thank you, Alex.
Re: [RFC] Peek and Splice
On 02/03/2013 01:19 PM, Marcus Kool wrote: > On 02/01/2013 03:00 PM, Alex Rousskov wrote: >> I agree with the general "everything we proxy should be available for >> analysis" principle. Getting to that point would be difficult because >> protocols and APIs such as ICAP, eCAP, external ACL helper, and >> url_rewriter were not designed to deal with "everything". They need to >> be tweaked or extended to work with non-HTTP traffic. We already do that >> in some cases (e.g., FTP) but more is needed to handle "everything". > > And that is exactly why I try to encourage you to implement about it now > since doing this together with the planned change is less work than > moving it to a future project. The overlap between "peek and splice" and "analyze everything" projects is tiny. The "analyze everything" project is very big on its own, with serious coordination overheads to get ICAP and eCAP folks participating if you want to do it right. I would not recommend merging the two projects together (and my team would certainly not be able to do that). > As a bonus it will make Squid one of the very few proxies which > takes virus scanning and content filtering really seriously. If the demand is there, I am sure somebody will eventually make "analyze everything" happen, possibly in multiple small steps. And if several virus scanning and content filtering players come together for that project, we may even pull something comprehensive and compatible with most popular analysis software in that area. Cheers, Alex.
Re: [squid-users] what should squid -z do
On 02/03/2013 08:31 PM, Amos Jeffries wrote: >> On 2/3/2013 8:29 PM, Alex Rousskov wrote: >>> To be consistent with ufs, we should probably change rock behavior from >>> initializing the database to doing nothing if the configured database >>> directory already exists. Like ufs, rock will rely on directory >>> existence for that test (regardless of what may be inside that >>> configured directory). In other words, squid-z is not "validate and >>> initialize" but "initialize if the configured directory is not there". >>> >>> Any objections to that rock change? > Absolutely none. Start by making both perform an existence check before > doing *anything*. > * initialize if it does not exist yet Will do. > * optionally perform validation + repair if something does. This should not be the default behavior, IMO. Since validation may take a long time, and since many startup scripts run squid-z today simply because they look for wrong directories, we should not add an hour-long operation to all those scripts. > UFS seems to scan the directory structure and create missing > sub-directories. Omiting the file corruption tests. We should probably > add these now that they are possible. > > Rock should probably scan the potentially corrupt database and try to > drop/erase individual corrupt entries before deciding whether to > completely re-initialize the DB. Perhaps this validation can be triggered by squid -z -S? I am not volunteering for this project at this time, but I think patches implementing the cache_dir validation in -z mode should be welcomed, especially if motivated by deployment use cases. Thank you, Alex.
Re: /bzr/squid3/trunk/ r12649: Shuffle the traffic mode flags into their own structure
On 02/04/2013 04:12 AM, Amos Jeffries wrote: > +TrafficMode(const TrafficMode &rhs) { operator =(rhs); } > +TrafficMode &operator =(const TrafficMode &rhs) { memcpy(this, &rhs, > sizeof(TrafficMode)); return *this; } Please remove these -- the compiler-generated ones will work just fine (and possibly better). And if you add these, then you have to add an explicit destructor as well. > === added file 'src/anyp/TrafficMode.h' > --- a/src/anyp/TrafficMode.h 1970-01-01 00:00:00 + > +++ b/src/anyp/TrafficMode.h 2013-02-04 09:47:50 + > @@ -0,0 +1,70 @@ > +#ifndef SQUID_ANYP_TRAFFIC_MODE_H > +#define SQUID_ANYP_TRAFFIC_MODE_H > + > +namespace AnyP IMO, this should not be placed in AnyP because it is not some basic code common to many protocols (e.g., URL or base64 encoding). Traffic mode is higher-level code not directly related to any protocol support. If we place this code in AnyP, most of Squid will have to be moved there :-). The "P[rotocol]" in "AnyP" is important. Thank you, Alex.
[PATCH] Make squid -z for cache_dir rock work like UFS, not COSS
On 02/03/2013 10:06 PM, Alex Rousskov wrote: > On 02/03/2013 08:31 PM, Amos Jeffries wrote: >>> On 2/3/2013 8:29 PM, Alex Rousskov wrote: >>>> To be consistent with ufs, we should probably change rock behavior from >>>> initializing the database to doing nothing if the configured database >>>> directory already exists. Like ufs, rock will rely on directory >>>> existence for that test (regardless of what may be inside that >>>> configured directory). In other words, squid-z is not "validate and >>>> initialize" but "initialize if the configured directory is not there". >>>> >>>> Any objections to that rock change? >> Absolutely none. Start by making both perform an existence check before >> doing *anything*. >> * initialize if it does not exist yet > Will do. The attached trunk patch makes squid -z for cache_dir rock work like UFS instead of like COSS. When a startup script runs squid -z by mistake against a cache_dir that is already initialized and full of cached entries, some admins prefer that nothing happens. Rock store now skips reinitialization if both the cache_dir directory and the db file in that directory exist. If one or both are missing, the missing pieces are created. UFS does something similar because it creates missing L1 and L2 directories but does not erase any entries already present in the cache_dir path. COSS, OTOH, re-initializes the existing db (but nobody noticed or complained loud enough, apparently). Rock behavior will now be closer to UFS. To clean a corrupted cache_dir, the admin must remove it before running squid-z. Thank you, Alex. Make squid -z for cache_dir rock work like UFS instead of like COSS. When a startup script runs squid -z by mistake against a cache_dir that is already initialized and full of cached entries, some admins prefer that nothing happens. Rock store now skips reinitialization if both the cache_dir directory and the db file in that directory exist. If one or both are missing, the missing pieces are created. UFS does something similar because it creates missing L1 and L2 directories but does not erase any entries already present in the cache_dir path. COSS, OTOH, re-initializes the existing db (but nobody noticed or complained loud enough, apparently). Rock behavior will now be closer to UFS. To clean a corrupted cache_dir, the admin must remove it before running squid-z. === modified file 'src/fs/rock/RockSwapDir.cc' --- src/fs/rock/RockSwapDir.cc 2013-01-21 07:15:09 + +++ src/fs/rock/RockSwapDir.cc 2013-02-04 19:36:56 + @@ -143,51 +143,63 @@ { const int64_t eLimitLo = map ? map->entryLimit() : 0; // dynamic shrinking unsupported const int64_t eWanted = (maxSize() - HeaderSize)/maxObjectSize(); return min(max(eLimitLo, eWanted), entryLimitHigh()); } // TODO: encapsulate as a tool; identical to CossSwapDir::create() void Rock::SwapDir::create() { assert(path); assert(filePath); if (UsingSmp() && !IamDiskProcess()) { debugs (47,3, HERE << "disker will create in " << path); return; } debugs (47,3, HERE << "creating in " << path); -struct stat swap_sb; -if (::stat(path, &swap_sb) < 0) { +struct stat dir_sb; +if (::stat(path, &dir_sb) == 0) { +struct stat file_sb; +if (::stat(filePath, &file_sb) == 0) { +debugs (47, DBG_IMPORTANT, "Skipping existing Rock db: " << filePath); +return; +} +// else the db file is not there or is not accessible, and we will try +// to create it later below, generating a detailed error on failures. +} else { // path does not exist or is inaccessible +// If path exists but is not accessible, mkdir() below will fail, and +// the admin should see the error and act accordingly, so there is +// no need to distinguish ENOENT from other possible stat() errors. debugs (47, DBG_IMPORTANT, "Creating Rock db directory: " << path); const int res = mkdir(path, 0700); if (res != 0) { debugs(47, DBG_CRITICAL, "Failed to create Rock db dir " << path << ": " << xstrerror()); fatal("Rock Store db creation error"); } } +debugs (47, DBG_IMPORTANT, "Creating Rock db: " << filePath); #if SLOWLY_FILL_WITH_ZEROS char block[1024]; Must(maxSize() % sizeof(block) == 0); memset(block, '\0', sizeof(block)); const int swap = open(filePath, O_WRONLY|O_CREAT|O_TRUNC|O_BINARY, 0600); for (off_t offset = 0; offset < maxSize(); offset += sizeof(block)) { if (write(swap, block, sizeof(block)) != sizeof(bl
Re: [PATCH] Bug 3686 - refactor the config options controlling max object size
On 02/05/2013 03:36 AM, Amos Jeffries wrote: > http://bugs.squid-cache.org/show_bug.cgi?id=3686 > +if (static_cast(newMax) > maxSize()) { > +debugs(47, DBG_PARSE_NOTE(2), "WARNING: object 'max-size' option for > " << path << " cannot exceed " << maxSize()); > +max_objsize = maxSize(); > +return; > +} Please adjust this to say "cannot exceed total cache_dir size " or similar to clarify where the limit is coming from. I would also say "ignoring max-size for because it exceeds cache_dir " instead of the "max-size cannot exceed ..." phrase, to tell the admin what Squid is doing instead of just telling them what Squid cannot do. This will make it clear that their max-size option has no effect. In summary: debugs(47, DBG_PARSE_NOTE(2), "WARNING: ignoring max-size for " << path << " because it exceeds total cache_dir size " << maxSize()); > +/// the minimum size of singel object which may be stored here "single" typo I would rephrase as "smaller objects will not be added and may be purged". > -virtual int64_t maxObjectSize() const { return max_objsize; } > +/// The maximum size of single object which may be stored here. > +int64_t maxObjectSize() const; Same rephrasing suggestion as the above. Please do not remove "virtual". It is a useful reminder that this is a part of the Store API. Also, I think it would be appropriate (and useful) to document maxSize() method in this patch because it looks symmetrical to now-documented minSize() but has a very different meaning. > === modified file 'src/peer_digest.cc' > --- src/peer_digest.cc2013-01-30 15:39:37 + > +++ src/peer_digest.cc2013-02-03 13:38:13 + > @@ -347,6 +347,7 @@ > > req->header.putStr(HDR_ACCEPT, "text/html"); > > +// XXX: this is broken when login=PASS, login=PASSTHRU, login=PROXYPASS, > login=NEGOTIATE, and login=*:... > if (p->login) > xstrncpy(req->login, p->login, MAX_LOGIN_SZ); > An unrelated change? > StoreEntry *const pe = addEntry(i); > > +printf("pe->swap_status == %d (SWAPOUT_WRITING=%d)\n", > pe->swap_status, SWAPOUT_WRITING); > + > CPPUNIT_ASSERT(pe->swap_status == SWAPOUT_WRITING); > CPPUNIT_ASSERT(pe->swap_dirn == 0); > CPPUNIT_ASSERT(pe->swap_filen >= 0); Perhaps remove the added debugging to reduce "make check" noise? Or make it conditional on swap_status being not SWAPOUT_WRITING? > +rep->setHeaders(HTTP_OK, "dummy test object", "x-squid-internal/test", > 0, -1, squid_curtime + 10); You may want to add a line to the commit message to note why this change was necessary (and that test cases now have weak/indirect checks for max-size code). All of the above can be done during commit, without another review cycle IMO. Thank you, Alex.
Re: [PATCH] StoreId with couple cosmetics in sync with trunk 1639
On 02/03/2013 01:13 AM, Eliezer Croitoru wrote: > Alex I am sorry but as for now I cannot unify the redirectStateData > objects into one method for both redirect and storeid helpers. Sigh. > +/** > + * The method returns the current storeID of the request. > + * returns canonicalUrl of the request in a case StoreID not present. > + * Always returns null terminated char* . > + */ > +const char * > +HttpRequest::storeId() Please remove duplicate description. You already described this method in the header file. > + * dosn't return NULL since this became the glue between most of > + * store MD5 hash and other important code that a cache object based on. > + */ > +const char *storeId(); s/dosn't/does not/ and please check the rest of the patch for this typo. I also recommend removing everything after "NULL" because the reason is not really important and may change (as the users of this method change). Besides, "other important code" is too broad to be useful anyway. > +/** > + * The ID string used internally by Squid to uniquely de-duplicate this > requests > + * URL with other URLs stored objects. > + */ > +String store_id; I find "uniquely de-duplicate" too puzzling. If you want to briefly define what store ID is here, how about this: If defined, store_id_program mapped requested URL to this ID. Store uses this ID (and not the URL) to find and store entries, avoiding caching duplicate entries when different URLs point to "essentially the same" cachable resource. If you do not want to define the concept itself here, then the first line would suffice IMO. If you do not want to change wording, please s/this requests/this request/. > + In the future, the interface protocol will be extended with > + key=value pairs ("kv-pairs" shown above). Helper programs > + should be prepared to receive and possibly ignore additional > + whitespace-separated tokens on each input line. I think this description is stale. We already document what helpers should receive. If you want to say something about optional kv-pairs, say "helpers must ignore any kv-pair with a key they do not support". > + WARNING: > + StoreID forces squid to use a specific String to identify and verify a > + object\resource. The default is the url of the resource and If not > + planned carefully a bad StoreID response from the helper can result > + in a mismatch between the user request to the reply. > + Note that when using StoreID the refresh_patterns will apply to the > + StoreID returned from the helper and not the original URL. I suggest to simplify and polish this: -- When using StoreID, the refresh_patterns will apply to the StoreID returned by the helper and not to the request URL. WARNING: Wrong StoreID value returned by a careless helper may result in wrong cached response returned to the user. --- > +if (strcmp(e->mem_obj->url, http->request->storeId()) != 0) { > +debugs(33, DBG_IMPORTANT, "clientProcessHit: URL mismatch, '" << > e->mem_obj->url << "' != '" << r->storeId() << "'"); > processMiss(); > return; > } Please use the same storeId() expression in the condition and in the debugging statement following that condition. > +/** > + * Here is the place where the snowball of an object > + * creation starts. Here we can see if the object was > + * created using URL or alternative StoreID from helper. > + */ > +debugs(88, 3, "What mem-obj->url contains ?: "<< > http->storeEntry()->mem_obj->url ); > } Please remove the first sentence of that comment. I am not sure which "object" it refers to, but the mem_obj object is created before this comment. In the debugs() line, please replace "What mem-obj->url contains ?: "<< with "mem_obj->url: " << > +const char *storeId() { return (http->store_id.size() > 0 ? > http->store_id.termedBuf() : http->uri); } Please make this method "const". > +String store_id; /* Store ID for transactions where the request member > is nil */ The rest of the code is using StoreID. Please s/Store ID/StoreID/ for consistency sake. Consistency helps when searching sources for keywords. > + * method will be added for each and one of them. s/each and one/each/ > -} else > +} else{ > +debugs(20, 3,"used with StoreID: " << mem_obj->url); > newkey = storeKeyPublic(mem_obj->url, mem_obj->method); > - > +} Please remove this change. It is not consistent with other storeKey*() calls and you already added similar debugging to storeKeyPublic(). > const cache_key * > storeKeyPublic(const char *url, const HttpRequestMethod& method) > { > +debugs(20, 3, "using: " << RequestMethodStr(method) << " " << url); > static cache_key digest[SQUID_MD5_DIGEST_LENGTH]; > unsigned char m = (unsigned char) method.id
Re: [PATCH] StoreId with couple cosmetics in sync with trunk 1639
On 02/06/2013 08:45 PM, Amos Jeffries wrote: > This audit round all appear to be documentation an debugs polish. Any > objection to me applying those changes and comitting? None from me, especially if you can remove code duplication that Eliezer cannot (the still unaddressed issue from the previous review). Thank you, Alex.
Re: [PATCH] Make squid -z for cache_dir rock work like UFS, not COSS
On 02/04/2013 09:15 PM, Amos Jeffries wrote: > On 5/02/2013 8:48 a.m., Alex Rousskov wrote: >> The attached trunk patch makes squid -z for cache_dir rock work like UFS >> instead of like COSS. > +1. Although when comitting can you add a few words to the -z > description in main.cc usage() function and src/squid.8.in man file, so > that it is clear this only a create-if-missing action. Committed to trunk as r12654 with the requested documentation updates. >> -debugs(0, DBG_CRITICAL, "Creating Swap Directories"); >> +debugs(0, DBG_CRITICAL, "Creating missing swap directories"); I also changed the cache.log message to expose the "missing" logic. Please undo the above change if you think that too many scripts rely on matching the old phrase. The manual page now also documents that squid -z runs in daemon mode unless -N is also given. HTH, Alex.
Re: [PATCH] Bug 3686 - refactor the config options controlling max object size
On 02/08/2013 05:54 PM, Amos Jeffries wrote: >>> -virtual int64_t maxObjectSize() const { return max_objsize; } >>> +int64_t maxObjectSize() const; >> Please do not remove "virtual". It is a useful reminder that this is a >> part of the Store API. > Why add virtual table entries? this is not a replaceable function any > more. This is an accessor method for the SwapDir base object. Store, the SwapDir's parent class has the "virtual" keyword for this method. Thus, the method will be virtual in all child classes and will be in the virtual table regardless of whether we say "virtual" explicitly: > Store.h:virtual int64_t maxObjectSize() const = 0; The above Store class API gets implemented by these Store children: > MemStore.h: virtual int64_t maxObjectSize() const; > StoreHashIndex.h: virtual int64_t maxObjectSize() const; > SwapDir.h:virtual int64_t maxObjectSize() const; > SwapDir.h:int64_t maxObjectSize() const; The reason I believe it is better to say "virtual" explicitly in this case is to remind us that maxObjectSize() is a part of the [semi-broken] Store API and not some SwapDir "own" method. This will help us when we polish the API to remove the common Store root. Again, being explicit here does not add or remove any virtual table entries. Hope this clarifies, Alex.
Re: [PATCH] StoreId with couple cosmetics in sync with trunk 12639
On 02/11/2013 04:30 AM, Eliezer Croitoru wrote: >>> I noticed another issue regarding squid with kids process: >>> when used with two cache_dirs(rock+aufs) it uses two kids, is it >>> suppose to be like that? >> Yes Squid splits into a Disker kid which manages all the disk I/O >> processes and a Worker which handles all the HTTP proxying and most >> other tasks. For the record, to avoid any confusion if somebody reads this without reading documentation, it is one disker kid per Rock cache_dir, not one disker kid for all disk I/O. There are more caveats discussed below. > Which I think should be mentioned in a way at the docs of cache_dir > directive which I couldn't understand from. > It's not mentioned that multi cache_dir directives make squid to a SMP > service rather then one process service. I think this depends on what you mean by "SMP service". Even without Rock, Squid creates other processes and/or threads, especially in daemon mode, so you must give disker more weight than those other processes/threads if you think the situation with Rock is special. Note that disker kids do not accept HTTP requests (they are not Workers). The issue is further complicated by the fact that when run in non-daemon mode or when forced to use a non-Ipc disk I/O module, Rock cache_dir may not create diskers at all. This is why "squid -N" still "works" even if you use Rock cache_dir. Here is the corresponding code: > bool > Rock::SwapDir::needsDiskStrand() const > { > const bool wontEvenWorkWithoutDisker = Config.workers > 1; > const bool wouldWorkBetterWithDisker = DiskIOModule::Find("IpcIo"); > return InDaemonMode() && (wontEvenWorkWithoutDisker || > wouldWorkBetterWithDisker); > } I agree that squid.conf.documented can be improved in this area. I added the following paragraph to cache_dir rock description (trunk r12671): > If possible, Squid using Rock Store creates a dedicated kid > process called "disker" to avoid blocking Squid worker(s) on disk > I/O. One disker kid is created for each rock cache_dir. Diskers > are created only when Squid, running in daemon mode, has support > for the IpcIo disk I/O module. As you can tell, the above does not tell the whole story (told by the actual code quoted above), but I hope it is better than nothing. Further improvements welcome! However, if you respond to this, please start a new email thread. Thank you, Alex.
NA - token = fatalf
Hello, I got a complaint from an authentication helper author because his helper crashes Squid with a "NA NT_STATUS_NO_SUCH_USER *\n" response that lack a "token" field or kv-pair. The corresponding fatalf() is in auth/negotiate/UserRequest.cc: > case HelperReply::Error: { > Note::Pointer messageNote = reply.notes.find("message"); > Note::Pointer tokenNote = reply.notes.find("token"); > if (tokenNote == NULL) { > /* protocol error */ > fatalf("authenticateNegotiateHandleReply: *** Unsupported helper > response ***, '%s'\n", reply.other().content()); > break; > } I tried to understand whether this fatalf() is intentional but failed. As far as I can tell, the helper documentation on wiki says these self-contradictory things: NA:Deprecated by ERR result from Squid-3.4 onwards. token: Negotiate authenticator interface requires it on NA responses. token: This field must not be sent on ERR responses. token=...: This field is only used on OK responses. The trunk code appears to say these things: * ERR and NA results are treated the same way * ERR and NA negotiate results require a token kv pair Could somebody with better authentication and helper knowledge clarify whether the token field is indeed required for Nagotiate ERR and NA responses? If not, can we just remove the above quoted fatalf() blob and make the following line conditional on the token presence? > lm_request->server_blob = xstrdup(tokenNote->firstValue()); Thank you, Alex.
[PATCH] NA - token = fatalf
On 02/12/2013 03:33 PM, Henrik Nordström wrote: > tis 2013-02-12 klockan 14:41 -0700 skrev Alex Rousskov: >> Could somebody with better authentication and helper knowledge clarify >> whether the token field is indeed required for Nagotiate ERR and NA >> responses? If not, can we just remove the above quoted fatalf() blob and >> make the following line conditional on the token presence? > Squid-2 negotiate expects > > NAblobmessage > > but do not require any of them to be present. Is the attached fix on the right track? It makes the "token" part of the helper response optional and, hence, removes the fatalf() message. No other changes were intended, but this trunk patch is untested. Thank you, Alex. Do not FATAL and quit when handling an NA or ERR negotiate helper response without a challenge token. === modified file 'src/auth/negotiate/UserRequest.cc' --- src/auth/negotiate/UserRequest.cc 2013-01-28 16:56:05 + +++ src/auth/negotiate/UserRequest.cc 2013-02-13 00:30:07 + @@ -310,53 +310,51 @@ * challenge-response nature of the protocol. * Just free the temporary auth_user after merging as * much of it new state into the existing one as possible */ usernamehash->user()->absorb(local_auth_user); /* from here on we are working with the original cached credentials. */ local_auth_user = usernamehash->user(); auth_user_request->user(local_auth_user); } else { /* store user in hash's */ local_auth_user->addToNameCache(); } /* set these to now because this is either a new login from an * existing user or a new user */ local_auth_user->expiretime = current_time.tv_sec; auth_user_request->user()->credentials(Auth::Ok); debugs(29, 4, HERE << "Successfully validated user via Negotiate. Username '" << auth_user_request->user()->username() << "'"); } break; case HelperReply::Error: { -Note::Pointer messageNote = reply.notes.find("message"); -Note::Pointer tokenNote = reply.notes.find("token"); -if (tokenNote == NULL) { -/* protocol error */ -fatalf("authenticateNegotiateHandleReply: *** Unsupported helper response ***, '%s'\n", reply.other().content()); -break; -} - /* authentication failure (wrong password, etc.) */ + +Note::Pointer messageNote = reply.notes.find("message"); auth_user_request->denyMessage(messageNote->firstValue()); auth_user_request->user()->credentials(Auth::Failed); + safe_free(lm_request->server_blob); -lm_request->server_blob = xstrdup(tokenNote->firstValue()); +Note::Pointer tokenNote = reply.notes.find("token"); +if (tokenNote != NULL) +lm_request->server_blob = xstrdup(tokenNote->firstValue()); + lm_request->releaseAuthServer(); debugs(29, 4, HERE << "Failed validating user via Negotiate. Error returned '" << reply << "'"); } break; case HelperReply::Unknown: debugs(29, DBG_IMPORTANT, "ERROR: Negotiate Authentication Helper '" << reply.whichServer << "' crashed!."); /* continue to the next case */ case HelperReply::BrokenHelper: { /* TODO kick off a refresh process. This can occur after a YR or after * a KK. If after a YR release the helper and resubmit the request via * Authenticate Negotiate start. * If after a KK deny the user's request w/ 407 and mark the helper as * Needing YR. */ Note::Pointer errNote = reply.notes.find("message"); if (reply.result == HelperReply::Unknown) auth_user_request->denyMessage("Internal Error"); else if (errNote != NULL) auth_user_request->denyMessage(errNote->firstValue());
Re: NA - token = fatalf
On 02/12/2013 06:41 PM, Amos Jeffries wrote: > On 13/02/2013 11:33 a.m., Henrik Nordström wrote: >> tis 2013-02-12 klockan 14:41 -0700 skrev Alex Rousskov: >>> Hello, >>> Could somebody with better authentication and helper knowledge clarify >>> whether the token field is indeed required for Nagotiate ERR and NA >>> responses? If not, can we just remove the above quoted fatalf() blob and >>> make the following line conditional on the token presence? >> Squid-2 negotiate expects >> >> NAblobmessage >> >> but do not require any of them to be present. It accepts >> >> NA >> >> as valid response. >> >> NTLM is slightly different and do not expect a blob. Simply >> >> NAmessage >> >> where message may be blank. >> >> Regards >> Henrik >> > > Squid-3 should be identical. The token is required for Authenticate-Info > to supply client with keytab identification in the reponse headers. Can you clarify (for those who do not know enough about authentication) whether the token is required only if the admin wants some optional functionality to work? Like some optional header to be sent by Squid to the user? Or is it required for the error page to be reliably displayed to the authenticating user at all? My understanding is that the admin does not want authentication to continue in this NA/ERR case. > A > missing token= on the Negotiate response normally indicates that an NTLM > helper has been wrongly configured on the Negotiate auth interface. > Markus' negotiate_wrapper helper presents a dummy token when mapping > NTLM responses to Squid. I do not know whether this particular helper is misconfigured. And I cannot tell by reading the wiki page (for the reasons discussed earlier). I believe the helper works OK otherwise. In the NA case, the helper returns something like NA NT_STATUS_NO_SUCH_USER *\n Is that a valid NA response? If the token is required for NA/ERR responses, then the helper is broken because it does not send one. If the token is not required, then Squid is broken because it requires one. Henrik says the token is not required [in Squid2]. > Yes you can remove these fatal() if you want, but it needs to result in > authentication failure Will removal of fatal(), as in the posted patch, result in authentication failure? I assume it will, but, again, I am not an authentication expert. > and squid.conf ERROR messages if you do so. Do you mean squid.cache ERROR message? At level 1? Does that imply that token is required? I doubt the admin would want tons of ERROR messages in the cache.log so they would probably have to fix their helper instead. But we should not force them to fix the helper if there is nothing wrong with it. > The > code for triggering all that side-effect is in the BrokenHelper use case > which might need to become a separate error handling method. This also > goes for the other identical fatal("Unsupported helper response") in > Negotiate auth which would be worth removing in the same way. Tthis seems to imply that the helper is broken because the token should be required in all NA/ERR responses. Is that correct? Thank you, Alex.
Re: [PATCH] Refactor mime.cc
On 02/10/2013 10:02 AM, Kinkie wrote: > the attached patch aims to c++-ify and standardize a bit more mime.cc > and the auxiliary classes it contains. > +MimeIcon(const char *aName); Missing "explicit" keyword to prevent accidental conversions of c-strings to MimeIcons. > private: > +MimeIcon(); > +private: > +MimeEntry(); Both not needed: If you remove both declarations, it would still be impossible to create MimeIcon or MimeEntry using a default constructor because there will be no default constructor generated by the compiler since you provide explicit [parameterized] constructors for both classes. Thank you, Alex.
Re: [RFC] deprecate hierarchy_stoplist
On 02/12/2013 06:17 PM, Amos Jeffries wrote: > The hierarchy_stoplist directive has been a rarely needed and redundant > directive for some time now. > > It's useful operation is also performed by this following config: > > acl stoplist url_regex cgi-bin \? > always_direct allow stoplist > > Any objections? None from me if there is indeed an equivalent (or better) option. Matching url_regex would be slower than current strstr(), but hopefully the difference is minor. Thank you, Alex.
Re: NA - token = fatalf
On 02/12/2013 08:48 PM, Amos Jeffries wrote: > On 13/02/2013 3:00 p.m., Alex Rousskov wrote: >> On 02/12/2013 06:41 PM, Amos Jeffries wrote: >>> On 13/02/2013 11:33 a.m., Henrik Nordström wrote: >>>> Squid-2 negotiate expects >>>> >>>> NAblobmessage >>>> >>>> but do not require any of them to be present. >>>> NTLM is slightly different and do not expect a blob. Simply >>>> >>>> NAmessage >>>> >>>> where message may be blank. >> In the NA case, the >> helper returns something like NA NT_STATUS_NO_SUCH_USER *\n >> Is that a valid NA response? > > Looks like it should be correct. > > Yes ERR token="NT_STATUS_NO_SUCH_USER" message="*" The admin does not want to emit the token AFAICT. The admin wants NT_STATUS_NO_SUCH_USER to be the message. My understanding is that if they change the response to new ERR message="NT_STATUS_NO_SUCH_USER" then a patched Squid will work, but I do not yet know whether they can change the helper. A bigger question, I think, is whether the quoted tokenless NA response used to work. If it used to work, we may have to preserve the old behavior even if this particular admin can change the helper script. > I spy the HelperReply.cc parse() "NA " case is not converting the > responses. AF is doing so, but not NA. Neither is the Negotiate switch > case. This would seem to be the main bug. You may be right, but I cannot propose a fix because I do not know what the correct/supported format(s) are. Thus, I am focusing on preventing crashes and hoping that somebody who knows what old helpers produced can fix the parser to accommodate those cases and fix the wiki page. > IIRC that was left because there was confusion over how to handle > multi-word messages and optional token prefix. It feels like that confusion still exists, but I do not know enough to help, unfortunately. > Questions are: > do we assume that there is no token on NA and send the whole string as > visible error message? The proposed patch will do that. I think that is a reasonable assumption going forward, but I do not know whether it is compatible with old helpers. > Do we try to check the first word and see if it is possibly a token and > use it as one when is found? How can we distinguish a token from a message when kv-pairs are not used? > Or, always assume the first word is a token and send it as one? that > would probably break NTLM so we would have to do it in the Negotiate > swicth case. But would allow us to send the catenated token+message for > error display in case we got it wrong and a FUD token in Negotiate > should not hurt. > > ... and yes these are blind guess ideas. I have not investigated any > side effects. I think there are at least two distinct areas here: 1) Old helpers (presumably not modifiable): We should do our best to support them. We need to know (and document) what their assumptions were to support them. I do not know that. 2) New helpers (presumably still modifiable): We should require kv-pair based format for tokens and messages for all new result codes. We may add new result codes to resolve ambiguous cases in (1), if any. My understanding is ERR is the only new result code right now. >> If the token is required for NA/ERR responses, then the helper is broken >> because it does not send one. If the token is not required, then Squid >> is broken because it requires one. Henrik says the token is not required >> [in Squid2]. > > It seems to be sending token "NT_STATUS_NO_SUCH_USER". It is trying to send NT_STATUS_NO_SUCH_USER message. > Henrick only said these two were valid: > NA \n > NA token message\n Henrik also said "do not require any of them to be present" but it is not clear to me whether "NA foo\n" means that foo is a token or message. > Although I would go so far as to add this one is probably valid also: > NA token\n or NA message\n ? Thank you, Alex.
Re: NA - token = fatalf
On 02/12/2013 03:33 PM, Henrik Nordström wrote: > Squid-2 negotiate expects > > NAblobmessage > > but do not require any of them to be present. It accepts > > NA > > as valid response. How will Squid-2 negotiate interpret the following? NAsomething-without-spaces Will it assume that something-without-spaces is a blob? A message? Thank you, Alex.
Re: /bzr/squid3/trunk/ r12683: mime.cc post-merge cleanup
On 02/13/2013 03:27 PM, Francesco Chemolli wrote: > > revno: 12683 > +explicit MimeEntry(const char *aPattern, const regex_t &compiledPattern, Hi Kinkie, You overdid it :-). The "explicit" keyword only makes sense with single-parameter constructors. All other constructors cannot lead to implicit type conversions. HTH, Alex.
Re: [PATCH] support parameters for no-cache and private
On 02/16/2013 06:51 PM, Amos Jeffries wrote: > Since we now support HTTP/1.1 storage and revalidation of > Cache-Control:no-cache it is important that we at least detect the cases > where no-cache= and private= contain parameters. The patch appears to do more than just detection -- Squid starts to ignore CC:no-cache and, in some cases, CC:private (with or without parameters, depending on circumstances). > AFAIK these are still rare occurances due to the historic lack of > support. So for now Squid just detects and exempts these responses from > the caching performed. The basic framework for adding future support of > these HTTP/1.1 features is made available > +} else if (/* p &&*/ !httpHeaderParseQuotedString(p, > (ilen-nlen-1), &private_)) { > +debugs(65, 2, "cc: invalid private= specs near '" << item << > "'"); > +clearPrivate(); > +} else { It feels wrong to clear and ignore a private CC directive that we know is there, even though we cannot parse its value. Should we be conservative and treat this as private without parameters instead of possibly violating HTTP by ignoring CC:private (e.g., if our parser is wrong)? Also, if this code stays, please reshape it to handle the successful parsing case (with side-effect of setting private_) first, for clarity: if (!p) { private_.clean(); // last CC:private wins setMask(true, true); } else if (parse(private_)) { setMask(true, true); // the above [re]sets private_ } else { clearPrivate(); // ignore this and previous CC:private } Same concerns for no-cache, of course. > +void Private(String &v) {setMask(CC_PRIVATE,true); private_.append(v);} > // uses append for multi-line headers Would not append() merge two header names together without delimiters if the directive is split between two header fields? For example, would not the private_ value become "foobar" in the example below? Foo: 1 Cache-Control: private="foo", private="bar" Bar: 2 For the future code to work, the values from multiple directives should be concatenated using ",". However, it may be better to store them in a list or map of some kind instead (since that future code will have to check individual header names for a match anyway). Thus, the best structure for this may be rather different. This doubles my concern that we are adding code that should be essentially unused (IMO), that adds overhead, but that may have to be fixed/rewritten further when it is used. Also, I think the code will mishandle this case: Foo: 1 Cache-Control: private Cache-Control: private="bar" Bar: 2 because it will treat the above as the following less restrictive case: Foo: 1 Cache-Control: private="bar" Bar: 2 instead of treating it as Foo: 1 Cache-Control: private Bar: 2 Does HTTP say that [later] less restrictive CC directives overwrite [earlier] more restrictive ones? > +// CC:no-cache (not only if there are no parameters) > +const bool CcNoCacheNoParams = (rep->cache_control->hasNoCache() > && rep->cache_control->noCache().undefined()); The comment says "not only if there are no parameters" but the boolean condition says "only if there are no parameters". > +// CC:private (if stored) > +const bool CcPrivate = rep->cache_control->hasPrivate(); Does "if stored" apply to all other CC directives in this context as well? Please start local variable names such as CcPrivate with a small letter. > // NP: request CC:no-cache only means cache READ is forbidden. STORE > is permitted. > +if (rep->cache_control->hasNoCache() && > rep->cache_control->noCache().defined()) { > +/* TODO: we are allowed to cache when no-cache= has parameters. > + * Provided we strip away any of the listed headers unless they > are revalidated > + * successfully (ie, must revalidate AND these headers are > prohibited on stale replies). > + * That is a bit tricky for squid right now so we avoid caching > entirely. > + */ > +debugs(22, 3, HERE << "NO because server reply > Cache-Control:no-cache has parameters"); > +return 0; > +} The new comment and behavior seems inconsistent, especially in light of the old "NP" comment that is left intact: It seems like we should store (or not) regardless of the header names presence. Why does presence of header names make a difference in this case? > // NP: given the must-revalidate exception we should also be > able to exempt no-cache. > // HTTPbis WG verdict on this is that it is omitted from the > spec due to being 'unexpected' by > // some. The caching+revalidate is not exactly unsafe though > with Squids interpretation of no-cache > -// as equivalent to must-revalidate in the reply. > -} else if (rep->cache_control->noCache() && > !REFRESH_OV
Re: [PATCH] Prevent external_acl.cc "inBackground" assertion on queue overloads
On 01/11/2013 05:39 PM, Alex Rousskov wrote: > Hello, > > I think there is an off-by-one error in the external_acl.cc code > that kills Squid: > >> 2013/01/11 15:13:18.037| external_acl.cc(843) aclMatchExternal: >> "http://example-3.com/req=debug,uniq4,abort_at=AfterReqHs/resp=debug/": >> queueing a call. >> 2013/01/11 15:13:18.037| external_acl.cc(845) aclMatchExternal: >> "http://example-3.com/req=debug,uniq4,abort_at=AfterReqHs/resp=debug/": >> return -1. >> 2013/01/11 15:13:18.037| Acl.cc(321) checklistMatches: >> ACL::ChecklistMatches: result for 'dummyfilter' is -1 >> 2013/01/11 15:13:18.037| Acl.cc(339) matches: ACLList::matches: result is >> false >> 2013/01/11 15:13:18.037| Checklist.cc(275) matchNode: 0x99fe8a8 matched=0 >> async=1 finished=0 >> 2013/01/11 15:13:18.037| Checklist.cc(312) matchNode: 0x99fe8a8 going async >> 2013/01/11 15:13:18.037| Acl.cc(61) FindByName: ACL::FindByName 'dummyfilter' >> 2013/01/11 15:13:18.037| Checklist.cc(131) asyncInProgress: >> ACLChecklist::asyncInProgress: 0x99fe8a8 async set to 1 >> 2013/01/11 15:13:18.037| external_acl.cc(1407) Start: fg lookup in >> 'dummyfilter' for >> 'http://example-3.com/req=debug,uniq4,abort_at=AfterReqHs/resp=debug/' >> 2013/01/11 15:13:18.037| external_acl.cc(1450) Start: 'dummyfilter' queue is >> too long >> 2013/01/11 15:13:18.037| assertion failed: external_acl.cc:1451: >> "inBackground" > > > The enqueue check for external ACL lookups is inconsistent with the > final queue length check in ExternalACLLookup::Start(). The former > allows adding to the already full (but not yet overflowing) queue while > the latter rightfully(?) asserts that the queue should not overflow. Amos, We discussed future steps to remove the whole queuing mess, and it looks like we were in agreement on what needs to be done there. Meanwhile, any objections to committing this specific bug fixing patch? Thank you, Alex.
Re: [PATCH] debug prints
On 02/18/2013 04:39 AM, Amos Jeffries wrote: > On 10/08/2012 1:57 p.m., Amos Jeffries wrote: >> On 9/08/2012 12:12 a.m., Alexander Komyagin wrote: >>> Following patch fixes nested debug() calls problem. >>> Since debugs() is a macros, it should not change static Debugs::level >>> before putting debug message to the internal stream. Otherwise we >>> encounter problems when debug message itself contains calls to debugs(). >>> >>> --- src/Debug.h2012-03-07 05:42:55.0 +0300 >>> +++ src/Debug.h2012-08-08 14:49:20.0 +0400 >>> @@ -106,8 +106,9 @@ >>> /* Debug stream */ >>> #define debugs(SECTION, LEVEL, CONTENT) \ >>> do { \ >>> -if ((Debug::level = (LEVEL)) <= Debug::Levels[SECTION]) { \ >>> +if ((LEVEL) <= Debug::Levels[SECTION]) { \ >>> Debug::getDebugOut() << CONTENT; \ >>> +Debug::level = (LEVEL); \ >>> Debug::finishDebug(); \ >>> } \ >>> } while (/*CONSTCOND*/ 0) >>> >> >> Looks okay. How much testing has this had? >> >> Amos > > Applied as trunk rev.12698. Sorry this took so long to go in. Please note that debugs() is not the only macro that changes Debug::level before doing anything else. The old do_debug() does that as well. debugs() still changes Debug::sectionLevel before we do anything else so there is inconsistency now with regard to level and sectionLevel. Finally, this change makes current debugging level unavailable to CONTENT. I know some CONTENT uses Debug::sectionLevel. I do not see any CONTENT using Debug::level, so this may not be a problem today. I cannot tell exactly what problems this change is fixing (the commit message does not explain), so the fix may very well be worth the side-effects, but it looks like the debugging code became messier after this change. I wonder if the fix is worth it and whether there is a better way to fix that problem? Thank you, Alex.
Re: [PATCH] support parameters for no-cache and private
On 02/18/2013 08:20 PM, Amos Jeffries wrote: > The HTTP semantic meaning of these three cases is very different between > themselves but the same syntax<->meaning pattern for both private and > no-cache. > > CC:private ... means *whole* response is private, MUST NOT cache. > > CC:private="foo" ... means the specific listed *headers* are private, > response itself MAY be cached. Agreed. And until Squid learns to delete private headers, the whole response is private in both cases (otherwise we will be "leaking" private headers and violating HTTP). > CC:private=^%%$ ... (or other breakage) means invalid options and SHOULD > be ignored. Debatable. If the breakage is in our code, then we should not ignore CC:private. If the breakage is in senders code, the correct action cannot be determined, so it is best to be on the safe side and assume private. Thus, in both cases, we should act as if CC:private was received. Yes, RFC 2616 talks about ignoring CC directives, but only in the context of _extensions_ (Section 14.9.6), not in the context of a standard directive that we recognize but somehow cannot fully parse. > CC:private, private="foo" ... is treated in HTTP as an explicit hack by > the sender - such that caches coping with private="foo" will obey > CC:private="foo" and older 1.0 caches will do "CC:private" handling. Is that an HTTPbis change? AFAICT, RFC 2616 says CC:private applies to the whole response and does not have any language (that I can find) which would imply that CC:private="foo" overwrites that CC:private meaning. I am looking at the following RFC 2616 text: >When a directive appears without any 1#field-name parameter, the >directive applies to the entire request or response. When such a >directive appears with a 1#field-name parameter, it applies only to >the named field or fields, and not to the rest of the request or >response. The text is sloppy, but I think the above considers two cases (just CC:private and just CC:private="foo"), without explicitly covering the combination case. My interpretation of the combination case does not contradict the above rules. If overwriting was the intent in the combination case, it should have been documented explicitly. Does HTTPbis do a better job at documenting this? Note that the above applies to CC:no-cache as well. There is another RFC 2616 paragraph that shows how an _extension_ directive can be combined with CC:private to alter CC:private meaning for the recipients in-the-know, but that example is different from what we are discussing here because RFC never says that CC:private="foo" alters another CC:private meaning. I wonder whether RFC 2616 authors even considered a parameter+parameterless combination for the same CC directive. > This patch is supposed to pull the parameters into a string for later > re-use and treat it as CC:private (MUST NOT cache). I think the patch does the above for CC:private (except for corner cases discussed elsewhere) but not for CC:no-cache. Here are a few places where CC parameters change the behavior in various ways even though we do not support them yet: > -} else if (rep->cache_control->noCache() && ... > +} else if (rep->cache_control->hasNoCache() && > !rep->cache_control->noCache().defined() && ... > -rep->cache_control->noCache() || > +const bool CcNoCacheNoParams = (rep->cache_control->hasNoCache() > && rep->cache_control->noCache().undefined()); >> Also, I think the code will mishandle this case: >> >>Foo: 1 >>Cache-Control: private >>Cache-Control: private="bar" >>Bar: 2 >> >> because it will treat the above as the following less restrictive case: >> >>Foo: 1 >>Cache-Control: private="bar" >>Bar: 2 >> >> instead of treating it as >> >>Foo: 1 >>Cache-Control: private >>Bar: 2 >> >> Does HTTP say that [later] less restrictive CC directives overwrite >> [earlier] more restrictive ones? > > Yes. These two directives directly contradict each other. I do not think they do. One is simply "larger" than the other. Nowhere HTTP says that CC:private="foo" means that the request can be cached [no matter what other headers it may have] so there is no contradiction, just duplication of information. >>> // NP: request CC:no-cache only means cache READ is >>> forbidden. STORE is permitted. >>> +if (rep->cache_control->hasNoCache() && >>> rep->cache_control->noCache().defined()) { >>> +/* TODO: we are allowed to cache when no-cache= has >>> parameters. >>> + * Provided we strip away any of the listed headers >>> unless they are revalidated >>> + * successfully (ie, must revalidate AND these headers >>> are prohibited on stale replies). >>> + * That is a bit tricky for squid right now so we avoid >>> caching entirely. >>> + */ >>> +debugs(22, 3, HERE << "NO because server reply >>> Cache-Control:no-cach
Re: [PATCH] support parameters for no-cache and private
On 02/16/2013 06:51 PM, Amos Jeffries wrote: > Please run this past Co-Advisor to confirm the private="..." and > no-cache="..." cases are now all "Precondition Failed". FYI: Co-Advisor crashes current Squid trunk ("Received Segment Violation...dying.") so we will need to triage and fix that before we can test the CC changes. This may take a few days. Meanwhile, if you have an updated patch for testing, please post. Thank you, Alex.
Re: [RFC] DNS system upgrades
On 02/19/2013 05:08 PM, Amos Jeffries wrote: > A few things are on my radar that need improving in the Squid DNS > components. I am proposing these as a batch, any which we can agree on > will be scheduled for fixing in 3.4. > > > 1) renaming all DNS options to begin with dns_* prefix. > > This will allow the cfgman documentation to collate these options > together as a batch in future. > Also clarify what component in Squid some of the more obscure options > relate to. > Also, allow for some future upgrades compacting these type of options > into a single component directive / command line interpreter simplicity. > > Options affected that I'm aware of: > append_domain > ignore_unknown_nameservers > hosts_file > cache_dns_program ... (stripping the cache_ part.) > > > NP: at this point I am on the fence about adding the prefix to fqdncache > and ipcache options and *not* proposing a change to them. > > > > 2) adapting append_domains from a string type to a custom type > > This will allow us to do additional configuration validation. Such as > identifying whether resolv.conf or squid.conf was used as data source. > > > * auto-enablling dns_defnames search list > > /etc/resolv.conf contains two similar but different directives for > labelling the local domain name. > > The "search" directive in particular signals DNS searching of multiple > domains to the default host resolver. But Squid still requires explicit > "dns_defnames on" in squid.conf to enable that behaviour. As a result we > have administrators seeing a 'bad' difference between internal and > dnsserver when they try upgrading to internal DNS. > > I propose using the resolv.conf hint to enable dns_defnames searching > automatically in the absence of explicit squid.conf configuration. By "explicit squid.conf configuration", you mean the dns_nameservers option, right? In other words, if dns_nameservers is given, the entire /etc/resolv.conf is ignored. Otherwise, both "nameserver" and "search" directives in /etc/resolv.conf are honored, correct? > Administrators who do not want it are supposed to be using the > alternative "domain" directive, when they use tools like host or > nslookup to debug they should see consistent behaviour (diverting them > away from Squid to the actual DNS issues in resolv.conf settings), and > this will prevent future confusion such as appears to be happening in > bug 3585. > 3) removal of dnsserver and related code. > > IRIC the argument for keeping it previously was NetBIOS or WINS name > lookup still being needed (though I am suspicious the dns_defnames issue > was the actual cause of this). > > - NetBIOS was deprecated in 2000, to be replaced by DNS > - WINS was deprecated in 2012, to be replaced by IPv6 > auto-configuration / adhoc services. > > > I am okay to see this delayed a few more squid series to give WINS > longer to die, but it is time we looked again at why it is still there. > > Since the last round of discussion we adjusted the internal engine to > fix most of the remaining issues. The above dns_defnames behaviour > change is the last bit I am aware of that is needed for a seamless > upgrade, short of full-blown NetBIOS support integration which is not > very feasible. All three items above sound good to me, but this is not my area of expertise so I hope others will chime in. The WINS-related decision may be worth discussing on squid-users as well. Thank you, Alex.
Re: [PATCH] support parameters for no-cache and private
On 02/19/2013 06:12 PM, Alex Rousskov wrote: > On 02/16/2013 06:51 PM, Amos Jeffries wrote: >> Please run this past Co-Advisor to confirm the private="..." and >> no-cache="..." cases are now all "Precondition Failed". > > FYI: Co-Advisor crashes current Squid trunk ("Received Segment > Violation...dying.") so we will need to triage and fix that before we > can test the CC changes. This may take a few days. Hi Amos, I think I know where the problem is. For all responses without a CC header, the cache_control member is nil in the following patch code dereferencing it: > === modified file 'src/http.cc' > --- src/http.cc 2013-02-16 11:42:53 + > +++ src/http.cc 2013-02-17 01:41:03 + > @@ -362,6 +362,16 @@ > } > > // NP: request CC:no-cache only means cache READ is forbidden. STORE > is permitted. > +if (rep->cache_control->hasNoCache() && > rep->cache_control->noCache().defined()) { Please fix. Thank you, Alex.
Re: [RFC] DNS system upgrades
On 02/20/2013 01:46 AM, Amos Jeffries wrote: > On 20/02/2013 8:45 p.m., Alex Rousskov wrote: >> On 02/19/2013 05:08 PM, Amos Jeffries wrote: >>> 2) adapting append_domains from a string type to a custom type >>> >>> This will allow us to do additional configuration validation. Such as >>> identifying whether resolv.conf or squid.conf was used as data source. >>> >>> >>> * auto-enablling dns_defnames search list >>> >>> /etc/resolv.conf contains two similar but different directives for >>> labelling the local domain name. >>> >>> The "search" directive in particular signals DNS searching of multiple >>> domains to the default host resolver. But Squid still requires explicit >>> "dns_defnames on" in squid.conf to enable that behaviour. As a result we >>> have administrators seeing a 'bad' difference between internal and >>> dnsserver when they try upgrading to internal DNS. >>> >>> I propose using the resolv.conf hint to enable dns_defnames searching >>> automatically in the absence of explicit squid.conf configuration. >> By "explicit squid.conf configuration", you mean the dns_nameservers >> option, right? In other words, if dns_nameservers is given, the entire >> /etc/resolv.conf is ignored. Otherwise, both "nameserver" and "search" >> directives in /etc/resolv.conf are honored, correct? > > No I meant "dns_defnames on" is needed explicitly in squid.conf to > enable DNS search behaviour the search settings loaded from resolv.conf > should have meant in the first place. The default squid.conf does not > honour the search part of teh setting. Will enabling dns_nameservers disable any use of /etc/resolv.conf? I think there should be an easy way for an admin to disable all /etc/resolv.conf use and rely on squid.conf settings exclusively. Use of dns_nameservers may be a good trigger for that. In other words, I do not think Squid should pick up search clues from /etc/resolv.conf when the admin explicitly configured dns_nameservers. It feels like that would lead to messy configurations where the admin will not know for sure where the information is coming from. We should warn when options contradict each other. If there is a conflict, I think our preference should be towards "works as expected" rather than "external DNS works as internal DNS (and so it is easy to switch from former to the latter)". However, again, this is not my area so perhaps a hodgepodge of /etc/resolv.conf info and dns_nameservers info is better. Thank you, Alex. > What you point at is another behaviour I had forgotten. > > 5) We should make etc/resolv.conf provide the defaultsfor dns_defnames, > dns_nameservers, append_domain squid.conf options and have them override > resolvconf. > > The way append_domain and dns_nameservers work that appears to have been > the intention originally. > > >> >>> Administrators who do not want it are supposed to be using the >>> alternative "domain" directive, when they use tools like host or >>> nslookup to debug they should see consistent behaviour (diverting them >>> away from Squid to the actual DNS issues in resolv.conf settings), and >>> this will prevent future confusion such as appears to be happening in >>> bug 3585. >> >>> 3) removal of dnsserver and related code. >>> >>> IRIC the argument for keeping it previously was NetBIOS or WINS name >>> lookup still being needed (though I am suspicious the dns_defnames issue >>> was the actual cause of this). >>> >>> - NetBIOS was deprecated in 2000, to be replaced by DNS >>> - WINS was deprecated in 2012, to be replaced by IPv6 >>> auto-configuration / adhoc services. >>> >>> >>> I am okay to see this delayed a few more squid series to give WINS >>> longer to die, but it is time we looked again at why it is still there. >>> >>> Since the last round of discussion we adjusted the internal engine to >>> fix most of the remaining issues. The above dns_defnames behaviour >>> change is the last bit I am aware of that is needed for a seamless >>> upgrade, short of full-blown NetBIOS support integration which is not >>> very feasible. >> >> All three items above sound good to me, but this is not my area of >> expertise so I hope others will chime in. >> >> The WINS-related decision may be worth discussing on squid-users as well. > > If we can't findany reasons against that is the next step. Henrik was > quick enough to provide reasons to keep it last time without bothering > many people. > > Amos
[PATCH] Preserve bare backslashes in AF and TT
Hello, It looks like NTLM and possibly Negotiate authentication is broken in trunk because Squid eats the bare backslash that AF responses use to separate authentication domain and user names. With the backslash removed, the merged domainuser name never matches, of course. The attached patch fixes the problem in my tests. Is there a better way to do this? Thank you, Alex. Preserve bare backslashes in AF and TT NTLM/Negotiate helper responses. Bare backslashes separate authentication domain and user name in AF. When authentication helper protocol was enhanced to support annotations, we started to use strwordtok() to parse AF and TT responses, but strwordtok() eats bare backslashes (and does other decoding that might not be safe in AF and AT context?). We now use a yet another custom parser to extract and skip words. I started with reusing strtok() for that, but was too worried that some other code (e.g., annotations parser) might use it at the same time, resulting in conflicts. Besides, strtok() requires hacks when we do not want to tokenize the whole thing. === modified file 'src/HelperReply.cc' --- src/HelperReply.cc 2012-11-28 01:13:21 + +++ src/HelperReply.cc 2013-02-21 01:26:00 + @@ -1,92 +1,122 @@ /* * DEBUG: section 84Helper process maintenance * AUTHOR: Amos Jeffries */ #include "squid.h" #include "ConfigParser.h" #include "HelperReply.h" #include "helper.h" #include "rfc1738.h" #include "SquidString.h" HelperReply::HelperReply(char *buf, size_t len) : result(HelperReply::Unknown), whichServer(NULL) { parse(buf,len); } +/** Like strtok(start, " ") but does not use a static area that makes strtok() + * unsafe when multiple callers might call strtok() in one context. Also + * advances its argument to the next word, so that the strtok(NULL, "") hack + * is not required at the end when we do not want to extract all words. + * Does NOT decode unlike strwordtok() and so is safe for bare backslashes. + * TODO: Make w_space a parameter? Move elsewhere? + */ +static char * +SplitWordAndAdvance(char *&start) +{ +if (!start) +return NULL; + +start += strspn(start, w_space); // skip leading whitespace + +if (!*start) +return NULL; // empty string or just whitespace; no words + +char *word = start; + +start += strcspn(start, w_space); // skip first word + +if (*start) { // there is whitespace after the first word (at least) +*start = '\0'; // terminate the first word +++start; // advance beyond the termination character we just added +} + +return word; +} + void HelperReply::parse(char *buf, size_t len) { // check we have something to parse if (!buf || len < 1) { // for now ensure that legacy handlers are not presented with NULL strings. other_.init(1,1); other_.terminate(); return; } char *p = buf; // optimization: do not consider parsing result code if the response is short. // URL-rewriter may return relative URLs or empty response for a large portion // of its replies. if (len >= 2) { // some helper formats (digest auth, URL-rewriter) just send a data string // we must also check for the ' ' character after the response token (if anything) if (!strncmp(p,"OK",2) && (len == 2 || p[2] == ' ')) { result = HelperReply::Okay; p+=2; } else if (!strncmp(p,"ERR",3) && (len == 3 || p[3] == ' ')) { result = HelperReply::Error; p+=3; } else if (!strncmp(p,"BH",2) && (len == 2 || p[2] == ' ')) { result = HelperReply::BrokenHelper; p+=2; } else if (!strncmp(p,"TT ",3)) { // NTLM challenge token result = HelperReply::TT; p+=3; // followed by an auth token -char *w1 = strwordtok(NULL, &p); +char *w1 = SplitWordAndAdvance(p); if (w1 != NULL) { MemBuf authToken; authToken.init(); authToken.append(w1, strlen(w1)); notes.add("token",authToken.content()); } else { // token field is mandatory on this response code result = HelperReply::BrokenHelper; notes.add("message","Missing 'token' data"); } } else if (!strncmp(p,"AF ",3)) { // NTLM/Negotate OK response result = HelperReply::Okay; p+=3; // followed by: // an optional auth token and user field // or, an optional username field -char *w1 = strwordtok(NULL, &p); -char *w2 = strwordtok(NULL, &p); +char *w1 = SplitWordAndAdvance(p); +char *w2 = SplitWordAndAdvance(p); if (w2 != NULL) { // Negotiate "token user" MemBuf authT
Re: [RFC] DNS system upgrades
On 02/20/2013 06:56 PM, Amos Jeffries wrote: > I'm only proposing for now that dns_defnames directive be enabled *if* > resolv.conf is loaded containing search directive and nothing is in > squid.conf setting it explicitly. Yes, that would make sense to me: If the admin wants to use resolv.conf, we should use all of it by default. >> I >> think there should be an easy way for an admin to disable all >> /etc/resolv.conf use and rely on squid.conf settings exclusively. Use of >> dns_nameservers may be a good trigger for that. >> >> In other words, I do not think Squid should pick up search clues from >> /etc/resolv.conf when the admin explicitly configured dns_nameservers. >> It feels like that would lead to messy configurations where the admin >> will not know for sure where the information is coming from. We should >> warn when options contradict each other. >> >> If there is a conflict, I think our preference should be towards "works >> as expected" rather than "external DNS works as internal DNS (and so it >> is easy to switch from former to the latter)". > ... which to me means Squid always loading resolv.conf first and obeying > its instructions, then overriding particular aspects of that behaviour > with squid.conf settings. I see your point. However, it may be better to simplify expectations when admin starts tweaking things by using an "either resolv.conf or squid.conf" principle. It would be unusual and unnatural, IMHO, to specify domain names manually in squid.conf but then rely on resolv.conf for "search" patterns so I am guessing most admins would not expect that kind of combination. One possible solution to clarify choices and minimize complex dependencies is to group these options. The admin would have to pick one of the following two approaches: # either use these resolvers, with these options: dns_resolution \ resolvers=ip1,ip2,ip3 \ search=tld1,tld2 \ ... # or use resolv.conf, possibly overwriting some of its settings: dns_resolution \ config=/etc/resolv.conf \ search=... \ ... with the following being the default (no overwriting): dns_resolution \ config=/etc/resolv.conf I may be missing some details here, but I hope the above illustrates the overall approach. This is just an idea/suggestion for your consideration. It is not meant to block your proposal in any way. Thank you, Alex.
Re: [PATCH] Preserve bare backslashes in AF and TT
On 02/20/2013 07:26 PM, Amos Jeffries wrote: > On 21/02/2013 2:47 p.m., Alex Rousskov wrote: >> Hello, >> >> It looks like NTLM and possibly Negotiate authentication is broken >> in trunk because Squid eats the bare backslash that AF responses use to >> separate authentication domain and user names. With the backslash >> removed, the merged domainuser name never matches, of course. >> >> The attached patch fixes the problem in my tests. Is there a better way >> to do this? > strwordtok() is a function of our own creation so it is probably best > long term to adjust its API to include a flag for skipping the > slashDecode behaviour instead of creating duplicate code. Please note that the function I added does not duplicate strwordtok() behavior. If anything, it duplicates strtok(), but for good reasons. When two functions cover very different use cases and have rather different semantics (not to mention implementation), I think they both deserve to live. > Example patch attached. Re-formatting is avoided to show clearly the > small amount of change needed. Are we sure that valid user names cannot contain quotes or other symbols that strwordtok() may want to process specially? Today and tomorrow? It seems reasonable to have two tokenizing APIs, one that handles shell-like escaping/quoting (and possibly variable substitution in the future) and another that just extracts words based on whitespace. Their use cases are very different. If you disagree that two functions are better than one here, how about applying the "slashDecode" flag that you want to add to _all_ special strwordtok() powers, naming it simply "decode"? After all, if backslashes are not treated specially, not everything can be successfully quoted anyway. In other words, if input format does not support an escape mechanism such as backslashes it cannot fully support quotes either. > I like the extra documentation you added about statics and use cases. > Can you write something similar about strwordtok() ? Sure, will do (regardless of whether we add a second tokenizing function or not). Thank you, Alex.
Re: [PATCH] support parameters for no-cache and private
On 02/21/2013 06:26 PM, Amos Jeffries wrote: > Adjusted patch to drop the odd NP, rework CC:private operation on broken > parameters, and fix the segfault. Hi Amos, Thank you for addressing most of my concerns. > +} else if (/* p &&*/ !httpHeaderParseQuotedString(p, > (ilen-nlen-1), &private_)) { > +debugs(65, 2, "cc: invalid private= specs near '" << item << > "'"); > +clearPrivate(); > +} This still does not work correctly when there are multiple valid CC:private=".." header fields in a message header because the httpHeaderParseQuotedString() clears the previous private_ contents, right? You can fix this by parsing into a local String variable and then appending that variable contents to the previously stored headers using the now-fixed Private() method. This approach will also eliminate the need to call clearPrivate() on failures and then immediately undo that with setMask(true,true). Same concern for no-cache="..." parsing. > +} else { > +debugs(65, 2, "cc: invalid no-cache= specs near '" << item > << "'"); > +clearNoCache(); > +} I still do not think we should ignore the whole CC:no-cache just because we cannot parse something, especially when there was a perfectly valid parameterless CC:no-cache before we failed to parse something. Instead, we should conservatively treat the whole thing as CC:no-cache. In other words, Cache-control: no-cache Foo: bar Cache-control: no-cache="foo should be treated as Cache-control: no-cache Foo: bar rather than Foo: bar You obviously disagree. Hopefully somebody will break this tie. > // NP: request CC:no-cache only means cache READ is forbidden. STORE > is permitted. > +if (rep->cache_control && rep->cache_control->hasNoCache() && > rep->cache_control->noCache().defined()) { > +/* TODO: we are allowed to cache when no-cache= has parameters. > + * Provided we strip away any of the listed headers unless they > are revalidated > + * successfully (ie, must revalidate AND these headers are > prohibited on stale replies). > + * That is a bit tricky for squid right now so we avoid caching > entirely. > + */ > +debugs(22, 3, HERE << "NO because server reply > Cache-Control:no-cache has parameters"); > +return 0; > +} > + > // NP: request CC:private is undefined. We ignore. > // NP: other request CC flags are limiters on HIT/MISS. We don't > care about here. Please move the new code below all three NPs. As rendered now, the first NP seems to apply to the if-statement, which confused me a lot. It is also good to keep request CC code/comments together so that "other request CC flags" comment makes more sense. On a separate note, the code became more confusing (IMO) because patched Squid now handles CC:no-cache in two places: Here, we handle the parameter case. The caller handles both parameter and parameterless cases by setting ENTRY_REVALIDATE. Fixing that is difficult, but we should at least clarify what is going on. I suggest using the following comment instead of the above TODO: /* We are allowed to store CC:no-cache. When CC:no-cache has no parameters, we must revalidate. The caller does that by setting ENTRY_REVALIDATE. TODO: When CC:no-cache has parameters, strip away the listed headers instead of revalidating (or when revalidation fails). For now, avoid caching entirely because we do not strip on failed revalidations. */ This will point to the other critical piece of code AND explain why parameter case is treated "worse" than the parameterless one despite being "better" from protocol intent point of view. If my "because we do not strip on failed revalidations" explanation is not correct, please let me know. >> Since we now support HTTP/1.1 storage and revalidation of >> Cache-Control:no-cache it is important that we at least detect the >> cases where no-cache= and private= contain parameters and handle them >> properly whenever possible. >> >> AFAIK these are still rare occurances due to the historic lack of >> support. So for now Squid just detects and exempts these responses >> from the caching performed. The basic framework for adding future >> support of these HTTP/1.1 features is made available >> > > Please run this past Co-Advisor to confirm the private="..." and > no-cache="..." cases are now all "Precondition Failed". Confirmed. Please note that CC:private="..." cases were failing precondition before the patch as well. Apparently, somebody fixed CC:private handling some time after r12394 and before your patch. HTH, Alex.
Re: Squid SMP on MacOS
On 02/24/2013 10:02 PM, Amos Jeffries wrote: > I'm trying to get the MacOS builds of Squid going again but having some > problems with shm_open() in the Rock storage unit-tests. > > 1) MacOS defines the max name length we can pass to shm_open() at 30 > bytes. "/squid-testRock__testRockSearch" being 35 or so bytes. > Cutting the definition in testRock.cc down so it becomes > "/squid-testRock_Search" resolves that, but then we hit (2). That TESTDIR name is wrong because it is used for more than just "search" testing. I bet the Rock name mimicked the UFS test name, but the UFS name is wrong too, for the same reason. We should use "cppUnitTestRock" and "cppUnitTestUfs" or something similarly unique and short, I guess. > 2) With the short string above and the current settings sent to > shm_open() in src/ipc/mem/Segment.cc line 73 MacOS shm_open() starts > responding with EINVAL. > theFD = shm_open(theName.termedBuf(), O_CREAT | O_RDWR | O_TRUNC, > S_IRUSR | S_IWUSR); Sounds like some of the five shm_open() flags we are using successfully elsewhere do not work on MacOS. I do not know which flag(s) do not work, and we have no MacOS boxes in the lab, so we cannot experiment or read documentation. I assume shared segment opening fails with similar symptoms when used outside of unit tests (e.g., with a shared memory cache)? If so, please feel free to disable shared memory support on MacOS (do not define HAVE_SHM?) until somebody who needs it can find the right combination of flags. Thank you, Alex.
Re: [squid-users] Store object length
On 02/25/2013 04:45 AM, Amos Jeffries wrote: > This is a code question. Please followup on squid-dev mailing list. The > developers can help you more with this than the Squid users can. Responding on squid-dev only. > On 25/02/2013 8:00 p.m., anita wrote: >> Hi All, >> >> I am trying to write an API that will fetch objects from the squid cache >> given the url and http method. >> When I analyzed the code, I found that this can be achieved with a >> storeGetPublic(url, http_method) to see if the url has an entry in the >> cache. >> This followed by storeClientCopy() call. >> However, the StoreIOBuffer used here has been initialized with a >> lenght of HTTP_REQBUF_SZ (4096 or 4 k) by default. > > Err. Yes storeGetPublic() is the API to do this with. > > Why are you writing a new one? what are you trying to do with Squid? I agree with Amos that these questions are important, for many reasons, ranging from compatibility with official code to being able to get support on squid-dev. To answer your buffer size question: The API does not restrict what buffer sizes the caller should use. Some client side code uses HTTP_REQBUF_SZ, but your code may use a different size. The store_client subscriber will receive callbacks when more data is available. It is a complex, iterative process; not a single "give me the whole response data" call. In theory, even a 1-byte buffer should work, but, besides being too slow, I suspect some Store code assumes that HTTP response headers will fit into the first buffer given to Store by store_client. Thus, it would be unwise to use smallish buffers without fixing Store first. Also, please note that storeGetPublic() returns an entry that is not necessarily fully cached. In the extreme case of collapsed forwarding, it returns an entry that is only expected to be cached when received from the origin server. >> I would like to know how larger objects are stored and retrieved from >> cache? How can I determine the length of the object read in that case? In general, you should not rely on knowing the response size a priori because some store modules may allow storage of objects where the size is not known when swap out begins (that is a good thing) and do not update the recorded size when swap out ends (that would be a performance optimization). I doubt Squid fully supports that today, but eventually it should, so it is better not to make a priori size knowledge a critical precondition in your code (if possible). Please describe what problem you are trying to solve and why you think you need a new API for what is already supported. With some luck, squid-dev folks will be able to help you along the way. Thank you, Alex. >> Version used: Squid ver 3.1.16 >> squid.conf : default configuration file with cache_dir directive >> uncommented. >> >> Any pointers are appreciated. > > When playing with the Squid code please always upgrade to the most > recent you can before starting. Preferrably that would be 3.HEAD which > is our rolling development version. > 3.1 Squid series has been officially deprecated for about five months > now, and even 3.2 is nearing end of life. > > The Squid-3 series is one very long process of converting the code from > C code constructs to C++ objects APIs and pluggable mechanisms. So you > may find it much easier to code later releases, or that your fix has > already been done. > > Amos
[PATCH] Generated SBuf::find() test cases
Hello, The attached patch adds about 700'000 test cases for eight SBuf::find() methods. Needless to say, they are generated rather than manually written. They take a couple of seconds to run. Configuration options allow us to increase or decrease the number of test cases if needed. The code attempts to avoid reporting the same kind of find() bugs over and over again by suppressing reports about similar failures. The definition of "similar" is imperfect and the corresponding categorization code can probably be significantly improved, but it already results in reporting just 28 out of 22949 failures. Here are a few reported test case failures (stringng r9612): > case196: SBuf("a").rfind("a", 0) returns npos instead of 0 > std::string("a").rfind("a", 0) returns 0 > category: hay1.rfind(needle1@B,posB) > case5796: SBuf("2").rfind("", 0) returns 1 instead of 0 > std::string("2").rfind("", 0) returns 0 > category: hay1.rfind(needle0@E,posL) > case5799: SBuf("2").find("", 1) returns 0 instead of 1 > std::string("2").find("", 1) returns 1 > category: hay1.find(needle0@B,posP) > case6236: SBuf("2Q").rfind("Q", 1) returns npos instead of 1 > std::string("2Q").rfind("Q", 1) returns 1 > category: hayN.rfind(needle1@E,posB) > case22048: SBuf("CQGGR4").rfind('G', 2) returns npos instead of 2 > std::string("CQGGR4").rfind('G', 2) returns 2 > category: hayN.rfind(char@E,posL) > case30524: SBuf("UtRGRR").rfind("RR", 3) returns 4 instead of npos > std::string("UtRGRR").rfind("RR", 3) returns npos > category: hayN.rfind(needleN@E,posL) ... > Generated SBuf test cases: 716672 > failed cases: 22949 > reported cases: 28 > Asserting because some cases failed... Needless to say, all of the above could be coming from the same bug or two in SBuf code. 28 test case failures do not imply that there are 28 different bugs (our the categorization code is not that clever!). The large number of generated test cases, the logic to suppress similar reports, and lack of CppUnit knowledge resulted in very limited CppUnit integration. All the generated test cases are treated as one from CppUnit point of view. Perhaps this can be improved in the future. New generated test cases for SBuf methods other than find() and rfind() should be added as well, of course. Class and file names should be polished. There are a few other TODOs in the patch. As you know, SBuf in the StringNG branch has been under Squid Project review for a very long time. While each review iteration did find some bugs, the overall progress has been very slow. I suspect one of the reasons behind these delays is that it take a long time to find the time for review and then to find the time to fix bugs. By adding these test cases to the branch, I hope we can eliminate some of the delays because at least find() bugs can be found automatically, without yet another review iteration. Similar generated test cases for other areas should be added as well, IMO. Kinkie, would you be interested in taking over this code, polishing it as needed, committing it to stringng, and using it for your work? Thank you, Alex. Author: Valentin Rakush Author: Alex Rousskov Generate a [configurable] large number of SBuf::*find() test cases using random strings (and std::string as the source of correct outcomes). This is a Measurement Factory project. === modified file 'src/Makefile.am' --- src/Makefile.am 2013-02-24 18:39:40 + +++ src/Makefile.am 2013-02-25 17:13:45 + @@ -3801,40 +3801,42 @@ $(SSL_LIBS) \ $(top_builddir)/lib/libmisccontainers.la \ $(top_builddir)/lib/libmiscencoding.la \ $(top_builddir)/lib/libmiscutil.la \ $(COMPAT_LIB) \ $(SQUID_CPPUNIT_LIBS) \ $(SQUID_CPPUNIT_LA) \ $(SSLLIB) \ $(KRB5LIBS) \ $(COMPAT_LIB) \ $(XTRA_LIBS) tests_testURL_LDFLAGS = $(LIBADD_DL) tests_testURL_DEPENDENCIES = \ $(REPL_OBJS) \ $(SQUID_CPPUNIT_LA) tests_testSBuf_SOURCES= \ tests/testSBuf.h \ tests/testSBuf.cc \ tests/testMain.cc \ + tests/SBufFindTest.h \ + tests/SBufFindTest.cc \ $(SBUF_SOURCE) \ SBufStream.h \ time.cc \ mem.cc \ tests/stub_debug.cc \ tests/stub_fatal.cc \ tests/stub_HelperChildConfig.cc \ tests/stub_cache_cf.cc \ tests/stub_cache_manager.cc \ tests/stub_store.cc \ tests/stub_store_stats.cc \ tests/stub_tools.cc \ SquidString.h \ String.cc \ wordlist.cc \ MemBuf.cc nodist_tests_testSBuf_SOURCES=$(TESTSOURCES) tests_testSBuf_LDFLAGS = $(LIBADD_DL) tests_testSBuf_LDADD=\ $(SQUID_CPPUNIT_LIBS) \ === added file 'src/tests/SBufFindTest.cc' --- src/tests/SBufFindTest.cc 1970-01-01 00:00:00 +
Re: [PATCH] Generated SBuf::find() test cases
On 02/25/2013 03:45 PM, Amos Jeffries wrote: > On 26/02/2013 11:06 a.m., Alex Rousskov wrote: >> The attached patch adds about 700'000 test cases for eight >> SBuf::find() methods. Needless to say, they are generated rather than >> manually written. They take a couple of seconds to run. Configuration >> options allow us to increase or decrease the number of test cases if >> needed. > Why do we need so many tests? I do not know how many tests are needed. Somebody can study that (if they feel that reducing the number of generated tests is important). More on that below. > Surely a selection of lengths from the middle, and specific scans of all > the edge-boundary tests would be enough? That is what the random tests are doing. One could massage them into using shorter strings (already a parameter), selecting fewer lengths from the middle (a simple code modification), and/or doing fewer edge-boundary tests (a simple code modification), of course. > That can't be more than a few hundred - which is few enough to run each > as a separate CppUnit case. I have not studied whether doing so will reduce the number of bugs found. I only studied that going in the opposite direction (more test cases) does not seem to help much. Please note that it is difficult to know for sure because while many test cases find bugs, it is difficult to isolate independent/different bugs without actually fixing SBuf. Why is "a few hundred" is OK as a separate CppUnit case but "a few hundred thousands" is not OK? I am _not_ trying to argue either way, just curious what the criteria for sizing CppUnit cases are, since I know so little about CppUnit... If more CppUnit cases are needed, I am sure it would be possible to parameterize the current code to generate more specific test cases. After repeating such "focused" generation inside each appropriate CppUnit case, one would end up with multiple CppUnit test cases covering approximately the same input space. > Also, using fuzzing with a random input leaves us open to the rare > occasions when a single byte octet value at a specific location is the > cause of a bug. Think \0 at octet 5 when doing 32-bit chunk comparisions > in some buggy implementation checking against !*p. That type of failure > appears intermittently and inspires confusion rather than a fix. Bugs notwithstanding, all random input is 100% reproducible so there should not be any intermittent behavior originating from the test cases themselves (at least as far as the same execution environment is concerned -- I do not know whether random() implementations yield different sequences on different platforms). HTH, Alex.
Re: [PATCH] Generated SBuf::find() test cases
On 02/25/2013 07:31 PM, Amos Jeffries wrote: > I like what this does already just because it finds bugs, and I'm happy > with generating the actual strings randomly to see if we can fuzz up > some failure somewhere as an on-top kind of check. But I don't see the > need for nearly so many checks of the basic border conditions. I think > we could auto-generate a smaller more deterministic/targeted set of > checks for better results and clarity about what the coverage is exactly. I am sure it is possible to generate fewer checks (and the code already has parameters to control some of that). As for coverage clarity, I would not be surprised if it is more a problem of documenting what is covered rather than the sheer number of test cases. > 1) the weak argument: > 700K '.' characters is a lot to output if we want to make each test a > CPPUNIT_ASSERT*() - assuming it passes the tests. 700K failures would be > a few GB or TB of log. > Whereas a few hundred tests is only a few hundred bytes more build log. I think is is an argument against making each generated test case a CPPUNIT_ASSERT rather than an argument for fewer test cases :-). If CPPUNIT cannot handle a large number of test cases, let's aggregate (like the proposed patch does) instead of testing CPPUNIT limits. > 2) the slightly stronger reason: > Because the SBuf content can be broken down into "fields" of octet > value ranges that should all have consistent behaviour relating to how > they break string implementations: ASCII visible characters, extended > 127+ octets, and ASCII control codes (\1 to \31 and \126), and nil octet > \0. That is only 16 (4^2) sets of permutations needed to cover all the > necessary byte code handling failures of whatever internal > implementation is used (possibly less if we tailor the unit-tests to the > specific implementation). > ** I understand std::string cannot handle most of these octect value > ranges, Why not? AFAIK, std::string can handle any octet. It can even handle wide characters if needed. The proposed patch uses digits and letters, but that is because it makes output easier to read and understand. The only special character I can think of in this context is \0, and only if SBuf accidentally uses some function call that treats it specially. Adding \0 support to generated test cases is a TODO (just requires handling it properly when printing AFAIK). > so this test can only be performed using the printabe > characters. So for now char permutations: 1. > > eg. a search for "b" in "acd" is no better than searching for "b" in > "efg". The important "unit" is that you are searching for a missing > pattern of length 1. Yes, that is how generated test cases are already structured. Search for Placement in the patch to see what needle placements are supported. Of course, one of the primary reasons we are generating test cases is because we are worried that humans will miss bugs by grouping too many slightly different things together. For example, the extension of the above logic would be to declare searches for missing string of any length identical. In theory, they are. In practice, buggy code treats too-long or too-short strings differently. Yes, it is theoretically possible to think of all possible valuable combinations and create a hundred or so corresponding test cases. However, nobody has done that so far, and I would discourage folks from trying because the number of valuable permutations is too large in this case IMO. While the following does not give you the absolute minimum of carefully handcrafted possibilities, I think it is a useful estimate of the problem complexity: SBuf methods * character groups * hay lengths * needle lengths * needle placements * number of needles * search position restrictions. That's already 700+ test cases if each multiplier has just 3 values (and most of them are more than 3 values even if we carefully limit them to the essential ones). > It is only worthwhile testing what you label "placements" once for each > permutation in other factors. Yes, that is what the patch does if I understand you correctly: Placements are tested once for each permutation in other factors. The same is also true for any factor other than placement, of course, so I am not sure I understand you correctly. > To my knowledge the pattern units / placements we need to check for each > of the above permutations is: > - pattern out of range (off end of SBuf) > - exact match > - absent pattern > - pattern too long > - present pattern > - present needle prefix > - present needle suffix > - present and fully out of range prefix > - present ending in-range prefix > - present and fully out of range suffix > - present starting in-range suffix > > ** Placement pattern permutations: 11 I do not understand the difference between some of these (e.g., pattern vs needle). As you probably know, the patch covers these placements: needle not in hay, needle in th
Re: [PATCH] ACL to control TPROXY spoofing
On 02/26/2013 05:17 AM, Steve Hill wrote: >> Code simplicity. An "if(flags.spoof)" test is far faster than even >> constructing a checklist and processing "allow all" in fast-ACL pathway. >> So if the ACL flexibility does not actually have a clear need the speed >> would be better. > Ok. Well I'm a bit on the fence here too. > > I can see some use for the flexibility - the situation I mentioned would > require spoofing to be disabled for requests from the branch offices but > it would probably be desirable to leave spoofing on for the main office. ... > I tend to think that since the ACL isn't constructed and tested in the > default case (and therefore for most people there is no performance > hit), I would err towards increased functionality rather than increased > performance. It sounds like Steve has a reasonable use case where ACLs would help. And he is right that the default should be "no acl" (with appropriate effect) rather than "allow all" ACL so that the feature performance impact on Squid that does not care about these things will be negligible and equivalent to the "if (flags.spoof)" test overheads. If you need a tie breaker, and there is no expert to chime in, I am happy to vote for the ACL control path, with a "no ACL" default :-). Thank you, Alex.
Re: Store Object length
Quoting earlier responses to your question in case you missed them: >> On 25/02/2013 8:00 p.m., anita wrote: >>> I am trying to write an API that will fetch objects from the squid cache >>> given the url and http method. >>> When I analyzed the code, I found that this can be achieved with a >>> storeGetPublic(url, http_method) to see if the url has an entry in the >>> cache. >>> This followed by storeClientCopy() call. >>> However, the StoreIOBuffer used here has been initialized with a >>> lenght of HTTP_REQBUF_SZ (4096 or 4 k) by default. > On 02/25/2013 04:45 AM, Amos Jeffries wrote: >> Err. Yes storeGetPublic() is the API to do this with. >> >> Why are you writing a new one? what are you trying to do with Squid? On 02/25/2013, Alex Rousskov wrote: > I agree with Amos that these questions are important, for many reasons, > ranging from compatibility with official code to being able to get > support on squid-dev. > > > To answer your buffer size question: The API does not restrict what > buffer sizes the caller should use. Some client side code uses > HTTP_REQBUF_SZ, but your code may use a different size. The store_client > subscriber will receive callbacks when more data is available. It is a > complex, iterative process; not a single "give me the whole response > data" call. > > In theory, even a 1-byte buffer should work, but, besides being too > slow, I suspect some Store code assumes that HTTP response headers will > fit into the first buffer given to Store by store_client. Thus, it would > be unwise to use smallish buffers without fixing Store first. > > > Also, please note that storeGetPublic() returns an entry that is not > necessarily fully cached. In the extreme case of collapsed forwarding, > it returns an entry that is only expected to be cached when received > from the origin server. >>> I would like to know how larger objects are stored and retrieved from >>> cache? How can I determine the length of the object read in that case? > In general, you should not rely on knowing the response size a priori > because some store modules may allow storage of objects where the size > is not known when swap out begins (that is a good thing) and do not > update the recorded size when swap out ends (that would be a performance > optimization). I doubt Squid fully supports that today, but eventually > it should, so it is better not to make a priori size knowledge a > critical precondition in your code (if possible). > > Please describe what problem you are trying to solve and why you think > you need a new API for what is already supported. With some luck, > squid-dev folks will be able to help you along the way. > > > Thank you, > > Alex. >>> Version used: Squid ver 3.1.16 >>> squid.conf : default configuration file with cache_dir directive >>> uncommented. >>> >>> Any pointers are appreciated. >> >> When playing with the Squid code please always upgrade to the most >> recent you can before starting. Preferrably that would be 3.HEAD which >> is our rolling development version. >> 3.1 Squid series has been officially deprecated for about five months >> now, and even 3.2 is nearing end of life. >> >> The Squid-3 series is one very long process of converting the code from >> C code constructs to C++ objects APIs and pluggable mechanisms. So you >> may find it much easier to code later releases, or that your fix has >> already been done. >> >> Amos
Re: [PATCH] Generated SBuf::find() test cases
On 02/26/2013 03:56 PM, Amos Jeffries wrote: > I've applied it to me checkout of stringng and am hitting build errors. > Will see what I can do about that and post an update hopefully later > today.. > > > ... unless you are able to post one Alex? I am not getting build errors (or I would not post the patch in the first place, at least not without a warning!). The patch built fine and was tested, which is the default for all the patches I post. It did fail "make check" because of SBuf problems, as reported. I was using stringng r9612. If you need help fixing build errors (which do not surprise me because, in part, the code has to deal with lots of signed/unsigned interactions), please share them. Thank you, Alex.
Re: [PATCH] Generated SBuf::find() test cases
On 02/27/2013 01:41 AM, Amos Jeffries wrote: > ../.././test-suite/../src/DiskIO/../.././test-suite/../src/tests/SBufFindTest.cc:43: > error: `Placement' is not a class or namespace > *** Error code 1 Please remove "Placement::" from that line. Alex.
Re: Store Object length
On 03/01/2013 04:04 AM, anita.sivaku...@wipro.com wrote: > I am storing the list of all the prefetched urls and > go through them one by one OK. > and retrieve the object from the store/cache That part is puzzling. See below. > On the whole, I will need to send the objects serially one after the > other. So it would not be following a normal http request parse -> > object fetch -> object sent back with http reply. I am not sure I understand the "send serially" part of this. My understanding is is your prefetching Squid will: 1. Find a URL to prefetch in the response. 2. Create a fake HTTP request for that URL. 3. Fetch that URL using regular server-side code, which may also cache the response. The above should work without many Store changes, similar to quick_abort handling. Your prefetching code should not need to get anything from the Store (until the client requests the prefetched object, but that transaction will just use the existing code). > Instead my API will directly fetch from the cache given the url and send it > across. That part I do not understand. It sounds like you might be trying to do two separate things: 1. Prefetch into the parent cache. 2. Push prefetched objects into the child cache. If that is the case, can you explain why not just make the child cache (or both caches) prefetch? Why the push? > We are already into thick of our development that we do not want to > move to higher versions as it will be a lot of work for us though we > may consider for a version change in future. Are you planning on submitting your changes for the inclusion into the official sources? Thank you, Alex. > -Original Message- > From: Alex Rousskov [mailto:rouss...@measurement-factory.com] > Sent: 27 February 2013 00:34 > To: Anita Sivakumar (WT01 - GMT-Telecom Equipment) > Cc: squid-dev@squid-cache.org > Subject: Re: Store Object length > > Quoting earlier responses to your question in case you missed them: > >>> On 25/02/2013 8:00 p.m., anita wrote: >>>> I am trying to write an API that will fetch objects from the squid cache >>>> given the url and http method. >>>> When I analyzed the code, I found that this can be achieved with a >>>> storeGetPublic(url, http_method) to see if the url has an entry in the >>>> cache. >>>> This followed by storeClientCopy() call. >>>> However, the StoreIOBuffer used here has been initialized with a >>>> lenght of HTTP_REQBUF_SZ (4096 or 4 k) by default. > >> On 02/25/2013 04:45 AM, Amos Jeffries wrote: >>> Err. Yes storeGetPublic() is the API to do this with. >>> >>> Why are you writing a new one? what are you trying to do with Squid? > > On 02/25/2013, Alex Rousskov wrote: >> I agree with Amos that these questions are important, for many reasons, >> ranging from compatibility with official code to being able to get >> support on squid-dev. >> >> >> To answer your buffer size question: The API does not restrict what >> buffer sizes the caller should use. Some client side code uses >> HTTP_REQBUF_SZ, but your code may use a different size. The store_client >> subscriber will receive callbacks when more data is available. It is a >> complex, iterative process; not a single "give me the whole response >> data" call. >> >> In theory, even a 1-byte buffer should work, but, besides being too >> slow, I suspect some Store code assumes that HTTP response headers will >> fit into the first buffer given to Store by store_client. Thus, it would >> be unwise to use smallish buffers without fixing Store first. >> >> >> Also, please note that storeGetPublic() returns an entry that is not >> necessarily fully cached. In the extreme case of collapsed forwarding, >> it returns an entry that is only expected to be cached when received >> from the origin server. > >>>> I would like to know how larger objects are stored and retrieved from >>>> cache? How can I determine the length of the object read in that case? > >> In general, you should not rely on knowing the response size a priori >> because some store modules may allow storage of objects where the size >> is not known when swap out begins (that is a good thing) and do not >> update the recorded size when swap out ends (that would be a performance >> optimization). I doubt Squid fully supports that today, but eventually >> it should, so it is better not to make a priori size knowledge a >> critical precondition in your code (if possible). >> >> Please describe what problem you are trying to solve and why you think >> you need a new
Re: StringNG SBuf fixes for rfind()
On 03/03/2013 10:45 PM, Amos Jeffries wrote: > http://master.squid-cache.org/~amosjeffries/patches/SBuf_fixes_mk3.patch * SBuf::rfind(const SBuf &needle, SBuf::size_type endPos) const > +// on empty hay std::string returns npos > +if (length() < needle.length()) > +return SBuf::npos; The comment is inaccurate: empty std::string returns npos only when the needle is not empty. Besides, the if-statement does not check for an empty hay at all. The above code is correct -- if needle does not fit we can only return npose, of course, but I do not think we need a reference to std::string in this case. > // on empty needle std::string returns the position the search starts > if (needle.length() == 0) > return endPos; The comment is a little misleading: std::string actually returns its length when searching start exceeds that length: > std::string("a2").rfind("", 0) = 0 > std::string("a2").rfind("", 1) = 1 > std::string("a2").rfind("", 2) = 2 > std::string("a2").rfind("", 3) = 2 The code itself is correct because you adjust endPos before getting to that if-statement. Instead of trying hard to describe what std::string does correctly, I suggest just saying something like this: // match std::string::rfind() behavior on empty needles * SBuf::rfind(char c, SBuf::size_type endPos) const > +// on empty hay std::string returns size of hay > +if (length() < 1) > return SBuf::npos; The comment is out of sync with the correct code. Here we do not need to refer to std::string IMO. > There is a rather strange problem with using memrchr(). If you pass it > the accurate count of bytes to check it fails to match start/end of hay > properly in sub-string cases. It does not in my tests: > memrchr("a2", '2', 0) = 0 > memrchr("a2", '2', 1) = 0 > memrchr("a2", '2', 2) = 0x8048ba1 > memrchr("a2", '2', 3) = 0x8048ba1 > memrchr("2a", '2', 0) = 0 > memrchr("2a", '2', 1) = 0x8048c61 > memrchr("2a", '2', 2) = 0x8048c61 > memrchr("2a", '2', 3) = 0x8048c61 > memrchr("a2a", '2', 0) = 0 > memrchr("a2a", '2', 1) = 0 > memrchr("a2a", '2', 2) = 0x8048c02 > memrchr("a2a", '2', 3) = 0x8048c02 All of the above seem to work as expected when the "accurate count of bytes to check" is passed. > I have had to use endPos+1 and artificially manipulate the situations > where endPos == length(). Your mk3 code seems to be correct, with no strange problems in this area: You are correctly converting character position to the number of bytes to scan. I recommend replacing the "weirdness" comment with something like "convert character position to the number of characters to scan". HTH, Alex.
[RFC] Native FTP proxying
On 09/04/2012 03:04 PM, Henrik Nordström wrote: >>> A FTP client-side might be accepted, allowing Squid to act as an >>> ftp->ftp gateway using HTTP semantics internally, even accepting >>> ftp->http gatewaying if you want. But not sure it makes sense to add >>> native FTP proxy support. >> Can you clarify the difference between "ftp->ftp gateway" and "native >> FTP proxy support"? Please assume that we are "using HTTP semantics >> internally" in both cases, which I interpret as not altering the >> forward.cc and Store code much -- the new FTP code will convert FTP >> commands into HTTP messages for internal processing and the code outside >> FTP client and server modules will remain pretty much the same. > An ftp->ftp gateway using HTTP semantics internally is a FTP server > component (client_side..) that accepts FTP commands from FTP clients and > maps them to HTTP semantics which is then proxied by Squid. > > Native FTP proxy support means acting as an FTP proxy relaying FTP > commands between client and requested server without going via an > internal HTTP sematics mapping. > > > The first is very feasible and maps nicely to Squid. I started documenting that approach at http://wiki.squid-cache.org/Features/NativeFtp I am not sure the name of that wiki URL is correct because it sounds like "native FTP" in Squid context means more than "listening for native FTP commands" to Henrik and possibly others. To me, Squid internals (i.e., how the FTP commands and replies are forwarded inside Squid) should not matter when naming this feature, but I can rename if others think that "FtpGateway" is better than "NativeFtp". The initial implementation will not allow any caching or filtering by file names with full paths because tracking true "current directory" within an FTP session is difficult (as we discussed a little on this thread). I think there are a few ways it can be done with a high probability of success, but those ways should be finalized and discussed later, once the basic FTP mapping and forwarding code is working IMO. The current wiki page documents overall feature functionality and FTP<->HTTP mapping basics. Comments and suggestions are welcomed. Thank you, Alex.
Re: [RFC] Native FTP proxying
On 03/11/2013 06:21 PM, Amos Jeffries wrote: > On 12/03/2013 1:04 p.m., Alex Rousskov wrote: >> I started documenting that approach at >> http://wiki.squid-cache.org/Features/NativeFtp > Yes "Gateway" in the name please. Kinkie, I do not have permissions to rename wiki pages. Could you please rename http://wiki.squid-cache.org/Features/NativeFtp to http://wiki.squid-cache.org/Features/FtpGateway Thank you, Alex.
Re: [RFC] SourceLayout for client-side and server-side components
On 03/13/2013 05:10 AM, Amos Jeffries wrote: > 1) we retain the client-side/server-side naming and place the > components in a sub-directory off the main protocol directory. > > src/http/ - HTTP protocol definitions > src/http/client - client-side HTTP receiving component > src/http/server - client-side HTTP receiving component > > Pros: > * code is all in a clear bundled subdirectory. > > Cons: > * when gatewaying between protocols is not clear where to place the > code. We will need to define squid-internal protocol structures > separately from any particular client/server-side protocol definition. In addition to that, this layout is upside down IMO: Protocol is less important than the "side" from Squid architecture point of view. Squid has only a few "sides". There can be ten or more protocols working on each side. And some protocols work only on one side. I would recommend starting with this: src/http - HTTP-specific classes used by both sides src/CCC - classes dealing with clients (all protocols) src/SSS - classes dealing with servers (all protocols) CCC can be "client", "client_side", "ports", "incoming", "in", "accept", or some other misleading term. I cannot think of a good one so I used a CCC placeholder. I hope others can come up with something. Perhaps we can do what Ubuntu did -- does "client side" sound pretty in some popular ancient language? :-) SSS can be "server", "server_side", "outgoing", "out", "connect", or some other misleading term. Client arguments apply here as well. The CCC/SSS naming question is a separate one though. I think we all understand what "dealing with proxy clients" means and we have current client_side* code as an illustration. Same for the server side. Needless to say, it would be possible to split protocol-specific code inside each side: src/http - HTTP-specific classes used by both sides src/CCC - Mostly protocol-agnostic classes dealing with clients src/CCC/http - Very HTTP-specific classes dealing with clients ... but that is secondary and may not be needed. > 2) following the upcoming HTTPbis terminology clarifications as a basis > for these components layout. The new specs are emphasising the terms > _sender_ and _receiver_ as more precise than client/server. We can use > these terms to name our directory structure like so: Sender and receiver are single message handling roles. They are _not_ clients and servers. You have both senders and receivers on the client side. You have both senders and receivers on the server side. I agree that we could split senders and receivers, but the following is not how they would have to be split: > src/receiver/X/= client-side component X. > > src/sender/X/ = server-side component X. It would have to be: src/CCC/senders/ src/CCC/receivers/ src/SSS/senders/ src/SSS/receivers/ or, if my arguments that "side" is a higher-level concept than protocol or message handling role are overruled, then: src/senders/CCC/ src/senders/SSS/ src/receivers/CCC/ src/receivers/SSS/ HTH, Alex.