Re: client_side and comm_close
> On Sun, 20 Apr 2008 22:44:22 -0600, Alex Rousskov wrote: > > > What if we let the close callback to determine how it should be called? > > Old code gets an "immediate" close callback (the default in v3.1). The > > code that was written to handle async calls uses asynchronous callbacks. > > This approach may allow for a smoother migration to 100% asynchronous > > calls without introducing new complex inter-call dependencies. I believe > > this approach is also easy to implement without changing much of the > > existing code (old or new). > > Thats fine for me. In fact a very good migration plan. I have implemented something very similar to the above and, unfortunately, it does not work well. The plan would have worked beautifully if all comm_close users (old and new) were independent from each other. In real world, there are some natural dependencies. What happens is that an asynchronous write handler call for "new" code is scheduled first but may be dialed _after_ the synchronous close handler call for the "old" code. When "old" and "new" code are managing related state (e.g., client-side HTTP and REQMOD transactions), the out-of-order execution of callbacks creates problems. The good news is that I think I found another simple workaround, thanks to Christos' earlier work on bug #2253. Instead of mixing sync and async calls, Comm now makes sure that _queued_ Comm callbacks that were scheduled before comm_close get their flags reset to COMM_ERR_CLOSING before being dialed. The above hack preserves callback order and prevents situations where Comm user receiving a successful I/O callback starts modifying Comm-related state and gets an assertion because the descriptor is already in a "closing" state. Even if this hack solves a few current problems, it is only a temporary solution because modifying queued callbacks is ugly, not possible with threads, and should not be necessary in a well-written code. The Comm notes I posted earlier today document the plans to remove the COMM_ERR_CLOSING flag and, eventually, this hack. It is possible that ongoing testing will reveal problems with this approach and we will have to go back to the drawing board, of course. Stay tuned, Alex.
Re: client_side and comm_close
On Mon, 2008-04-21 at 23:45 +1200, Amos Jeffries wrote: > >> IMO, The proposed API _very_ first two lines of code in comm_close are to > >> register a special Comm callback to perform the fclose() call, and then to > >> immediately set fd_table flag closed for the rest of the comm_close > >> process. > > > > Agreed on the flag, disagreed on the call. The special/internal Comm > > call (to self) should be scheduled last (after all close callbacks) and > > not first because the close handler might need access to some FD-related > > info. That info should be preserved until all close handlers have been > > called. > > > >> With that condition at the start we can guarantee that any registrations > >> made during the close sequence are either non-FD-relevant or caught. > > > > Yes, the flag is sufficient for that. The internal "close for good" call > > can still be last. > > I was thinking the close-for-good call would get caught as an fd > operation on closed fd by the Async stuff if set after the flag. Actually, the internal close-for-good call handler asserts that the flag is set! This internal handler does not have the structure or the restrictions of the public comm_* functions... > >> The special Comm callback is only special in that it is not required to > >> check flags open before fclose()'ing the system-level FD, which will allow > >> new connections to open on the FD. > > > > It is special because it is calling an internal comm method not some > > external close handler. Its profile and parameters are different. I > > would not call it a callback because of that, but the label is not > > important. It is not a close callback in terms of > > comm_add_close_handler. > > Yet it seems to me it needs to be run as an async after the other async > close-handlers are run. Due to the non-determinence of the async timing. Yes, the it must called asynchronously. The call is scheduled after all close callbacks are scheduled so that it will be dialed last. Alex.
Re: client_side and comm_close
Alex Rousskov wrote: On Mon, 2008-04-21 at 16:02 +1200, Amos Jeffries wrote: On Sun, 2008-04-20 at 22:01 +0300, Tsantilas Christos wrote: Maybe it can be easier: The ease of implementation is a separate question. We still have to agree what we are implementing, and the items below attempt to define that. Ambiguity and lack of clear APIs is quite dangerous here (as the related bugs illustrate!). Just keeping the comm_close as is and in AsyncCallsQueue just cancels/not execute asyncCall's for which the fd_table[fd].flags.closing is set. Implementing #1 below can indeed be as simple as you sketch above. We need to make sure that the final call that closes the FD and makes fd value available to other connections is placed last (that may already be true and is easy to fix if not). Not true. IMO, The proposed API _very_ first two lines of code in comm_close are to register a special Comm callback to perform the fclose() call, and then to immediately set fd_table flag closed for the rest of the comm_close process. Agreed on the flag, disagreed on the call. The special/internal Comm call (to self) should be scheduled last (after all close callbacks) and not first because the close handler might need access to some FD-related info. That info should be preserved until all close handlers have been called. With that condition at the start we can guarantee that any registrations made during the close sequence are either non-FD-relevant or caught. Yes, the flag is sufficient for that. The internal "close for good" call can still be last. I was thinking the close-for-good call would get caught as an fd operation on closed fd by the Async stuff if set after the flag. The special Comm callback is only special in that it is not required to check flags open before fclose()'ing the system-level FD, which will allow new connections to open on the FD. It is special because it is calling an internal comm method not some external close handler. Its profile and parameters are different. I would not call it a callback because of that, but the label is not important. It is not a close callback in terms of comm_add_close_handler. Yet it seems to me it needs to be run as an async after the other async close-handlers are run. Due to the non-determinence of the async timing. Between the initial comm_close() call and the special Comm callback, we don't need to care if callbacks write their shutdown statements to the fd (its still technically open) but the closed flag prevents comm accepting any new delayed event registrations or reads. Exactly. Our only problem is with code that calls comm with a stale fd, after that fd has been really closed and a new connection was opened with the same fd. That's not a new problem and we will solve it in v3.2 using comm handlers. I hope the above puts us on the same page about the implementation sketch for the comm_close API, but please yell if something still seems broken. Okay. It looks the way I would implement it from scratch. Amos -- Please use Squid 2.6.STABLE19 or 3.0.STABLE4
Re: client_side and comm_close
Alex Rousskov wrote: On Mon, 2008-04-21 at 15:48 +1200, Amos Jeffries wrote: comm_close(fd) API: 1) No I/O callbacks will be dialed after comm_close is called (including the time when comm_close is running). Sounds good. 2) All close callbacks registered before comm_close was called will be called asynchronously, sometime after comm_close exits. Sound good. 3) The code can ask Comm whether a FD associated with a close callback has been closed. The answer will be "yes" after comm_close is called and "no" before that. This interface needs to be added. Direct access to fd_table[fd].flags will not work because the same FD could have been assigned to another connection already. The code will have to use its close callback to determine FD status. Sounds good, BUT, direct access to fd_table pieces may need to be blocked entirely (private:) so code is forced to go through the Comm API properly. Yes, except if we want to avoid modifying old code that can still access those flags directly because it gets immediate close callbacks (more on that below). (2) states that the higher-level close callbacks may be run at any time. ie after the callback (3) refers to is run. This leaves a window open for disaster, unless the closing callbacks are made immediate, and back we go to recursion... Those are the same close callbacks! There are no low-level and high-level close callbacks here. The external code can use its stored callback pointer to get FD status even after that close callback has been called. There is no problem with that. The callback will not look at fd_table in that case, it will just say "yes, the fd is closed as far as you should be concerned". And, per recent suggestions, old code will get immediate close callbacks so it does not need to be modified to use the close callback pointer to ask about FD status. Alright. I got to this branch of the thread before the other which makes things clearer. Same for the below. 4) Registering any callback after comm_close has been called is wrong. Comm will assert if it can detect such late registrations. Not all late registrations will be detected because the same FD can be already assigned to a different (new) connection[*]. That non-detection seems to me to be a worry. The same problem in (3) occurs here. (4) can guarantee that the closing callbacks don't play nasty re-registrations. But only if they are called immediate instead of scheduled. Sorry, I do not understand. The "late" registration of a close callback can come from anywhere. The old code relies on fd only. There is no way for comm to distinguish whether the caller is using a FD from a closed connection or a new one. This problem exists in the old code as well so there is no new danger here! This problem can only be solved when we have comm handlers of one sort or another. The above comm_close API is easy to implement without massive code changes. Do you think it is the right API? If not, what should be changed? Apart from the worry with immediate vs delayed closing callbacks. To reduce that worry somewhat I think, the callbacks which actually use ERR_COMM_CLOSING for anything other than immediate abort will need to be replaced with two; a normal callback that checks the FD is open, and a simple closing callback. I am confused by the "normal callback that checks the FD is open" part. What is that for? Are you talking about some internal comm calls to close the FD? I believe that handler code that currently does not ignore ERR_COMM_CLOSING notifications will need to be moved into the close handler code (because only the close handler will be called if we eliminate ERR_COMM_CLOSING). Nevermind. I was saying it wrong, but meaning what you have re-stated. Amos -- Please use Squid 2.6.STABLE19 or 3.0.STABLE4
Re: client_side and comm_close
On Mon, 2008-04-21 at 16:02 +1200, Amos Jeffries wrote: > > On Sun, 2008-04-20 at 22:01 +0300, Tsantilas Christos wrote: > >> Maybe it can be easier: > > > > The ease of implementation is a separate question. We still have to > > agree what we are implementing, and the items below attempt to define > > that. Ambiguity and lack of clear APIs is quite dangerous here (as the > > related bugs illustrate!). > > > >> Just keeping the comm_close as is and in AsyncCallsQueue just > >> cancels/not execute asyncCall's for which the fd_table[fd].flags.closing > >> is set. > > > > Implementing #1 below can indeed be as simple as you sketch above. We > > need to make sure that the final call that closes the FD and makes fd > > value available to other connections is placed last (that may already be > > true and is easy to fix if not). > > Not true. > > IMO, The proposed API _very_ first two lines of code in comm_close are to > register a special Comm callback to perform the fclose() call, and then to > immediately set fd_table flag closed for the rest of the comm_close > process. Agreed on the flag, disagreed on the call. The special/internal Comm call (to self) should be scheduled last (after all close callbacks) and not first because the close handler might need access to some FD-related info. That info should be preserved until all close handlers have been called. > With that condition at the start we can guarantee that any registrations > made during the close sequence are either non-FD-relevant or caught. Yes, the flag is sufficient for that. The internal "close for good" call can still be last. > The special Comm callback is only special in that it is not required to > check flags open before fclose()'ing the system-level FD, which will allow > new connections to open on the FD. It is special because it is calling an internal comm method not some external close handler. Its profile and parameters are different. I would not call it a callback because of that, but the label is not important. It is not a close callback in terms of comm_add_close_handler. > Between the initial comm_close() call and the special Comm callback, we > don't need to care if callbacks write their shutdown statements to the fd > (its still technically open) but the closed flag prevents comm accepting > any new delayed event registrations or reads. Exactly. Our only problem is with code that calls comm with a stale fd, after that fd has been really closed and a new connection was opened with the same fd. That's not a new problem and we will solve it in v3.2 using comm handlers. I hope the above puts us on the same page about the implementation sketch for the comm_close API, but please yell if something still seems broken. Thank you, Alex.
Re: client_side and comm_close
On Mon, 2008-04-21 at 15:48 +1200, Amos Jeffries wrote: > > comm_close(fd) API: > > > > 1) No I/O callbacks will be dialed after comm_close is called (including > > the time when comm_close is running). > > > Sounds good. > > > 2) All close callbacks registered before comm_close was called will be > > called asynchronously, sometime after comm_close exits. > > > > Sound good. > > > 3) The code can ask Comm whether a FD associated with a close callback > > has been closed. The answer will be "yes" after comm_close is called and > > "no" before that. This interface needs to be added. Direct access to > > fd_table[fd].flags will not work because the same FD could have been > > assigned to another connection already. The code will have to use its > > close callback to determine FD status. > > Sounds good, BUT, direct access to fd_table pieces may need to be blocked > entirely (private:) so code is forced to go through the Comm API properly. Yes, except if we want to avoid modifying old code that can still access those flags directly because it gets immediate close callbacks (more on that below). > (2) states that the higher-level close callbacks may be run at any time. > ie after the callback (3) refers to is run. This leaves a window open for > disaster, unless the closing callbacks are made immediate, and back we go > to recursion... Those are the same close callbacks! There are no low-level and high-level close callbacks here. The external code can use its stored callback pointer to get FD status even after that close callback has been called. There is no problem with that. The callback will not look at fd_table in that case, it will just say "yes, the fd is closed as far as you should be concerned". And, per recent suggestions, old code will get immediate close callbacks so it does not need to be modified to use the close callback pointer to ask about FD status. > > 4) Registering any callback after comm_close has been called is wrong. > > Comm will assert if it can detect such late registrations. Not all late > > registrations will be detected because the same FD can be already > > assigned to a different (new) connection[*]. > > That non-detection seems to me to be a worry. The same problem in (3) > occurs here. (4) can guarantee that the closing callbacks don't play nasty > re-registrations. But only if they are called immediate instead of > scheduled. Sorry, I do not understand. The "late" registration of a close callback can come from anywhere. The old code relies on fd only. There is no way for comm to distinguish whether the caller is using a FD from a closed connection or a new one. This problem exists in the old code as well so there is no new danger here! This problem can only be solved when we have comm handlers of one sort or another. > > The above comm_close API is easy to implement without massive code > > changes. Do you think it is the right API? If not, what should be > > changed? > > Apart from the worry with immediate vs delayed closing callbacks. > > To reduce that worry somewhat I think, the callbacks which actually use > ERR_COMM_CLOSING for anything other than immediate abort will need to be > replaced with two; a normal callback that checks the FD is open, and a > simple closing callback. I am confused by the "normal callback that checks the FD is open" part. What is that for? Are you talking about some internal comm calls to close the FD? I believe that handler code that currently does not ignore ERR_COMM_CLOSING notifications will need to be moved into the close handler code (because only the close handler will be called if we eliminate ERR_COMM_CLOSING). Thanks, Alex.
Re: client_side and comm_close
sön 2008-04-20 klockan 22:56 -0600 skrev Alex Rousskov: > - I hope we can avoid similar problems with comm_cancel_read because the > two or three places using that call can be adjusted to cancel the read > callback directly. We have already talked about that elsewhere. It's not only comm_cancel_read but also commSetTimeout. Regards Henrik
Re: client_side and comm_close
On Mon, 2008-04-21 at 01:11 +0200, Henrik Nordstrom wrote: > sön 2008-04-20 klockan 22:29 +0300 skrev Tsantilas Christos: > > I am not saying that all are OK with the comm_close. Looks that there is > > a problem with this function, and this is the second problem > > > > This why I am saying that there are two different problems here > > There is actually three distinct problems discussed... > > a) client_side* is a big tangled mess beyond any repair, broken by > design and patched around too many times, making it very fragile each > time touched. > > b) comm is unable to cancel pending calls from comm_ calls using > callback pointers (callers not using AsyncCall) when asked to (either > explicit such as comm_read_cancel, or implicit by comm_close). Basically > an AsyncCall migration API issue with no caller/receiver boundary for > the AsyncCall. I haven't looked into how things works out then the > caller of comm_* givest the AsyncCall object, but at least then it's > possible for that comm_* caller to cancel the call.. > > c) comm_close now async, while some of the code really expects it to be > synchronous allowing for immediate teardown of everything related.. > > The first is just what it is. A big ugly mess of code which been around > for too long. Somewhat cleaned up in Squid-3, but still battling with > the fundamental design issues of the environment where the code > operates... > > The second is a plain bug imho. The open question is if the bug is to be > fixed, or if it can be eleminated by converting everything over to > AsyncCall... > > The last is a reality in the current code, but probably not a desired > property of the code.. Agreed. It looks like we are zeroing in on v3.1 solution for (b) and, in part, (c): - We are almost done defining comm_close API. We may be done if you agree with my suggestion to make the close callback decide how it should be called (with the old code-friendly default). - We already know how to cancel pending I/O calls after comm_close. - I hope we can avoid similar problems with comm_cancel_read because the two or three places using that call can be adjusted to cancel the read callback directly. We have already talked about that elsewhere. Thank you, Alex.
Re: client_side and comm_close
On Mon, 2008-04-21 at 00:54 +0200, Henrik Nordstrom wrote: > sön 2008-04-20 klockan 11:21 -0600 skrev Alex Rousskov: > > I agree regarding the first initiation point, but disagree that it is > > sufficient to guarantee non-recursive behavior: In addition to > > callbacks, we have direct function calls to comm and other APIs. The > > combination of immediate callbacks and those function calls will cause > > "recursion" as control will reenter the jobs multiple times. For > > example: > > > > event loop -> immediate callback -> handler -> comm_* call -> immediate > > callback -> handler -> ... > > No, there is already an AsyncCall in progress so the call from comm_* > will always be async. Ah, I think I see what you mean now. I would very much like to avoid or at least minimize this mixture of asynchronous and immediate calls, especially if one call is going to determine the other call "timing". Let's see what we really need once comm_close API is finalized. > > comm_close(fd) API: > > > > 1) No I/O callbacks will be dialed after comm_close is called (including > > the time when comm_close is running). > > Yes. > > > 2) All close callbacks registered before comm_close was called will be > > called asynchronously, sometime after comm_close exits. > > Yes.. or... no. See below. > > > 3) The code can ask Comm whether a FD associated with a close callback > > has been closed. The answer will be "yes" after comm_close is called and > > "no" before that. This interface needs to be added. Direct access to > > fd_table[fd].flags will not work because the same FD could have been > > assigned to another connection already. The code will have to use its > > close callback to determine FD status. > > Right. There is a trouble here in that we only have the fd, not a "Comm > handle". So we need to defer the actual closing of the fd until all > close handler have executed, but make comm behave as if it was closed. > > Or the comm_close callbacks should most likely be synchronous to let > each user of the fd know the fd is now going away and that they MUST NOT > take any other actions on that fd. > > The second is more in line with what the current code expects, Yes, we are on the same page here and #4 below. > and what should be done for 3.1 I think. What if we let the close callback to determine how it should be called? Old code gets an "immediate" close callback (the default in v3.1). The code that was written to handle async calls uses asynchronous callbacks. This approach may allow for a smoother migration to 100% asynchronous calls without introducing new complex inter-call dependencies. I believe this approach is also easy to implement without changing much of the existing code (old or new). > For 3.2 we may rethink the comm layer, eleminating this. Thank you, Alex. > > 4) Registering any callback after comm_close has been called is wrong. > > yes. The fd is semantically gone at that point. > > > [*] We can reliably detect #4 violations by replacing "fd" integers > > outside of comm with a unique integer ID or, better, a handler object > > containing metadata. That kind of change may be too big for v3.1 though. > > Yes. Long term we should have a "Comm handle" separate from the fd, with > the lowlevel OS fd only known to comm. > > And additionally it should be possible to layer things on that which > further abstracts I/O from the fd, such as ssl.. but that's another > topic..
Re: client_side and comm_close
On Sun, 2008-04-20 at 22:58 +0300, Tsantilas Christos wrote: > Alex Rousskov wrote: > > On Sun, 2008-04-20 at 22:01 +0300, Tsantilas Christos wrote: > >> Maybe it can be easier: > > > > The ease of implementation is a separate question. We still have to > > agree what we are implementing, and the items below attempt to define > > that. Ambiguity and lack of clear APIs is quite dangerous here (as the > > related bugs illustrate!). > > In practice I am proposing to keep the current design but just do not > allow any AsyncCall executed after the comm_close called (now we are > allowing it). One of the problems is that the "current design" is partially undefined and, hence, is being interpreted or broken differently by different code. That's why I want to polish and document the API while we are fixing related problems. I know you prefer to talk about specific "code" but "keeping the current design" does not mean much to me in this context, unfortunately (for the reasons mentioned above). > In your design just do not close the socket with the comm_close but > schedules a call to close it, this will not allow the fd used by a new > connection. Actually, I do not think the proposed rules mentioned closing the socket! Who cares about such a minor internal detail :-? The FD must be closed sometime after the last close callback returns, I guess. I will add that to the comm_close rules we are hashing out in this thread (I want to refresh my memory on overlapping I/Os per Adrian suggestion first). Thank you, Alex.
Re: client_side and comm_close
> On Sun, 2008-04-20 at 22:01 +0300, Tsantilas Christos wrote: >> Maybe it can be easier: > > The ease of implementation is a separate question. We still have to > agree what we are implementing, and the items below attempt to define > that. Ambiguity and lack of clear APIs is quite dangerous here (as the > related bugs illustrate!). > >> Just keeping the comm_close as is and in AsyncCallsQueue just >> cancels/not execute asyncCall's for which the fd_table[fd].flags.closing >> is set. > > Implementing #1 below can indeed be as simple as you sketch above. We > need to make sure that the final call that closes the FD and makes fd > value available to other connections is placed last (that may already be > true and is easy to fix if not). Not true. IMO, The proposed API _very_ first two lines of code in comm_close are to register a special Comm callback to perform the fclose() call, and then to immediately set fd_table flag closed for the rest of the comm_close process. With that condition at the start we can guarantee that any registrations made during the close sequence are either non-FD-relevant or caught. The special Comm callback is only special in that it is not required to check flags open before fclose()'ing the system-level FD, which will allow new connections to open on the FD. Between the initial comm_close() call and the special Comm callback, we don't need to care if callbacks write their shutdown statements to the fd (its still technically open) but the closed flag prevents comm accepting any new delayed event registrations or reads. > We also need to make sure that the > user-level code does not access fd_table based on a possibly stale fd > value. That would require more work, but probably still nothing major. > >> We only need a way to identify comm_close handlers and the asyncCall >> which will do the actual close of the socket. > > I am not sure what you mean, but the general-purpose AsyncCallQueue will > not do these comm-specific checks. We already have custom comm callbacks > and a general "Should This Call be Dialed?" interface. The callbacks > will check the flags and will not dial if they should not. Please > clarify if I am talking about a different problem here. > > Thank you, > > Alex. > >> Alex Rousskov wrote: >> > comm_close(fd) API: >> > >> > 1) No I/O callbacks will be dialed after comm_close is called >> (including >> > the time when comm_close is running). >> > >> > 2) All close callbacks registered before comm_close was called will be >> > called asynchronously, sometime after comm_close exits. >> > >> > 3) The code can ask Comm whether a FD associated with a close callback >> > has been closed. The answer will be "yes" after comm_close is called >> and >> > "no" before that. This interface needs to be added. Direct access to >> > fd_table[fd].flags will not work because the same FD could have been >> > assigned to another connection already. The code will have to use its >> > close callback to determine FD status. >> > >> > 4) Registering any callback after comm_close has been called is wrong. >> > Comm will assert if it can detect such late registrations. Not all >> late >> > registrations will be detected because the same FD can be already >> > assigned to a different (new) connection[*]. >> > >> > The above comm_close API is easy to implement without massive code >> > changes. Do you think it is the right API? If not, what should be >> > changed? >> > >> > [*] We can reliably detect #4 violations by replacing "fd" integers >> > outside of comm with a unique integer ID or, better, a handler object >> > containing metadata. That kind of change may be too big for v3.1 >> though. > > >
Re: client_side and comm_close
> On Sun, 2008-04-20 at 09:22 +0200, Henrik Nordstrom wrote: >> lör 2008-04-19 klockan 23:05 -0600 skrev Alex Rousskov: >> > It is easy to implement two kinds of call scheduling, but I am not >> sure >> > I understand the merits of doing so. What do you mean by the "event >> > loops" in this context? Are you talking about eventAdd() queue? Or I/O >> > event (i.e., comm) callbacks? I am guessing it is the latter. >> >> Both. The event checks (I/O, timed events etc) run from the main loop >> and are always the first AsyncCall callers for anything going on. >> >> > The immediate calls from comm were not really "safe" (because of the >> > recursive/reentrant behavior) so I am not sure what we would achieve >> > here. >> >> As everything going on in Squid is initiated by the event loop you don't >> get recursion from this change.. >> >> main loop -> event loop -> immediate async call -> handler -> other I/O >> actions etc but from here always async.. > > I agree regarding the first initiation point, but disagree that it is > sufficient to guarantee non-recursive behavior: In addition to > callbacks, we have direct function calls to comm and other APIs. The > combination of immediate callbacks and those function calls will cause > "recursion" as control will reenter the jobs multiple times. For > example: > > event loop -> immediate callback -> handler -> comm_* call -> immediate > callback -> handler -> ... > >> > Please clarify which of the following comm_close implementation would >> > solve the problems you want to solve in this context: >> > >> > A) Immediately call all not-yet-dialed callbacks for the affected FD. >> > All relevant I/O callbacks are called with ERR_COMM_CLOSING flag set >> (if >> > we still have that). The closing callbacks, if any, are called last. >> > >> > B) Asynchronously call all not-yet-dialed callbacks for the affected >> FD. >> > All relevant I/O callbacks are called with ERR_COMM_CLOSING flag set >> (if >> > we still have that). The closing callbacks, if any, are called last. >> > >> > C) Other. >> >> Don't call them. See ERR_COMM_CLOSING discussion. >> >> The I/O calls are no longer relevant at the moment the fd has been >> semantically closed (comm_close called by the code). Only the close >> callbacks matter at that time. >> >> Note: The close callbacks as such need to be async, just not the closing >> of the fd (or at least at a semantical level). > > OK, I think you picked option B then :-). Let me polish it to better > match the current level of understanding and the "dropping > ERR_COMM_CLOSING" agreement: > > comm_close(fd) API: > > 1) No I/O callbacks will be dialed after comm_close is called (including > the time when comm_close is running). > Sounds good. > 2) All close callbacks registered before comm_close was called will be > called asynchronously, sometime after comm_close exits. > Sound good. > 3) The code can ask Comm whether a FD associated with a close callback > has been closed. The answer will be "yes" after comm_close is called and > "no" before that. This interface needs to be added. Direct access to > fd_table[fd].flags will not work because the same FD could have been > assigned to another connection already. The code will have to use its > close callback to determine FD status. Sounds good, BUT, direct access to fd_table pieces may need to be blocked entirely (private:) so code is forced to go through the Comm API properly. (2) states that the higher-level close callbacks may be run at any time. ie after the callback (3) refers to is run. This leaves a window open for disaster, unless the closing callbacks are made immediate, and back we go to recursion... > 4) Registering any callback after comm_close has been called is wrong. > Comm will assert if it can detect such late registrations. Not all late > registrations will be detected because the same FD can be already > assigned to a different (new) connection[*]. That non-detection seems to me to be a worry. The same problem in (3) occurs here. (4) can guarantee that the closing callbacks don't play nasty re-registrations. But only if they are called immediate instead of scheduled. > > The above comm_close API is easy to implement without massive code > changes. Do you think it is the right API? If not, what should be > changed? Apart from the worry with immediate vs delayed closing callbacks. To reduce that worry somewhat I think, the callbacks which actually use ERR_COMM_CLOSING for anything other than immediate abort will need to be replaced with two; a normal callback that checks the FD is open, and a simple closing callback. The rest can be left to StackableIO/abstract-handles cleanup. I have some design ideas there, but thats another topic. > > [*] We can reliably detect #4 violations by replacing "fd" integers > outside of comm with a unique integer ID or, better, a handler object > containing metadata. That kind of change may be too big for v3.1 though. Amos
Re: client_side and comm_close
> sön 2008-04-20 klockan 00:38 -0600 skrev Alex Rousskov: > >> This thread is (was) dedicated to the discussion whether _adjusting_ >> v3.1 roadmap to add client-side cleanup is a good idea. > > And I say no on that. The current code works kind of, and the small > problems we have can be patched up for 3.1.STABLE. > > What is needed for the server component (client_side*) is a major > cleanup of the server code + store + how it interacts with forwarding. > It's a faily large change and beyond what we should aim for in 3.1. 3.2 > is more likely, maybe 3.3. Quite likely it needs a complete rebuild of > the client_side* code (and rename to http_server), only reusing the > parts that makes sense. It's not really that difficult job that code > needs to do, but the way it's currently being done makes a big mess of > it. > > client_side interactions with comm_close is just a tiny tiny part of > that, and looking into that before we have defined how comm_close really > should behave isn't meaningful imho. > > There is already sufficient amount of large changes in 3.2. Lets try not > to make it a significantly bigger step in the evolution of Squid. Focus > for 3.1 should now be to prune out bugs and look into what we need to do > for 3.2. +1. In case I haven't voted yet. Amos
Re: client_side and comm_close
sön 2008-04-20 klockan 22:29 +0300 skrev Tsantilas Christos: > I am not saying that all are OK with the comm_close. Looks that there is > a problem with this function, and this is the second problem > > This why I am saying that there are two different problems here There is actually three distinct problems discussed... a) client_side* is a big tangled mess beyond any repair, broken by design and patched around too many times, making it very fragile each time touched. b) comm is unable to cancel pending calls from comm_ calls using callback pointers (callers not using AsyncCall) when asked to (either explicit such as comm_read_cancel, or implicit by comm_close). Basically an AsyncCall migration API issue with no caller/receiver boundary for the AsyncCall. I haven't looked into how things works out then the caller of comm_* givest the AsyncCall object, but at least then it's possible for that comm_* caller to cancel the call.. c) comm_close now async, while some of the code really expects it to be synchronous allowing for immediate teardown of everything related.. The first is just what it is. A big ugly mess of code which been around for too long. Somewhat cleaned up in Squid-3, but still battling with the fundamental design issues of the environment where the code operates... The second is a plain bug imho. The open question is if the bug is to be fixed, or if it can be eleminated by converting everything over to AsyncCall... The last is a reality in the current code, but probably not a desired property of the code.. Regards Henrik
Re: client_side and comm_close
sön 2008-04-20 klockan 22:19 +0300 skrev Christos Tsantilas: > The important files are the CommCalls.* AsyncCalls.* and ICAP/AsyncJob.* > But I believe that you are understand it enough. I think I understand most of the code now, but not entirely 100% sure on the design and intended use model yet.. Basic principle is simple and neat. Regards Henrik
Re: client_side and comm_close
sön 2008-04-20 klockan 11:21 -0600 skrev Alex Rousskov: > I agree regarding the first initiation point, but disagree that it is > sufficient to guarantee non-recursive behavior: In addition to > callbacks, we have direct function calls to comm and other APIs. The > combination of immediate callbacks and those function calls will cause > "recursion" as control will reenter the jobs multiple times. For > example: > > event loop -> immediate callback -> handler -> comm_* call -> immediate > callback -> handler -> ... No, there is already an AsyncCall in progress so the call from comm_* will always be async. > comm_close(fd) API: > > 1) No I/O callbacks will be dialed after comm_close is called (including > the time when comm_close is running). Yes. > 2) All close callbacks registered before comm_close was called will be > called asynchronously, sometime after comm_close exits. Yes.. or... no. See below. > 3) The code can ask Comm whether a FD associated with a close callback > has been closed. The answer will be "yes" after comm_close is called and > "no" before that. This interface needs to be added. Direct access to > fd_table[fd].flags will not work because the same FD could have been > assigned to another connection already. The code will have to use its > close callback to determine FD status. Right. There is a trouble here in that we only have the fd, not a "Comm handle". So we need to defer the actual closing of the fd until all close handler have executed, but make comm behave as if it was closed. Or the comm_close callbacks should most likely be synchronous to let each user of the fd know the fd is now going away and that they MUST NOT take any other actions on that fd. The second is more in line with what the current code expects, and what should be done for 3.1 I think. For 3.2 we may rethink the comm layer, eleminating this. > 4) Registering any callback after comm_close has been called is wrong. yes. The fd is semantically gone at that point. > [*] We can reliably detect #4 violations by replacing "fd" integers > outside of comm with a unique integer ID or, better, a handler object > containing metadata. That kind of change may be too big for v3.1 though. Yes. Long term we should have a "Comm handle" separate from the fd, with the lowlevel OS fd only known to comm. And additionally it should be possible to layer things on that which further abstracts I/O from the fd, such as ssl.. but that's another topic.. Regards Henrik
Re: client_side and comm_close
Alex Rousskov wrote: On Sun, 2008-04-20 at 22:01 +0300, Tsantilas Christos wrote: Maybe it can be easier: The ease of implementation is a separate question. We still have to agree what we are implementing, and the items below attempt to define that. Ambiguity and lack of clear APIs is quite dangerous here (as the related bugs illustrate!). In practice I am proposing to keep the current design but just do not allow any AsyncCall executed after the comm_close called (now we are allowing it). In your design just do not close the socket with the comm_close but schedules a call to close it, this will not allow the fd used by a new connection. We only need a way to identify comm_close handlers and the asyncCall which will do the actual close of the socket. I am not sure what you mean, but the general-purpose AsyncCallQueue will not do these comm-specific checks. We already have custom comm callbacks and a general "Should This Call be Dialed?" interface. The callbacks will check the flags and will not dial if they should not. Please clarify if I am talking about a different problem here. No you are talking for the same problem but you are giving a better solution :-) Thank you, Alex. Alex Rousskov wrote: comm_close(fd) API: 1) No I/O callbacks will be dialed after comm_close is called (including the time when comm_close is running). 2) All close callbacks registered before comm_close was called will be called asynchronously, sometime after comm_close exits. 3) The code can ask Comm whether a FD associated with a close callback has been closed. The answer will be "yes" after comm_close is called and "no" before that. This interface needs to be added. Direct access to fd_table[fd].flags will not work because the same FD could have been assigned to another connection already. The code will have to use its close callback to determine FD status. 4) Registering any callback after comm_close has been called is wrong. Comm will assert if it can detect such late registrations. Not all late registrations will be detected because the same FD can be already assigned to a different (new) connection[*]. The above comm_close API is easy to implement without massive code changes. Do you think it is the right API? If not, what should be changed? [*] We can reliably detect #4 violations by replacing "fd" integers outside of comm with a unique integer ID or, better, a handler object containing metadata. That kind of change may be too big for v3.1 though.
Re: client_side and comm_close
On Sun, 2008-04-20 at 22:01 +0300, Tsantilas Christos wrote: > Maybe it can be easier: The ease of implementation is a separate question. We still have to agree what we are implementing, and the items below attempt to define that. Ambiguity and lack of clear APIs is quite dangerous here (as the related bugs illustrate!). > Just keeping the comm_close as is and in AsyncCallsQueue just > cancels/not execute asyncCall's for which the fd_table[fd].flags.closing > is set. Implementing #1 below can indeed be as simple as you sketch above. We need to make sure that the final call that closes the FD and makes fd value available to other connections is placed last (that may already be true and is easy to fix if not). We also need to make sure that the user-level code does not access fd_table based on a possibly stale fd value. That would require more work, but probably still nothing major. > We only need a way to identify comm_close handlers and the asyncCall > which will do the actual close of the socket. I am not sure what you mean, but the general-purpose AsyncCallQueue will not do these comm-specific checks. We already have custom comm callbacks and a general "Should This Call be Dialed?" interface. The callbacks will check the flags and will not dial if they should not. Please clarify if I am talking about a different problem here. Thank you, Alex. > Alex Rousskov wrote: > > comm_close(fd) API: > > > > 1) No I/O callbacks will be dialed after comm_close is called (including > > the time when comm_close is running). > > > > 2) All close callbacks registered before comm_close was called will be > > called asynchronously, sometime after comm_close exits. > > > > 3) The code can ask Comm whether a FD associated with a close callback > > has been closed. The answer will be "yes" after comm_close is called and > > "no" before that. This interface needs to be added. Direct access to > > fd_table[fd].flags will not work because the same FD could have been > > assigned to another connection already. The code will have to use its > > close callback to determine FD status. > > > > 4) Registering any callback after comm_close has been called is wrong. > > Comm will assert if it can detect such late registrations. Not all late > > registrations will be detected because the same FD can be already > > assigned to a different (new) connection[*]. > > > > The above comm_close API is easy to implement without massive code > > changes. Do you think it is the right API? If not, what should be > > changed? > > > > [*] We can reliably detect #4 violations by replacing "fd" integers > > outside of comm with a unique integer ID or, better, a handler object > > containing metadata. That kind of change may be too big for v3.1 though.
Re: client_side and comm_close
Henrik Nordstrom wrote: lör 2008-04-19 klockan 17:24 +0300 skrev Tsantilas Christos: If I am not wrong the bug which triggered this discussion is the bug2309 which is squid3.0 bug where AsyncCalls does not exists. Is it? I wasn't aware this was the problem we are discussing. 2309 is from what I can tell completely unrelated to comm_close or asyncness. Exactly what triggers it we don't know yet, other than for some reason two comm_read() calls gets scheduled on the same FD. Ok, maybe the bug2309 is unrelated to the comm_close but it is related to the client_side code complexity, and this is the first problem. From what I can tell 2309 is caused by the comm_read() refactoring, where a more low-level commSetSelect() API were used previously where it was no error to reset the event handler for an comm event while one was already set up on the fd.. What triggered the comm_close discussion was the pconn closing issue from what I remember. I am not saying that all are OK with the comm_close. Looks that there is a problem with this function, and this is the second problem This why I am saying that there are two different problems here
Re: client_side and comm_close
Maybe it can be easier: Just keeping the comm_close as is and in AsyncCallsQueue just cancels/not execute asyncCall's for which the fd_table[fd].flags.closing is set. We only need a way to identify comm_close handlers and the asyncCall which will do the actual close of the socket. Alex Rousskov wrote: OK, I think you picked option B then :-). Let me polish it to better match the current level of understanding and the "dropping ERR_COMM_CLOSING" agreement: comm_close(fd) API: 1) No I/O callbacks will be dialed after comm_close is called (including the time when comm_close is running). 2) All close callbacks registered before comm_close was called will be called asynchronously, sometime after comm_close exits. 3) The code can ask Comm whether a FD associated with a close callback has been closed. The answer will be "yes" after comm_close is called and "no" before that. This interface needs to be added. Direct access to fd_table[fd].flags will not work because the same FD could have been assigned to another connection already. The code will have to use its close callback to determine FD status. 4) Registering any callback after comm_close has been called is wrong. Comm will assert if it can detect such late registrations. Not all late registrations will be detected because the same FD can be already assigned to a different (new) connection[*]. The above comm_close API is easy to implement without massive code changes. Do you think it is the right API? If not, what should be changed? [*] We can reliably detect #4 violations by replacing "fd" integers outside of comm with a unique integer ID or, better, a handler object containing metadata. That kind of change may be too big for v3.1 though. Thank you, Alex.
Re: client_side and comm_close
On Sun, 2008-04-20 at 09:22 +0200, Henrik Nordstrom wrote: > lör 2008-04-19 klockan 23:05 -0600 skrev Alex Rousskov: > > It is easy to implement two kinds of call scheduling, but I am not sure > > I understand the merits of doing so. What do you mean by the "event > > loops" in this context? Are you talking about eventAdd() queue? Or I/O > > event (i.e., comm) callbacks? I am guessing it is the latter. > > Both. The event checks (I/O, timed events etc) run from the main loop > and are always the first AsyncCall callers for anything going on. > > > The immediate calls from comm were not really "safe" (because of the > > recursive/reentrant behavior) so I am not sure what we would achieve > > here. > > As everything going on in Squid is initiated by the event loop you don't > get recursion from this change.. > > main loop -> event loop -> immediate async call -> handler -> other I/O > actions etc but from here always async.. I agree regarding the first initiation point, but disagree that it is sufficient to guarantee non-recursive behavior: In addition to callbacks, we have direct function calls to comm and other APIs. The combination of immediate callbacks and those function calls will cause "recursion" as control will reenter the jobs multiple times. For example: event loop -> immediate callback -> handler -> comm_* call -> immediate callback -> handler -> ... > > Please clarify which of the following comm_close implementation would > > solve the problems you want to solve in this context: > > > > A) Immediately call all not-yet-dialed callbacks for the affected FD. > > All relevant I/O callbacks are called with ERR_COMM_CLOSING flag set (if > > we still have that). The closing callbacks, if any, are called last. > > > > B) Asynchronously call all not-yet-dialed callbacks for the affected FD. > > All relevant I/O callbacks are called with ERR_COMM_CLOSING flag set (if > > we still have that). The closing callbacks, if any, are called last. > > > > C) Other. > > Don't call them. See ERR_COMM_CLOSING discussion. > > The I/O calls are no longer relevant at the moment the fd has been > semantically closed (comm_close called by the code). Only the close > callbacks matter at that time. > > Note: The close callbacks as such need to be async, just not the closing > of the fd (or at least at a semantical level). OK, I think you picked option B then :-). Let me polish it to better match the current level of understanding and the "dropping ERR_COMM_CLOSING" agreement: comm_close(fd) API: 1) No I/O callbacks will be dialed after comm_close is called (including the time when comm_close is running). 2) All close callbacks registered before comm_close was called will be called asynchronously, sometime after comm_close exits. 3) The code can ask Comm whether a FD associated with a close callback has been closed. The answer will be "yes" after comm_close is called and "no" before that. This interface needs to be added. Direct access to fd_table[fd].flags will not work because the same FD could have been assigned to another connection already. The code will have to use its close callback to determine FD status. 4) Registering any callback after comm_close has been called is wrong. Comm will assert if it can detect such late registrations. Not all late registrations will be detected because the same FD can be already assigned to a different (new) connection[*]. The above comm_close API is easy to implement without massive code changes. Do you think it is the right API? If not, what should be changed? [*] We can reliably detect #4 violations by replacing "fd" integers outside of comm with a unique integer ID or, better, a handler object containing metadata. That kind of change may be too big for v3.1 though. Thank you, Alex.
Re: client_side and comm_close
On Sun, 2008-04-20 at 16:51 +0200, Henrik Nordstrom wrote: > sön 2008-04-20 klockan 00:38 -0600 skrev Alex Rousskov: > > > This thread is (was) dedicated to the discussion whether _adjusting_ > > v3.1 roadmap to add client-side cleanup is a good idea. > > And I say no on that. The current code works kind of, and the small > problems we have can be patched up for 3.1.STABLE. Noted. Since nobody advocated and volunteered for doing the client-side cleanup now, I think we should plan doing it for v3.2. If, based on Amos 4-step plan, we decide there may be time to do this in v3.1 after step #2 is completed, we can always revisit this question. If nobody commits to do this in v3.2, the feature will be pushed back to v3.3, as usual. I have added a Feature page to place this work on Squid3 roadmap: http://wiki.squid-cache.org/Features/ClientSideCleanup > What is needed for the server component (client_side*) is a major > cleanup of the server code + store + how it interacts with forwarding. > It's a faily large change and beyond what we should aim for in 3.1. 3.2 > is more likely, maybe 3.3. Quite likely it needs a complete rebuild of > the client_side* code (and rename to http_server), only reusing the > parts that makes sense. It's not really that difficult job that code > needs to do, but the way it's currently being done makes a big mess of > it. > > client_side interactions with comm_close is just a tiny tiny part of > that, and looking into that before we have defined how comm_close really > should behave isn't meaningful imho. I agree with the technical comments. As for the schedule, I hope we do not have to wait for v3.3, but it is too early to say. Thank you, Alex.
Re: client_side and comm_close
lör 2008-04-19 klockan 17:24 +0300 skrev Tsantilas Christos: > If I am not wrong the bug which triggered this discussion is the bug2309 > which is squid3.0 bug where AsyncCalls does not exists. Is it? I wasn't aware this was the problem we are discussing. 2309 is from what I can tell completely unrelated to comm_close or asyncness. Exactly what triggers it we don't know yet, other than for some reason two comm_read() calls gets scheduled on the same FD. >From what I can tell 2309 is caused by the comm_read() refactoring, where a more low-level commSetSelect() API were used previously where it was no error to reset the event handler for an comm event while one was already set up on the fd.. What triggered the comm_close discussion was the pconn closing issue from what I remember. Regards Henrik
Re: client_side and comm_close
tor 2008-04-17 klockan 20:13 +0300 skrev Tsantilas Christos: > As I can understand the problem does not exist only in squid3.0 or only > in squid3-trunk. The problems show up mainly in squid3-trunk due to the delays introduced by AsyncCalls, combined with the inability for old code to cancel pending operations when needed where they could do so just fine before AsyncCalls... The client_side_* and pconn* code isn't designed to deal with this kind of asyncness.. > The main problem with the client_side code is that the comm_read done by > the ConnStateData class, the comm_writes by the ClientSocketContext > class and comm_close (and other comm operations ) by all. This makes > very difficult to manage the comm related problems in client_side code Yes, but before AsyncCalls it all wound down immediately on comm_close, cancelling all further operations. The FD got closed, the ClientSocketContext invalidate and pending requests cancelled. So while this code is pretty much the same in 3-trunk it's only a problem after the introduction of AsyncCalls. The AsyncCalls do solve another class of problems however, what has been referred to as "recursion" in this thread. Or to be specific the combined fact that comm_close was immediate and that it wasn't always obvious when comm_close would trigger in complex code where comm_close is called from multiple places or deep down in processing chains where the upper layers maybe assumes it won't get called. This "recursion" only existed for comm_close, not read/write or timed events as those always bounced via the event loop before called. Regards Henrik
Re: client_side and comm_close
sön 2008-04-20 klockan 00:38 -0600 skrev Alex Rousskov: > This thread is (was) dedicated to the discussion whether _adjusting_ > v3.1 roadmap to add client-side cleanup is a good idea. And I say no on that. The current code works kind of, and the small problems we have can be patched up for 3.1.STABLE. What is needed for the server component (client_side*) is a major cleanup of the server code + store + how it interacts with forwarding. It's a faily large change and beyond what we should aim for in 3.1. 3.2 is more likely, maybe 3.3. Quite likely it needs a complete rebuild of the client_side* code (and rename to http_server), only reusing the parts that makes sense. It's not really that difficult job that code needs to do, but the way it's currently being done makes a big mess of it. client_side interactions with comm_close is just a tiny tiny part of that, and looking into that before we have defined how comm_close really should behave isn't meaningful imho. There is already sufficient amount of large changes in 3.2. Lets try not to make it a significantly bigger step in the evolution of Squid. Focus for 3.1 should now be to prune out bugs and look into what we need to do for 3.2. Regards Henrik
Re: client_side and comm_close
lör 2008-04-19 klockan 23:05 -0600 skrev Alex Rousskov: > It is easy to implement two kinds of call scheduling, but I am not sure > I understand the merits of doing so. What do you mean by the "event > loops" in this context? Are you talking about eventAdd() queue? Or I/O > event (i.e., comm) callbacks? I am guessing it is the latter. Both. The event checks (I/O, timed events etc) run from the main loop and are always the first AsyncCall callers for anything going on. > The immediate calls from comm were not really "safe" (because of the > recursive/reentrant behavior) so I am not sure what we would achieve > here. As everything going on in Squid is initiated by the event loop you don't get recursion from this change.. main loop -> event loop -> immediate async call -> handler -> other I/O actions etc but from here always async.. > I hope answers to earlier questions would clarify the intent. > Again, I would like to understand and agree on the problem/goal before > debating specific solutions... Goal is to get a clean and deterministic API for the event processing without hard to reproduce race conditions. > Please clarify which of the following comm_close implementation would > solve the problems you want to solve in this context: > > A) Immediately call all not-yet-dialed callbacks for the affected FD. > All relevant I/O callbacks are called with ERR_COMM_CLOSING flag set (if > we still have that). The closing callbacks, if any, are called last. > > B) Asynchronously call all not-yet-dialed callbacks for the affected FD. > All relevant I/O callbacks are called with ERR_COMM_CLOSING flag set (if > we still have that). The closing callbacks, if any, are called last. > > C) Other. Don't call them. See ERR_COMM_CLOSING discussion. The I/O calls are no longer relevant at the moment the fd has been semantically closed (comm_close called by the code). Only the close callbacks matter at that time. Note: The close callbacks as such need to be async, just not the closing of the fd (or at least at a semantical level). Regards Henrik
Re: client_side and comm_close
lör 2008-04-19 klockan 21:01 -0600 skrev Alex Rousskov: > I agree with many of your specific suggestions (and will discuss them > separately), but I want to understand what problem you are proposing to > solve first. For example: > > - Fix a few new bugs introduced by async calls. > - Fix certain comm API problems. > - Improve client-side code. > - Something else. > > This thread was just about improving client-side code (more > specifically, when to do it), but the responses seem to be are all over > the map so I am trying to understand what the goal is before diving into > specific development ideas. I am focusing more on comm than client_side at the moment. Fixing client_side without a well defined and reliable comm layer is not very meaningful. Regards Henrik
Re: client_side and comm_close
On Sun, 2008-04-20 at 11:47 +0800, Adrian Chadd wrote: > > > Yes, then you left it by introducing something which exposed the bugs. > > > > The problems this thread is discussing were known before AsyncCalls. > > But they weren't exhibiting as -bugs-. They were: assertion in bug2309 (and many other problems I personally worked on) were "discovered" in Squid3 way before AsyncCalls (v3.0 and earlier). > > > Its great that we now know about the bugs - but now you've got a codebase > > > that you're trying to stabilise > > > > This thread is not about stabilizing client-side code. It is about > > changing its key classes and the relationships among them (for many > > reasons). This will certainly destabilize the code short-term. If it > > would not, I would not need to ask others opinion whether we should wait > > with that work! > > I thought the focus on 3.1 - and I'd check, but the Wiki history for the > Roadmap stuff I put in early in the 3.1 roadmap cycle - was to focus on > stability and performance. ... > _I_ helped start building the 3.1 roadmap, if you remember. _I_ helped > draft the notion that 3.1 should be about performance and stability > fixes. This thread is (was) dedicated to the discussion whether _adjusting_ v3.1 roadmap to add client-side cleanup is a good idea. Since we have more than one person working on the project, we should try to coordinate significant changes. If you think client-side cleanup is a bad idea for v3.1, just post your arguments -- no reason to talk about AsyncCalls or other unrelated matters (which can be discussed in other threads, of course). As for v3.1 original focus that "you put in early", I do not even want to argue about that, and I am happy to assume that your original v3.1 "roadmap" was exactly what we should have done. What matters though, is the _current_ Squid3 roadmap, which we are trying to base on specific commitments of specific folks. It is far from perfect, but I think it is improving and reflects the current reality reasonably well. > If you want people to adopt Squid-3 then you should try and bring > another stable release out with IPv6 (once all of -those- issues are > fixed) and without too much quirkiness, and soon. We already have a Squid3 release that is reasonably stable and improving. I am not sure why we need to rush v3.1 out when the planned features have not been completed yet. (And did not you claim that IPv6 as a Squid feature is not really important? This thread is not about IPv6 though.) Alex.
Re: client_side and comm_close
On Sat, 2008-04-19 at 17:24 +0300, Tsantilas Christos wrote: > I think we have two different problems here, the client_side code and > the comm_close in async calls. You are correct. General async call bugs and general comm API problems have been mentioned on this thread as well. Most of these problems are related, of course, but it is rather important to understand which problem the person is talking about when discussing a specific solution :-). > If I am not wrong the bug which triggered this discussion is the bug2309 > which is squid3.0 bug where AsyncCalls does not exists. If that is indeed the case, it is a good point! > The problem here > is the client side code which the true is that it needs clean up. It is > well known to everyone tried to fix bugs or add code in this part of > squid3.. Yes, I hope everybody who worked with the client-side code is in agreement that it "needs a clean up". I believe the question opening this thread was "when to do it" (v3.1 or v3.2) not "whether to do it". It is important to note that I can hit bug2309-like assertions in trunk as well. The exact causes may be a little different, but all seem to revolve around comm_close and transaction abort handling (before and after async calls). Thank you, Alex.
Re: client_side and comm_close
On Sat, 2008-04-19 at 11:52 +0200, Henrik Nordstrom wrote: > fre 2008-04-18 klockan 08:25 -0600 skrev Alex Rousskov: > > > The changes needed on the client-side are pretty much unrelated to async > > calls. Async calls just make the design problems more obvious and help > > in solving them. Removing async calls from comm will fix a few bugs, but > > will be a step backwards and will make ongoing code cleanup more > > difficult. > > Here is a different proposal. Split async calls in two. Queued and > immediate, with the event loops using immediate calls and everything > else queued. It is easy to implement two kinds of call scheduling, but I am not sure I understand the merits of doing so. What do you mean by the "event loops" in this context? Are you talking about eventAdd() queue? Or I/O event (i.e., comm) callbacks? I am guessing it is the latter. > The siplest way of doing so safely would be to make async > calls immediate if not already within an async call, but adjusting the > event loops is also a possibility. This would make the code behave a lot > more like how it was before async, eleminating most races. The immediate calls from comm were not really "safe" (because of the recursive/reentrant behavior) so I am not sure what we would achieve here. I hope answers to earlier questions would clarify the intent. Again, I would like to understand and agree on the problem/goal before debating specific solutions... > The very few cases which may depend on comm_close being immediate should > be possible to identify by cbdataInternalLock calls I think as the state > object previously may disappear immediately otherwise. I can't see any > in the client_side_* code today. But there very likely is assumptions > that after comm_close has been called no further events will fire > positively on that fd (only as ERR_COMM_CLOSING). > I don't remember why comm_close was made async, but the more I think > about it I think it's most likely wrong. comm_close SHOULD invalidate > any pending comm operations on that fd, anything else just leads to > nightmares. Please clarify which of the following comm_close implementation would solve the problems you want to solve in this context: A) Immediately call all not-yet-dialed callbacks for the affected FD. All relevant I/O callbacks are called with ERR_COMM_CLOSING flag set (if we still have that). The closing callbacks, if any, are called last. B) Asynchronously call all not-yet-dialed callbacks for the affected FD. All relevant I/O callbacks are called with ERR_COMM_CLOSING flag set (if we still have that). The closing callbacks, if any, are called last. C) Other. Implementation A would lead to reentrant job calls, which is the problem we were trying to solve earlier, I think. Implementation B avoids reentrant/recursive behavior. We can have both if B is the Right Thing to do but A is what some old code required. (I still would like to know what problem we are trying to solve though; Is the problem you are trying to solve specific to comm_close?) Thank you, Alex.
Re: client_side and comm_close
On Sat, Apr 19, 2008, Alex Rousskov wrote: > > But now you have a better understanding of the bugs that are introduced > > with AsyncCalls. You can then plan design changes to the client-side code > > to work around these. > > I am not sure how to rephrase this so that it is clear, but client-side > design changes are needed to fix client-side bugs, not to work around > AsyncCall bugs. AsyncCalls are just a helpful tool here, not the focus. > Removing AsyncCalls will not help but will hurt. I know that client-side changes are required. We've all known that the client-side codebase needs major surgery for various reasons. > I simply do not understand why you focus on AsyncCalls in the context of > this thread when this thread discusses client-side problems that were > there before AsyncCalls and are there after AsyncCalls were added. I'm not focusing on AsyncCalls. I'm focusing on the breakage that its introduced in code which isn't yet ready for it. > > Yes, then you left it by introducing something which exposed the bugs. > > The problems this thread is discussing were known before AsyncCalls. But they weren't exhibiting as -bugs-. > > Its great that we now know about the bugs - but now you've got a codebase > > that you're trying to stabilise > > This thread is not about stabilizing client-side code. It is about > changing its key classes and the relationships among them (for many > reasons). This will certainly destabilize the code short-term. If it > would not, I would not need to ask others opinion whether we should wait > with that work! I thought the focus on 3.1 - and I'd check, but the Wiki history for the Roadmap stuff I put in early in the 3.1 roadmap cycle - was to focus on stability and performance. The recent work has broken the stability, and doing performance work on an unstable codebase is silly. > > If you back out the async related changes then you'll have a stable codebase > > again - you can then make smaller incremental changes to the codebase > > and test them to make sure you haven't broken anything, rather than spending > > possibly 6 + months finding more surprises. > > I do not think we can fix the problems discussed here by doing small > incremental changes. We are not talking about bug workarounds or a few > places that need internal polishing. And thats where we differed. > > Remember, the initial idea was to use 3.1 to tidy up the codebase to > > stabilise > > it, not tidy up the codebase to unstabilise it. Redesigning chunks of the > > comm codebase -was- on my todo list for 3.1, but then people dumped stuff > > into > > 3 to make it less stable again. > > This thread is not about redesigning comm APIs, and I do not know much > about your todo list, but Squid 3.1 roadmap has been built in public. If > any proposed Squid 3.1 features went against your plans, you should have > added your feature(s) to the roadmap and discussed conflicts. This > procedure does not guarantee anything (those who work together on the > project have to compromise), but it works better than complaining about > the first steps while approaching the end of the road. _I_ helped start building the 3.1 roadmap, if you remember. _I_ helped draft the notion that 3.1 should be about performance and stability fixes. The introduction of AsyncCalls, whilst a good idea in the long run, just shows that the current client-side codebase isn't ready for it. > And nobody is tidying up the code to make it unstable, obviously. But it is! And instead of backing the changes out - giving us a stable codebase again, but with an understanding of what needs to be changed before the AsyncCalls stuff is reintroduced - people still seem to persist along the same path which brought all the quirky instability into Squid-3 in the first place. Stop thinking that I'm focusing on AsyncCalls, and start thinking that I'm focusing on software engineering process with a lean towards stability. If you want people to adopt Squid-3 then you should try and bring another stable release out with IPv6 (once all of -those- issues are fixed) and without too much quirkiness, and soon. -- - Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid Support - - $25/pm entry-level VPSes w/ capped bandwidth charges available in WA -
Re: client_side and comm_close
On Sat, 2008-04-19 at 11:52 +0200, Henrik Nordstrom wrote: > fre 2008-04-18 klockan 08:25 -0600 skrev Alex Rousskov: > > > The changes needed on the client-side are pretty much unrelated to async > > calls. Async calls just make the design problems more obvious and help > > in solving them. Removing async calls from comm will fix a few bugs, but > > will be a step backwards and will make ongoing code cleanup more > > difficult. > > Here is a different proposal. I agree with many of your specific suggestions (and will discuss them separately), but I want to understand what problem you are proposing to solve first. For example: - Fix a few new bugs introduced by async calls. - Fix certain comm API problems. - Improve client-side code. - Something else. This thread was just about improving client-side code (more specifically, when to do it), but the responses seem to be are all over the map so I am trying to understand what the goal is before diving into specific development ideas. Thank you, Alex.
Re: client_side and comm_close
On Sat, 2008-04-19 at 10:50 +0800, Adrian Chadd wrote: > On Fri, Apr 18, 2008, Alex Rousskov wrote: > > If the client_side's main problem was a few coding bugs caused by async > > call changes, you may have been right. Unfortunately, our main problem > > here is the client_side design flaws. Removing async calls from comm > > will not undo those flaws. > > I know this. > > The changes needed on the client-side are pretty much unrelated to async > > calls. Async calls just make the design problems more obvious and help > > in solving them. Removing async calls from comm will fix a few bugs, but > > will be a step backwards and will make ongoing code cleanup more > > difficult. > > But now you have a better understanding of the bugs that are introduced > with AsyncCalls. You can then plan design changes to the client-side code > to work around these. I am not sure how to rephrase this so that it is clear, but client-side design changes are needed to fix client-side bugs, not to work around AsyncCall bugs. AsyncCalls are just a helpful tool here, not the focus. Removing AsyncCalls will not help but will hurt. I simply do not understand why you focus on AsyncCalls in the context of this thread when this thread discusses client-side problems that were there before AsyncCalls and are there after AsyncCalls were added. > > Another way to look at it is to note that client_side was already pretty > > stable before async calls (i.e., Squid 3.0). Thus, we have already been > > where you are proposing to go. > > Yes, then you left it by introducing something which exposed the bugs. The problems this thread is discussing were known before AsyncCalls. > Its great that we now know about the bugs - but now you've got a codebase > that you're trying to stabilise This thread is not about stabilizing client-side code. It is about changing its key classes and the relationships among them (for many reasons). This will certainly destabilize the code short-term. If it would not, I would not need to ask others opinion whether we should wait with that work! > and you can't guarantee you understand > all of the possible ways its going to break in the future. True. In fact, nobody can "guarantee he understands all of the possible ways something is going to break in the future". > If you back out the async related changes then you'll have a stable codebase > again - you can then make smaller incremental changes to the codebase > and test them to make sure you haven't broken anything, rather than spending > possibly 6 + months finding more surprises. I do not think we can fix the problems discussed here by doing small incremental changes. We are not talking about bug workarounds or a few places that need internal polishing. > Remember, the initial idea was to use 3.1 to tidy up the codebase to stabilise > it, not tidy up the codebase to unstabilise it. Redesigning chunks of the > comm codebase -was- on my todo list for 3.1, but then people dumped stuff into > 3 to make it less stable again. This thread is not about redesigning comm APIs, and I do not know much about your todo list, but Squid 3.1 roadmap has been built in public. If any proposed Squid 3.1 features went against your plans, you should have added your feature(s) to the roadmap and discussed conflicts. This procedure does not guarantee anything (those who work together on the project have to compromise), but it works better than complaining about the first steps while approaching the end of the road. And nobody is tidying up the code to make it unstable, obviously. In summary, I do not understand how removing AsyncCalls would help fix client-side design problems that are the subject of this thread. I do not even understand why we are talking about AsyncCalls here. Alex.
Re: client_side and comm_close
Hi all, I think we have two different problems here, the client_side code and the comm_close in async calls. If I am not wrong the bug which triggered this discussion is the bug2309 which is squid3.0 bug where AsyncCalls does not exists. The problem here is the client side code which the true is that it needs clean up. It is well known to everyone tried to fix bugs or add code in this part of squid3.. About the comm_close, the AsyncCalls designed and implemented in such way to require the minimum changes in the squid3 code and this was limiting some thinks... Henrik Nordstrom wrote: > > Here is a different proposal. Split async calls in two. Queued and > immediate, with the event loops using immediate calls and everything > else queued. The siplest way of doing so safely would be to make async > calls immediate if not already within an async call, but adjusting the > event loops is also a possibility. This would make the code behave a lot > more like how it was before async, eleminating most races. > > > The very few cases which may depend on comm_close being immediate should > be possible to identify by cbdataInternalLock calls I think as the state > object previously may disappear immediately otherwise. I can't see any > in the client_side_* code today. But there very likely is assumptions > that after comm_close has been called no further events will fire > positively on that fd (only as ERR_COMM_CLOSING). > > I don't remember why comm_close was made async, but the more I think I think (but not sure now) there was cases where some of the pending asyncCalls must executed. > about it I think it's most likely wrong. comm_close SHOULD invalidate > any pending comm operations on that fd, anything else just leads to > nightmares. To invalidate any pending comm operation for a particular fd you should scan all AsyncCalls queue which maybe is costly in a busy proxy, with thousands of asyncCalls scheduled (maybe the use of secondary asyncCalls indexes for each fd?) If we have only AsyncJob based asyncCalls it would be easy to not execute (invalidate) AsyncCalls which are referred to non valid asyncJobs, but now still exists functions which are scheduled using the old way... > > Related to this I also think the COMM_ERR_CLOSING callback from comm > should be dropped. Nearly all handlers looking for COMM_ERR_CLOSING do > nothing else than return immediately to abort execution.. and the few > exceptions there is can be dealt with by a close callback if at all > needed: > > errorSendComplete, invalidates the err object > > ftpStateData::dataRead, sets a boolean that the read is done > > HttpStateData::handleRequestBodyProducerAborted injects a > COMM_ERR_CLOSING > > HttpStateData::readReply sets a boolean that the read is done > > ServerStateData::sentRequestBody clears requestSender > > > the other 20 or so users of comm callbacks just returns immediately on > COMM_ERR_CLOSING, "aborting" the call.. > > Regards > Henrik > >
Re: client_side and comm_close
On Sat, Apr 19, 2008, Henrik Nordstrom wrote: > I don't remember why comm_close was made async, but the more I think > about it I think it's most likely wrong. comm_close SHOULD invalidate > any pending comm operations on that fd, anything else just leads to > nightmares. The back-end of comm_close() may have to be async to properly support windows overlapped IO in the future. You have you cancel pending IOs which IIRC may actually fire before the cancel can be processed. Its been a few years since I toyed with overlapped IO so I can't be sure. Someone with more Windows Overlapped IO experience should chime in. Also, I reckon it would be a smart idea to sit down and code up some very basic Windows overlapped IO examples (like a tcp proxy! :) to properly explore its behaviour before any changes to comm are made. (Thats what I'll be doing before my comm changes in the cacheboy branch, as I really don't want to use libevent/libev's "windows" mode for IO.) Adrian -- - Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid Support - - $25/pm entry-level VPSes w/ capped bandwidth charges available in WA -
Re: client_side and comm_close
fre 2008-04-18 klockan 08:25 -0600 skrev Alex Rousskov: > The changes needed on the client-side are pretty much unrelated to async > calls. Async calls just make the design problems more obvious and help > in solving them. Removing async calls from comm will fix a few bugs, but > will be a step backwards and will make ongoing code cleanup more > difficult. Here is a different proposal. Split async calls in two. Queued and immediate, with the event loops using immediate calls and everything else queued. The siplest way of doing so safely would be to make async calls immediate if not already within an async call, but adjusting the event loops is also a possibility. This would make the code behave a lot more like how it was before async, eleminating most races. The very few cases which may depend on comm_close being immediate should be possible to identify by cbdataInternalLock calls I think as the state object previously may disappear immediately otherwise. I can't see any in the client_side_* code today. But there very likely is assumptions that after comm_close has been called no further events will fire positively on that fd (only as ERR_COMM_CLOSING). I don't remember why comm_close was made async, but the more I think about it I think it's most likely wrong. comm_close SHOULD invalidate any pending comm operations on that fd, anything else just leads to nightmares. Related to this I also think the COMM_ERR_CLOSING callback from comm should be dropped. Nearly all handlers looking for COMM_ERR_CLOSING do nothing else than return immediately to abort execution.. and the few exceptions there is can be dealt with by a close callback if at all needed: errorSendComplete, invalidates the err object ftpStateData::dataRead, sets a boolean that the read is done HttpStateData::handleRequestBodyProducerAborted injects a COMM_ERR_CLOSING HttpStateData::readReply sets a boolean that the read is done ServerStateData::sentRequestBody clears requestSender the other 20 or so users of comm callbacks just returns immediately on COMM_ERR_CLOSING, "aborting" the call.. Regards Henrik
Re: client_side and comm_close
On Fri, Apr 18, 2008, Alex Rousskov wrote: > > I'd suggest another option - roll back all of the async calls changes to the > > comm code, stabilise the codebase without it and re-evaluate what should > > occur (in smaller chunks, rather than dropping in a new comm manager) > > before reintroducing it. > > If the client_side's main problem was a few coding bugs caused by async > call changes, you may have been right. Unfortunately, our main problem > here is the client_side design flaws. Removing async calls from comm > will not undo those flaws. I know this. > The changes needed on the client-side are pretty much unrelated to async > calls. Async calls just make the design problems more obvious and help > in solving them. Removing async calls from comm will fix a few bugs, but > will be a step backwards and will make ongoing code cleanup more > difficult. But now you have a better understanding of the bugs that are introduced with AsyncCalls. You can then plan design changes to the client-side code to work around these. > Another way to look at it is to note that client_side was already pretty > stable before async calls (i.e., Squid 3.0). Thus, we have already been > where you are proposing to go. Yes, then you left it by introducing something which exposed the bugs. Its great that we now know about the bugs - but now you've got a codebase that you're trying to stabilise and you can't guarantee you understand all of the possible ways its going to break in the future. If you back out the async related changes then you'll have a stable codebase again - you can then make smaller incremental changes to the codebase and test them to make sure you haven't broken anything, rather than spending possibly 6 + months finding more surprises. Remember, the initial idea was to use 3.1 to tidy up the codebase to stabilise it, not tidy up the codebase to unstabilise it. Redesigning chunks of the comm codebase -was- on my todo list for 3.1, but then people dumped stuff into 3 to make it less stable again. Adrian -- - Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid Support - - $25/pm entry-level VPSes w/ capped bandwidth charges available in WA -
Re: client_side and comm_close
On Fri, 2008-04-18 at 16:15 +0800, Adrian Chadd wrote: > On Wed, Apr 16, 2008, Alex Rousskov wrote: > > > In short, we have several related problems here: (a) client_side code is > > incapable of reliably identifying whether comm_close has been called; > > (b) ConnStateData::isOpen may not work anymore; (c) client_side code > > uses many different ways to identify whether more data should be read > > from the connection; (d) comm_close is used a lot but no longer has an > > immediate effect and some client_side code may still depend on that > > effect to be immediate; (e) client_side comm handlers decent very deep, > > making it difficult to propagate errors and comm_close status up. > > > > We should decide whether to continue patching holes here and there or > > clean up the client_side*cc connection management mess for good. Should > > we continue to patch isolated v3.0 problems and cleanup v3.1? Or is this > > a v3.2 project? Or am I exaggerating the problems since common cases > > usually work fine? > > I'd suggest another option - roll back all of the async calls changes to the > comm code, stabilise the codebase without it and re-evaluate what should > occur (in smaller chunks, rather than dropping in a new comm manager) > before reintroducing it. If the client_side's main problem was a few coding bugs caused by async call changes, you may have been right. Unfortunately, our main problem here is the client_side design flaws. Removing async calls from comm will not undo those flaws. The changes needed on the client-side are pretty much unrelated to async calls. Async calls just make the design problems more obvious and help in solving them. Removing async calls from comm will fix a few bugs, but will be a step backwards and will make ongoing code cleanup more difficult. Another way to look at it is to note that client_side was already pretty stable before async calls (i.e., Squid 3.0). Thus, we have already been where you are proposing to go. Alex.
Re: client_side and comm_close
On Fri, 2008-04-18 at 18:34 +1200, Amos Jeffries wrote: > Alex Rousskov wrote: > > On Fri, 2008-04-18 at 15:40 +1200, Amos Jeffries wrote: > > > >> IMO, This should be queued as part of the 3.1 cleanups, but not as > >> urgently as some other issues. > >> > >> The priorities as I see them now are: > >> - immediately fixable and urgent bugs. > >> - completing the dedicated 3.1 feature list (DEVEL is overdue). > >> - cleanups for fixing more intrusive or design bugs (such as this, and > >> the 407 one). > >> - shakedown for release cycle. > >> > >> We have a few months of shakedown before anything like a good stable. > >> Plenty of time for cleanup not blocking new features. > > > > I estimate that properly cleaning up the client side would delay v3.1 > > stable release by 2-3 months unless we suddenly get more experts > > involved. Is that delay acceptable to you? Others? > > If we were in a position to start it now, maybe. But we are not I think. You are probably right. FWIW, I will not be able to work on that until eCAP is done. > I'm vote is for the timeline above, leaving step-3 as optional, the > descision whether to even start it until step 2 is fully completed. > We may later decide that it is worthwhile or vital, but I don't think > anyone should start in on more major sweeping until we know what all the > fallout from the currently underway set is going to be. In the unlikely event somebody with enough client-side knowledge can start on this now, I would probably support it (and help with the design). Otherwise, I agree that your plan is the best option available. Thank you, Alex.
Re: client_side and comm_close
On Wed, Apr 16, 2008, Alex Rousskov wrote: > In short, we have several related problems here: (a) client_side code is > incapable of reliably identifying whether comm_close has been called; > (b) ConnStateData::isOpen may not work anymore; (c) client_side code > uses many different ways to identify whether more data should be read > from the connection; (d) comm_close is used a lot but no longer has an > immediate effect and some client_side code may still depend on that > effect to be immediate; (e) client_side comm handlers decent very deep, > making it difficult to propagate errors and comm_close status up. > > We should decide whether to continue patching holes here and there or > clean up the client_side*cc connection management mess for good. Should > we continue to patch isolated v3.0 problems and cleanup v3.1? Or is this > a v3.2 project? Or am I exaggerating the problems since common cases > usually work fine? I'd suggest another option - roll back all of the async calls changes to the comm code, stabilise the codebase without it and re-evaluate what should occur (in smaller chunks, rather than dropping in a new comm manager) before reintroducing it. I think we've seen that there's still a gap in how development branches are tested and that the best way we have to test code is to put it in production :) Adrian -- - Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid Support - - $25/pm entry-level VPSes w/ capped bandwidth charges available in WA -
Re: client_side and comm_close
Alex Rousskov wrote: On Fri, 2008-04-18 at 15:40 +1200, Amos Jeffries wrote: IMO, This should be queued as part of the 3.1 cleanups, but not as urgently as some other issues. The priorities as I see them now are: - immediately fixable and urgent bugs. - completing the dedicated 3.1 feature list (DEVEL is overdue). - cleanups for fixing more intrusive or design bugs (such as this, and the 407 one). - shakedown for release cycle. We have a few months of shakedown before anything like a good stable. Plenty of time for cleanup not blocking new features. I estimate that properly cleaning up the client side would delay v3.1 stable release by 2-3 months unless we suddenly get more experts involved. Is that delay acceptable to you? Others? If we were in a position to start it now, maybe. But we are not I think. I'm vote is for the timeline above, leaving step-3 as optional, the descision whether to even start it until step 2 is fully completed. We may later decide that it is worthwhile or vital, but I don't think anyone should start in on more major sweeping until we know what all the fallout from the currently underway set is going to be. Amos -- Please use Squid 2.6.STABLE19 or 3.0.STABLE4
Re: client_side and comm_close
On Fri, 2008-04-18 at 15:40 +1200, Amos Jeffries wrote: > IMO, This should be queued as part of the 3.1 cleanups, but not as > urgently as some other issues. > > The priorities as I see them now are: > - immediately fixable and urgent bugs. > - completing the dedicated 3.1 feature list (DEVEL is overdue). > - cleanups for fixing more intrusive or design bugs (such as this, and > the 407 one). > - shakedown for release cycle. > > We have a few months of shakedown before anything like a good stable. > Plenty of time for cleanup not blocking new features. I estimate that properly cleaning up the client side would delay v3.1 stable release by 2-3 months unless we suddenly get more experts involved. Is that delay acceptable to you? Others? Thank you, Alex.
Re: client_side and comm_close
Alex Rousskov wrote: On Thu, 2008-04-17 at 20:13 +0300, Tsantilas Christos wrote: Alex Rousskov wrote: In reply to comment #8 for bug #2309 comm_read assertion failed: "ccb->active == false" http://www.squid-cache.org/bugs/show_bug.cgi?id=2309 Note to the one looking into this bug: The assert happens if a comm_read() is called while there already is a comm_read() pending. Yes. I would suspect the half closed client check has been wrongly adopted to call comm_read where it used commSetSelect periodically before.. d The above may be true, but there are other conditions causing the same assertion in my tests, with and without the pending comm_close patch from Christos. As I can understand the problem does not exist only in squid3.0 or only in squid3-trunk. The main problem with the client_side code is that the comm_read done by the ConnStateData class, the comm_writes by the ClientSocketContext class and comm_close (and other comm operations ) by all. This makes very difficult to manage the comm related problems in client_side code Yes. If we decide to fix this, there will be a single object responsible for client connection management. Others will simply use its services as needed. Comm cannot be such an object because it is too low-level. Currently, the functionality is spread among many classes. We should decide whether to continue patching holes here and there or clean up the client_side*cc connection management mess for good. Should we continue to patch isolated v3.0 problems and cleanup v3.1? Or is this a v3.2 project? Or am I exaggerating the problems since common cases usually work fine? The question remains. IMO, This should be queued as part of the 3.1 cleanups, but not as urgently as some other issues. The priorities as I see them now are: - immediately fixable and urgent bugs. - completing the dedicated 3.1 feature list (DEVEL is overdue). - cleanups for fixing more intrusive or design bugs (such as this, and the 407 one). - shakedown for release cycle. We have a few months of shakedown before anything like a good stable. Plenty of time for cleanup not blocking new features. Amos -- Please use Squid 2.6.STABLE19 or 3.0.STABLE4
Re: client_side and comm_close
On Thu, 2008-04-17 at 20:13 +0300, Tsantilas Christos wrote: > Alex Rousskov wrote: > > In reply to comment #8 for bug #2309 > > comm_read assertion failed: "ccb->active == false" > > http://www.squid-cache.org/bugs/show_bug.cgi?id=2309 > > > >> Note to the one looking into this bug: The assert happens if a > >> comm_read() is called while there already is a comm_read() pending. > > > > Yes. > > > >> I would suspect the half closed client check has been wrongly > >> adopted to call comm_read where it used commSetSelect periodically > >> before.. > > d > > The above may be true, but there are other conditions causing the same > > assertion in my tests, with and without the pending comm_close patch > > from Christos. > > > > As I can understand the problem does not exist only in squid3.0 or only > in squid3-trunk. > > The main problem with the client_side code is that the comm_read done by > the ConnStateData class, the comm_writes by the ClientSocketContext > class and comm_close (and other comm operations ) by all. This makes > very difficult to manage the comm related problems in client_side code Yes. If we decide to fix this, there will be a single object responsible for client connection management. Others will simply use its services as needed. Comm cannot be such an object because it is too low-level. Currently, the functionality is spread among many classes. > > We should decide whether to continue patching holes here and there or > > clean up the client_side*cc connection management mess for good. Should > > we continue to patch isolated v3.0 problems and cleanup v3.1? Or is this > > a v3.2 project? Or am I exaggerating the problems since common cases > > usually work fine? The question remains. Thank you, Alex.
Re: client_side and comm_close
Hi all, Alex Rousskov wrote: > In reply to comment #8 for bug #2309 > comm_read assertion failed: "ccb->active == false" > http://www.squid-cache.org/bugs/show_bug.cgi?id=2309 > >> Note to the one looking into this bug: The assert happens if a >> comm_read() is called while there already is a comm_read() pending. > > Yes. > >> I would suspect the half closed client check has been wrongly >> adopted to call comm_read where it used commSetSelect periodically >> before.. > d > The above may be true, but there are other conditions causing the same > assertion in my tests, with and without the pending comm_close patch > from Christos. > As I can understand the problem does not exist only in squid3.0 or only in squid3-trunk. The main problem with the client_side code is that the comm_read done by the ConnStateData class, the comm_writes by the ClientSocketContext class and comm_close (and other comm operations ) by all. This makes very difficult to manage the comm related problems in client_side code > In short, we have several related problems here: (a) client_side code is > incapable of reliably identifying whether comm_close has been called; > (b) ConnStateData::isOpen may not work anymore; (c) client_side code > uses many different ways to identify whether more data should be read > from the connection; (d) comm_close is used a lot but no longer has an > immediate effect and some client_side code may still depend on that > effect to be immediate; (e) client_side comm handlers decent very deep, > making it difficult to propagate errors and comm_close status up. > > We should decide whether to continue patching holes here and there or > clean up the client_side*cc connection management mess for good. Should > we continue to patch isolated v3.0 problems and cleanup v3.1? Or is this > a v3.2 project? Or am I exaggerating the problems since common cases > usually work fine? > > Thank you, > > Alex. > > >
client_side and comm_close
In reply to comment #8 for bug #2309 comm_read assertion failed: "ccb->active == false" http://www.squid-cache.org/bugs/show_bug.cgi?id=2309 > Note to the one looking into this bug: The assert happens if a > comm_read() is called while there already is a comm_read() pending. Yes. > I would suspect the half closed client check has been wrongly > adopted to call comm_read where it used commSetSelect periodically > before.. The above may be true, but there are other conditions causing the same assertion in my tests, with and without the pending comm_close patch from Christos. In short, we have several related problems here: (a) client_side code is incapable of reliably identifying whether comm_close has been called; (b) ConnStateData::isOpen may not work anymore; (c) client_side code uses many different ways to identify whether more data should be read from the connection; (d) comm_close is used a lot but no longer has an immediate effect and some client_side code may still depend on that effect to be immediate; (e) client_side comm handlers decent very deep, making it difficult to propagate errors and comm_close status up. We should decide whether to continue patching holes here and there or clean up the client_side*cc connection management mess for good. Should we continue to patch isolated v3.0 problems and cleanup v3.1? Or is this a v3.2 project? Or am I exaggerating the problems since common cases usually work fine? Thank you, Alex.