Re: client_side and comm_close

2008-09-04 Thread Alex Rousskov
> On Sun, 20 Apr 2008 22:44:22 -0600, Alex Rousskov wrote:
> 
> > What if we let the close callback to determine how it should be called?
> > Old code gets an "immediate" close callback (the default in v3.1). The
> > code that was written to handle async calls uses asynchronous callbacks.
> > This approach may allow for a smoother migration to 100% asynchronous
> > calls without introducing new complex inter-call dependencies. I believe
> > this approach is also easy to implement without changing much of the
> > existing code (old or new).
> 
> Thats fine for me. In fact a very good migration plan.

I have implemented something very similar to the above and,
unfortunately, it does not work well. The plan would have worked
beautifully if all comm_close users (old and new) were independent from
each other. In real world, there are some natural dependencies.

What happens is that an asynchronous write handler call for "new" code
is scheduled first but may be dialed _after_ the synchronous close
handler call for the "old" code. When "old" and "new" code are managing
related state (e.g., client-side HTTP and REQMOD transactions), the
out-of-order execution of callbacks creates problems.

The good news is that I think I found another simple workaround, thanks
to Christos' earlier work on bug #2253. Instead of mixing sync and async
calls, Comm now makes sure that _queued_ Comm callbacks that were
scheduled before comm_close get their flags reset to COMM_ERR_CLOSING
before being dialed.

The above hack preserves callback order and prevents situations where
Comm user receiving a successful I/O callback starts modifying
Comm-related state and gets an assertion because the descriptor is
already in a "closing" state.

Even if this hack solves a few current problems, it is only a temporary
solution because modifying queued callbacks is ugly, not possible with
threads, and should not be necessary in a well-written code. The Comm
notes I posted earlier today document the plans to remove the
COMM_ERR_CLOSING flag and, eventually, this hack.

It is possible that ongoing testing will reveal problems with this
approach and we will have to go back to the drawing board, of course.

Stay tuned,

Alex.




Re: client_side and comm_close

2008-04-21 Thread Alex Rousskov
On Mon, 2008-04-21 at 23:45 +1200, Amos Jeffries wrote:
> >> IMO, The proposed API _very_ first two lines of code in comm_close are to
> >> register a special Comm callback to perform the fclose() call, and then to
> >> immediately set fd_table flag closed for the rest of the comm_close
> >> process.
> > 
> > Agreed on the flag, disagreed on the call. The special/internal Comm
> > call (to self) should be scheduled last (after all close callbacks) and
> > not first because the close handler might need access to some FD-related
> > info. That info should be preserved until all close handlers have been
> > called.
> > 
> >> With that condition at the start we can guarantee that any registrations
> >> made during the close sequence are either non-FD-relevant or caught.
> > 
> > Yes, the flag is sufficient for that. The internal "close for good" call
> > can still be last.
> 
> I was thinking the close-for-good call would get caught as an fd 
> operation on closed fd by the Async stuff if set after the flag.

Actually, the internal close-for-good call handler asserts that the flag
is set! This internal handler does not have the structure or the
restrictions of the public comm_* functions...

> >> The special Comm callback is only special in that it is not required to
> >> check flags open before fclose()'ing the system-level FD, which will allow
> >> new connections to open on the FD.
> > 
> > It is special because it is calling an internal comm method not some
> > external close handler. Its profile and parameters are different.  I
> > would not call it a callback because of that, but the label is not
> > important. It is not a close callback in terms of
> > comm_add_close_handler.
> 
> Yet it seems to me it needs to be run as an async after the other async 
> close-handlers are run. Due to the non-determinence of the async timing.

Yes, the it must called asynchronously. The call is scheduled after all
close callbacks are scheduled so that it will be dialed last.

Alex.




Re: client_side and comm_close

2008-04-21 Thread Amos Jeffries

Alex Rousskov wrote:

On Mon, 2008-04-21 at 16:02 +1200, Amos Jeffries wrote:

On Sun, 2008-04-20 at 22:01 +0300, Tsantilas Christos wrote:

Maybe it can be easier:

The ease of implementation is a separate question. We still have to
agree what we are implementing, and the items below attempt to define
that. Ambiguity and lack of clear APIs is quite dangerous here (as the
related bugs illustrate!).


Just keeping the comm_close as is and in AsyncCallsQueue just
cancels/not execute asyncCall's for which the fd_table[fd].flags.closing
is set.

Implementing #1 below can indeed be as simple as you sketch above. We
need to make sure that the final call that closes the FD and makes fd
value available to other connections is placed last (that may already be
true and is easy to fix if not).

Not true.

IMO, The proposed API _very_ first two lines of code in comm_close are to
register a special Comm callback to perform the fclose() call, and then to
immediately set fd_table flag closed for the rest of the comm_close
process.


Agreed on the flag, disagreed on the call. The special/internal Comm
call (to self) should be scheduled last (after all close callbacks) and
not first because the close handler might need access to some FD-related
info. That info should be preserved until all close handlers have been
called.


With that condition at the start we can guarantee that any registrations
made during the close sequence are either non-FD-relevant or caught.


Yes, the flag is sufficient for that. The internal "close for good" call
can still be last.


I was thinking the close-for-good call would get caught as an fd 
operation on closed fd by the Async stuff if set after the flag.





The special Comm callback is only special in that it is not required to
check flags open before fclose()'ing the system-level FD, which will allow
new connections to open on the FD.


It is special because it is calling an internal comm method not some
external close handler. Its profile and parameters are different.  I
would not call it a callback because of that, but the label is not
important. It is not a close callback in terms of
comm_add_close_handler.


Yet it seems to me it needs to be run as an async after the other async 
close-handlers are run. Due to the non-determinence of the async timing.



Between the initial comm_close() call and the special Comm callback, we
don't need to care if callbacks write their shutdown statements to the fd
(its still technically open) but the closed flag prevents comm accepting
any new delayed event registrations or reads.


Exactly. Our only problem is with code that calls comm with a stale fd,
after that fd has been really closed and a new connection was opened
with the same fd. That's not a new problem and we will solve it in v3.2
using comm handlers.

I hope the above puts us on the same page about the implementation
sketch for the comm_close API, but please yell if something still seems
broken.


Okay. It looks the way I would implement it from scratch.

Amos
--
Please use Squid 2.6.STABLE19 or 3.0.STABLE4


Re: client_side and comm_close

2008-04-21 Thread Amos Jeffries

Alex Rousskov wrote:

On Mon, 2008-04-21 at 15:48 +1200, Amos Jeffries wrote:

comm_close(fd) API:

1) No I/O callbacks will be dialed after comm_close is called (including
the time when comm_close is running).


Sounds good.


2) All close callbacks registered before comm_close was called will be
called asynchronously, sometime after comm_close exits.


Sound good.


3) The code can ask Comm whether a FD associated with a close callback
has been closed. The answer will be "yes" after comm_close is called and
"no" before that. This interface needs to be added. Direct access to
fd_table[fd].flags will not work because the same FD could have been
assigned to another connection already. The code will have to use its
close callback to determine FD status.

Sounds good, BUT, direct access to fd_table pieces may need to be blocked
entirely (private:) so code is forced to go through the Comm API properly.


Yes, except if we want to avoid modifying old code that can still access
those flags directly because it gets immediate close callbacks (more on
that below).


(2) states that the higher-level close callbacks may be run at any time.
ie after the callback (3) refers to is run. This leaves a window open for
disaster, unless the closing callbacks are made immediate, and back we go
to recursion...


Those are the same close callbacks! There are no low-level and
high-level close callbacks here. The external code can use its stored
callback pointer to get FD status even after that close callback has
been called. There is no problem with that. The callback will not look
at fd_table in that case, it will just say "yes, the fd is closed as far
as you should be concerned".

And, per recent suggestions, old code will get immediate close callbacks
so it does not need to be modified to use the close callback pointer to
ask about FD status.


Alright. I got to this branch of the thread before the other which makes 
things clearer. Same for the below.





4) Registering any callback after comm_close has been called is wrong.
Comm will assert if it can detect such late registrations. Not all late
registrations will be detected because the same FD can be already
assigned to a different (new) connection[*].

That non-detection seems to me to be a worry. The same problem in (3)
occurs here. (4) can guarantee that the closing callbacks don't play nasty
re-registrations. But only if they are called immediate instead of
scheduled.


Sorry, I do not understand. The "late" registration of a close callback
can come from anywhere. The old code relies on fd only. There is no way
for comm to distinguish whether the caller is using a FD from a closed
connection or a new one. This problem exists in the old code as well so
there is no new danger here! This problem can only be solved when we
have comm handlers of one sort or another.


The above comm_close API is easy to implement without massive code
changes. Do you think it is the right API? If not, what should be
changed?

Apart from the worry with immediate vs delayed closing callbacks.

To reduce that worry somewhat I think, the callbacks which actually use
ERR_COMM_CLOSING for anything other than immediate abort will need to be
replaced with two; a normal callback that checks the FD is open, and a
simple closing callback.


I am confused by the "normal callback that checks the FD is open" part.
What is that for?  Are you talking about some internal comm calls to
close the FD? I believe that handler code that currently does not ignore
ERR_COMM_CLOSING notifications will need to be moved into the close
handler code (because only the close handler will be called if we
eliminate ERR_COMM_CLOSING).


Nevermind. I was saying it wrong, but meaning what you have re-stated.

Amos
--
Please use Squid 2.6.STABLE19 or 3.0.STABLE4


Re: client_side and comm_close

2008-04-20 Thread Alex Rousskov
On Mon, 2008-04-21 at 16:02 +1200, Amos Jeffries wrote:
> > On Sun, 2008-04-20 at 22:01 +0300, Tsantilas Christos wrote:
> >> Maybe it can be easier:
> >
> > The ease of implementation is a separate question. We still have to
> > agree what we are implementing, and the items below attempt to define
> > that. Ambiguity and lack of clear APIs is quite dangerous here (as the
> > related bugs illustrate!).
> >
> >> Just keeping the comm_close as is and in AsyncCallsQueue just
> >> cancels/not execute asyncCall's for which the fd_table[fd].flags.closing
> >> is set.
> >
> > Implementing #1 below can indeed be as simple as you sketch above. We
> > need to make sure that the final call that closes the FD and makes fd
> > value available to other connections is placed last (that may already be
> > true and is easy to fix if not).
> 
> Not true.
> 
> IMO, The proposed API _very_ first two lines of code in comm_close are to
> register a special Comm callback to perform the fclose() call, and then to
> immediately set fd_table flag closed for the rest of the comm_close
> process.

Agreed on the flag, disagreed on the call. The special/internal Comm
call (to self) should be scheduled last (after all close callbacks) and
not first because the close handler might need access to some FD-related
info. That info should be preserved until all close handlers have been
called.

> With that condition at the start we can guarantee that any registrations
> made during the close sequence are either non-FD-relevant or caught.

Yes, the flag is sufficient for that. The internal "close for good" call
can still be last.

> The special Comm callback is only special in that it is not required to
> check flags open before fclose()'ing the system-level FD, which will allow
> new connections to open on the FD.

It is special because it is calling an internal comm method not some
external close handler. Its profile and parameters are different.  I
would not call it a callback because of that, but the label is not
important. It is not a close callback in terms of
comm_add_close_handler.

> Between the initial comm_close() call and the special Comm callback, we
> don't need to care if callbacks write their shutdown statements to the fd
> (its still technically open) but the closed flag prevents comm accepting
> any new delayed event registrations or reads.

Exactly. Our only problem is with code that calls comm with a stale fd,
after that fd has been really closed and a new connection was opened
with the same fd. That's not a new problem and we will solve it in v3.2
using comm handlers.

I hope the above puts us on the same page about the implementation
sketch for the comm_close API, but please yell if something still seems
broken.

Thank you,

Alex.




Re: client_side and comm_close

2008-04-20 Thread Alex Rousskov
On Mon, 2008-04-21 at 15:48 +1200, Amos Jeffries wrote:
> > comm_close(fd) API:
> >
> > 1) No I/O callbacks will be dialed after comm_close is called (including
> > the time when comm_close is running).
> >
> Sounds good.
> 
> > 2) All close callbacks registered before comm_close was called will be
> > called asynchronously, sometime after comm_close exits.
> >
> 
> Sound good.
> 
> > 3) The code can ask Comm whether a FD associated with a close callback
> > has been closed. The answer will be "yes" after comm_close is called and
> > "no" before that. This interface needs to be added. Direct access to
> > fd_table[fd].flags will not work because the same FD could have been
> > assigned to another connection already. The code will have to use its
> > close callback to determine FD status.
> 
> Sounds good, BUT, direct access to fd_table pieces may need to be blocked
> entirely (private:) so code is forced to go through the Comm API properly.

Yes, except if we want to avoid modifying old code that can still access
those flags directly because it gets immediate close callbacks (more on
that below).

> (2) states that the higher-level close callbacks may be run at any time.
> ie after the callback (3) refers to is run. This leaves a window open for
> disaster, unless the closing callbacks are made immediate, and back we go
> to recursion...

Those are the same close callbacks! There are no low-level and
high-level close callbacks here. The external code can use its stored
callback pointer to get FD status even after that close callback has
been called. There is no problem with that. The callback will not look
at fd_table in that case, it will just say "yes, the fd is closed as far
as you should be concerned".

And, per recent suggestions, old code will get immediate close callbacks
so it does not need to be modified to use the close callback pointer to
ask about FD status.

> > 4) Registering any callback after comm_close has been called is wrong.
> > Comm will assert if it can detect such late registrations. Not all late
> > registrations will be detected because the same FD can be already
> > assigned to a different (new) connection[*].
> 
> That non-detection seems to me to be a worry. The same problem in (3)
> occurs here. (4) can guarantee that the closing callbacks don't play nasty
> re-registrations. But only if they are called immediate instead of
> scheduled.

Sorry, I do not understand. The "late" registration of a close callback
can come from anywhere. The old code relies on fd only. There is no way
for comm to distinguish whether the caller is using a FD from a closed
connection or a new one. This problem exists in the old code as well so
there is no new danger here! This problem can only be solved when we
have comm handlers of one sort or another.

> > The above comm_close API is easy to implement without massive code
> > changes. Do you think it is the right API? If not, what should be
> > changed?
> 
> Apart from the worry with immediate vs delayed closing callbacks.
> 
> To reduce that worry somewhat I think, the callbacks which actually use
> ERR_COMM_CLOSING for anything other than immediate abort will need to be
> replaced with two; a normal callback that checks the FD is open, and a
> simple closing callback.

I am confused by the "normal callback that checks the FD is open" part.
What is that for? Are you talking about some internal comm calls to
close the FD? I believe that handler code that currently does not ignore
ERR_COMM_CLOSING notifications will need to be moved into the close
handler code (because only the close handler will be called if we
eliminate ERR_COMM_CLOSING).

Thanks,

Alex.




Re: client_side and comm_close

2008-04-20 Thread Henrik Nordstrom
sön 2008-04-20 klockan 22:56 -0600 skrev Alex Rousskov:

> - I hope we can avoid similar problems with comm_cancel_read because the
> two or three places using that call can be adjusted to cancel the read
> callback directly. We have already talked about that elsewhere.

It's not only comm_cancel_read but also commSetTimeout.

Regards
Henrik



Re: client_side and comm_close

2008-04-20 Thread Alex Rousskov
On Mon, 2008-04-21 at 01:11 +0200, Henrik Nordstrom wrote:
> sön 2008-04-20 klockan 22:29 +0300 skrev Tsantilas Christos:
> > I am not saying that all are OK with the comm_close. Looks that there is 
> > a problem with this function, and this is the second problem
> > 
> > This why I am saying that there are two different problems here
> 
> There is actually three distinct problems discussed...
> 
> a) client_side* is a big tangled mess beyond any repair, broken by
> design and patched around too many times, making it very fragile each
> time touched.
> 
> b) comm is unable to cancel pending calls from comm_ calls using
> callback pointers (callers not using AsyncCall) when asked to (either
> explicit such as comm_read_cancel, or implicit by comm_close). Basically
> an AsyncCall migration API issue with no caller/receiver boundary for
> the AsyncCall. I haven't looked into how things works out then the
> caller of comm_* givest the AsyncCall object, but at least then it's
> possible for that comm_* caller to cancel the call..
> 
> c) comm_close now async, while some of the code really expects it to be
> synchronous allowing for immediate teardown of everything related..
> 
> The first is just what it is. A big ugly mess of code which been around
> for too long. Somewhat cleaned up in Squid-3, but still battling with
> the fundamental design issues of the environment where the code
> operates...
> 
> The second is a plain bug imho. The open question is if the bug is to be
> fixed, or if it can be eleminated by converting everything over to
> AsyncCall...
> 
> The last is a reality in the current code, but probably not a desired
> property of the code..

Agreed. It looks like we are zeroing in on v3.1 solution for (b) and, in
part, (c):

- We are almost done defining comm_close API. We may be done if you
agree with my suggestion to make the close callback decide how it should
be called (with the old code-friendly default).

- We already know how to cancel pending I/O calls after comm_close.

- I hope we can avoid similar problems with comm_cancel_read because the
two or three places using that call can be adjusted to cancel the read
callback directly. We have already talked about that elsewhere.


Thank you,

Alex.




Re: client_side and comm_close

2008-04-20 Thread Alex Rousskov
On Mon, 2008-04-21 at 00:54 +0200, Henrik Nordstrom wrote:
> sön 2008-04-20 klockan 11:21 -0600 skrev Alex Rousskov:
> > I agree regarding the first initiation point, but disagree that it is
> > sufficient to guarantee non-recursive behavior: In addition to
> > callbacks, we have direct function calls to comm and other APIs. The
> > combination of immediate callbacks and those function calls will cause
> > "recursion" as control will reenter the jobs multiple times. For
> > example:
> > 
> > event loop -> immediate callback -> handler -> comm_* call -> immediate
> > callback -> handler -> ...
> 
> No, there is already an AsyncCall in progress so the call from comm_*
> will always be async.

Ah, I think I see what you mean now. I would very much like to avoid or
at least minimize this mixture of asynchronous and immediate calls,
especially if one call is going to determine the other call "timing".
Let's see what we really need once comm_close API is finalized.

> > comm_close(fd) API:
> > 
> > 1) No I/O callbacks will be dialed after comm_close is called (including
> > the time when comm_close is running).
> 
> Yes.
> 
> > 2) All close callbacks registered before comm_close was called will be
> > called asynchronously, sometime after comm_close exits.
> 
> Yes.. or... no. See below.
> 
> > 3) The code can ask Comm whether a FD associated with a close callback
> > has been closed. The answer will be "yes" after comm_close is called and
> > "no" before that. This interface needs to be added. Direct access to
> > fd_table[fd].flags will not work because the same FD could have been
> > assigned to another connection already. The code will have to use its
> > close callback to determine FD status.
> 
> Right. There is a trouble here in that we only have the fd, not a "Comm
> handle". So we need to defer the actual closing of the fd until all
> close handler have executed, but make comm behave as if it was closed.
> 
> Or the comm_close callbacks should most likely be synchronous to let
> each user of the fd know the fd is now going away and that they MUST NOT
> take any other actions on that fd.
> 
> The second is more in line with what the current code expects,

Yes, we are on the same page here and #4 below.

> and what should be done for 3.1 I think.

What if we let the close callback to determine how it should be called?
Old code gets an "immediate" close callback (the default in v3.1). The
code that was written to handle async calls uses asynchronous callbacks.
This approach may allow for a smoother migration to 100% asynchronous
calls without introducing new complex inter-call dependencies. I believe
this approach is also easy to implement without changing much of the
existing code (old or new).

> For 3.2 we may rethink the comm layer, eleminating this.

Thank you,

Alex.

> > 4) Registering any callback after comm_close has been called is wrong.
> 
> yes. The fd is semantically gone at that point.
> 
> > [*] We can reliably detect #4 violations by replacing "fd" integers
> > outside of comm with a unique integer ID or, better, a handler object
> > containing metadata. That kind of change may be too big for v3.1 though.
> 
> Yes. Long term we should have a "Comm handle" separate from the fd, with
> the lowlevel OS fd only known to comm.
> 
> And additionally it should be possible to layer things on that which
> further abstracts I/O from the fd, such as ssl.. but that's another
> topic..




Re: client_side and comm_close

2008-04-20 Thread Alex Rousskov
On Sun, 2008-04-20 at 22:58 +0300, Tsantilas Christos wrote:
> Alex Rousskov wrote:
> > On Sun, 2008-04-20 at 22:01 +0300, Tsantilas Christos wrote:
> >> Maybe it can be easier:
> > 
> > The ease of implementation is a separate question. We still have to
> > agree what we are implementing, and the items below attempt to define
> > that. Ambiguity and lack of clear APIs is quite dangerous here (as the
> > related bugs illustrate!).
> 
> In practice I am proposing to keep the current design but just do not 
> allow any AsyncCall executed after the comm_close called (now we are 
> allowing it).

One of the problems is that the "current design" is partially undefined
and, hence, is being interpreted or broken differently by different
code. That's why I want to polish and document the API while we are
fixing related problems.

I know you prefer to talk about specific "code" but "keeping the current
design" does not mean much to me in this context, unfortunately (for the
reasons mentioned above).

> In your design just do not close the socket with the comm_close but 
> schedules a call to close it, this will not allow the fd used by a new 
> connection.

Actually, I do not think the proposed rules mentioned closing the
socket! Who cares about such a minor internal detail :-?

The FD must be closed sometime after the last close callback returns, I
guess. I will add that to the comm_close rules we are hashing out in
this thread (I want to refresh my memory on overlapping I/Os per Adrian
suggestion first).

Thank you,

Alex.




Re: client_side and comm_close

2008-04-20 Thread Amos Jeffries
> On Sun, 2008-04-20 at 22:01 +0300, Tsantilas Christos wrote:
>> Maybe it can be easier:
>
> The ease of implementation is a separate question. We still have to
> agree what we are implementing, and the items below attempt to define
> that. Ambiguity and lack of clear APIs is quite dangerous here (as the
> related bugs illustrate!).
>
>> Just keeping the comm_close as is and in AsyncCallsQueue just
>> cancels/not execute asyncCall's for which the fd_table[fd].flags.closing
>> is set.
>
> Implementing #1 below can indeed be as simple as you sketch above. We
> need to make sure that the final call that closes the FD and makes fd
> value available to other connections is placed last (that may already be
> true and is easy to fix if not).

Not true.

IMO, The proposed API _very_ first two lines of code in comm_close are to
register a special Comm callback to perform the fclose() call, and then to
immediately set fd_table flag closed for the rest of the comm_close
process.

With that condition at the start we can guarantee that any registrations
made during the close sequence are either non-FD-relevant or caught.

The special Comm callback is only special in that it is not required to
check flags open before fclose()'ing the system-level FD, which will allow
new connections to open on the FD.

Between the initial comm_close() call and the special Comm callback, we
don't need to care if callbacks write their shutdown statements to the fd
(its still technically open) but the closed flag prevents comm accepting
any new delayed event registrations or reads.


> We also need to make sure that the
> user-level code does not access fd_table based on a possibly stale fd
> value. That would require more work, but probably still nothing major.
>
>> We only need a way to identify comm_close handlers and the asyncCall
>> which will do the actual close of the socket.
>
> I am not sure what you mean, but the general-purpose AsyncCallQueue will
> not do these comm-specific checks. We already have custom comm callbacks
> and a general "Should This Call be Dialed?" interface. The callbacks
> will check the flags and will not dial if they should not. Please
> clarify if I am talking about a different problem here.
>
> Thank you,
>
> Alex.
>
>> Alex Rousskov wrote:
>> > comm_close(fd) API:
>> >
>> > 1) No I/O callbacks will be dialed after comm_close is called
>> (including
>> > the time when comm_close is running).
>> >
>> > 2) All close callbacks registered before comm_close was called will be
>> > called asynchronously, sometime after comm_close exits.
>> >
>> > 3) The code can ask Comm whether a FD associated with a close callback
>> > has been closed. The answer will be "yes" after comm_close is called
>> and
>> > "no" before that. This interface needs to be added. Direct access to
>> > fd_table[fd].flags will not work because the same FD could have been
>> > assigned to another connection already. The code will have to use its
>> > close callback to determine FD status.
>> >
>> > 4) Registering any callback after comm_close has been called is wrong.
>> > Comm will assert if it can detect such late registrations. Not all
>> late
>> > registrations will be detected because the same FD can be already
>> > assigned to a different (new) connection[*].
>> >
>> > The above comm_close API is easy to implement without massive code
>> > changes. Do you think it is the right API? If not, what should be
>> > changed?
>> >
>> > [*] We can reliably detect #4 violations by replacing "fd" integers
>> > outside of comm with a unique integer ID or, better, a handler object
>> > containing metadata. That kind of change may be too big for v3.1
>> though.
>
>
>




Re: client_side and comm_close

2008-04-20 Thread Amos Jeffries
> On Sun, 2008-04-20 at 09:22 +0200, Henrik Nordstrom wrote:
>> lör 2008-04-19 klockan 23:05 -0600 skrev Alex Rousskov:
>> > It is easy to implement two kinds of call scheduling, but I am not
>> sure
>> > I understand the merits of doing so. What do you mean by the "event
>> > loops" in this context? Are you talking about eventAdd() queue? Or I/O
>> > event (i.e., comm) callbacks? I am guessing it is the latter.
>>
>> Both. The event checks (I/O, timed events etc) run from the main loop
>> and are always the first AsyncCall callers for anything going on.
>>
>> > The immediate calls from comm were not really "safe" (because of the
>> > recursive/reentrant behavior) so I am not sure what we would achieve
>> > here.
>>
>> As everything going on in Squid is initiated by the event loop you don't
>> get recursion from this change..
>>
>> main loop -> event loop -> immediate async call -> handler -> other I/O
>> actions etc but from here always async..
>
> I agree regarding the first initiation point, but disagree that it is
> sufficient to guarantee non-recursive behavior: In addition to
> callbacks, we have direct function calls to comm and other APIs. The
> combination of immediate callbacks and those function calls will cause
> "recursion" as control will reenter the jobs multiple times. For
> example:
>
> event loop -> immediate callback -> handler -> comm_* call -> immediate
> callback -> handler -> ...
>
>> > Please clarify which of the following comm_close implementation would
>> > solve the problems you want to solve in this context:
>> >
>> > A) Immediately call all not-yet-dialed callbacks for the affected FD.
>> > All relevant I/O callbacks are called with ERR_COMM_CLOSING flag set
>> (if
>> > we still have that). The closing callbacks, if any, are called last.
>> >
>> > B) Asynchronously call all not-yet-dialed callbacks for the affected
>> FD.
>> > All relevant I/O callbacks are called with ERR_COMM_CLOSING flag set
>> (if
>> > we still have that). The closing callbacks, if any, are called last.
>> >
>> > C) Other.
>>
>> Don't call them. See ERR_COMM_CLOSING discussion.
>>
>> The I/O calls are no longer relevant at the moment the fd has been
>> semantically closed (comm_close called by the code). Only the close
>> callbacks matter at that time.
>>
>> Note: The close callbacks as such need to be async, just not the closing
>> of the fd (or at least at a semantical level).
>
> OK, I think you picked option B then :-). Let me polish it to better
> match the current level of understanding and the "dropping
> ERR_COMM_CLOSING" agreement:
>
> comm_close(fd) API:
>
> 1) No I/O callbacks will be dialed after comm_close is called (including
> the time when comm_close is running).
>
Sounds good.

> 2) All close callbacks registered before comm_close was called will be
> called asynchronously, sometime after comm_close exits.
>

Sound good.

> 3) The code can ask Comm whether a FD associated with a close callback
> has been closed. The answer will be "yes" after comm_close is called and
> "no" before that. This interface needs to be added. Direct access to
> fd_table[fd].flags will not work because the same FD could have been
> assigned to another connection already. The code will have to use its
> close callback to determine FD status.

Sounds good, BUT, direct access to fd_table pieces may need to be blocked
entirely (private:) so code is forced to go through the Comm API properly.

(2) states that the higher-level close callbacks may be run at any time.
ie after the callback (3) refers to is run. This leaves a window open for
disaster, unless the closing callbacks are made immediate, and back we go
to recursion...

> 4) Registering any callback after comm_close has been called is wrong.
> Comm will assert if it can detect such late registrations. Not all late
> registrations will be detected because the same FD can be already
> assigned to a different (new) connection[*].

That non-detection seems to me to be a worry. The same problem in (3)
occurs here. (4) can guarantee that the closing callbacks don't play nasty
re-registrations. But only if they are called immediate instead of
scheduled.

>
> The above comm_close API is easy to implement without massive code
> changes. Do you think it is the right API? If not, what should be
> changed?

Apart from the worry with immediate vs delayed closing callbacks.

To reduce that worry somewhat I think, the callbacks which actually use
ERR_COMM_CLOSING for anything other than immediate abort will need to be
replaced with two; a normal callback that checks the FD is open, and a
simple closing callback.

The rest can be left to StackableIO/abstract-handles cleanup. I have some
design ideas there, but thats another topic.

>
> [*] We can reliably detect #4 violations by replacing "fd" integers
> outside of comm with a unique integer ID or, better, a handler object
> containing metadata. That kind of change may be too big for v3.1 though.


Amos



Re: client_side and comm_close

2008-04-20 Thread Amos Jeffries
> sön 2008-04-20 klockan 00:38 -0600 skrev Alex Rousskov:
>
>> This thread is (was) dedicated to the discussion whether _adjusting_
>> v3.1 roadmap to add client-side cleanup is a good idea.
>
> And I say no on that. The current code works kind of, and the small
> problems we have can be patched up for 3.1.STABLE.
>
> What is needed for the server component (client_side*) is a major
> cleanup of the server code + store + how it interacts with forwarding.
> It's a faily large change and beyond what we should aim for in 3.1. 3.2
> is more likely, maybe 3.3. Quite likely it needs a complete rebuild of
> the client_side* code (and rename to http_server), only reusing the
> parts that makes sense. It's not really that difficult job that code
> needs to do, but the way it's currently being done makes a big mess of
> it.
>
> client_side interactions with comm_close is just a tiny tiny part of
> that, and looking into that before we have defined how comm_close really
> should behave isn't meaningful imho.
>
> There is already sufficient amount of large changes in 3.2. Lets try not
> to make it a significantly bigger step in the evolution of Squid. Focus
> for 3.1 should now be to prune out bugs and look into what we need to do
> for 3.2.

+1. In case I haven't voted yet.

Amos




Re: client_side and comm_close

2008-04-20 Thread Henrik Nordstrom
sön 2008-04-20 klockan 22:29 +0300 skrev Tsantilas Christos:
> I am not saying that all are OK with the comm_close. Looks that there is 
> a problem with this function, and this is the second problem
> 
> This why I am saying that there are two different problems here

There is actually three distinct problems discussed...

a) client_side* is a big tangled mess beyond any repair, broken by
design and patched around too many times, making it very fragile each
time touched.

b) comm is unable to cancel pending calls from comm_ calls using
callback pointers (callers not using AsyncCall) when asked to (either
explicit such as comm_read_cancel, or implicit by comm_close). Basically
an AsyncCall migration API issue with no caller/receiver boundary for
the AsyncCall. I haven't looked into how things works out then the
caller of comm_* givest the AsyncCall object, but at least then it's
possible for that comm_* caller to cancel the call..

c) comm_close now async, while some of the code really expects it to be
synchronous allowing for immediate teardown of everything related..

The first is just what it is. A big ugly mess of code which been around
for too long. Somewhat cleaned up in Squid-3, but still battling with
the fundamental design issues of the environment where the code
operates...

The second is a plain bug imho. The open question is if the bug is to be
fixed, or if it can be eleminated by converting everything over to
AsyncCall...

The last is a reality in the current code, but probably not a desired
property of the code..

Regards
Henrik



Re: client_side and comm_close

2008-04-20 Thread Henrik Nordstrom
sön 2008-04-20 klockan 22:19 +0300 skrev Christos Tsantilas:

> The important files are the CommCalls.* AsyncCalls.* and ICAP/AsyncJob.*
> But I believe that you are understand it enough. 

I think I understand most of the code now, but not entirely 100% sure on
the design and intended use model yet..

Basic principle is simple and neat.

Regards
Henrik



Re: client_side and comm_close

2008-04-20 Thread Henrik Nordstrom
sön 2008-04-20 klockan 11:21 -0600 skrev Alex Rousskov:
> I agree regarding the first initiation point, but disagree that it is
> sufficient to guarantee non-recursive behavior: In addition to
> callbacks, we have direct function calls to comm and other APIs. The
> combination of immediate callbacks and those function calls will cause
> "recursion" as control will reenter the jobs multiple times. For
> example:
> 
> event loop -> immediate callback -> handler -> comm_* call -> immediate
> callback -> handler -> ...

No, there is already an AsyncCall in progress so the call from comm_*
will always be async.

> comm_close(fd) API:
> 
> 1) No I/O callbacks will be dialed after comm_close is called (including
> the time when comm_close is running).

Yes.

> 2) All close callbacks registered before comm_close was called will be
> called asynchronously, sometime after comm_close exits.

Yes.. or... no. See below.

> 3) The code can ask Comm whether a FD associated with a close callback
> has been closed. The answer will be "yes" after comm_close is called and
> "no" before that. This interface needs to be added. Direct access to
> fd_table[fd].flags will not work because the same FD could have been
> assigned to another connection already. The code will have to use its
> close callback to determine FD status.

Right. There is a trouble here in that we only have the fd, not a "Comm
handle". So we need to defer the actual closing of the fd until all
close handler have executed, but make comm behave as if it was closed.

Or the comm_close callbacks should most likely be synchronous to let
each user of the fd know the fd is now going away and that they MUST NOT
take any other actions on that fd.

The second is more in line with what the current code expects, and what
should be done for 3.1 I think.

For 3.2 we may rethink the comm layer, eleminating this.

> 4) Registering any callback after comm_close has been called is wrong.

yes. The fd is semantically gone at that point.

> [*] We can reliably detect #4 violations by replacing "fd" integers
> outside of comm with a unique integer ID or, better, a handler object
> containing metadata. That kind of change may be too big for v3.1 though.

Yes. Long term we should have a "Comm handle" separate from the fd, with
the lowlevel OS fd only known to comm.

And additionally it should be possible to layer things on that which
further abstracts I/O from the fd, such as ssl.. but that's another
topic..

Regards
Henrik



Re: client_side and comm_close

2008-04-20 Thread Tsantilas Christos

Alex Rousskov wrote:

On Sun, 2008-04-20 at 22:01 +0300, Tsantilas Christos wrote:

Maybe it can be easier:


The ease of implementation is a separate question. We still have to
agree what we are implementing, and the items below attempt to define
that. Ambiguity and lack of clear APIs is quite dangerous here (as the
related bugs illustrate!).



In practice I am proposing to keep the current design but just do not 
allow any AsyncCall executed after the comm_close called (now we are 
allowing it).


In your design just do not close the socket with the comm_close but 
schedules a call to close it, this will not allow the fd used by a new 
connection.





We only need a way to identify comm_close handlers and the asyncCall 
which will do the actual close of the socket.


I am not sure what you mean, but the general-purpose AsyncCallQueue will
not do these comm-specific checks. We already have custom comm callbacks
and a general "Should This Call be Dialed?" interface. The callbacks
will check the flags and will not dial if they should not. Please
clarify if I am talking about a different problem here.


No you are talking for the same problem but you are giving a better 
solution :-)





Thank you,

Alex.

Alex Rousskov wrote: 

comm_close(fd) API:

1) No I/O callbacks will be dialed after comm_close is called (including
the time when comm_close is running).

2) All close callbacks registered before comm_close was called will be
called asynchronously, sometime after comm_close exits.

3) The code can ask Comm whether a FD associated with a close callback
has been closed. The answer will be "yes" after comm_close is called and
"no" before that. This interface needs to be added. Direct access to
fd_table[fd].flags will not work because the same FD could have been
assigned to another connection already. The code will have to use its
close callback to determine FD status.

4) Registering any callback after comm_close has been called is wrong.
Comm will assert if it can detect such late registrations. Not all late
registrations will be detected because the same FD can be already
assigned to a different (new) connection[*].

The above comm_close API is easy to implement without massive code
changes. Do you think it is the right API? If not, what should be
changed?

[*] We can reliably detect #4 violations by replacing "fd" integers
outside of comm with a unique integer ID or, better, a handler object
containing metadata. That kind of change may be too big for v3.1 though.








Re: client_side and comm_close

2008-04-20 Thread Alex Rousskov
On Sun, 2008-04-20 at 22:01 +0300, Tsantilas Christos wrote:
> Maybe it can be easier:

The ease of implementation is a separate question. We still have to
agree what we are implementing, and the items below attempt to define
that. Ambiguity and lack of clear APIs is quite dangerous here (as the
related bugs illustrate!).

> Just keeping the comm_close as is and in AsyncCallsQueue just 
> cancels/not execute asyncCall's for which the fd_table[fd].flags.closing 
> is set.

Implementing #1 below can indeed be as simple as you sketch above. We
need to make sure that the final call that closes the FD and makes fd
value available to other connections is placed last (that may already be
true and is easy to fix if not). We also need to make sure that the
user-level code does not access fd_table based on a possibly stale fd
value. That would require more work, but probably still nothing major.

> We only need a way to identify comm_close handlers and the asyncCall 
> which will do the actual close of the socket.

I am not sure what you mean, but the general-purpose AsyncCallQueue will
not do these comm-specific checks. We already have custom comm callbacks
and a general "Should This Call be Dialed?" interface. The callbacks
will check the flags and will not dial if they should not. Please
clarify if I am talking about a different problem here.

Thank you,

Alex.

> Alex Rousskov wrote: 
> > comm_close(fd) API:
> > 
> > 1) No I/O callbacks will be dialed after comm_close is called (including
> > the time when comm_close is running).
> > 
> > 2) All close callbacks registered before comm_close was called will be
> > called asynchronously, sometime after comm_close exits.
> > 
> > 3) The code can ask Comm whether a FD associated with a close callback
> > has been closed. The answer will be "yes" after comm_close is called and
> > "no" before that. This interface needs to be added. Direct access to
> > fd_table[fd].flags will not work because the same FD could have been
> > assigned to another connection already. The code will have to use its
> > close callback to determine FD status.
> > 
> > 4) Registering any callback after comm_close has been called is wrong.
> > Comm will assert if it can detect such late registrations. Not all late
> > registrations will be detected because the same FD can be already
> > assigned to a different (new) connection[*].
> > 
> > The above comm_close API is easy to implement without massive code
> > changes. Do you think it is the right API? If not, what should be
> > changed?
> > 
> > [*] We can reliably detect #4 violations by replacing "fd" integers
> > outside of comm with a unique integer ID or, better, a handler object
> > containing metadata. That kind of change may be too big for v3.1 though.




Re: client_side and comm_close

2008-04-20 Thread Tsantilas Christos

Henrik Nordstrom wrote:

lör 2008-04-19 klockan 17:24 +0300 skrev Tsantilas Christos:

If I am not wrong the bug which triggered this discussion is the bug2309
which is squid3.0 bug where AsyncCalls does not exists.


Is it? I wasn't aware this was the problem we are discussing. 2309 is
from what I can tell completely unrelated to comm_close or asyncness.
Exactly what triggers it we don't know yet, other than for some reason
two comm_read() calls gets scheduled on the same FD.



Ok, maybe the bug2309 is unrelated to the comm_close but it is related 
to the client_side code complexity, and this is the first problem.



From what I can tell 2309 is caused by the comm_read() refactoring,

where a more low-level commSetSelect() API were used previously where it
was no error to reset the event handler for an comm event while one was
already set up on the fd..

What triggered the comm_close discussion was the pconn closing issue
from what I remember.



I am not saying that all are OK with the comm_close. Looks that there is 
a problem with this function, and this is the second problem


This why I am saying that there are two different problems here





Re: client_side and comm_close

2008-04-20 Thread Tsantilas Christos

Maybe it can be easier:
Just keeping the comm_close as is and in AsyncCallsQueue just 
cancels/not execute asyncCall's for which the fd_table[fd].flags.closing 
is set.
We only need a way to identify comm_close handlers and the asyncCall 
which will do the actual close of the socket.




Alex Rousskov wrote:

OK, I think you picked option B then :-). Let me polish it to better
match the current level of understanding and the "dropping
ERR_COMM_CLOSING" agreement:

comm_close(fd) API:

1) No I/O callbacks will be dialed after comm_close is called (including
the time when comm_close is running).

2) All close callbacks registered before comm_close was called will be
called asynchronously, sometime after comm_close exits.

3) The code can ask Comm whether a FD associated with a close callback
has been closed. The answer will be "yes" after comm_close is called and
"no" before that. This interface needs to be added. Direct access to
fd_table[fd].flags will not work because the same FD could have been
assigned to another connection already. The code will have to use its
close callback to determine FD status.

4) Registering any callback after comm_close has been called is wrong.
Comm will assert if it can detect such late registrations. Not all late
registrations will be detected because the same FD can be already
assigned to a different (new) connection[*].

The above comm_close API is easy to implement without massive code
changes. Do you think it is the right API? If not, what should be
changed?

[*] We can reliably detect #4 violations by replacing "fd" integers
outside of comm with a unique integer ID or, better, a handler object
containing metadata. That kind of change may be too big for v3.1 though.

Thank you,

Alex.







Re: client_side and comm_close

2008-04-20 Thread Alex Rousskov
On Sun, 2008-04-20 at 09:22 +0200, Henrik Nordstrom wrote:
> lör 2008-04-19 klockan 23:05 -0600 skrev Alex Rousskov:
> > It is easy to implement two kinds of call scheduling, but I am not sure
> > I understand the merits of doing so. What do you mean by the "event
> > loops" in this context? Are you talking about eventAdd() queue? Or I/O
> > event (i.e., comm) callbacks? I am guessing it is the latter.
> 
> Both. The event checks (I/O, timed events etc) run from the main loop
> and are always the first AsyncCall callers for anything going on.
> 
> > The immediate calls from comm were not really "safe" (because of the
> > recursive/reentrant behavior) so I am not sure what we would achieve
> > here.
> 
> As everything going on in Squid is initiated by the event loop you don't
> get recursion from this change..
> 
> main loop -> event loop -> immediate async call -> handler -> other I/O
> actions etc but from here always async..

I agree regarding the first initiation point, but disagree that it is
sufficient to guarantee non-recursive behavior: In addition to
callbacks, we have direct function calls to comm and other APIs. The
combination of immediate callbacks and those function calls will cause
"recursion" as control will reenter the jobs multiple times. For
example:

event loop -> immediate callback -> handler -> comm_* call -> immediate
callback -> handler -> ...

> > Please clarify which of the following comm_close implementation would
> > solve the problems you want to solve in this context:
> > 
> > A) Immediately call all not-yet-dialed callbacks for the affected FD.
> > All relevant I/O callbacks are called with ERR_COMM_CLOSING flag set (if
> > we still have that). The closing callbacks, if any, are called last.
> > 
> > B) Asynchronously call all not-yet-dialed callbacks for the affected FD.
> > All relevant I/O callbacks are called with ERR_COMM_CLOSING flag set (if
> > we still have that). The closing callbacks, if any, are called last.
> > 
> > C) Other.
> 
> Don't call them. See ERR_COMM_CLOSING discussion.
> 
> The I/O calls are no longer relevant at the moment the fd has been
> semantically closed (comm_close called by the code). Only the close
> callbacks matter at that time.
> 
> Note: The close callbacks as such need to be async, just not the closing
> of the fd (or at least at a semantical level).

OK, I think you picked option B then :-). Let me polish it to better
match the current level of understanding and the "dropping
ERR_COMM_CLOSING" agreement:

comm_close(fd) API:

1) No I/O callbacks will be dialed after comm_close is called (including
the time when comm_close is running).

2) All close callbacks registered before comm_close was called will be
called asynchronously, sometime after comm_close exits.

3) The code can ask Comm whether a FD associated with a close callback
has been closed. The answer will be "yes" after comm_close is called and
"no" before that. This interface needs to be added. Direct access to
fd_table[fd].flags will not work because the same FD could have been
assigned to another connection already. The code will have to use its
close callback to determine FD status.

4) Registering any callback after comm_close has been called is wrong.
Comm will assert if it can detect such late registrations. Not all late
registrations will be detected because the same FD can be already
assigned to a different (new) connection[*].

The above comm_close API is easy to implement without massive code
changes. Do you think it is the right API? If not, what should be
changed?

[*] We can reliably detect #4 violations by replacing "fd" integers
outside of comm with a unique integer ID or, better, a handler object
containing metadata. That kind of change may be too big for v3.1 though.

Thank you,

Alex.




Re: client_side and comm_close

2008-04-20 Thread Alex Rousskov
On Sun, 2008-04-20 at 16:51 +0200, Henrik Nordstrom wrote:
> sön 2008-04-20 klockan 00:38 -0600 skrev Alex Rousskov:
> 
> > This thread is (was) dedicated to the discussion whether _adjusting_
> > v3.1 roadmap to add client-side cleanup is a good idea.
> 
> And I say no on that. The current code works kind of, and the small
> problems we have can be patched up for 3.1.STABLE.

Noted. Since nobody advocated and volunteered for doing the client-side
cleanup now, I think we should plan doing it for v3.2. If, based on Amos
4-step plan, we decide there may be time to do this in v3.1 after step
#2 is completed, we can always revisit this question. If nobody commits
to do this in v3.2, the feature will be pushed back to v3.3, as usual.

I have added a Feature page to place this work on Squid3 roadmap:
http://wiki.squid-cache.org/Features/ClientSideCleanup

> What is needed for the server component (client_side*) is a major
> cleanup of the server code + store + how it interacts with forwarding.
> It's a faily large change and beyond what we should aim for in 3.1. 3.2
> is more likely, maybe 3.3. Quite likely it needs a complete rebuild of
> the client_side* code (and rename to http_server), only reusing the
> parts that makes sense. It's not really that difficult job that code
> needs to do, but the way it's currently being done makes a big mess of
> it.
> 
> client_side interactions with comm_close is just a tiny tiny part of
> that, and looking into that before we have defined how comm_close really
> should behave isn't meaningful imho.

I agree with the technical comments. As for the schedule, I hope we do
not have to wait for v3.3, but it is too early to say.

Thank you,

Alex.




Re: client_side and comm_close

2008-04-20 Thread Henrik Nordstrom
lör 2008-04-19 klockan 17:24 +0300 skrev Tsantilas Christos:
> If I am not wrong the bug which triggered this discussion is the bug2309
> which is squid3.0 bug where AsyncCalls does not exists.

Is it? I wasn't aware this was the problem we are discussing. 2309 is
from what I can tell completely unrelated to comm_close or asyncness.
Exactly what triggers it we don't know yet, other than for some reason
two comm_read() calls gets scheduled on the same FD.

>From what I can tell 2309 is caused by the comm_read() refactoring,
where a more low-level commSetSelect() API were used previously where it
was no error to reset the event handler for an comm event while one was
already set up on the fd..

What triggered the comm_close discussion was the pconn closing issue
from what I remember.

Regards
Henrik



Re: client_side and comm_close

2008-04-20 Thread Henrik Nordstrom
tor 2008-04-17 klockan 20:13 +0300 skrev Tsantilas Christos:

> As I can understand the problem does not exist only in squid3.0 or only
> in squid3-trunk.

The problems show up mainly in squid3-trunk due to the delays introduced
by AsyncCalls, combined with the inability for old code to cancel
pending operations when needed where they could do so just fine before
AsyncCalls... The client_side_* and pconn* code isn't designed to deal
with this kind of asyncness..

> The main problem with the client_side code is that the comm_read done by
> the ConnStateData class, the comm_writes by the ClientSocketContext
> class and comm_close (and other comm operations ) by all. This makes
> very difficult to manage the comm related problems in client_side code

Yes, but before AsyncCalls it all wound down immediately on comm_close,
cancelling all further operations. The FD got closed, the
ClientSocketContext invalidate and pending requests cancelled. So while
this code is pretty much the same in 3-trunk it's only a problem after
the introduction of AsyncCalls.

The AsyncCalls do solve another class of problems however, what has been
referred to as "recursion" in this thread. Or to be specific the
combined fact that comm_close was immediate and that it wasn't always
obvious when comm_close would trigger in complex code where comm_close
is called from multiple places or deep down in processing chains where
the upper layers maybe assumes it won't get called.

This "recursion" only existed for comm_close, not read/write or timed
events as those always bounced via the event loop before called.

Regards
Henrik



Re: client_side and comm_close

2008-04-20 Thread Henrik Nordstrom
sön 2008-04-20 klockan 00:38 -0600 skrev Alex Rousskov:

> This thread is (was) dedicated to the discussion whether _adjusting_
> v3.1 roadmap to add client-side cleanup is a good idea.

And I say no on that. The current code works kind of, and the small
problems we have can be patched up for 3.1.STABLE.

What is needed for the server component (client_side*) is a major
cleanup of the server code + store + how it interacts with forwarding.
It's a faily large change and beyond what we should aim for in 3.1. 3.2
is more likely, maybe 3.3. Quite likely it needs a complete rebuild of
the client_side* code (and rename to http_server), only reusing the
parts that makes sense. It's not really that difficult job that code
needs to do, but the way it's currently being done makes a big mess of
it.

client_side interactions with comm_close is just a tiny tiny part of
that, and looking into that before we have defined how comm_close really
should behave isn't meaningful imho.
 
There is already sufficient amount of large changes in 3.2. Lets try not
to make it a significantly bigger step in the evolution of Squid. Focus
for 3.1 should now be to prune out bugs and look into what we need to do
for 3.2.

Regards
Henrik



Re: client_side and comm_close

2008-04-20 Thread Henrik Nordstrom
lör 2008-04-19 klockan 23:05 -0600 skrev Alex Rousskov:
> It is easy to implement two kinds of call scheduling, but I am not sure
> I understand the merits of doing so. What do you mean by the "event
> loops" in this context? Are you talking about eventAdd() queue? Or I/O
> event (i.e., comm) callbacks? I am guessing it is the latter.

Both. The event checks (I/O, timed events etc) run from the main loop
and are always the first AsyncCall callers for anything going on.

> The immediate calls from comm were not really "safe" (because of the
> recursive/reentrant behavior) so I am not sure what we would achieve
> here.

As everything going on in Squid is initiated by the event loop you don't
get recursion from this change..

main loop -> event loop -> immediate async call -> handler -> other I/O
actions etc but from here always async..

> I hope answers to earlier questions would clarify the intent.
> Again, I would like to understand and agree on the problem/goal before
> debating specific solutions...

Goal is to get a clean and deterministic API for the event processing
without hard to reproduce race conditions.

> Please clarify which of the following comm_close implementation would
> solve the problems you want to solve in this context:
> 
> A) Immediately call all not-yet-dialed callbacks for the affected FD.
> All relevant I/O callbacks are called with ERR_COMM_CLOSING flag set (if
> we still have that). The closing callbacks, if any, are called last.
> 
> B) Asynchronously call all not-yet-dialed callbacks for the affected FD.
> All relevant I/O callbacks are called with ERR_COMM_CLOSING flag set (if
> we still have that). The closing callbacks, if any, are called last.
> 
> C) Other.

Don't call them. See ERR_COMM_CLOSING discussion.

The I/O calls are no longer relevant at the moment the fd has been
semantically closed (comm_close called by the code). Only the close
callbacks matter at that time.

Note: The close callbacks as such need to be async, just not the closing
of the fd (or at least at a semantical level).

Regards
Henrik



Re: client_side and comm_close

2008-04-20 Thread Henrik Nordstrom
lör 2008-04-19 klockan 21:01 -0600 skrev Alex Rousskov:

> I agree with many of your specific suggestions (and will discuss them
> separately), but I want to understand what problem you are proposing to
> solve first. For example:
> 
> - Fix a few new bugs introduced by async calls.
> - Fix certain comm API problems.
> - Improve client-side code.
> - Something else.
> 
> This thread was just about improving client-side code (more
> specifically, when to do it), but the responses seem to be are all over
> the map so I am trying to understand what the goal is before diving into
> specific development ideas.

I am focusing more on comm than client_side at the moment. Fixing
client_side without a well defined and reliable comm layer is not very
meaningful.

Regards
Henrik



Re: client_side and comm_close

2008-04-19 Thread Alex Rousskov
On Sun, 2008-04-20 at 11:47 +0800, Adrian Chadd wrote:

> > > Yes, then you left it by introducing something which exposed the bugs.
> > 
> > The problems this thread is discussing were known before AsyncCalls.
> 
> But they weren't exhibiting as -bugs-.

They were: assertion in bug2309 (and many other problems I personally
worked on) were "discovered" in Squid3 way before AsyncCalls (v3.0 and
earlier).

> > > Its great that we now know about the bugs - but now you've got a codebase
> > > that you're trying to stabilise 
> > 
> > This thread is not about stabilizing client-side code. It is about
> > changing its key classes and the relationships among them (for many
> > reasons). This will certainly destabilize the code short-term. If it
> > would not, I would not need to ask others opinion whether we should wait
> > with that work!
> 
> I thought the focus on 3.1 - and I'd check, but the Wiki history for the
> Roadmap stuff I put in early in the 3.1 roadmap cycle - was to focus on
> stability and performance.
...
> _I_ helped start building the 3.1 roadmap, if you remember. _I_ helped
> draft the notion that 3.1 should be about performance and stability
> fixes.

This thread is (was) dedicated to the discussion whether _adjusting_
v3.1 roadmap to add client-side cleanup is a good idea. Since we have
more than one person working on the project, we should try to coordinate
significant changes.  If you think client-side cleanup is a bad idea for
v3.1, just post your arguments -- no reason to talk about AsyncCalls or
other unrelated matters (which can be discussed in other threads, of
course).

As for v3.1 original focus that "you put in early", I do not even want
to argue about that, and I am happy to assume that your original v3.1
"roadmap" was exactly what we should have done.

What matters though, is the _current_ Squid3 roadmap, which we are
trying to base on specific commitments of specific folks. It is far from
perfect, but I think it is improving and reflects the current reality
reasonably well.

> If you want people to adopt Squid-3 then you should try and bring
> another stable release out with IPv6 (once all of -those- issues are
> fixed) and without too much quirkiness, and soon.

We already have a Squid3 release that is reasonably stable and
improving. I am not sure why we need to rush v3.1 out when the planned
features have not been completed yet. (And did not you claim that IPv6
as a Squid feature is not really important? This thread is not about
IPv6 though.)

Alex.




Re: client_side and comm_close

2008-04-19 Thread Alex Rousskov
On Sat, 2008-04-19 at 17:24 +0300, Tsantilas Christos wrote:

> I think we have two different problems here, the client_side code  and
> the comm_close in async calls.

You are correct. General async call bugs and general comm API problems
have been mentioned on this thread as well. Most of these problems are
related, of course, but it is rather important to understand which
problem the person is talking about when discussing a specific
solution :-).

> If I am not wrong the bug which triggered this discussion is the bug2309
> which is squid3.0 bug where AsyncCalls does not exists.

If that is indeed the case, it is a good point!

> The problem here
> is the client side code which the true is that it needs clean up. It is
> well known to everyone tried to fix bugs or add code in this part of
> squid3..

Yes, I hope everybody who worked with the client-side code is in
agreement that it "needs a clean up". I believe the question opening
this thread was "when to do it" (v3.1 or v3.2) not "whether to do it".

It is important to note that I can hit bug2309-like assertions in trunk
as well. The exact causes may be a little different, but all seem to
revolve around comm_close and transaction abort handling (before and
after async calls).

Thank you,

Alex.




Re: client_side and comm_close

2008-04-19 Thread Alex Rousskov
On Sat, 2008-04-19 at 11:52 +0200, Henrik Nordstrom wrote:
> fre 2008-04-18 klockan 08:25 -0600 skrev Alex Rousskov:
> 
> > The changes needed on the client-side are pretty much unrelated to async
> > calls. Async calls just make the design problems more obvious and help
> > in solving them. Removing async calls from comm will fix a few bugs, but
> > will be a step backwards and will make ongoing code cleanup more
> > difficult.
> 
> Here is a different proposal. Split async calls in two. Queued and
> immediate, with the event loops using immediate calls and everything
> else queued.

It is easy to implement two kinds of call scheduling, but I am not sure
I understand the merits of doing so. What do you mean by the "event
loops" in this context? Are you talking about eventAdd() queue? Or I/O
event (i.e., comm) callbacks? I am guessing it is the latter.

> The siplest way of doing so safely would be to make async
> calls immediate if not already within an async call, but adjusting the
> event loops is also a possibility. This would make the code behave a lot
> more like how it was before async, eleminating most races.

The immediate calls from comm were not really "safe" (because of the
recursive/reentrant behavior) so I am not sure what we would achieve
here. I hope answers to earlier questions would clarify the intent.
Again, I would like to understand and agree on the problem/goal before
debating specific solutions...

> The very few cases which may depend on comm_close being immediate should
> be possible to identify by cbdataInternalLock calls I think as the state
> object previously may disappear immediately otherwise. I can't see any
> in the client_side_* code today. But there very likely is assumptions
> that after comm_close has been called no further events will fire
> positively on that fd (only as ERR_COMM_CLOSING).

> I don't remember why comm_close was made async, but the more I think
> about it I think it's most likely wrong. comm_close SHOULD invalidate
> any pending comm operations on that fd, anything else just leads to
> nightmares.

Please clarify which of the following comm_close implementation would
solve the problems you want to solve in this context:

A) Immediately call all not-yet-dialed callbacks for the affected FD.
All relevant I/O callbacks are called with ERR_COMM_CLOSING flag set (if
we still have that). The closing callbacks, if any, are called last.

B) Asynchronously call all not-yet-dialed callbacks for the affected FD.
All relevant I/O callbacks are called with ERR_COMM_CLOSING flag set (if
we still have that). The closing callbacks, if any, are called last.

C) Other.

Implementation A would lead to reentrant job calls, which is the problem
we were trying to solve earlier, I think. Implementation B avoids
reentrant/recursive behavior. We can have both if B is the Right Thing
to do but A is what some old code required.

(I still would like to know what problem we are trying to solve though;
Is the problem you are trying to solve specific to comm_close?)

Thank you,

Alex.




Re: client_side and comm_close

2008-04-19 Thread Adrian Chadd
On Sat, Apr 19, 2008, Alex Rousskov wrote:

> > But now you have a better understanding of the bugs that are introduced
> > with AsyncCalls. You can then plan design changes to the client-side code
> > to work around these.
> 
> I am not sure how to rephrase this so that it is clear, but client-side
> design changes are needed to fix client-side bugs, not to work around
> AsyncCall bugs. AsyncCalls are just a helpful tool here, not the focus.
> Removing AsyncCalls will not help but will hurt.

I know that client-side changes are required. We've all known that
the client-side codebase needs major surgery for various reasons.

> I simply do not understand why you focus on AsyncCalls in the context of
> this thread when this thread discusses client-side problems that were
> there before AsyncCalls and are there after AsyncCalls were added.

I'm not focusing on AsyncCalls. I'm focusing on the breakage that its
introduced in code which isn't yet ready for it.

> > Yes, then you left it by introducing something which exposed the bugs.
> 
> The problems this thread is discussing were known before AsyncCalls.

But they weren't exhibiting as -bugs-.

> > Its great that we now know about the bugs - but now you've got a codebase
> > that you're trying to stabilise 
> 
> This thread is not about stabilizing client-side code. It is about
> changing its key classes and the relationships among them (for many
> reasons). This will certainly destabilize the code short-term. If it
> would not, I would not need to ask others opinion whether we should wait
> with that work!

I thought the focus on 3.1 - and I'd check, but the Wiki history for the
Roadmap stuff I put in early in the 3.1 roadmap cycle - was to focus on
stability and performance.

The recent work has broken the stability, and doing performance work on an
unstable codebase is silly.

> > If you back out the async related changes then you'll have a stable codebase
> > again - you can then make smaller incremental changes to the codebase
> > and test them to make sure you haven't broken anything, rather than spending
> > possibly 6 + months finding more surprises.
> 
> I do not think we can fix the problems discussed here by doing small
> incremental changes. We are not talking about bug workarounds or a few
> places that need internal polishing.

And thats where we differed.

> > Remember, the initial idea was to use 3.1 to tidy up the codebase to 
> > stabilise
> > it, not tidy up the codebase to unstabilise it. Redesigning chunks of the
> > comm codebase -was- on my todo list for 3.1, but then people dumped stuff 
> > into
> > 3 to make it less stable again.
> 
> This thread is not about redesigning comm APIs, and I do not know much
> about your todo list, but Squid 3.1 roadmap has been built in public. If
> any proposed Squid 3.1 features went against your plans, you should have
> added your feature(s) to the roadmap and discussed conflicts. This
> procedure does not guarantee anything (those who work together on the
> project have to compromise), but it works better than complaining about
> the first steps while approaching the end of the road.

_I_ helped start building the 3.1 roadmap, if you remember. _I_ helped
draft the notion that 3.1 should be about performance and stability fixes.
The introduction of AsyncCalls, whilst a good idea in the long run,
just shows that the current client-side codebase isn't ready for it.

> And nobody is tidying up the code to make it unstable, obviously.

But it is! And instead of backing the changes out - giving us a stable
codebase again, but with an understanding of what needs to be changed before
the AsyncCalls stuff is reintroduced - people still seem to persist along
the same path which brought all the quirky instability into Squid-3 in the
first place.

Stop thinking that I'm focusing on AsyncCalls, and start thinking that I'm
focusing on software engineering process with a lean towards stability.
If you want people to adopt Squid-3 then you should try and bring another
stable release out with IPv6 (once all of -those- issues are fixed)
 and without too much quirkiness, and soon.

-- 
- Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid Support -
- $25/pm entry-level VPSes w/ capped bandwidth charges available in WA -


Re: client_side and comm_close

2008-04-19 Thread Alex Rousskov
On Sat, 2008-04-19 at 11:52 +0200, Henrik Nordstrom wrote:
> fre 2008-04-18 klockan 08:25 -0600 skrev Alex Rousskov:
> 
> > The changes needed on the client-side are pretty much unrelated to async
> > calls. Async calls just make the design problems more obvious and help
> > in solving them. Removing async calls from comm will fix a few bugs, but
> > will be a step backwards and will make ongoing code cleanup more
> > difficult.
> 
> Here is a different proposal.

I agree with many of your specific suggestions (and will discuss them
separately), but I want to understand what problem you are proposing to
solve first. For example:

- Fix a few new bugs introduced by async calls.
- Fix certain comm API problems.
- Improve client-side code.
- Something else.

This thread was just about improving client-side code (more
specifically, when to do it), but the responses seem to be are all over
the map so I am trying to understand what the goal is before diving into
specific development ideas.

Thank you,

Alex.




Re: client_side and comm_close

2008-04-19 Thread Alex Rousskov
On Sat, 2008-04-19 at 10:50 +0800, Adrian Chadd wrote:
> On Fri, Apr 18, 2008, Alex Rousskov wrote:

> > If the client_side's main problem was a few coding bugs caused by async
> > call changes, you may have been right. Unfortunately, our main problem
> > here is the client_side design flaws. Removing async calls from comm
> > will not undo those flaws.
>
> I know this.

> > The changes needed on the client-side are pretty much unrelated to async
> > calls. Async calls just make the design problems more obvious and help
> > in solving them. Removing async calls from comm will fix a few bugs, but
> > will be a step backwards and will make ongoing code cleanup more
> > difficult.
>
> But now you have a better understanding of the bugs that are introduced
> with AsyncCalls. You can then plan design changes to the client-side code
> to work around these.

I am not sure how to rephrase this so that it is clear, but client-side
design changes are needed to fix client-side bugs, not to work around
AsyncCall bugs. AsyncCalls are just a helpful tool here, not the focus.
Removing AsyncCalls will not help but will hurt.

I simply do not understand why you focus on AsyncCalls in the context of
this thread when this thread discusses client-side problems that were
there before AsyncCalls and are there after AsyncCalls were added.

> > Another way to look at it is to note that client_side was already pretty
> > stable before async calls (i.e., Squid 3.0). Thus, we have already been
> > where you are proposing to go.
> 
> Yes, then you left it by introducing something which exposed the bugs.

The problems this thread is discussing were known before AsyncCalls.

> Its great that we now know about the bugs - but now you've got a codebase
> that you're trying to stabilise 

This thread is not about stabilizing client-side code. It is about
changing its key classes and the relationships among them (for many
reasons). This will certainly destabilize the code short-term. If it
would not, I would not need to ask others opinion whether we should wait
with that work!

> and you can't guarantee you understand
> all of the possible ways its going to break in the future.

True. In fact, nobody can "guarantee he understands all of the possible
ways something is going to break in the future".

> If you back out the async related changes then you'll have a stable codebase
> again - you can then make smaller incremental changes to the codebase
> and test them to make sure you haven't broken anything, rather than spending
> possibly 6 + months finding more surprises.

I do not think we can fix the problems discussed here by doing small
incremental changes. We are not talking about bug workarounds or a few
places that need internal polishing.

> Remember, the initial idea was to use 3.1 to tidy up the codebase to stabilise
> it, not tidy up the codebase to unstabilise it. Redesigning chunks of the
> comm codebase -was- on my todo list for 3.1, but then people dumped stuff into
> 3 to make it less stable again.

This thread is not about redesigning comm APIs, and I do not know much
about your todo list, but Squid 3.1 roadmap has been built in public. If
any proposed Squid 3.1 features went against your plans, you should have
added your feature(s) to the roadmap and discussed conflicts. This
procedure does not guarantee anything (those who work together on the
project have to compromise), but it works better than complaining about
the first steps while approaching the end of the road.

And nobody is tidying up the code to make it unstable, obviously.

In summary, I do not understand how removing AsyncCalls would help fix
client-side design problems that are the subject of this thread. I do
not even understand why we are talking about AsyncCalls here.

Alex.




Re: client_side and comm_close

2008-04-19 Thread Tsantilas Christos
Hi all,
  I think we have two different problems here, the client_side code  and
the comm_close in async calls.
If I am not wrong the bug which triggered this discussion is the bug2309
which is squid3.0 bug where AsyncCalls does not exists. The problem here
is the client side code which the true is that it needs clean up. It is
well known to everyone tried to fix bugs or add code in this part of
squid3..

About the comm_close, the AsyncCalls designed and implemented in such
way to require the minimum changes in the squid3 code and this was
limiting some thinks...

Henrik Nordstrom wrote:
> 
> Here is a different proposal. Split async calls in two. Queued and
> immediate, with the event loops using immediate calls and everything
> else queued. The siplest way of doing so safely would be to make async
> calls immediate if not already within an async call, but adjusting the
> event loops is also a possibility. This would make the code behave a lot
> more like how it was before async, eleminating most races.
> 
> 
> The very few cases which may depend on comm_close being immediate should
> be possible to identify by cbdataInternalLock calls I think as the state
> object previously may disappear immediately otherwise. I can't see any
> in the client_side_* code today. But there very likely is assumptions
> that after comm_close has been called no further events will fire
> positively on that fd (only as ERR_COMM_CLOSING).
> 
> I don't remember why comm_close was made async, but the more I think

I think (but not sure now) there was cases where some of the pending
asyncCalls must executed.

> about it I think it's most likely wrong. comm_close SHOULD invalidate
> any pending comm operations on that fd, anything else just leads to
> nightmares.

To invalidate any pending comm operation for a particular fd you should
scan all AsyncCalls queue which maybe is costly in a busy proxy, with
thousands of asyncCalls scheduled (maybe the use of secondary asyncCalls
indexes for each fd?)
If we have only AsyncJob based asyncCalls it would be easy to not
execute (invalidate) AsyncCalls which are referred to non valid
asyncJobs, but now still exists functions which are scheduled using the
old way...

> 
> Related to this I also think the COMM_ERR_CLOSING callback from comm
> should be dropped. Nearly all handlers looking for COMM_ERR_CLOSING do
> nothing else than return immediately to abort execution.. and the few
> exceptions there is can be dealt with by a close callback if at all
> needed:
> 
> errorSendComplete, invalidates the err object
> 
> ftpStateData::dataRead, sets a boolean that the read is done
> 
> HttpStateData::handleRequestBodyProducerAborted injects a
> COMM_ERR_CLOSING
> 
> HttpStateData::readReply sets a boolean that the read is done
> 
> ServerStateData::sentRequestBody clears requestSender
> 
> 
> the other 20 or so users of comm callbacks just returns immediately on
> COMM_ERR_CLOSING, "aborting" the call..
> 
> Regards
> Henrik
> 
> 



Re: client_side and comm_close

2008-04-19 Thread Adrian Chadd
On Sat, Apr 19, 2008, Henrik Nordstrom wrote:

> I don't remember why comm_close was made async, but the more I think
> about it I think it's most likely wrong. comm_close SHOULD invalidate
> any pending comm operations on that fd, anything else just leads to
> nightmares.

The back-end of comm_close() may have to be async to properly support
windows overlapped IO in the future. You have you cancel pending IOs
which IIRC may actually fire before the cancel can be processed. Its been
a few years since I toyed with overlapped IO so I can't be sure.

Someone with more Windows Overlapped IO experience should chime in.
Also, I reckon it would be a smart idea to sit down and code up some
very basic Windows overlapped IO examples (like a tcp proxy! :) to
properly explore its behaviour before any changes to comm are made.

(Thats what I'll be doing before my comm changes in the cacheboy branch,
as I really don't want to use libevent/libev's "windows" mode for IO.)



Adrian

-- 
- Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid Support -
- $25/pm entry-level VPSes w/ capped bandwidth charges available in WA -


Re: client_side and comm_close

2008-04-19 Thread Henrik Nordstrom
fre 2008-04-18 klockan 08:25 -0600 skrev Alex Rousskov:

> The changes needed on the client-side are pretty much unrelated to async
> calls. Async calls just make the design problems more obvious and help
> in solving them. Removing async calls from comm will fix a few bugs, but
> will be a step backwards and will make ongoing code cleanup more
> difficult.

Here is a different proposal. Split async calls in two. Queued and
immediate, with the event loops using immediate calls and everything
else queued. The siplest way of doing so safely would be to make async
calls immediate if not already within an async call, but adjusting the
event loops is also a possibility. This would make the code behave a lot
more like how it was before async, eleminating most races.


The very few cases which may depend on comm_close being immediate should
be possible to identify by cbdataInternalLock calls I think as the state
object previously may disappear immediately otherwise. I can't see any
in the client_side_* code today. But there very likely is assumptions
that after comm_close has been called no further events will fire
positively on that fd (only as ERR_COMM_CLOSING).

I don't remember why comm_close was made async, but the more I think
about it I think it's most likely wrong. comm_close SHOULD invalidate
any pending comm operations on that fd, anything else just leads to
nightmares.

Related to this I also think the COMM_ERR_CLOSING callback from comm
should be dropped. Nearly all handlers looking for COMM_ERR_CLOSING do
nothing else than return immediately to abort execution.. and the few
exceptions there is can be dealt with by a close callback if at all
needed:

errorSendComplete, invalidates the err object

ftpStateData::dataRead, sets a boolean that the read is done

HttpStateData::handleRequestBodyProducerAborted injects a
COMM_ERR_CLOSING

HttpStateData::readReply sets a boolean that the read is done

ServerStateData::sentRequestBody clears requestSender


the other 20 or so users of comm callbacks just returns immediately on
COMM_ERR_CLOSING, "aborting" the call..

Regards
Henrik



Re: client_side and comm_close

2008-04-18 Thread Adrian Chadd
On Fri, Apr 18, 2008, Alex Rousskov wrote:

> > I'd suggest another option - roll back all of the async calls changes to the
> > comm code, stabilise the codebase without it and re-evaluate what should
> > occur (in smaller chunks, rather than dropping in a new comm manager)
> > before reintroducing it.
> 
> If the client_side's main problem was a few coding bugs caused by async
> call changes, you may have been right. Unfortunately, our main problem
> here is the client_side design flaws. Removing async calls from comm
> will not undo those flaws.

I know this.

> The changes needed on the client-side are pretty much unrelated to async
> calls. Async calls just make the design problems more obvious and help
> in solving them. Removing async calls from comm will fix a few bugs, but
> will be a step backwards and will make ongoing code cleanup more
> difficult.

But now you have a better understanding of the bugs that are introduced
with AsyncCalls. You can then plan design changes to the client-side code
to work around these.

> Another way to look at it is to note that client_side was already pretty
> stable before async calls (i.e., Squid 3.0). Thus, we have already been
> where you are proposing to go.

Yes, then you left it by introducing something which exposed the bugs.
Its great that we now know about the bugs - but now you've got a codebase
that you're trying to stabilise and you can't guarantee you understand
all of the possible ways its going to break in the future.
If you back out the async related changes then you'll have a stable codebase
again - you can then make smaller incremental changes to the codebase
and test them to make sure you haven't broken anything, rather than spending
possibly 6 + months finding more surprises.

Remember, the initial idea was to use 3.1 to tidy up the codebase to stabilise
it, not tidy up the codebase to unstabilise it. Redesigning chunks of the
comm codebase -was- on my todo list for 3.1, but then people dumped stuff into
3 to make it less stable again.



Adrian

-- 
- Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid Support -
- $25/pm entry-level VPSes w/ capped bandwidth charges available in WA -


Re: client_side and comm_close

2008-04-18 Thread Alex Rousskov
On Fri, 2008-04-18 at 16:15 +0800, Adrian Chadd wrote:
> On Wed, Apr 16, 2008, Alex Rousskov wrote:
> 
> > In short, we have several related problems here: (a) client_side code is
> > incapable of reliably identifying whether comm_close has been called;
> > (b) ConnStateData::isOpen may not work anymore; (c) client_side code
> > uses many different ways to identify whether more data should be read
> > from the connection; (d) comm_close is used a lot but no longer has an
> > immediate effect and some client_side code may still depend on that
> > effect to be immediate; (e) client_side comm handlers decent very deep,
> > making it difficult to propagate errors and comm_close status up.
> > 
> > We should decide whether to continue patching holes here and there or
> > clean up the client_side*cc connection management mess for good. Should
> > we continue to patch isolated v3.0 problems and cleanup v3.1? Or is this
> > a v3.2 project? Or am I exaggerating the problems since common cases
> > usually work fine?
> 
> I'd suggest another option - roll back all of the async calls changes to the
> comm code, stabilise the codebase without it and re-evaluate what should
> occur (in smaller chunks, rather than dropping in a new comm manager)
> before reintroducing it.

If the client_side's main problem was a few coding bugs caused by async
call changes, you may have been right. Unfortunately, our main problem
here is the client_side design flaws. Removing async calls from comm
will not undo those flaws.

The changes needed on the client-side are pretty much unrelated to async
calls. Async calls just make the design problems more obvious and help
in solving them. Removing async calls from comm will fix a few bugs, but
will be a step backwards and will make ongoing code cleanup more
difficult.


Another way to look at it is to note that client_side was already pretty
stable before async calls (i.e., Squid 3.0). Thus, we have already been
where you are proposing to go.

Alex.




Re: client_side and comm_close

2008-04-18 Thread Alex Rousskov
On Fri, 2008-04-18 at 18:34 +1200, Amos Jeffries wrote:
> Alex Rousskov wrote:
> > On Fri, 2008-04-18 at 15:40 +1200, Amos Jeffries wrote:
> > 
> >> IMO, This should be queued as part of the 3.1 cleanups, but not as 
> >> urgently as some other issues.
> >>
> >> The priorities as I see them now are:
> >>   - immediately fixable and urgent bugs.
> >>   - completing the dedicated 3.1 feature list (DEVEL is overdue).
> >>   - cleanups for fixing more intrusive or design bugs (such as this, and 
> >> the 407 one).
> >>   - shakedown for release cycle.
> >>
> >> We have a few months of shakedown before anything like a good stable. 
> >> Plenty of time for cleanup not blocking new features.
> > 
> > I estimate that properly cleaning up the client side would delay v3.1
> > stable release by 2-3 months unless we suddenly get more experts
> > involved. Is that delay acceptable to you? Others?
> 
> If we were in a position to start it now, maybe. But we are not I think.

You are probably right. FWIW, I will not be able to work on that until
eCAP is done.

> I'm vote is for the timeline above, leaving step-3 as optional, the 
> descision whether to even start it until step 2 is fully completed.

> We may later decide that it is worthwhile or vital, but I don't think 
> anyone should start in on more major sweeping until we know what all the 
> fallout from the currently underway set is going to be.

In the unlikely event somebody with enough client-side knowledge can
start on this now, I would probably support it (and help with the
design). Otherwise, I agree that your plan is the best option available.

Thank you,

Alex.




Re: client_side and comm_close

2008-04-18 Thread Adrian Chadd
On Wed, Apr 16, 2008, Alex Rousskov wrote:

> In short, we have several related problems here: (a) client_side code is
> incapable of reliably identifying whether comm_close has been called;
> (b) ConnStateData::isOpen may not work anymore; (c) client_side code
> uses many different ways to identify whether more data should be read
> from the connection; (d) comm_close is used a lot but no longer has an
> immediate effect and some client_side code may still depend on that
> effect to be immediate; (e) client_side comm handlers decent very deep,
> making it difficult to propagate errors and comm_close status up.
> 
> We should decide whether to continue patching holes here and there or
> clean up the client_side*cc connection management mess for good. Should
> we continue to patch isolated v3.0 problems and cleanup v3.1? Or is this
> a v3.2 project? Or am I exaggerating the problems since common cases
> usually work fine?

I'd suggest another option - roll back all of the async calls changes to the
comm code, stabilise the codebase without it and re-evaluate what should
occur (in smaller chunks, rather than dropping in a new comm manager)
before reintroducing it.

I think we've seen that there's still a gap in how development branches are
tested and that the best way we have to test code is to put it in production :)




Adrian

-- 
- Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid Support -
- $25/pm entry-level VPSes w/ capped bandwidth charges available in WA -


Re: client_side and comm_close

2008-04-17 Thread Amos Jeffries

Alex Rousskov wrote:

On Fri, 2008-04-18 at 15:40 +1200, Amos Jeffries wrote:

IMO, This should be queued as part of the 3.1 cleanups, but not as 
urgently as some other issues.


The priorities as I see them now are:
  - immediately fixable and urgent bugs.
  - completing the dedicated 3.1 feature list (DEVEL is overdue).
  - cleanups for fixing more intrusive or design bugs (such as this, and 
the 407 one).

  - shakedown for release cycle.

We have a few months of shakedown before anything like a good stable. 
Plenty of time for cleanup not blocking new features.


I estimate that properly cleaning up the client side would delay v3.1
stable release by 2-3 months unless we suddenly get more experts
involved. Is that delay acceptable to you? Others?


If we were in a position to start it now, maybe. But we are not I think.

I'm vote is for the timeline above, leaving step-3 as optional, the 
descision whether to even start it until step 2 is fully completed.


We may later decide that it is worthwhile or vital, but I don't think 
anyone should start in on more major sweeping until we know what all the 
fallout from the currently underway set is going to be.


Amos
--
Please use Squid 2.6.STABLE19 or 3.0.STABLE4


Re: client_side and comm_close

2008-04-17 Thread Alex Rousskov

On Fri, 2008-04-18 at 15:40 +1200, Amos Jeffries wrote:

> IMO, This should be queued as part of the 3.1 cleanups, but not as 
> urgently as some other issues.
> 
> The priorities as I see them now are:
>   - immediately fixable and urgent bugs.
>   - completing the dedicated 3.1 feature list (DEVEL is overdue).
>   - cleanups for fixing more intrusive or design bugs (such as this, and 
> the 407 one).
>   - shakedown for release cycle.
> 
> We have a few months of shakedown before anything like a good stable. 
> Plenty of time for cleanup not blocking new features.

I estimate that properly cleaning up the client side would delay v3.1
stable release by 2-3 months unless we suddenly get more experts
involved. Is that delay acceptable to you? Others?

Thank you,

Alex.




Re: client_side and comm_close

2008-04-17 Thread Amos Jeffries

Alex Rousskov wrote:

On Thu, 2008-04-17 at 20:13 +0300, Tsantilas Christos wrote:

Alex Rousskov wrote:
In reply to comment #8 for bug #2309 
comm_read assertion failed: "ccb->active == false"

http://www.squid-cache.org/bugs/show_bug.cgi?id=2309

Note to the one looking into this bug: The assert happens if a 
comm_read() is called while there already is a comm_read() pending.

Yes.

I would suspect the half closed client check has been wrongly 
adopted to call comm_read where it used commSetSelect periodically

before..

d
The above may be true, but there are other conditions causing the same
assertion in my tests, with and without the pending comm_close patch
from Christos.


As I can understand the problem does not exist only in squid3.0 or only
in squid3-trunk.

The main problem with the client_side code is that the comm_read done by
the ConnStateData class, the comm_writes by the ClientSocketContext
class and comm_close (and other comm operations ) by all. This makes
very difficult to manage the comm related problems in client_side code


Yes. If we decide to fix this, there will be a single object responsible
for client connection management. Others will simply use its services as
needed. Comm cannot be such an object because it is too low-level.
Currently, the functionality is spread among many classes.


We should decide whether to continue patching holes here and there or
clean up the client_side*cc connection management mess for good. Should
we continue to patch isolated v3.0 problems and cleanup v3.1? Or is this
a v3.2 project? Or am I exaggerating the problems since common cases
usually work fine?


The question remains.


IMO, This should be queued as part of the 3.1 cleanups, but not as 
urgently as some other issues.


The priorities as I see them now are:
 - immediately fixable and urgent bugs.
 - completing the dedicated 3.1 feature list (DEVEL is overdue).
 - cleanups for fixing more intrusive or design bugs (such as this, and 
the 407 one).

 - shakedown for release cycle.

We have a few months of shakedown before anything like a good stable. 
Plenty of time for cleanup not blocking new features.


Amos
--
Please use Squid 2.6.STABLE19 or 3.0.STABLE4


Re: client_side and comm_close

2008-04-17 Thread Alex Rousskov
On Thu, 2008-04-17 at 20:13 +0300, Tsantilas Christos wrote:
> Alex Rousskov wrote:
> > In reply to comment #8 for bug #2309 
> > comm_read assertion failed: "ccb->active == false"
> > http://www.squid-cache.org/bugs/show_bug.cgi?id=2309
> > 
> >> Note to the one looking into this bug: The assert happens if a 
> >> comm_read() is called while there already is a comm_read() pending.
> > 
> > Yes.
> > 
> >> I would suspect the half closed client check has been wrongly 
> >> adopted to call comm_read where it used commSetSelect periodically
> >> before..
> > d
> > The above may be true, but there are other conditions causing the same
> > assertion in my tests, with and without the pending comm_close patch
> > from Christos.
> > 
> 
> As I can understand the problem does not exist only in squid3.0 or only
> in squid3-trunk.
> 
> The main problem with the client_side code is that the comm_read done by
> the ConnStateData class, the comm_writes by the ClientSocketContext
> class and comm_close (and other comm operations ) by all. This makes
> very difficult to manage the comm related problems in client_side code

Yes. If we decide to fix this, there will be a single object responsible
for client connection management. Others will simply use its services as
needed. Comm cannot be such an object because it is too low-level.
Currently, the functionality is spread among many classes.

> > We should decide whether to continue patching holes here and there or
> > clean up the client_side*cc connection management mess for good. Should
> > we continue to patch isolated v3.0 problems and cleanup v3.1? Or is this
> > a v3.2 project? Or am I exaggerating the problems since common cases
> > usually work fine?

The question remains.

Thank you,

Alex.




Re: client_side and comm_close

2008-04-17 Thread Tsantilas Christos
Hi all,
Alex Rousskov wrote:
> In reply to comment #8 for bug #2309 
> comm_read assertion failed: "ccb->active == false"
> http://www.squid-cache.org/bugs/show_bug.cgi?id=2309
> 
>> Note to the one looking into this bug: The assert happens if a 
>> comm_read() is called while there already is a comm_read() pending.
> 
> Yes.
> 
>> I would suspect the half closed client check has been wrongly 
>> adopted to call comm_read where it used commSetSelect periodically
>> before..
> d
> The above may be true, but there are other conditions causing the same
> assertion in my tests, with and without the pending comm_close patch
> from Christos.
> 

As I can understand the problem does not exist only in squid3.0 or only
in squid3-trunk.

The main problem with the client_side code is that the comm_read done by
the ConnStateData class, the comm_writes by the ClientSocketContext
class and comm_close (and other comm operations ) by all. This makes
very difficult to manage the comm related problems in client_side code


> In short, we have several related problems here: (a) client_side code is
> incapable of reliably identifying whether comm_close has been called;
> (b) ConnStateData::isOpen may not work anymore; (c) client_side code
> uses many different ways to identify whether more data should be read
> from the connection; (d) comm_close is used a lot but no longer has an
> immediate effect and some client_side code may still depend on that
> effect to be immediate; (e) client_side comm handlers decent very deep,
> making it difficult to propagate errors and comm_close status up.
> 
> We should decide whether to continue patching holes here and there or
> clean up the client_side*cc connection management mess for good. Should
> we continue to patch isolated v3.0 problems and cleanup v3.1? Or is this
> a v3.2 project? Or am I exaggerating the problems since common cases
> usually work fine?
> 
> Thank you,
> 
> Alex.
> 
> 
> 



client_side and comm_close

2008-04-16 Thread Alex Rousskov
In reply to comment #8 for bug #2309 
comm_read assertion failed: "ccb->active == false"
http://www.squid-cache.org/bugs/show_bug.cgi?id=2309

> Note to the one looking into this bug: The assert happens if a 
> comm_read() is called while there already is a comm_read() pending.

Yes.

> I would suspect the half closed client check has been wrongly 
> adopted to call comm_read where it used commSetSelect periodically
> before..

The above may be true, but there are other conditions causing the same
assertion in my tests, with and without the pending comm_close patch
from Christos.

In short, we have several related problems here: (a) client_side code is
incapable of reliably identifying whether comm_close has been called;
(b) ConnStateData::isOpen may not work anymore; (c) client_side code
uses many different ways to identify whether more data should be read
from the connection; (d) comm_close is used a lot but no longer has an
immediate effect and some client_side code may still depend on that
effect to be immediate; (e) client_side comm handlers decent very deep,
making it difficult to propagate errors and comm_close status up.

We should decide whether to continue patching holes here and there or
clean up the client_side*cc connection management mess for good. Should
we continue to patch isolated v3.0 problems and cleanup v3.1? Or is this
a v3.2 project? Or am I exaggerating the problems since common cases
usually work fine?

Thank you,

Alex.