Alex Rousskov wrote:
On 07/19/2010 05:26 AM, Amos Jeffries wrote:
Alex Rousskov wrote:
On 07/18/2010 04:52 AM, Amos Jeffries wrote:
This updated my previous submission for the comm connection opener
model.
diff since last time:
- removed the multi-path pathways. They are now isolated at the upper
level. ConnOpener now handles only one Comm::Connection.
- implemented ConnOpener as an AsyncJob child.
What has changed from HEAD:
ConnectionDetails objects have been renamed Comm::Connection and been
extended to hold the FD and Squids' socket flags.
...
On COMM_ERR_CONNECT and COMM_TIMEOUT the conn will be !isOpen() or NULL
and should not be relied on.
If the connection pointer is NULL, a job opening multiple connections
would not be able to tell which connection just failed. Let's not hide
connection pointer from the initiator.
* Can Adaptation::Icap::Xaction::closeConnection set connection to NULL?
In the reuseConnection == false case, it can after the connection->close().
Why not in both reuse and no-reuse cases?
Um... okay yes. In both. Fixed.
* ConnOpener::start(), doneAll(), and swanSong() should remain protected
as they are in the AsyncJob.
Done. Apart from doneAll(). It is required to be accessible to the
ConnectRetry wrapper until comm write callbacks are turned into methods.
ConnectRetry is broken in several ways: It partially and incorrectly
emulates async call protections, it calls start() directly, and it
results in at least two calls to start() for on job (which is like
having two calls to main() in a program).
Instead of this mess, please have ConnectRetry wrapper to schedule an
async call for ConnOpener::tryConnecting() or a similar job method and
move your existing comm_connect_addr() code there. No deleteThis or
doneAll access would be necessary.
Ew. A call to schedule a call?
We do gain a lot though. Done.
* Should ConnOpener::swanSong() call comm_close instead of just close()?
No. Until ConnOpener() has completed successfully there is no other
valid user of the conn->fd. The only handlers to deal with are
ConnOpeners' earlyAbort and timeout handlers, which are explicitly
cancelled and removed.
comm_close does a lot of additional handler and callback state updating
over multiple callbacks. This is all unnecessary waste of cycles on a
closed or maybe half-open FD.
I doubt the "we are the only fd user so we are going to bypass Comm when
we want to" is a good approach. ConnOpener calls many Comm methods.
Those methods may create some state in Comm internal tables. We should
call comm_close if we call comm_open instead of assuming that Comm has
no state associated with our descriptor.
If calling comm_close results in too many useless callbacks in our case,
we should remove the no-longer-needed callbacks _before_ closing the
descriptor.
Sorry, I responded to that without taking another look at the code.
ConnOpener::swanSong() actually calls the shared and public
Connection::close() which calls comm_close() and unsets the FD.
The behaviour you describe as should be happening was in fact happening.
Upper level removing its callbacks, lower level being told to close its FD.
* ConnOpener::callCallback() clears the close handler and timeout
depending on solo state, which is kind of wrong, but the best thing to
do is to remove that cleanup code from callCallback() completely because
the same code is already present in swanSong() and because it is
unrelated to calling the callback.
Ideally yes. I'm not certain enough about the state of the Job on the
second (ConnectRetry) legacy entrance to remove either at this point.
Do not let one bug cause others. I have discussed ConnectRetry fixes
above, but swanSong and callCallback should not depend on what kind of
bugs we have in ConnectRetry or we will be going in circles.
Okay. Fixed. And the crashes which resulted from that fix have in turn
been fixed... by rolling Comm::Connection out as the server FD handle
used by http.cc.
One commit early, but its worked and was not as large as I had
thought. Though the non-http servers have yet to have their rollout to
fix the same problems over there.
Do you want to audit the code at this stage or wait till I finish that
next level of rollout into the Servers' ?
* Do not call start() directly. Only AsyncStart should call it. And it
should not be called more than once per job object lifetime. It is
similar to main() in C/C++.
Fixed. Start now does comm_openex(), creates the calls and schedules the
connect one. Which contains the switch statement from the old code and
loops async.
* It may not be a good idea to separate calls.earlyabort and
calls.timeout existence from the existence of the corresponding Comm
close or timeout handler. Set them when setting the Comm handler. Clear
them when clearing the handler.
Fixed.
* FwdState::unregister() used to prevent FwdState from closing the
unregistered connection. The new code does not. Was the old code buggy?
Only in Comm::Connection semantics. Added an alternative version and
deprecated the old unregister(int) with a hack to keep it going on the
old semantics.
I will try to remember to revisit this when you post the patch with the
changes described above, but I may forget.
New code is this pair of methods:
void
FwdState::unregister(Comm::ConnectionPointer conn)
{
debugs(17, 3, HERE << entry->url() );
assert(serverConnection() == conn);
assert(conn->isOpen());
comm_remove_close_handler(conn->fd, fwdServerClosedWrapper, this);
}
// Legacy method to be removed in favor of the above as soon as possible
void
FwdState::unregister(int fd)
{
debugs(17, 3, HERE << entry->url() );
assert(fd == serverConnection()->fd);
assert(fd > -1);
comm_remove_close_handler(fd, fwdServerClosedWrapper, this);
serverConnection()->fd = -1;
}
Yes, I do see that the fd = -1 was critical to the old code semantics.
This is a problem. unregister() can prevent the fwdServerClosedWrapper
being called from that point on. From a days running of Squid that seems
to be sufficient.
The better solution may be to not call ->close() or comm_close()
directly anymore from FwdState or any of the Servers. Relying on the
Comm::Connection to do its own closing on destruct.
And why is it called "paths"? Each paths element (i.e., a path) only
contains one point (the next hop). A path is a sequence of points.
Should we call the "paths" member "hops", "peers", or "servers"?
Sequence of two location data points are stored in each conn:
(local,remote)
The paths vector<> holds a series of such.
Though I agree paths is not a good name, and I am leaning towards
remoteDestinations or somesuch now.
"destinations" is fine with me though other choices are shorter. I would
drop the "remote" prefix as it adds no meaning and may be confusing when
a remote address is "localhost".
Overall, exposing connection pointer to Servers seems wrong, especially
because the pointer may change from one call to another. A Server that
needs that pointer, should store it at initialization time, just like it
stores fd in the current code.
You have "TODO: store Comm::Connection instead of FD". This is an XXX
and not just a TODO because of the above concerns.
This auditing task is scoped to do the connection setup and get
split-stack IPv6 a chance of going operational. Which means as little
upper layer change in the Servers as we can cope with.
I appreciate the goal of minimizing the upper-level changes, but when
the connection pointer change is outside Server control, it becomes
dangerous to offer the Server access to it. It already caused one bug
and we could easily miss more hidden ones. At the very least, let's add
a "current connection, may change between calls" comment for the conn()
method.
Is there any reason why the CommCalls have unique params classes and
dialers for CommTimeoutCb* and CommCloseCb* ? I'd kind of like to roll
them and CommConnectCb* into passing CommCommonCb* if thats possible,
and cut ~25% of the CommCalls code on the next cleanup step.
You can merge CommTimeoutCb and CommCloseCb because they have the same
arguments and such. The merge will remove a few lines of simple code,
but you may lose the ability to know which callback is segfaulting based
on call object alone, which is sometimes useful when dealing with core
dumps. I doubt it is worth it, but have no strong objections.
If the cleanup you have mentioned focuses on removing old function-style
callbacks instead, then you would be able to remove a lot of CommCalls
complexity and improve debugging.
Yes.
Converting old-style function calls to use CommCommonCbParams is fine,
but, again, it would be better to get rid of those calls instead.
All changed. Running my checks again now before commit and re-submit.
It would be nice if you could run it with valgrind. A few bugs I was
lucky to spot (e.g., double-destruction and peer leak) would have been
identified by valgrind.
Will do.
Amos
--
Please be using
Current Stable Squid 2.7.STABLE9 or 3.1.5