Alex Rousskov wrote:
On 06/09/2010 05:44 AM, Amos Jeffries wrote:
(since last submission I've implemented most of Alex suggestions


At the design level, the following problems need to be addressed, IMO:

Connection dual-use: Replace unconnected connections with some sort of
connection destination class and use it to configure the connection
opener (ConnectStateData) object for solo and paths cases. The
connection opener should return a single working Connection object on
success (among other things).

Um, one of the middling-important goals here is to reduce copying of the comm data properties from one alloc'd object to another and stack. Preserving the handles (refcount pointer) already stored by callers (FTP, DNS for starters). I'm happy to make a typedef to differentiate the scopes for readability. Though it seems to me that will just make more complexity.


ConnectStateData nature: The connection opener class should become an
AsyncJob and treated as such because it is.

Okay. Underway.


Either separate or merge single- and multi-destination cases in
ConnectStateData: The current code still merges some but not all
"single-destination" code with "handling one of the multi-destinations"
code, resulting in bugs. We should fully merge the two similar cases.


Specific questions, suggestions, and lower-level comments are below.


Done. Dropped multi-path code from comm layer.
Forwarding has retryOrBail logics that are used to restart an early aborted paths. Just re-using that to loop over the vector for single-path opening attempts looks okay.
Building and will need to re-test later.


Peer selection has been extended to do DNS lookups on the peers chosen
for forwarding to and produce a vector<> of possible connection
endpoints (squid local IP via tcp_outgoing_address or tproxy) and remote
server.

ConnectStateData has been moved into src/comm/ (not yet namespaced) and
had all its DNS lookup operations dropped. To be replaced by a looping
process of attempting to open a socket and join a link as described by
some Comm::Connection or vector<> of same.
 Limited by connect_timeout in its running time.
 Extended by connect_retries number of attempts to open each path.

Can ConnectStateData handle DNS lookups? Does the peer selection
algorithm need tight control over those lookups before we try to open a
connection to a peer?

No, ConnectStateData doing its own DNS was the v4/v6 block on split-stack IPv6 support for BSD. It blocks outgoing address selection based on higher scoped HttpRequest details.

Pre-seeded peer DNS results would probably be a better performance boost than doing new lookups each time. I've not found it to be a particularly slow lookup, since the peers are most certain to be static or constantly looked up the DNS is kept fresh in ipcache.


ConnectStateData::connect() will go away and do some async work. Will
come back at some point by calling the handler with COMM_OK,
COMM_ERR_CONNECT, COMM_TIMEOUT and ptrs to the Comm::Connection or
vector (whichever were passed in).
 On COMM_OK the Comm::ConnectionPointer or the first entry of the vector
will be an open Comm::ConnectionPointer which we can now use.
 On COMM_ERR_CONNECT the vector will be empty (all tried and
discarded),  the single Comm::ConnectionPointer will be !isOpen() or NULL.
 On COMM_TIMEOUT their content is as per COMM_ERR_CONNECT but the vector
may have untried paths still present but closed.

FD opening, FD problems, connection errors, timeouts, early remote
TCP_RST or NACK closure during the setup are all now wrapped out of
sight inside ConnectStateData.

Who is responsible for closing the opened connection descriptor if the
requestor disappears?

The old requestor code could close the FD on job exit because it had
access to the descriptor. With the new code, the descriptor should not
be available to the requestor until the connection is established.

[ I would appreciate if we do not argue exactly which requestors may
disappear -- it is bound to happen sooner or later, and I am tired of
fixing hard-to-find bugs in the code that simply assumed otherwise
because it seemed safe at the time. Let's at least write the new code
the right way. If it is an async job, we should never assume it is
around after AsyncStart. ]

Yes. 100% agreed.

If the requestor disappears and everything goes pear shaped the Comm::Connection has just enough smarts to close the FD and prevent leaks in its own destructor.

In the case where the original requestor has died then ConnectStateData or dialer parameter details is left the only holder of the ConnectionPointer and FD closure will be actioned when that state gets dropped by the dialer.

We still have the case where an object holding a Comm::Connection pointer leaks at a higher level. But there is nothing extra comm can do about that.


* Comm::Connection assignment operator does not handle assignment to
self. Please either fix the operator or make the assignment impossible
(if it is not needed).

Done.


* Comm::Connection assignment operator should return non-const reference
to follow typical C/C++ assignment expectations.


Done.


* Do we really need to support Comm::Connection copying and assignment?
Seems like a very dangerous operation given the fact that Connection
closes it descriptor upon destruction. For example, the following code
is currently possible but is totally wrong:

  void foo(Connection c) { ... }

  Connection c;
  c.fd = openex...
  foo(c);

If we do not use Comm::Connection copying and assignment now, I would
strongly recommend prohibiting it. Otherwise, please point me to the
code that does.

Um. Ident needs to duplicate the received Comm::Connection before playing with some of its fields and using the raw received data. I'll make that a memcpy() and unset the required fields instead of assignment copy.


* Do you plan on converting idle pconn pools to use Comm::Connection?


Yes. pconn and pinned FD are next on the list to convert after this bit is merged.


* Please use "const reference" types for Pointer methods parameters by
default. This avoids needless locking/unlocking during parameter
copying. For example,

ConnectStateData(Comm::PathsPointer paths, AsyncCall::Pointer handler)

can be optimized as

ConnectStateData(const Comm::PathsPointer &paths, const
AsyncCall::Pointer &handler)

Methods okay I understand.
That example is a bad choice though, the constructor needs to lock references.



* Do not free the host before deleting self because the destructor will
do it for you:

    callback = NULL;
    safe_free(host);
    delete this;


Done.


* The ConnectStateData code still has slightly different code executed
for solo and path cases, code where solo/path decision is tested in
slightly different ways, etc.

I have already commented on that and you responded that fixing this bug
is too inefficient. I disagree. There is no need to allocate paths if
they are not used.

If you provide a simple currentDestination() accessor for the current
path/solo destination and a goingSolo() test method, the code will not
become less efficient but may become correct or at least clearer.

Another efficient option is to have two classes. One opens a connection
for a single destination. The other one uses the first class to open a
connection for one of several destinations. This is an arguably a
cleaner design because it clearly separates two different problems at
the expense of creating two jobs instead of one.

I think on of these two changes changes must be done. I will point a few
specific problems that the lack of uniform solo/paths treatment has
caused below, but this is not an exhaustive list.


Now dumped the multi-path code in comm layer.



* Who creates the descriptor for the solo case if the requestor did not
open it? There is such code for the paths case but not for the solo
case.  For example, comm_openex() call was removed from the ICAP code,
but I do not see the corresponding call in ConnectStateData::connect.

Um, okay I've seen what you mean. Making ConnectStateData a pure solo-path.
Forwarding has retryOrBail logics which are now used to loop over paths. But tunnel needs some equivalent added at a later date to fix the CONNECT retry bugs.


If paths code can create descriptors on its own, without requestor
participation, the solo code should be able to do the same.


* The assertion below seems to contradict the code logic

        /* catch the error case. */
        assert(paths != NULL && solo != NULL);


Dead with paths.


* Why does the code below is not executed for the solo case?

    if (paths != NULL && paths->size() > 0) {
        fd = (*paths)[0]->fd;
        debugs(5, 3, HERE << "FD " << fd);
        comm_remove_close_handler(fd, ConnectStateData::EarlyAbort, this);
        commSetTimeout(fd, -1, NULL, NULL);
    }


Fixed.


* Why does the code below is not executed for the solo case?

        if (connstart == 0) {
            connstart = squid_curtime;
        }

Fixed.




* Who sets the ConnectStateData::EarlyAbort handler? What is the handler
for?

Ouch. Should have been ConnectStateData::connect(). Fixed.
It's to catch the partially-open FD cases.



* ConnectStateData is not owned by anyone, deletes self, and is
scheduling/receiving async calls. It should be converted into an
AsyncJob with proper start/done methods. As a free bonus, this will
greatly help with debugging.

Sweet. Underway.

When I get my head around the Dialer parameters usage, those ConnectCbFunPtr will die as well.



* Do not pass a pointer to the data member to an async job for long-term
use. For example, when the Forward object disappears, the copy of the
&paths pointer in ConnectStateData will point to garbage:

        ConnectStateData *cs = new ConnectStateData(&paths, call);

I did not check for similar bugs elsewhere.


Fixed with the dropping of ConnectStateData::paths.


* You need to prohibit copy-construction, not parameter-less creation
via a default constructor:

/* These objects may NOT be created without connections to act on. Do not 
define this operator. */
    ConnectStateData();

No default constructor is generated if there are explicit ones.

Meh. Thanks. Done.


* The return type of the assignment operator should be a non-const
reference. This is not really a runtime bug, but please fix:

    const ConnectStateData operator =(const ConnectStateData &c);


Done.


* Why take a mini pause and schedule an zero-delay event here? Can we
just proceed with the next step instead? We are inside an async call
handler so I do not see any need to make us "more async":

eventAdd("ConnectStateData::Connect", ConnectStateData::Connect, this, 0.0, 0);


The old code said to prevent super-fast loops. I didn't change it.
Pre-async cruft probably. Changed.


* Here is a suggestion for the ConnectStateData description, repeated
from the previous email, just to make sure it is not lost in the noise:

/// Async-opener of a Comm connection.
/// Can find the first "working" among multiple destinations.

If you agree with the above scope, let's s/ConnectStateData/ConnOpener/g


Ah agree. A nicer name too. Changed.
Though the multi-paths is now dead. Taking first line of the two.


* The new Must() below does not seem to match the new allocation code.
If the connection is not NULL, we will have a memory leak so we should
not imply that it may not be NULL in Must():

+    Must(connection == NULL || !connection->isOpen());
const Adaptation::Service &s = service(); if (!TheConfig.reuse_connections)
         disableRetries(); // this will also safely drain pconn pool
+ connection = new Comm::Connection;


* Please replace

+    // TODO:  where do we get the DNS info for the ICAP server host ??
+    // Ip::Address will do a BLOCKING lookup if s.cfg().host is a ...

with

// TODO: Avoid blocking lookup if s.cfg().host is a ...


Done.


* Can you add a host() getter/setter for ConnectStateData? Not a big
deal, but it bothers me that we are duplicating so many xstrdup() calls
(all users except Ident!) and leaving the host member open to abuse.


Done. Added a NULL wrapper check as well.


* In the code below, can we set the connection pointer to NULL instead
of dragging it (and the connection object) around after closeConnection()?

         writer = NULL;
         reader = NULL;
         connector = NULL;
-        connection = -1;
     }
 }

Particular references can be set to NULL if no longer needed. Just be careful of logging, which often needs things long after the active code.

Setting an AccessLogEntry field and unsetting connection local field is probably the way to go.


* Can the new code below be moved after the Xaction state is checked and
properly synced? The current implementation leaves Xaction in an
inconsistent state: we are connecting and the connection have timed out.

+    if (io.flag == COMM_TIMEOUT) {
+        handleCommTimedout();
+        return;
+    }
+
     Must(connector != NULL);
     connector = NULL;

Inconsistent? That is the state.
We are connecting and the connection attempt has taken too long and already closed.

If that is the wrong handler for a timed out comm action, then please point me at the right one.

This is as far as I can get today. Will try for more later.


* Note how the dual-use Connection prompts you to allocate and then get
rid of a "temporary" connection in the ICAP code below:

+    connection = io.conn;

This waste would not exist if we did not use Connection objects as
destination detail holders.


* Adaptation::Icap::Xaction::fillPendingStatus and
Adaptation::Icap::Xaction::fillDoneStatus will now crash when connection
pointer is NULL. Please fix.


* Please move (connection != NULL && connection->isOpen()) into a
dedicated haveConnection() const method instead of duplicating the same
code many times.


* Xaction is not the only ICAP code using the connection member. Please
fix all the affected code. For example:

void Adaptation::Icap::ModXact::startReading()
{
    Must(connection >= 0);



* Please shorten the comment below to "///< ICAP server connection" even
though it is currently not very accurate if we are not reusing a pconn
and have not connected yet (due to dual-use of Connection).

-    int connection;     // FD of the ICAP server connection
+    Comm::ConnectionPointer connection;     // Handle to the ICAP server 
connection



* Does connect_retries apply to ICAP connections? If yes, the
"forwarding" description below is a little misleading.

+       This sets the maximum number of connection attempts for each
+       potential host address selected by forwarding.

Can we clarify how one can get multiple potential host addresses here?
And whether we are talking about multiple domain names or multiple IP
addresses?


* Not a big deal, but I would rather see us _removing_ such personal
artifacts than adding more of them:

+ * AUTHOR: Amos Jeffries
+ * Copyright (c) 2010, Amos Jeffries <[email protected]>

Technically, we can all add our names to the majority of Squid files
because any active developer added at least something there, but it only
wastes ink and time, IMO.


* Open paren in the comment below. Why not use C++ // one-line comments
for one line comments?

/* set the new one (unless it is NULL */



* Please remove Comm::PathsPointer typedef. It is very misleading
because we name refcounting pointers that way. Most of my review, I
assumed it is a refcounting pointer... I probably missed bugs because of
that.

If you do want to have a typedef for Paths pointer (even though the
typedef name is longer than using the raw pointer type!), consider
something like

    typedef Paths* PathsRawPtr;


I have not fully reviewed all the code but I hope the above is
sufficient to help you move forward. Please CC me if you post a new
version for review.


Thank you,

Alex.


Amos
--
Please be using
  Current Stable Squid 2.7.STABLE9 or 3.1.4

Reply via email to