On 02/05/2014 07:55 AM, Amos Jeffries wrote: > Here we have the combined patch for what I have been calling part 1 - > Comm::TcpReceiver interface, part 2 - ConnStateData object API upgrade, > and part 3 - HttpStateData object API upgrade.
I am puzzled about the status of this work. There seems to be unfinished changes in some critical paths of the code such as HttpStateData::mayNeedToReadMore() but you are posting this as a PATCH, implying it is ready for commit from your point of view. Could you please clarify whether you consider these changes ready for trunk commit and use? > * Kept the TcpReceiver name for now since Alex is disagreeing with > callign the interface an "agent". I am thinking of calling it just > "Comm::Tcp" in its current form. Once we agree on a description for the new class, we should be able to name it. Until then, let's keep the wrong name to minimize renaming overheads. I do not recall objecting to Agent. Agent is the right name if that is what you are writing (it is difficult to be sure based on the current code but I will discuss this below). > * The clear thing to keep in mind is that this TcpReceiver object is > *just* the interface design for the I/O actions. It needs to be > inherited by some other object which performs the processing logics on > the I/O data. So, currently, you define TcpReceiver as a base for classes that do [TCP?] I/O. That is a start, although I hope we can be more specific than that. Something like this might be a step forward: '''Common base for classes reading (and writing) variable-size protocol messages from (and to) a single TCP stream. Contains basic I/O code and complex algorithms for coordinating I/O activity with message receiving (and sending) states.''' Please note that I am not saying the above description is correct! That description still does not address SSL concerns, for example. However, it may be better than the current nothing because it is difficult to review a class without a stated scope or intended purpose (wrong or otherwise). > + MemBuf inBuf; The above undocumented TcpReceiver data member may be critical to defining TcpReceiver and related classes correctly. What does that buffer contain? Raw TCP bytes? I do not think so. Do we read raw TCP bytes into that buffer? Not quite. Those bytes did not necessarily got there from a TCP connection. They may have been through an SSL decoding layer using ssl_read_method() and Comm I/O API (at least). That buffer actually contains "final protocol" bytes, for whatever protocol the kid agent is implementing (HTTP, Gopher, etc.). The whole TcpReceiver class is not about TCP at all! If that is correct, the class description can be adjusted to something like: '''Common base for classes reading (and writing) variable-size protocol messages from (and to) a single Comm Connection. Contains basic connection management code and complex algorithms for coordinating I/O activity with message receiving (and sending) states.''' Again, I am not saying the above is correct, but I think it reflects the code you are writing better (than nothing or than the TCP-based definition I wrote above). And once we agree on what is the correct definition, we can discuss the class and member names to match it; I am not focusing on the naming problems right now. > +bool > +Comm::TcpReceiver::doneAll() const > +{ > + return stoppedSending() && stoppedReceiving() && !inBuf.hasContent() && > AsyncJob::doneAll(); > +} If we are done sending and receiving why would having buffered content (the third guard above) matter? What can we do with that content (in the scope of this class) that does not involve sending or receiving? > +void > +Comm::TcpReceiver::releaseConnection(const char *reason) > +{ > + // Used by clients to release the connection before > + // storing it in a Pconn pool for reuse. > + comm_remove_close_handler(tcp->fd, closed_); > + closed_->cancel(reason); s/clients/callers/ or s/clients/kids/ to avoid the impression that this code is specific to Client agents. It is not. "release" is kind of undefined in this context. I think you are talking about the need to stop monitoring connection for I/O. Consider rephrasing the comment to be more precise. > +void > +Comm::TcpReceiver::releaseConnection(const char *reason) > +{ ... > + // XXX: remove half-closed handler ?? > +} If the method is used exclusively to prepare a connection for entering a PconnPool (or similar actions), then the connection Must() not be half-closed (and the method should be renamed?). Otherwise, yes, we should remove the half-closed monitor to be in sync with the method definition. > +void > +Comm::TcpReceiver::stopReadingXXX() > +{ > + /* NP: This is a hack needed to allow TunnelStateData > + * to take control of a socket despite any scheduled read. > + */ > + if (reading()) { > + comm_read_cancel(tcp->fd, reader_); > + reader_ = NULL; > + } > +} I recommend removing the above comment. We already marked the method with XXX and documented its dangers. The above comment does not add much value, will get stale, and raises questions like "Why is this method here and not in the one specific caller it was designed for?". If you remove the comment, the reader can still search for stopReadingXXX() callers. Callers should document why they are calling an *XXX() method. > + // useless to read() after aborting read() > + if (stoppedReceiving()) > + return; It is not "useless", it is wrong. And stoppedReceiving() does not mean we aborted reading. We could just stopped reading after EOF. Consider: // do not start now if we have already stopped or something like that. > + /// Inject a fake request as if the client had sent it. > + // httpsSslBumpAccessCheckDone() should be doing explicit state setup > inste > ad of parsing. > + bool injectFakeRequestXXX(MemBuf &fake); As implemented, the method is not really specific to clients, requests, fake messages, or even messages. It just prepends some protocol bytes. Technically, there is nothing XXX about the method either! If you keep it, consider naming it inject(). If there is something XXX about it that I missed, let's use injectXXX(). I think we should document that the message is injected before existing buffered content, if any. Please move caller-specific explanation of the XXX to the caller where it is less likely to get out of sync. > +bool > +Comm::TcpReceiver::injectFakeRequestXXX(MemBuf &fake) > +{ > + if (inBuf.hasContent()) > + fake.append(inBuf.content(), inBuf.contentSize()); > + > + inBuf.reset(); > + inBuf.append(fake.content(), fake.contentSize()); > + > + return processReadBuffer(inBuf); > +} Let's not combine injection with processing. Let the caller call processReadBuffer() if/when they are ready for that. It would make the method more powerful but simpler. And you would not have to fix its documentation to describe the return value, etc. Another layer of problems with this method are two similar asserts: One when the "fake" buffer is too small and the other one is when the inBuf is too small to accommodate the combined content. I will focus on the former. We should not assume that the fake MemBuf can accommodate the entire inBuf. MemBufs have limited capacity and it is too much to ask the caller to give us the right buffer. If you disagree, please adjust the method description to explain the requirements on the buffer parameter. If you agree, make the buffer parameter constant and use a temporary buffer (easy) or some kind of new MemBuf method with memmove() to inject content. The code should not assert (because I might be able to trigger such an assert by stuffing the input buffer with a lot of content). A throwing Must() would be OK in this hypothetical case IMO because it will just terminate a single transaction. Overall, I am not sure this method is needed. The caller can do the manipulations on its own, right? It may be easier for the caller to add the required capacity checks than implement this general code correctly. > +Comm::TcpReceiver::TcpReceiver() : > + AsyncJob("Comm::TcpReceiver"), > + tcp(), > + stoppedReceiving_(NULL), > + stoppedSending_(NULL), > + closed_(), > + reader_() > +{} If you want to explicitly list data members with default constructors, please list all of them, including inBuf. > + /// Attempt to read some data. > + /// Will call processReadBuffer() when there is data to process. > + void readSomeData(); Let's explicitly say that the method "May do nothing." It may be important for the caller not to rely on more data coming in after calling this method. > + if (!maybeMakeSpaceAvailable()) { > + stopReceiving("full read buffer - but processing does not free > any space"); > + // fall through to setup the half-closed monitoring > + } else { > + // schedule another read() - unless aborted by > maybeMakeSpaceAvailable() failure > + readSomeData(); > + return; // wait for the results of this attempt. > + } The "unless aborted by..." part is wrong -- we have already ensured that there _is_ space available above. Otherwise, the "wait for the results of this attempt" would become a bug -- the results would never come if the buffer is full. That is all I had time for today, sorry. More to come. Thank you, Alex.