Re: [PATCH] Comm::TcpReceiver - part 1, 2, and 3

Amos Jeffries Sat, 01 Mar 2014 01:56:20 -0800

On 1/03/2014 5:23 p.m., Alex Rousskov wrote:
> On 02/08/2014 04:00 AM, Amos Jeffries wrote:
>> On 8/02/2014 8:05 p.m., Alex Rousskov wrote:
>>> Agent will have pure virtual functions, of course. Event the existing
>>> ClientAgent (i.e., ServerStateData) does!
> 
> 
>> Oh you are looking at Agent as part of the hard-coded inheritence chain
>> of the protocol manager class?
>>  I have been looking at it as a dynamicaly assigned member of the
>> manager class. So that we can pass an Agent between say ConnStateData
>> and TunnelStateData. Or switch a ConnStateData Agent pointer between
>> TcpAgent and TlsAgent when doing ssl-bump on a client connection.
> 
> 
> I have provided my working definition of Agent on January 24, 2014:
> 
>   Agent: Common base for Client and Server agents. Uses Transport.


Fine. Now the definitions of those "Client and Server agents" ...

>   Client: Common base for HttpClient, FtpClient, IcapClient, etc.
>   Server: Common base for HttpServer, FtpServer, etc.
>   HttpClient: Sends HTTP requests. Receives HTTP responses.
>   FtpServer: Receives FTP commands. Sends FTP replies.
>   ...
> 
>   TransportLayer: Common base for TcpLayer, SslLayer, and other layers.
>   TcpLayer: A final TransportLayer using a TCP socket Comm APIs for I/O.
>   SslLayer: A TransportLayer using another TransportLayer for I/O.
>   ...

Problem: the Comm:: layer and fd_* layer APIs provide an abstracted sent
of functions that (currently) provide *all* of the above layers behind a
single open/close/read/write/set-timeout API. In similar separation as
OSI layer 4 and layer 5 have a definite shear line between them.

fd_* provides the socket state API. While Comm:: / comm_* provides the
semantics abstraction API.

This is the cause of my attempted TcpReceiver class having rarely any
TCP-specific operations. Because it was built from code semantically
above that shear line in the abstraction (the Agent tasks in your
definition as you noted below).


> 
>   Transport: Contains one or more ConnLayers. Handles I/O multilayering.

ConnLayer? are you meaning UDP/TCP/SSL FD and read(2)/write(2)/etc.

Please can you also explain what you mean by "I/O multilayering". You
seem to be defining it as something separate from both
UDP/TCP/TLS/SSL/etc low-level protocols (xLayer above) and HTTP/FTP/etc
high-level protocols (clients/servers above).

My understanding of the term is protocol stacks like
WebSockets-over-HTTP-over-TLS-over-TCP-over-IP-over-...

I am also struggling to see how what it does differs from TransportLayer.


> 
>   Connection: A socket with a peer address. We have that already.
> 

Overall my impression of this description is that Comm:: and comm_*()
API is the Transport -> xLayer -> TransportLayer stack of classes
without any actual class wrapper.
 Am I right?


FWIW: Since these transport pieces all seem to be scoped at where the
Comm:: and comm_* and fd_* code APIs exist today I'm doubtful we should
be even discussing them much further at this stage. This project patch
is explicitly trying *not* to change any of that layers code just yet.

I thought TcpReceiver would be what you define Transport. But I agree it
does seem to be far too high level semantics for that.


> 
> Whether Agent is a dynamically assigned member of some other class
> should not be relevant to what Agent is. We should strive to define
> things for what they are/do, not for how they may be used (to the extent
> possible).
> 
> IIRC what ConnStateData does today, it is an HttpServer in the above
> definition (i.e., an Agent).

ConnStateData in trunk seems to be a conflated base Agent with HttpAgent
and HttpServer operations.

Ideally the HttpServer operations are more properly done by
ClientSocketContext via the use of ClientHttpRequest for the message
semantic operations.


> 
> Overall, I think your TcpReceiver class is stuck somewhere between
> 
> * Agent (that is how you actuall use TcpReceiver in the patch!) and
> 
> * Transport (that is how you talk about TcpReceiver on this thread when
> explaining/justifying some of its internals).
> 

Agreed. However I think its only stuck-between in the details that its
been placed in Comm:: where the Agent semantics dont make sense, and
named wrongly after a Transport API which it does not specifically provide.

If I am understanding you right this class should be named AnyP::Agent
in the generic protocol library instead of attempting to be part of
Comm:: related to TCP.

 Do we agree on that?



> This is just my impression, of course! If I am on the right track, then
> I suggest you pick the Agent design because
> 
> a) that is how you _use_ your class, which is more important than the
> implementation details; and
> 
> b) it is usually easier to design and implement classes you can use
> immediately rather than classes that you have to just keep around for
> future use.
> 
> and we can make much better progress from there.
> 

Okay. I agree on the design. I have been confused about how several of
the definitions map to Squid code (existing or after re-write). I hope
my questions and comments above have clarified where that confusion is.


> 
>> The server-facing Agents already are separate as you noted. They are
>> decended from ServerStateData ... but the manager over there is called
>> FwdState rather than ConnStateData/ClientSocketContexxt.
> 
> I am not sure what an "Agent manager" is, but I know that FwdState today
> is essentially a connection opening code. Its relationship with Agents
> is pretty much limited to starting them (with a freshly opened or reused
> idle/pinned connection).

Sorry my confusion getting in the way. It was a shared generic term for
what you are defining slightly more specifically as clients/servers.


<snip the TCP vs TLS layer talks>
Okay, good points. Lets shelve this until it is more relevant.

> 
>>  That is easier to handle with a distinct TlsWhatever agent than relying
>> on the comm_*() abstraction layer to determine a read(2) / write(2)
>> equivalent (from the global fd_table BTW).
> 
> If we can agree on what HTTP and FTP Agents are first, we may have a
> better chance of being able to add Peek and Splice (and other
> complexities) to that as a second stage. Even if new features are going
> to require some redesign, the initial agreement would be more valuable
> than the current stalemate, where we cannot even agree on what
> TcpReceiver class is (and, hence, how it should be named)!
> 

Okay this is what I know of the protocols requirements:

* HTTP/1 protocol Agent requires semantics for sending bytes and
receiving bytes over a single Transport.

* HTTP/2 Agent requires semantics for sending bytes and receiving bytes
over a single Transport which may also need to switch arbitrarily
between TcpTransport and TlsTransport at byte boundaries as defined by
the HTTP/2 protocol framing.
 NP: the peek-n-splice BIO methods over a TcpTransport would seem
necessary for this.

* Secure-HTTP Agent (RFC2660 which I would like to implement soonish)
requires an HTTP/1 Agent.

* ICY protocol Agent requires an HTTP/1 Agent.

* Tunnel (blind relay) requires two Agents with independent Transport each.
  One for each of client and server channels.

* FTP requires two Agents requires semantics for sending bytes and
receiving bytes over TcpTransport. Also the ability for both Agents to
share a TcpTransport. Also data channel Agent requires ability to block
ctrl channel Agent from reading.
  One for send/receive on ctrl channel, one for receive on data channel.
Shared Transport when "data channel" is an ASCII mode transfer via ctrl
socket.

In general the clients/servers need to be able to pass around and share
either Agent or Transport, whichever layer has control over both the
input buffer data and socket FD.

In summary overall Agent requires:
 * semantics for sending bytes
 * semantics for receiving bytes
 * ability to switch Transport type arbitrarily
 * ability to block another Agent reading on its Transport
 * maybe ability to switch between client/server owners/users.

> 
>>>>> That buffer actually contains "final protocol" bytes, for whatever
>>>>> protocol the kid agent is implementing (HTTP, Gopher, etc.). The whole
>>>>> TcpReceiver class is not about TCP at all!
>>>>
>>>> It's about bytes going to/from a TCP connection.
>>>
>>> Nope. The transport and any encodings/encryptions on top of it are
>>> pretty much irrelevant to the current (and future) TcpReceiver code. If
>>> you disagree, please show me important TCP-specific code in your current
>>> TcpReceiver implementation.
> 
> 
> You have not responded to the above paragraph. Does it mean that you
> agree that connection transport and any encodings/encryptions on top of
> it are pretty much irrelevant to the current (and future) TcpReceiver code?
> 

Yes.

> 
>>>> '''Common base for classes reading (and writing) from (and to) a single
>>>> TCP Comm Connection. Contains basic connection management code and
>>>> complex algorithms for coordinating I/O activity with message receiving
>>>> (and sending) states.'''
>>>
>>> Please show me the TCP specific code in your TcpReceiver implementation.
>>>
>>
>> The algorithm model behind readSomeData() assumes a streaming protocol
>> with single input stream with serialized read(2).
>>  - eliminating direct use by SCTP, FTP, or other multi-connection protocols.
> 
> I do not know about SCTP, but current TcpReceiver does not eliminate
> direct use by FTP. FTP agents can use this code for the control channel.

Ah. You are talking about FTP Agents controlling both channels. I am
talking about a TcpReceiver being used on each channel separately.



> 
>> Use of comm_read() and Comm::Write() relies on a socket with TCP_STREAM
>> socket type,
> 
> It does not. Existing IPC code uses both comm_read() and Comm::Write()
> for UDS sockets.
> 
> 
>> "Comm::Connection tcp;" is likewise a TCP_STREAM connection
> 
> Comm::Connection is not TCP_STREAM connection. IPC uses Comm::Connection
> for UDS connections.
> 
> 
>> assuming direct read(2)/write(2) access is possible.
> 
> Comm::Connection does not make that assumption. The I/O methods are
> handled at a separate layer. This is why TCP, SSL, and UDS code can use
> the same Connection and high-level read/write API. That part of the
> design is valuable/good. We should not break it. High-level message
> (HTTP, FTP, etc.) handling code should not care what the transport is.
> 
> 
>>  - all of these eliminate its use for UDP or UDS based protocol
>> receiving/sending.
> 
> As the existing/working code illustrates, they do not.
> 
> 
>>  We could just as easily (for the model, not the code) have the child
>> class doing TLS decryption of encrypted bytes read(2) directly off the
>> socket in to TcpReceiver. Given the plans I've seen about
>> Peek-and-Splice using BIO handlers we will probably end up doing exactly
>> that in some code.
> 
> You are right, except "use transport X with encoding Y for HTTP" is not
> a "parent-child" relationship. It is a "use" relationship. Implementing
> it using Agent child classes would be wrong. That relationship should be
> implemented via class data members instead.
> 

Yes :-)

> 
>>> In other words (and simplifying a bit):
>>>
>>>   /// raw TCP bytes (WRONG: could be SSL-decrypted TCP)
>>>   MemBuf inBuf;
>>>
>>>   /// opaque received bytes (OK but a little vague)
>>>   MemBuf inBuf;
>>
>> Explicitly accurate for what is in there though. Which is why I left it
>> undocumented. "inBuf" seems clear enough.
> 
> There is a significant difference among the three inBuf descriptions I
> have posted so, no, inBuf without a description is not clear enough.
> 

Fine. "opaque received bytes" it is then.

> 
>>>   /// received bytes that are yet-unprocessed by our kids;
>>>   /// opaque for us, but kids know the protocol these bytes
>>>   /// are using (e.g., HTTP, FTP, etc) (Better)
>>>   MemBuf inBuf;
>>>
>>> The actual final description may also mention what we expect the kids to
>>> do with the buffer (copy, leave just the unparsed bytes, empty on exit,
>>> etc.).
>>
>> We expect them to process it. Whatever that may mean to the child. This
>> class places no restriction or expectation on that.
> 
> Your own TcpReceiver code treats empty inBuf specially so, yes, you are
> already placing expectations and restrictions (and that is OK; we just
> need to agree on them and disclose them).
> 

?? empty is a state of buffer space (empty,some,full) unrelated to any
processing applied on bytes when they exist or knowledge of what they
contain.

Um. Do we really need to continue this sub-thread? seems to be
distracting from the main issues.


> 
>> +bool
>> +Comm::TcpReceiver::injectFakeRequestXXX(MemBuf &fake)
> ...
>>>>> Overall, I am not sure this method is needed. The caller can do the
>>>>> manipulations on its own, right? It may be easier for the caller to add
>>>>> the required capacity checks than implement this general code correctly.
> 
>>>> Ideally not. I am trying to move the inBuf to a private member so the
>>>> childs should only be playing with the buffer they get passed by
>>>> processReadBuffer(MemBuf &)
> 
>>> As you know by now, I do not think that hiding input buffer from kids is
>>> going to work unless you force kids to copy it. TcpReceiver does not
>>> control all the events in a transaction life and, hence, cannot prohibit
>>> access to the buffered message which may be waiting for those events.
> 
>> By making this a private/protected member we restrict access to it from
>> objects outside the child class and restrict parsing/processing actions
>> on it to the code where it is passed as parameter to
>> processReadbuffer(Membuf &).
> 
> Yes, but we were discussing whether injectFakeRequestXXX() is needed, to
> which you replied that you are trying to make inBuf private (not
> protected!). So, if you now agree that kids should have access to the
> buffer, let's remove injectFakeRequestXXX() with all if its problems.
> 

That was my reasons for making it private. Not agreement on keeping it
public. We want the I/O data to have to pass through parsing/processing
layer before anything else gets access to its contents.


Also, I do not as yet see any reason to avoid having it private in the
Agent model. If we do need it in future fine, but this effort has
revealed only a few places abusing the buffer that need to be fixed and
I'm not happy conflating this big patch with that extra work just yet.

Amos

Re: [PATCH] Comm::TcpReceiver - part 1, 2, and 3

Reply via email to