Here are a few more notes, to add to the unfinished review posted earlier. The first and the last one are minor, but the rest is serious/important.
On 01/05/2014 08:21 PM, Amos Jeffries wrote: > + /// the protocol label for this message > + const AnyP::ProtocolVersion & messageProtocol() const {return > msgProtocol_;} s/this message/parsed message/ or something like that to clarify what "this message" we are talking about in a context of a parser class. You may also want to add "unset until ..." note. > /** Initialize a new parser. > * Presenting it a buffer to work on and the current length of available > data. > * NOTE: This is *not* the buffer size, just the parse-able data length. > * The parse routines may be called again later with more data. > */ > Parser(const char *aBuf, int len) { reset(aBuf,len); } This new interface implies that the Parser stores/uses raw input buffer between parse() calls. I do not think this is a good idea. This approach already leads to time bombs where the patched code adjusts the size/offsets but assumes there was no buf pointer change (which will not be true if we start using more complex buffering approaches after migrating to SBuf and such): > +void > +Http1::RequestParser::noteBufferShift(int64_t n) > +{ > + bufsiz -= n; ... > + // shift the parser resume point to match buffer content > + parseOffset_ -= n; ... > +} With the old non-incremental parsers, the buffer pointer and various offsets were reset before every parse call: > /* Begin the parsing */ > PROF_start(parseHttpRequest); > HttpParserInit(&parser_, in.buf, in.notYetUsed); > > /* Process request */ > Http::ProtocolVersion http_ver; > ClientSocketContext *context = parseHttpRequest(this, &parser_, > &method, &http_ver); As we migrate to incremental parsers, we have a choice to make: * Either the parser and the caller are going to maintain their own raw buffers, with the caller appending to the parser's buffer when new bytes arrive and the parser consuming whatever buffer [prefix] it no longer needs. * Or the parser is going to rely on the callers buffer. Storing a reference to that buffer during parse() calls is a good idea, but the buffer is going to be presumed invalid and unused between those parse() calls. We should either not store any offsets between calls at all (if they are not needed) OR require the callers to consume exactly what the parser tells them to consume before giving the parser a new buffer to parse. In the latter case, the stored offsets will always be based on the first buffer byte that the parser has seen and will not depend on any buffer shifts. For efficiency reasons, I am pretty sure we want the latter, which is what the existing/unpatched code essentially gives us, ironically. We need to document and continue to make use of it correctly (making any necessary adjustments, of course). If the latter approach is what you are trying to implement, * we need neither the noteBufferShift() nor Parser constructor parameters, * we need to pass the buffer to the parse() call, and * the parseOffset_ either needs to be maintained differently or removed from Parser completely. These are important API changes that we need to get right now, before we start making the parser more complex and truly incremental. Note that the new RequestParser appears to be using parseOffset_ but it is just [poorly] shadowing the old/existing req.start and req.end members. We need to document what those old offsets mean and use them as needed (or remove them if they are not needed!). > request_parse_status = Http::scOkay; // HTTP/0.9 > + completedState_ = HTTP_PARSE_FIRST; > + parseOffset_ = line_end; > request_parse_status = Http::scOkay; // treat as HTTP/0.9 > + completedState_ = HTTP_PARSE_FIRST; > + parseOffset_ = req.end; > request_parse_status = Http::scOkay; > + parseOffset_ = req.end+1; // req.end is the \n byte. Next parse step > needs to start *after* that byte. > + completedState_ = HTTP_PARSE_FIRST; The above is one example where parseOffset_ is maintained inconsistently. Removing parseOffset_ (until it is actually needed) would fix that. > + * \return true if garbage whitespace was found > + */ > +bool > +Http1::RequestParser::skipGarbageLines() The current code returns true even if the garbage was not found during the call. Please clarify whether the return result applies to the call (naturally) or to the history of parsing this request (unnaturally). However, I would just make this method return void. I do not think the [fixed] caller really needs to know what happened here. With this email, I am done reviewing this iteration of the parser upgrade patch. Thank you, Alex.