On 05/19/2014 12:29 AM, Amos Jeffries wrote: > Updated patch attached. PREVIEW since the file renaming has been > omitted, as have incomplete Tokenizer related changes.
I have not reviewed most of the details, but the preview looks good to me overall. New temporary SBuf allocations for message parts will be removed as you polish the parsing code, right? Or are you proposing to commit those performance regressions first? At some point, please check for The Big Three rule violations in new or rewritten classes. Removing _unused_ copying is fine if it is easier than correctly supporting all three. Please add "explicit" to any new/modified single-required-parameter constructor or document why we want to support silent type conversion. Parser(SBuf) is the one I noticed but there may be others. > /** HTTP protocol parser. > * > * Works on a raw character I/O buffer and tokenizes the content into > + * either an error state or HTTP procotol major sections: It is not possible to tokenize something into an error state (unless you are parsing errors), and major protocol sections are not exactly "tokens" either. Suggestion: /** Parses raw buffer to extract major HTTP message sections: ... (the existence of an error state is obvious for any parser and does not need to be documented in the class description IMO, but you can add it if you prefer, of course). Please consider s/isDone()/done()/ -- see the last blob in "Member naming" item #3 at http://wiki.squid-cache.org/Squid3CodingGuidelines Consider s/doneBytes/parsedBytes/ although I think that method is not needed right now, as discussed further below. > + /** Whether the parser is already done processing the buffer. > + * Use to determine between incomplete data and errors results > + * from the parse. > + */ > + bool isDone() const {return parsingStage_==HTTP_PARSE_DONE;} s/determine/distinguish/ However, it is not clear how and why isDone() should be used for distinguishing errors from "need more data". I suspect that this needs to be renamed or redescribed but I am not sure about your intent here. Perhaps this would work? /// Whether we finished parsing (successfully or otherwise). Personally, I find the "reverse" methods like "needsMoreData()" easier to work with, but that is a matter of taste: if (parser->needsMoreData()) return; // wait for more bytes (clear and simple) if (!parsedAll) ... does not need more and failed to parse: must be an error ... versus: if (!parser->isDone()) What does isDone() return on errors? And we are negating, so ... if (!parsedOk) ... handle error or does it need more data? ... > + if (hp.doneBytes()) { > + // we are done with some of the buffer. update the ConnStateData > copy now. > + csd->in.buf = hp.buf; > + } I recommend removing the doneBytes() condition to simplify code and minimize the number of different paths we need to worry about: // the parser may have consumed some data, keep us synced csd->in.buf = hp.buf; If our code is any good, the removal of the guard will not introduce any significant performance penalty but will simplify the code logic significantly: The buffers are always the same after parsing. > +namespace Http { > +namespace One { ... > /** HTTP protocol parser. Placement inside Http::One suggests that this is an HTTP/1 protocol parser, not a generic HTTP protocol parser. Please document as such if the placement is correct. > + Parser(const SBuf &aBuf) { reset(aBuf); } > - void reset(const char *aBuf, int len); > + void reset(const SBuf &aBuf); Have you tried removing the above constructor and reset() method while supplying the buffer directly to the parse method as a parameter instead? I suspect it may be better than doing this direct manipulation of the parser buffer: > + if (!parser_ || parser_->isDone()) > + parser_ = new Http1::RequestParser(in.buf); > + else // update the buffer space being parsed > + parser_->buf = in.buf; Thank you, Alex.