On 2014-01-07 10:23, Kinkie wrote:
Hi,
  here's a merge proposal for Parser::Tokenizer, an implementation of
the API suggested by Alex.

The feature branch is available at lp:~squid/squid/sbuf-tokenizer

in src/parser/Tokeniser.h
* documentation call the second parameter of token() 'delimiters' but the code calls it 'whitespace'. Please be consistent, keeping in mind there is no dependency on it being whitespace (ie "Foo:,,,blah,,,\n" will have delims "," to be skipped)
 + the .cc needs to be updated to match the choice here.

* comment "Use three prefix() calls instead" is inaccurate.
The code may require non-3 numbers of prefix() calls. You can remove "three" and possibly also "calls" from that statement.

* the comments suffix "(a token)" on all the skip() methods seem unnecessary, and 'token' means different things to the different layers of the parser we will be dealing with. Best not to confuse future dev.

* double-empty line after class definition.


Regarding the semantics of prefix():
It would be useful to have the capacity for limiting how long the prefix will search down the buffer. A third SBuf::size_type N=npos parameter would allow us to do things like search for up to 16 bytes of method name and error quickly on 32KB of random hex data. - token() would be nice limiting too, but as long as prefix provides it we can refer to prefix() for all cases where length safety is needed.


in src/parser/testTokenizer.cc:
* no need for empty line after squid.h

* please use the standard token types now defined in CharacterSet for these test parses instead of defining some wrong sets with same names (ie "whitespace", "crlf"). - New custom sets are of course okay but should be clearly custom designs.

* testTokenizer::testCharacterSet() does nothing.

* simple construct/destruct tests are missing.
- how do we know the buffer bytes parsed are actually the ones given before testing prefix()/token()/skip() ? - how can we be sure the Tokenizer is not triggering a wasteful data-copy in its constructor?

* testTokenizer::testTokenizerPrefix() seems wrong in several ways.
- Just provide random input with various edge cases you can think of embeded (a fuzz test might be good there). - It is currently (badly) replicating things more properly found in testHttpParser units and implying that those calls might be how the HTTP is actually parsed (ALPHA-only method name etc).


in src/parser/testTokenizer.h
* Please order the testing such that the simplest operations are tested first. - with any functions tested in dependency order (eg token() calls skip()? then test skip() first)

* Please test the full API. Including simple things like atEnd(), reset().

* Please keep tests for simple APIs like this cleanly separated.
- a test should setup a specific state, run some API call, then test the resulting state is correct. - testTokenizer::testTokenizerSkip() is making needless tests of prefix() which consumes buffer while apparently verifying previous skip() state results. + instead add "#define private public" in front of the Tokenizer.h include and use SBuf operation on buf_ directly to verify its state without consuming any of it. The buf_ checks could also be added through the rest of the unit tests alongside the split-off token SBuf content checks.

Amos

Reply via email to