Re: [PATCH] Tokenizer

Amos Jeffries Mon, 06 Jan 2014 15:14:41 -0800

On 2014-01-07 10:23, Kinkie wrote:

Hi,
  here's a merge proposal for Parser::Tokenizer, an implementation of
the API suggested by Alex.


The feature branch is available at lp:~squid/squid/sbuf-tokenizer


in src/parser/Tokeniser.h

* documentation call the second parameter of token() 'delimiters' butthe code calls it 'whitespace'. Please be consistent, keeping in mindthere is no dependency on it being whitespace (ie "Foo:,,,blah,,,\n"will have delims "," to be skipped)

 + the .cc needs to be updated to match the choice here.

* comment "Use three prefix() calls instead" is inaccurate.

The code may require non-3 numbers of prefix() calls. You can remove"three" and possibly also "calls" from that statement.

* the comments suffix "(a token)" on all the skip() methods seemunnecessary, and 'token' means different things to the different layersof the parser we will be dealing with. Best not to confuse future dev.


* double-empty line after class definition.


Regarding the semantics of prefix():

It would be useful to have the capacity for limiting how long theprefix will search down the buffer. A third SBuf::size_type N=nposparameter would allow us to do things like search for up to 16 bytes ofmethod name and error quickly on 32KB of random hex data.- token() would be nice limiting too, but as long as prefix provides itwe can refer to prefix() for all cases where length safety is needed.



in src/parser/testTokenizer.cc:
* no need for empty line after squid.h

* please use the standard token types now defined in CharacterSet forthese test parses instead of defining some wrong sets with same names(ie "whitespace", "crlf").- New custom sets are of course okay but should be clearly customdesigns.


* testTokenizer::testCharacterSet() does nothing.

* simple construct/destruct tests are missing.

- how do we know the buffer bytes parsed are actually the ones givenbefore testing prefix()/token()/skip() ?- how can we be sure the Tokenizer is not triggering a wastefuldata-copy in its constructor?


* testTokenizer::testTokenizerPrefix() seems wrong in several ways.

- Just provide random input with various edge cases you can think ofembeded (a fuzz test might be good there).- It is currently (badly) replicating things more properly found intestHttpParser units and implying that those calls might be how the HTTPis actually parsed (ALPHA-only method name etc).



in src/parser/testTokenizer.h

* Please order the testing such that the simplest operations are testedfirst.- with any functions tested in dependency order (eg token() callsskip()? then test skip() first)

* Please test the full API. Including simple things like atEnd(),reset().


* Please keep tests for simple APIs like this cleanly separated.

- a test should setup a specific state, run some API call, then testthe resulting state is correct.- testTokenizer::testTokenizerSkip() is making needless tests ofprefix() which consumes buffer while apparently verifying previousskip() state results.+ instead add "#define private public" in front of the Tokenizer.hinclude and use SBuf operation on buf_ directly to verify its statewithout consuming any of it. The buf_ checks could also be added throughthe rest of the unit tests alongside the split-off token SBuf contentchecks.


Amos

Re: [PATCH] Tokenizer

Reply via email to