Ping. Amos
On 7/01/2014 12:14 p.m., Amos Jeffries wrote: > On 2014-01-07 10:23, Kinkie wrote: >> Hi, >> here's a merge proposal for Parser::Tokenizer, an implementation of >> the API suggested by Alex. >> >> The feature branch is available at lp:~squid/squid/sbuf-tokenizer > > in src/parser/Tokeniser.h > * documentation call the second parameter of token() 'delimiters' but > the code calls it 'whitespace'. Please be consistent, keeping in mind > there is no dependency on it being whitespace (ie "Foo:,,,blah,,,\n" > will have delims "," to be skipped) > + the .cc needs to be updated to match the choice here. > > * comment "Use three prefix() calls instead" is inaccurate. > The code may require non-3 numbers of prefix() calls. You can remove > "three" and possibly also "calls" from that statement. > > * the comments suffix "(a token)" on all the skip() methods seem > unnecessary, and 'token' means different things to the different layers > of the parser we will be dealing with. Best not to confuse future dev. > > * double-empty line after class definition. > > > Regarding the semantics of prefix(): > It would be useful to have the capacity for limiting how long the > prefix will search down the buffer. A third SBuf::size_type N=npos > parameter would allow us to do things like search for up to 16 bytes of > method name and error quickly on 32KB of random hex data. > - token() would be nice limiting too, but as long as prefix provides it > we can refer to prefix() for all cases where length safety is needed. > > > in src/parser/testTokenizer.cc: > * no need for empty line after squid.h > > * please use the standard token types now defined in CharacterSet for > these test parses instead of defining some wrong sets with same names > (ie "whitespace", "crlf"). > - New custom sets are of course okay but should be clearly custom designs. > > * testTokenizer::testCharacterSet() does nothing. > > * simple construct/destruct tests are missing. > - how do we know the buffer bytes parsed are actually the ones given > before testing prefix()/token()/skip() ? > - how can we be sure the Tokenizer is not triggering a wasteful > data-copy in its constructor? > > * testTokenizer::testTokenizerPrefix() seems wrong in several ways. > - Just provide random input with various edge cases you can think of > embeded (a fuzz test might be good there). > - It is currently (badly) replicating things more properly found in > testHttpParser units and implying that those calls might be how the HTTP > is actually parsed (ALPHA-only method name etc). > > > in src/parser/testTokenizer.h > * Please order the testing such that the simplest operations are tested > first. > - with any functions tested in dependency order (eg token() calls > skip()? then test skip() first) > > * Please test the full API. Including simple things like atEnd(), reset(). > > * Please keep tests for simple APIs like this cleanly separated. > - a test should setup a specific state, run some API call, then test > the resulting state is correct. > - testTokenizer::testTokenizerSkip() is making needless tests of > prefix() which consumes buffer while apparently verifying previous > skip() state results. > + instead add "#define private public" in front of the Tokenizer.h > include and use SBuf operation on buf_ directly to verify its state > without consuming any of it. The buf_ checks could also be added through > the rest of the unit tests alongside the split-off token SBuf content > checks. > > Amos >