Re: [PATCH] Tokenizer

Amos Jeffries Mon, 12 May 2014 03:16:29 -0700

Ping.

Amos


On 7/01/2014 12:14 p.m., Amos Jeffries wrote:
> On 2014-01-07 10:23, Kinkie wrote:
>> Hi,
>>   here's a merge proposal for Parser::Tokenizer, an implementation of
>> the API suggested by Alex.
>>
>> The feature branch is available at lp:~squid/squid/sbuf-tokenizer
> 
> in src/parser/Tokeniser.h
> * documentation call the second parameter of token() 'delimiters' but
> the code calls it 'whitespace'. Please be consistent, keeping in mind
> there is no dependency on it being whitespace (ie "Foo:,,,blah,,,\n"
> will have delims "," to be skipped)
>  + the .cc needs to be updated to match the choice here.
> 
> * comment "Use three prefix() calls instead" is inaccurate.
>  The code may require non-3 numbers of prefix() calls. You can remove
> "three" and possibly also "calls" from that statement.
> 
> * the comments suffix "(a token)" on all the skip() methods seem
> unnecessary, and 'token' means different things to the different layers
> of the parser we will be dealing with. Best not to confuse future dev.
> 
> * double-empty line after class definition.
> 
> 
> Regarding the semantics of prefix():
>   It would be useful to have the capacity for limiting how long the
> prefix will search down the buffer. A third SBuf::size_type N=npos
> parameter would allow us to do things like search for up to 16 bytes of
> method name and error quickly on 32KB of random hex data.
>  - token() would be nice limiting too, but as long as prefix provides it
> we can refer to prefix() for all cases where length safety is needed.
> 
> 
> in src/parser/testTokenizer.cc:
> * no need for empty line after squid.h
> 
> * please use the standard token types now defined in CharacterSet for
> these test parses instead of defining some wrong sets with same names
> (ie "whitespace", "crlf").
>  - New custom sets are of course okay but should be clearly custom designs.
> 
> * testTokenizer::testCharacterSet() does nothing.
> 
> * simple construct/destruct tests are missing.
>   - how do we know the buffer bytes parsed are actually the ones given
> before testing prefix()/token()/skip() ?
>   - how can we be sure the Tokenizer is not triggering a wasteful
> data-copy in its constructor?
> 
> * testTokenizer::testTokenizerPrefix() seems wrong in several ways.
>  - Just provide random input with various edge cases you can think of
> embeded (a fuzz test might be good there).
>  - It is currently (badly) replicating things more properly found in
> testHttpParser units and implying that those calls might be how the HTTP
> is actually parsed (ALPHA-only method name etc).
> 
> 
> in src/parser/testTokenizer.h
> * Please order the testing such that the simplest operations are tested
> first.
>  - with any functions tested in dependency order (eg token() calls
> skip()? then test skip() first)
> 
> * Please test the full API. Including simple things like atEnd(), reset().
> 
> * Please keep tests for simple APIs like this cleanly separated.
>  - a test should setup a specific state, run some API call, then test
> the resulting state is correct.
>  - testTokenizer::testTokenizerSkip() is making needless tests of
> prefix() which consumes buffer while apparently verifying previous
> skip() state results.
>   + instead add "#define private public" in front of the Tokenizer.h
> include and use SBuf operation on buf_ directly to verify its state
> without consuming any of it. The buf_ checks could also be added through
> the rest of the unit tests alongside the split-off token SBuf content
> checks.
> 
> Amos
>

Re: [PATCH] Tokenizer

Reply via email to