Re: [PATCH] Tokenizer

Alex Rousskov Tue, 03 Jun 2014 05:59:28 -0700

On 06/02/2014 10:02 PM, Amos Jeffries wrote:
> On 3/06/2014 6:07 a.m., Alex Rousskov wrote:
>> On 06/02/2014 11:27 AM, Kinkie wrote:
>>>>> +    * At least one terminating delimiter is required. \0 may be passed
>>>>> +    * as a delimiter to treat end of buffer content as the end of token.
>>>>
>>>>> +    if (tokenLen == SBuf::npos && !delimiters['\0']) {


>>>> The above is from the actually committed code (not the code posted for
>>>> review). Please remove \0 as a special case. As far as I can tell, the
>>>> reason it was added is already covered well enough by the existing API.
>>>> The addition of this special case is likely to just confuse and break
>>>> things. For example, adding \0 to delimiters for the purpose of
>>>> disabling this special case will also tell Tokenizer to skip \0
>>>> characters in input, which is often wrong.

> But it is not wrong that often. On direct network input such as we are
> dealing with it is more often a "right" delimiter. Consider
> \0-terminated string values inside NTLM, Kerberos/GSSAPI, ICP, HTCP
> packets, our own SSL helper 'eom' line endings, HTTP/2 using \0
> delimiter for Cookies, etc.

Grammars that use 0-terminated strings will add \0 to delimiter sets as
needed. Grammars that do not use 0-terminated strings will not add \0 to
delimiter sets. In either case, assigning a secondary meaning to \0
breaks parsing and spoils the API. There is no need for these problems
as Kinkie's alternative API demonstrates. The only argument half-worth
having is whether that more complex API is needed at all.


> The use case for token() is finding a { D* T+ D+ } sequence.

Sure, the above is the definition of the new token() API. When it comes
to actual/real use cases, I suspect any correct parsing code calling
token() will not need a special Tokenizer API to exclude "{T+} followed
by {T+}" cases. In other words, all the necessary checks can and should
be done without that special API. If you have a use case where the new
API is required, please share it.


Cheers,

Alex.

Re: [PATCH] Tokenizer

Reply via email to