Re: [RFC] Tokenizer API

Kinkie Tue, 10 Dec 2013 02:09:00 -0800

I'd need to check.
To align the others to our IRC discussion: I'm fine with the design
Alex and you are suggesting.
My only suggestions are to have remaining() return SBuf instead of
const SBuf &, and to have a few predefined CharacterSets.



On Tue, Dec 10, 2013 at 10:52 AM, Amos Jeffries <squ...@treenet.co.nz> wrote:
> On 10/12/2013 9:38 p.m., Robert Collins wrote:
>> On 10 December 2013 19:13, Amos Jeffries <squ...@treenet.co.nz> wrote:
>>
>>> The problem with comparing input strings to a SBuf of characters is that
>>> parsing a input of length N againt charset of size M takes O(N*M) time.
>>
>> Huh? There are linear time parsers with PEGs. Or maybe I don't
>> understand one of your preconditions to come up with an N*M complexity
>> here.
>
> The brute-force way is to scan each N position for the M delimiters
> individually.
>  --> SBuf uses while(0..N) {memchr(M)} ... O(N*M)
>
>
> PS. I suspect the linear time parsers you know of are converting the PEG
> into bool-array first before doing linear time scan with it or using
> multiple CPU threads to do a parallel linear scan on small M.
>
> Amos
>



-- 
    /kinkie

Re: [RFC] Tokenizer API

Reply via email to