On Fri, Nov 30, 2012 at 6:23 PM, Heikki Linnakangas <hlinnakan...@vmware.com
> wrote:

> On 30.11.2012 13:20, Alexander Korotkov wrote:
>
>> On Thu, Nov 29, 2012 at 5:25 PM, Heikki Linnakangas<hlinnakangas@**
>> vmware.com <hlinnakan...@vmware.com>
>>
>>> wrote:
>>>
>>
>>  Would it be safe to simply stop short the depth-first search on overflow,
>>> and proceed with the graph that was constructed up to that point?
>>>
>>
>> For depth-first it's not. But your proposal naturally makes sense. I've
>> changed it to breadth-first search. And then it's safe to mark all
>> unprocessed states as final when overflow. It means that we assume every
>> unprocessed branch to immediately finish with matching (this can give us
>> more false positives but no false negatives).
>> For overflow of matrix collection, it's safe to do just OR between all the
>> trigrams.
>> New version of patch is attached.
>>
>
> Thanks, sounds good.
>
> I've spent quite a long time trying to understand the transformation the
> getState/addKeys/addAcrs functions do to the original CNFA graph. I think
> that still needs more comments to explain the steps involved in it.
>
> One thing that occurs to me is that it might be better to delay expanding
> colors to characters. You could build and transform the graph, and even
> create the paths, all while operating on colors. You would end up with
> lists of "color trigrams", consisting of sequences of three colors that
> must appear in the source string. Only at the last step you would expand
> the color trigrams to real character trigrams. I think that would save a
> lot of processing while building the graph, if you have colors that contain
> many characters. It would allow us to do better than the fixed small
> MAX_COLOR_CHARS limit. For example with MAX_COLOR_CHARS = 4 in the current
> patch, it's a shame that we can't do anything with a fairly simple regexp
> like "^a[b-g]h$"
>

Nice idea to delay expanding colors to characters! Obviously, we should
delay expanding inly alphanumerical characters. Because non-alphanumberical
characters influence graph structure. Trying to implement...

------
With best regards,
Alexander Korotkov.

Reply via email to