Re: Transliteration preferring longest match

Luke Palmer Thu, 15 Dec 2005 13:56:30 -0800

On 12/15/05, Brad Bowman <[EMAIL PROTECTED]> wrote:
> Why does the longest input sequence win?
>    Is it for some consistency that that I'm not seeing? Some exceedingly
> common use case?  The rule seems unnecessarily restrictive.


Hmm.  Good point.  You see, the longest token wins because that's an
exceedingly common rule in lexers, and you can't sort regular
expressions the way you can sort strings, so there needs to be special
machinery in there.

There are two rather weak arguments to keep the longest token rule:

    * We could compile the transliteration into a DFA and make it
fast.  Premature optimization.
    * We could generalize transliteration to work on rules as well.

In fact, I think the first Perl module I ever wrote was
Regexp::Subst::Parallel, which did precisely the second of these. 
That's one of the easy things that was hard in Perl (but I guess
that's what CPAN is for).  Hmm.. none of these is really a compelling
argument either way.

Luke

Re: Transliteration preferring longest match

Reply via email to