On 12/15/05, Brad Bowman <[EMAIL PROTECTED]> wrote: > Why does the longest input sequence win? > Is it for some consistency that that I'm not seeing? Some exceedingly > common use case? The rule seems unnecessarily restrictive.
Hmm. Good point. You see, the longest token wins because that's an exceedingly common rule in lexers, and you can't sort regular expressions the way you can sort strings, so there needs to be special machinery in there. There are two rather weak arguments to keep the longest token rule: * We could compile the transliteration into a DFA and make it fast. Premature optimization. * We could generalize transliteration to work on rules as well. In fact, I think the first Perl module I ever wrote was Regexp::Subst::Parallel, which did precisely the second of these. That's one of the easy things that was hard in Perl (but I guess that's what CPAN is for). Hmm.. none of these is really a compelling argument either way. Luke