Re: Transliteration preferring longest match

Larry Wall Thu, 15 Dec 2005 14:36:16 -0800

On Thu, Dec 15, 2005 at 06:50:19PM +0100, Brad Bowman wrote:
: 
: Hi,
: 
: S05 describes an array version of trans for transliteration:
:  ( http://dev.perl.org/perl6/doc/design/syn/S05.html#Transliteration )
: 
:   The array version can map one-or-more characters to one-or-more 
:   characters:
: 
:      $str.=trans( [' ',      '<',    '>',    '&'    ] =>
:                   ['&nbsp;', '&lt;', '&gt;', '&amp;' ]);
: 
:   In the case that more than one sequence of input characters matches,
:   the longest one wins. In the case of two identical sequences the first
:   in order wins.
: 
: Why does the longest input sequence win? 
:   Is it for some consistency that that I'm not seeing? Some exceedingly
: common use case?  The rule seems unnecessarily restrictive.


On the contrary, it frees the user from having to worry about the order.

: The "first in order" rule is more flexible, the user can sort their
: arrays to produce the longest input rule, or use another order if that is
: preferred.

What possible use is a user-ordered rule set?  If you put the shorter
entry first, the longer one can never be reached.  It's not like you
can backtrack into a transliteration and pick a different entry.

: The first transliteration example even uses sort in
: the pair-wise form:
: 
:   $str.trans( %mapping.pairs.sort );

That seems like a useless use of sort, and probably defeats the optimizer
as well.

: Can we drop the longest preference?

Doesn't hurt anything, and can probably help.  Plus we already have
the longest token rule in there for magical hash matching in rules,
so it's likely the optimizer will already know how to handle it,
or something like it.

Larry

Re: Transliteration preferring longest match

Reply via email to