On Fri, Dec 16, 2005 at 01:29:11PM +0100, Ruud H.G. van Tol wrote:
: John Macdonald:
:
: > [trans]
: > If a shorter rule is allowed to match first, then the longer
: > rule can be removed from the match set, at least for constant
: > string matches.
:
: It is not about the length of the rules, but about the length of the
: matches.
They're the same thing when only fixed strings are allowed. We've only
been calling the "rules" for lack of a better term, but they aren't
rules, or even regexes. Transliteration is "literal".
: If both \s+ and \h+ match the same length, should then \h+ be honored
: because it is more specific?
If you want ordered matches of real rules, you should use /<@array>/
in a real match. The pattern equivalent to tr/// involves /<%hash>/
instead somehow.
: And are we only talking about matches at the same position? (Stepping
: through the input-buffer character-by-character, and testing each
: pattern.)
Both s/// and tr/// have ways of skipping non-matching characters, but
they're not the same.
It would be a useful exercise to write tr/// in terms of s///.
It occurs to me that it'd be awfully useful to have a kind of hash
that returns any unmatched key unchanged. But there's actually a
subtle conflict between how you want to use the hash on the left
and on the right. They have the same keys but different values.
s/(<%match>)/%replace{$0}/
The way we've got hashes defined currently on the left, the lookup finds
an additional rule to continue parsing, on the assumption that the key
of %match is merely the first keyword of some longer construct. But the
value can't simultaneously be a subsequent rule *and* a replacement value,
so we end up looking up the same string twice in two different hashes.
(Even if the first one is actually doing a trie internally or some such,
it's still effectively a hash lookup, and why do it twice?.) Maybe there's
some way to write the rule in the value of %match that matches zero
width but returns a useful value so it'd be something more like:
s/(<%match>)/$0[0]/
: > If, for example, '=' can match without
: > preferring to try first for '==' then you'll never match '=='
: > without syntactic help to force a backtracking retry.
:
: If rules will match in order of appearance, it is to the user to put the
: rules in the right order.
:
: Some help can be provided, like a warning when an 'ab' precedes an
: 'abc', and maybe even when an 'a*' precedes an 'a+'.
Yes, that would be useful for /<@array>/ analysis. But tr/// ain't that.
Larry