On Fri, Dec 16, 2005 at 01:29:11PM +0100, Ruud H.G. van Tol wrote: : John Macdonald: : : > [trans] : > If a shorter rule is allowed to match first, then the longer : > rule can be removed from the match set, at least for constant : > string matches. : : It is not about the length of the rules, but about the length of the : matches.
They're the same thing when only fixed strings are allowed. We've only been calling the "rules" for lack of a better term, but they aren't rules, or even regexes. Transliteration is "literal". : If both \s+ and \h+ match the same length, should then \h+ be honored : because it is more specific? If you want ordered matches of real rules, you should use /<@array>/ in a real match. The pattern equivalent to tr/// involves /<%hash>/ instead somehow. : And are we only talking about matches at the same position? (Stepping : through the input-buffer character-by-character, and testing each : pattern.) Both s/// and tr/// have ways of skipping non-matching characters, but they're not the same. It would be a useful exercise to write tr/// in terms of s///. It occurs to me that it'd be awfully useful to have a kind of hash that returns any unmatched key unchanged. But there's actually a subtle conflict between how you want to use the hash on the left and on the right. They have the same keys but different values. s/(<%match>)/%replace{$0}/ The way we've got hashes defined currently on the left, the lookup finds an additional rule to continue parsing, on the assumption that the key of %match is merely the first keyword of some longer construct. But the value can't simultaneously be a subsequent rule *and* a replacement value, so we end up looking up the same string twice in two different hashes. (Even if the first one is actually doing a trie internally or some such, it's still effectively a hash lookup, and why do it twice?.) Maybe there's some way to write the rule in the value of %match that matches zero width but returns a useful value so it'd be something more like: s/(<%match>)/$0[0]/ : > If, for example, '=' can match without : > preferring to try first for '==' then you'll never match '==' : > without syntactic help to force a backtracking retry. : : If rules will match in order of appearance, it is to the user to put the : rules in the right order. : : Some help can be provided, like a warning when an 'ab' precedes an : 'abc', and maybe even when an 'a*' precedes an 'a+'. Yes, that would be useful for /<@array>/ analysis. But tr/// ain't that. Larry