Re: Balanced Matches in Regexps? + tr and hashes
On Sat, 2002-08-17 at 14:31, Brent Dax wrote: Peter Behroozi: # After reading over Apocalypse 5 one more time, I noticed that # balanced matches (like capturing nested parenthetical # comments ((like this))) had been glossed over in the # rejection of RFC 145. What was not even mentioned in the rule parenthesized { \( ( -[()] | parenthesized ) \) } So that would mean to match nested tables, I would have to write rule nested_tables { start_table [ !before start_table!before end_table . | nested_tables ] end_table } or maybe even rule balanced { _[0] [ !before _[0]!before _[1] . | self ] _[1] }; $html =~ /balanced(start_table, end_table)/; Forgiving lookahead syntax errors on my part, that isn't as bad as I had thought. Thanks for pointing that out. However, since you forced me to read through A5 again, I now have another question :). Since we can now do $string.tr %hash; what happens when the keys of %hash have overlapping ranges by accident or otherwise? Are there any other options than reporting an overlap (hard), auto-sorting the key-value pairs (medium), or not allowing hashes (easy)? Peter Behroozi
Re: Balanced Matches in Regexps? + tr and hashes
On 17 Aug 2002, Peter Behroozi wrote: : However, since you forced me to read through A5 again, I now have : another question :). Since we can now do : : $string.tr %hash; : : what happens when the keys of %hash have overlapping ranges by accident : or otherwise? Are there any other options than reporting an overlap : (hard), auto-sorting the key-value pairs (medium), or not allowing : hashes (easy)? Doing tr efficiently generally requires precompilation, so in the case of a hash, the compiled result would be stored as a run-time property. So we can really do whatever processing we want, on the assumption that the hash will change much less frequently than it gets used. Alternatively, we could restrict hashes to single character translations. But under UTF-8 that doesn't guarantee a constant string length (as measured in bytes). But I would guess that hashes would be used for even longer sequences of characters too, so some amount of preprocessing would be desirable to determine if one key was a prefix of another. Maybe it wants to get translated to some sort of trie parser. Really just depends on how much memory we want to throw at it to make it fast. Larry