Re: Balanced Matches in Regexps? + tr and hashes

2002-08-17 Thread Peter Behroozi

On Sat, 2002-08-17 at 14:31, Brent Dax wrote:
 Peter Behroozi:
 # After reading over Apocalypse 5 one more time, I noticed that 
 # balanced matches (like capturing nested parenthetical 
 # comments ((like this))) had been glossed over in the 
 # rejection of RFC 145.  What was not even mentioned in the 
 
   rule parenthesized { \( ( -[()] | parenthesized ) \) }
 

So that would mean to match nested tables, I would have to write

rule nested_tables { start_table [ !before start_table!before
end_table . | nested_tables ] end_table }



or maybe even



rule balanced { _[0] [ !before _[0]!before _[1] . | self ]
_[1] };

$html =~ /balanced(start_table, end_table)/;


Forgiving lookahead syntax errors on my part, that isn't as bad as I had
thought.  Thanks for pointing that out.

However, since you forced me to read through A5 again, I now have
another question :).  Since we can now do

$string.tr %hash;

what happens when the keys of %hash have overlapping ranges by accident
or otherwise?  Are there any other options than reporting an overlap
(hard), auto-sorting the key-value pairs (medium), or not allowing
hashes (easy)?


Peter Behroozi




Re: Balanced Matches in Regexps? + tr and hashes

2002-08-17 Thread Larry Wall

On 17 Aug 2002, Peter Behroozi wrote:
: However, since you forced me to read through A5 again, I now have
: another question :).  Since we can now do
: 
: $string.tr %hash;
: 
: what happens when the keys of %hash have overlapping ranges by accident
: or otherwise?  Are there any other options than reporting an overlap
: (hard), auto-sorting the key-value pairs (medium), or not allowing
: hashes (easy)?

Doing tr efficiently generally requires precompilation, so in the
case of a hash, the compiled result would be stored as a run-time
property.  So we can really do whatever processing we want, on
the assumption that the hash will change much less frequently than
it gets used.  Alternatively, we could restrict hashes to single
character translations.  But under UTF-8 that doesn't guarantee a
constant string length (as measured in bytes).  But I would guess that
hashes would be used for even longer sequences of characters too,
so some amount of preprocessing would be desirable to determine if
one key was a prefix of another.  Maybe it wants to get translated
to some sort of trie parser.  Really just depends on how much memory
we want to throw at it to make it fast.

Larry