Re: Text::Balanced v Parse::RecDescent

Andrew Savige Tue, 03 Dec 2002 17:26:08 -0800

En op 4 december 2002 sprak Damian Conway:
> Here's a postmaturely optimized solution that makes use of the
> (?>...) metasyntax to prevent expensive and useless backtracking.
> 
> You should find it runs very much faster.
> 
> -----cut----------cut----------cut----------cut----------cut-----
> 
> use re 'eval';
> 
> our $quoted = qr/ ' [^'\\]* (?> (?> \\. [^'\\]* )* ) '  # Match 'str'
>                  | " [^"\\]* (?> (?> \\. [^"\\]* )* ) "  # Match
> 'str'
>                  /x;
> 
> our $element = qr/ (?> (?> [^'"{},]+ )          # Match non-special
> characters
>                       | \\.                      # Match escaped
> anything
>                       | $quoted                  # Match quoted
> anything
>                       | (??{$nested})            # Match
> {...,...,...}
>                     )
>                   /xs;
> 
> our $nested  = qr/ [{]                          # Match {
>                     (?> (?: $element , )* )      # Match list of
> subelements
>                     $element?                    # Match last
> subelement
>                     [}]                          # Match }
>                   /x;
> 
> 
> $data = <DATA>;
> 
> @fields = $data =~ m/\G ( $element ) ,? /gx;    # Capture elements
> repeatedly
> 
> use Data::Dumper 'Dumper';
> print Dumper( @fields );
> 
> __DATA__
> {1}, hello one two three four five six seven eight nine
> heat-death-of-the-universe
> 
> -----cut----------cut----------cut----------cut----------cut-----
> 
> Note that I also added a \G to the actual m// matcher, to ensure that
> the sequence of elements matched is actually sequential (i.e. no
> convenient skipping of inconvenient non-elements in the middle).


It definitely runs a lot faster.
However, for the original test data:

__DATA__
abc, ',def'  "\"ab'c,}" xyz , fred IN { 1, "x}y",3 } x, 'z'

your original program correctly prints:

$VAR1 = 'abc';
$VAR2 = ' \',def\'  "\\"ab\'c,}" xyz ';
$VAR3 = ' fred IN { 1, "x}y",3 } x';
$VAR4 = ' \'z\'
';

while your postmaturely optimized one prints:

$VAR1 = 'abc';
$VAR2 = ' ';
$VAR3 = '\',def\'';
$VAR4 = '  ';
$VAR5 = '"\\"ab\'c,}"';
$VAR6 = ' xyz ';
$VAR7 = ' fred IN ';

I tried replacing just $nested or just $element with the original
unoptimized regex but that did not help.

Will the new Perl 6 pattern matching be a vast improvement for these
sort of parsing problems?

/-\



http://www.yahoo.promo.com.au/hint/ - Yahoo! Hint Dropper
- Avoid getting hideous gifts this Christmas with Yahoo! Hint Dropper!

Re: Text::Balanced v Parse::RecDescent

Reply via email to