Re: [Perl-unix-users] regex performance

Grant Hopwood Wed, 13 Jun 2001 11:16:39 -0700
-start-
>   Dan Jablonsky <[EMAIL PROTECTED]>
>at    06/13/2001 12:22 PM

>Is there an alternative? What I am trying to do is
>isolate some patterns with each line of a text file
>and then make small changes to those pieces and/or
>switching the position of some of those pieces. Is it
>possible to do that without back referencing?
>For instance I start with:

>ABc Sun May 20 19:45:30, 2001 XYZ

>(tabs between the date and both fields to the right
>and left, all other spaces are spaces) and I need to
>get something like:

>ABcD Sun May 20 19:45:30 XY Z

>(tab between XY and Z, a new field).

>The way I do it now is:
>$row=~s/^([A-Z][A-Z][a-z])\t([A-Z][a-z][a-z]\s[A-Z][a-z][a-z]\s{1,2}\d{1,2}\s\d{2}:\d{2}:\d{2}).*?\t([A-Z])([A-Z])([A-Z]).*?/$1D\t$2\t$3$4\$5/

Someone may come up with another solution a little faster, but off the top 
of my head, the following regex benchmarks at half the time.

my $row = "ABc\tSun May 20 19:45:30, 2001\tXYZ";

# The following cuts down on a lot of unnecessary typing of [A-Z]'s (No 
speed performance, but is easier to read).
# It also cuts out the brace {} modifiers
# which also slow a regex down. It cuts out the period . match which leads 
to backtracking
# which will also slow a regex down.

$row =~ /^
                ([^\t]*)\t # Anything that isnt a tab (ABc)
                ([^,]*),   # anything that isnt a comma (Sun May 20 
19:45:30)
                [^\t]*\t   # Anything that isnt a tab, discard. 
                ([A-Z][A-Z]) # Two capital letters (XY)
                ([A-Z])      # One capital letter (Z)
                /x; 

# Concatenation should generally always be faster than substitution which 
kind of 'slices, dices, and stretches'
#  a string.

$row = "$1D\t$2\t$3\t$4";


Grant Hopwood.
Valero Energy Corp.
(210)370-2380
PGP Public Key: Ldap://certserver.pgp.com
nuclear iraq bioweapon encryption cocaine korea terrorist
_______________________________________________
Perl-Unix-Users mailing list. To unsubscribe go to 
http://listserv.ActiveState.com/mailman/subscribe/perl-unix-users
Re: [Perl-unix-users] regex performance

Reply via email to