-start-
> Dan Jablonsky <[EMAIL PROTECTED]>
>at 06/13/2001 12:22 PM
>Is there an alternative? What I am trying to do is
>isolate some patterns with each line of a text file
>and then make small changes to those pieces and/or
>switching the position of some of those pieces. Is it
>possible to do that without back referencing?
>For instance I start with:
>ABc Sun May 20 19:45:30, 2001 XYZ
>(tabs between the date and both fields to the right
>and left, all other spaces are spaces) and I need to
>get something like:
>ABcD Sun May 20 19:45:30 XY Z
>(tab between XY and Z, a new field).
>The way I do it now is:
>$row=~s/^([A-Z][A-Z][a-z])\t([A-Z][a-z][a-z]\s[A-Z][a-z][a-z]\s{1,2}\d{1,2}\s\d{2}:\d{2}:\d{2}).*?\t([A-Z])([A-Z])([A-Z]).*?/$1D\t$2\t$3$4\$5/
Someone may come up with another solution a little faster, but off the top
of my head, the following regex benchmarks at half the time.
my $row = "ABc\tSun May 20 19:45:30, 2001\tXYZ";
# The following cuts down on a lot of unnecessary typing of [A-Z]'s (No
speed performance, but is easier to read).
# It also cuts out the brace {} modifiers
# which also slow a regex down. It cuts out the period . match which leads
to backtracking
# which will also slow a regex down.
$row =~ /^
([^\t]*)\t # Anything that isnt a tab (ABc)
([^,]*), # anything that isnt a comma (Sun May 20
19:45:30)
[^\t]*\t # Anything that isnt a tab, discard.
([A-Z][A-Z]) # Two capital letters (XY)
([A-Z]) # One capital letter (Z)
/x;
# Concatenation should generally always be faster than substitution which
kind of 'slices, dices, and stretches'
# a string.
$row = "$1D\t$2\t$3\t$4";
Grant Hopwood.
Valero Energy Corp.
(210)370-2380
PGP Public Key: Ldap://certserver.pgp.com
nuclear iraq bioweapon encryption cocaine korea terrorist
_______________________________________________
Perl-Unix-Users mailing list. To unsubscribe go to
http://listserv.ActiveState.com/mailman/subscribe/perl-unix-users