RE: trying to understand how regex works

Joseph Youngquist Tue, 13 Aug 2002 06:56:07 -0700

I'd add the check for the garbage before I split, not sure if it would
really add any time to the program running but would, I think, reduce the
amount of checking needed after the split function.


next if(/value_garbage/g);  # assuming value_garbage is the exact string.

or you can use:

while <FILE> {
        p = "N";
        my @f = split /\s*\|\s*/, $_ unless(m/value_garbage/g);
        if (@f != 30) {             #^^^^^^^^^^^^^^^^^^^^^^^^^^
                print "Field count is ", scalar @f, " should be 30\n";
                # error processing ...
        }
        if ($f[1] =~ /   ...
                ...

This is again assuming that value_garbage is a string...if not, then well,
"if, elsif" away :)
But I would absolutely use the split function

Joe Youngquist

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]]On Behalf Of
$Bill Luebkert
Sent: Tuesday, August 13, 2002 12:39 AM
To: Dan Jablonsky
Cc: [EMAIL PROTECTED]
Subject: Re: trying to understand how regex works


Dan Jablonsky wrote:
> Hi all,
> I guess it must be a simple problem, but it's a
> mystery to me.
> I got 30 "fields" all separated by pipes in some files
> with many many lines. Some of the fields need to be
> changed, but mostly I have to drop any line that has
> certain values in certain fields.
> So I start by skipping any field that has garbage in
> it:
> open FOUT, ">>/some/path/outputfile.txt";
> open FILE "</some/path/inputfile.txt";
> while<FILE>{
> p="N";
> next if (/.*?\|value_garbage1\|.*?/ ||
> /.*?\|value_garbage2\|.*?/ ||
> /.*?\|value_garbage3\|.*?/);
>       #and then I continue with an if
>       if(/(.*?)\|(.*?)\|....30 times/){
>               $p="Y";
>               do something to $1; #change field 1
>               do something to $3; #change filed 3
>               $fld1=$newfld1;
>               $fld2=$2;
>               $fld3=$newfld3;
>               $fld4=$4;....and so on
>       }
>       print FOUT "$fld1|$fld2|...|$fld30|\n" if ($p="Y");
> #print the whole thing to the new output
> }
>
> Well, it happens that some of the lines are completely
> out of whack and the regex simply stops there - it
> doesn't exit, no errors but goes into an infinite loop
> even though I don't know how exactly is this possible.
> My second if states clearly (or not so clearly) that
> if the line does not have 30 fields it should skip the
> block, it should NOT print anything at the handle and
> should get the next line.
> For whatever reason, the first time it encounters a
> line with less that 30 fields, it just loops without
> end.
> I tried to solve this by replacing the .*? in the
> references by the actual format of each field and
> suddenly it started working but now the regex is a
> hundred times slower and the only thing that speeds it
> up is to go back to the .*? that really goes fast as
> long as the regex "is true". I mean if I have 30
> fields all the time, the regex works OK and it goes
> very fast.
>
> Anybody cares to explain this to me?

No, but I'll offer an alternative.

while <FILE> {
        p = "N";
        my @f = split /\s*\|\s*/, $_;
        if (@f != 30) {
                print "Field count is ", scalar @f, " should be 30\n";
                # error processing ...
        }
        if ($f[1] =~ /   ...
                ...

--
   ,-/-  __      _  _         $Bill Luebkert   ICQ=162126130
  (_/   /  )    // //       DBE Collectibles   Mailto:[EMAIL PROTECTED]
   / ) /--<  o // //      http://dbecoll.tripod.com/ (Free site for Perl)
-/-' /___/_<_</_</_     Castle of Medieval Myth & Magic
http://www.todbe.com/

_______________________________________________
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

_______________________________________________
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

RE: trying to understand how regex works

Reply via email to