Hi Barry,

On Thu, Nov 10, 2011 at 2:34 AM, Barry Brevik <bbre...@stellarmicro.com> wrote:

> Below is some test code that will be used in a larger program.
>
> In the code below I have a regular expression who's intent is to look 
> for  " <1 or more characters> , <1 or more characters> " and replace 
> the comma with |. (the white space is just for clarity).
>
> IAC, the regex works, that is, it matches, but it only replaces the 
> final match. I have just re-read the camel book section on regexes and 
> have tried many variations, but apparently I'm too close to it to see 
> what must be a simple answer.
>
> BTW, if you guys think I'm posting too often, please say so.
>
> Barry Brevik

> ============================================
> use strict;
> use warnings;
>
> my $csvLine = qq|  "col , 1"  ,  col___'2' ,  col-3, "col,4"|;
>
> print "before comma substitution: $csvLine\n\n";
>
> $csvLine =~ s/(\x22.+),(.+\x22)/$1|$2/s;
>
> print "after comma substitution.: $csvLine\n\n";
>

Tobias already gave you a solution and

I also think using Text::CSV or Text::CSV_XS is way better for this task thank 
plain regexes, For example one day you might encounter a line that has an 
embedded " escaped using \.

Then even if your regex worked earlier this can kill it.
And what if there was an | in the original string?
Nevertheless let me also try to explain the issue that you had with the regex 
as this can come up in other situations.

First, I'd probably use plain " instead of \x22 as that will be probably easier 
to the reader to know what are you looking for.

Second, the /s has probably no value at the end. That only changes the behavior 
of . to also match newlines.If you don't have newlines in your string (e.g. 
because you are processing a file line by line) then the /s has no effect. That 
makes this expression:

$csvLine =~ s/(".+),(.+")/$1|$2/;

Then, before going on you need to check what does this really match so I 
replaced the above with

if ($csvLine =~ s/(".+),(.+")/$1|$2/s ){
print "match: <$1><$2>\n";
}

and got

match: <"col , 1" , col___'2' , col-3, "col><4">

You see, the .+ is greedy, it match from the first " as much as it could.
You'd be better of telling it to match as little as possible by adding an extra 
? after the quantifier.

if ($csvLine =~ /(".+?),(.+?")/ ){
print "match: <$1><$2>\n";
}

prints this:
match: <"col >< 1">

Finally you need to do the substitution globally, so not only once but as many 
times as possible:
$csvLine =~ s/(".+?),(.+?")/$1|$2/g;

And the output is
after comma substitution.: "col | 1" , col___'2' , col-3, "col|4"

But again, for CSV files that can have embedded, it is better to use one of the 
real CSV parsers.

regards

Gabor

--

Gabor Szabo

http://szabgab.com/perl_tutorial.html <http://szabgab.com/perl_tutorial.html> 

 

_______________________________________________
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to