If I were you, I'd use Text::ParseWords::parse_line()
On Mon, Apr 7, 2008 at 9:11 PM, <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> I am writing a perl script to parse a file. The data in the file is
> seperated by space/tab. However, certain fields may be empty or
> consist of mutiple words and are double quoted and this makes it
> difficut for me to do a split.
>
> Example of data:
> "" "This is 2nd field"
> 3 4
> 1 2
> "" 4
> 1 2 "The field may consist of (meta)
> characters" ""
>
>
> What I am doing is as such:
> while ($line=~/(".*?")/) {; <- Loops until all double-
> quoted string is replaced
> $line=~s/""/__EMPTY__/g;
> $tmp1=$1;
> $tmp2=$1;
> $tmp1=~s/"//g;
> $tmp1=~s/ /__SPACE__/g;
> $tmp2=~s/([\(\)])/\\$1/g;
> $line=~s/$tmp2/$tmp1/; <- needs to replace meta-
> characters in $tmp2
> }
> @tmp=split /\s+/, $line;
> foreach $i (0..$#tmp) {
> $tmp[$i]=~s/__SPACE__/ /g;
> $tmp[$i]=~s/__EMPTY__//g;
> // Store data
> }
>
>
> Substitue "" with __EMPTY__
> While line matches ".*?" (non-greedy match), remember the content
> between the quotes.
> Assign this content to $tmp1 and $tmp2. Remove " from $tmp1, Replace '
> ' with __SPACE__.
> Replace metacharacters of $tmp2 with escape, ie (meta) to \(meta\).
> Substition of $tmp2 with $tmp1 (non-global).
> Do a split /\s+/,
> Replace __EMPTY__ with empty string
> Replace __SPACE__ with " ".
>
> Does you one have a neater and more efficient way either by split of
> regexp?
>
>
> Thanks
> Shu Teng
>
>