Hi all,

I am writing a perl script to parse a file. The data in the file is
seperated by space/tab. However, certain fields may be empty or
consist of mutiple words and are double quoted and this makes it
difficut for me to do a split.

Example of data:
""   "This is 2nd field"
3                                                                  4
1    2
""                                                                 4
1    2                           "The field may consist of (meta)
characters"   ""


What I am doing is as such:
   while ($line=~/(".*?")/) {;             <- Loops until all double-
quoted string is replaced
      $line=~s/""/__EMPTY__/g;
      $tmp1=$1;
      $tmp2=$1;
      $tmp1=~s/"//g;
      $tmp1=~s/ /__SPACE__/g;
      $tmp2=~s/([\(\)])/\\$1/g;
      $line=~s/$tmp2/$tmp1/;            <- needs to replace meta-
characters in $tmp2
   }
   @tmp=split /\s+/, $line;
   foreach $i (0..$#tmp) {
      $tmp[$i]=~s/__SPACE__/ /g;
      $tmp[$i]=~s/__EMPTY__//g;
      // Store data
   }


Substitue "" with __EMPTY__
While line matches ".*?" (non-greedy match), remember the content
between the quotes.
Assign this content to $tmp1 and $tmp2. Remove " from $tmp1, Replace '
' with __SPACE__.
Replace metacharacters of $tmp2 with escape, ie (meta) to \(meta\).
Substition of $tmp2 with $tmp1 (non-global).
Do a split /\s+/,
Replace __EMPTY__ with empty string
Replace __SPACE__ with " ".

Does you one have a neater and more efficient way either by split of
regexp?


Thanks
Shu Teng

Reply via email to