Jonathan Kaye wrote:
Harold Fuchs wrote:


Not using a Calc macro. If it were me I'd export the sheet as a CSV file,
write a Perl script to generate a new [correctly formatted] CSV file and
import that into a new sheet. I doubt a suitable Perl script would be more
than about 10 lines of *un*obfuscated code.

Hi Harold,
I tried it out using Unicon on the csv file. I used ISO 8859-15 encoding which took care of the kinkier characters. It's a bit more than 10 lines but when you take out the i/o stuff and the pretty formatting for ease of reading it comes to about that. I had to use "=" as a field delimiter since commas are crucial to splitting the records. The unary "\" operator is a test for non-nullness. Here's the code:
---------------------------------------------------------------
procedure main()
        datadir := "/home/jdkaye/MYPROGS/Data/"
        outdir := "/home/jdkaye/MYPROGS/Output/"
intext := open(datadir || "8_sept_sample3.csv") | stop("can't open data file") outtext := open(outdir || "8_sept_sample3_fixed.csv", "w") | stop("can't open output file")
        while entry := read(intext) do {
                entry ? if ((gloss := tab(upto('='))) & rem := tab(0)) then {
                          if gloss == "" then
                            next
                          while \find(",", gloss) do {
gloss ? if ((gl := tab(upto(','))) & move(1) & nrem := tab(0)) then {
                              write(outtext, gl, rem)
                              gloss := nrem
                              }
                          }
                }
        write(outtext, gloss, rem)
        }
end
---------------------------------------------------------------------------
Not too bad, eh? Thanks for the tip.
Jonathan
Hmmm.

Exactly 10 lines of Perl:

     #!/usr/bin/perl
     while (<>) {
         ($field1,@fields)=split(/;/,$_);
         $field1 =~ s/"//g;
         @subfields=split(/,/,$field1);
         $list=join(";",@fields);
         foreach $subfield (@subfields) {
             print "\"$subfield\";$list";
         }
}

Assuming the program is named "splitter.pl", use it as
   splitter.pl <input_file >output_file

in other words the script reads stdin and writes stdout.

NB I saved the spreadsheet in CSV format using semicolon as the delimiter to avoid confusion with the commas in column A. Now I can split the columns on semicolon and the column A value on comma without parsing problems.

Not a bad guess :-)

--
Harold Fuchs
London, England
Please reply *only* to [email protected]

Reply via email to