Ross Johnson wrote:
In case Greg isn't aware of it -
OOo files (including the new opendoc formats) are just zip files with
OOo extensions instead of .zip. Try:
unzip -p File.odp content.xml | perl -p -e "s/<[^>]*>//g;s/ +//;"
should work without needing to unpack first.
No sooner had i sent the previous message than I found OpenOffice has an
inbuilt document converter!
As people in this thread have pointed out, retrieving text from an OO
document isn't hard since it's just zipped xml that can be parsed with a
perl script. The problem is getting Microsoft Powerpoint documents into
a zipped XML format in the first place.
I've found that File -> Wizards -> Document Convertor does exactly that.
Just tell it you want to convert Powerpoint documents, not templates,
point it to your source directory and where you want it to spit out the
result and you're away.
I then find unzip -p $file.sxi content.xml | perl -p -e
"s/<[^>]*>/\n/g;s/ +//;s/\n\n*/\n/g;" -w
works rather well for extracting the text.
many thanks,
Greg
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]