I don't know if this works for the new format for OO
(v2+), but for earlier versions to get the text from
the "content.xml" files I would;
cat content.xml | perl -p -e "s/<[^>]*>//g;s/ +//;"
Hey, thanks, that is exactly what I was looking for !!!
JC
Depending upon what I wanted to do next, I could
redirect to a file, append to an existing file, etc.
This works very nicely for OO documents. I just tried
SXIs and it *seems* to work okay. Test it to see if
it works as is, or have fun to tweak to suit your
specific needs.
regards,
farmerdude
--- JC Helary <[EMAIL PROTECTED]> wrote:
Having tried various combinations of 'strings' and'sed', I have
concluded that the text cannot be reliably extracted without some
more intelligent parsing of the PPT format. OO obviously performs
this parsing since all the PPT files open flawlessly in
OpenOffice.org Impress.
Is there any way I can, using OpenOffice.org, create a macro to
extract the text from all of these files? There must be something
better than 1500 copy/paste operations!
Greg,
1) there is not save to text in OOo for presentation files.
2) all the contents is there in the converted OD file, in the xml
3) there was recently an annoucement about an OOo batch conversion
utility
with 3) you transform the PPT files to OD format, since 1) you
can't use that directly but thanks to 2) and smart XML parsers/
conversion tools you can readily access the textual data by
removing _all_ the xml tags.
I have never tried that because I never _had_ to dump to text but
my feeling is that what you ask, although a little unorthodox is
possible with a few tricks.
Jean-Christophe
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]