In case Greg isn't aware of it - OOo files (including the new opendoc formats) are just zip files with OOo extensions instead of .zip. Try:
unzip -p File.odp content.xml | perl -p -e "s/<[^>]*>//g;s/ +//;" should work without needing to unpack first. On Wed, 2006-02-01 at 12:48 +0900, JC Helary wrote: > > I don't know if this works for the new format for OO > > (v2+), but for earlier versions to get the text from > > the "content.xml" files I would; > > > > cat content.xml | perl -p -e "s/<[^>]*>//g;s/ +//;" > > Hey, thanks, that is exactly what I was looking for !!! > > JC > > > Depending upon what I wanted to do next, I could > > redirect to a file, append to an existing file, etc. > > > > This works very nicely for OO documents. I just tried > > SXIs and it *seems* to work okay. Test it to see if > > it works as is, or have fun to tweak to suit your > > specific needs. > > > > regards, > > > > farmerdude > > > > > > --- JC Helary <[EMAIL PROTECTED]> wrote: > > > >>> Having tried various combinations of 'strings' and'sed', I have > >>> concluded that the text cannot be reliably extracted without some > >>> more intelligent parsing of the PPT format. OO obviously performs > >>> this parsing since all the PPT files open flawlessly in > >>> OpenOffice.org Impress. > >>> > >>> Is there any way I can, using OpenOffice.org, create a macro to > >>> extract the text from all of these files? There must be something > >>> better than 1500 copy/paste operations! > >> > >> Greg, > >> > >> 1) there is not save to text in OOo for presentation files. > >> 2) all the contents is there in the converted OD file, in the xml > >> 3) there was recently an annoucement about an OOo batch conversion > >> utility > >> > >> with 3) you transform the PPT files to OD format, since 1) you > >> can't use that directly but thanks to 2) and smart XML parsers/ > >> conversion tools you can readily access the textual data by > >> removing _all_ the xml tags. > >> > >> I have never tried that because I never _had_ to dump to text but > >> my feeling is that what you ask, although a little unorthodox is > >> possible with a few tricks. > >> > >> Jean-Christophe > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
