JC No worries. I don't know how your mileage will be but I hope it works for you. I have a script to unzip the OO files and then do the PERL mojo below. I'm sure someone could improve upon it :)
farmerdude --- JC Helary <[EMAIL PROTECTED]> wrote: > > I don't know if this works for the new format for > OO > > (v2+), but for earlier versions to get the text > from > > the "content.xml" files I would; > > > > cat content.xml | perl -p -e "s/<[^>]*>//g;s/ > +//;" > > Hey, thanks, that is exactly what I was looking for > !!! > > JC > > > Depending upon what I wanted to do next, I could > > redirect to a file, append to an existing file, > etc. > > > > This works very nicely for OO documents. I just > tried > > SXIs and it *seems* to work okay. Test it to see > if > > it works as is, or have fun to tweak to suit your > > specific needs. > > > > regards, > > > > farmerdude > > > > > > --- JC Helary <[EMAIL PROTECTED]> wrote: > > > >>> Having tried various combinations of 'strings' > and'sed', I have > >>> concluded that the text cannot be reliably > extracted without some > >>> more intelligent parsing of the PPT format. OO > obviously performs > >>> this parsing since all the PPT files open > flawlessly in > >>> OpenOffice.org Impress. > >>> > >>> Is there any way I can, using OpenOffice.org, > create a macro to > >>> extract the text from all of these files? There > must be something > >>> better than 1500 copy/paste operations! > >> > >> Greg, > >> > >> 1) there is not save to text in OOo for > presentation files. > >> 2) all the contents is there in the converted OD > file, in the xml > >> 3) there was recently an annoucement about an OOo > batch conversion > >> utility > >> > >> with 3) you transform the PPT files to OD format, > since 1) you > >> can't use that directly but thanks to 2) and > smart XML parsers/ > >> conversion tools you can readily access the > textual data by > >> removing _all_ the xml tags. > >> > >> I have never tried that because I never _had_ to > dump to text but > >> my feeling is that what you ask, although a > little unorthodox is > >> possible with a few tricks. > >> > >> Jean-Christophe > > --------------------------------------------------------------------- > To unsubscribe, e-mail: > [EMAIL PROTECTED] > For additional commands, e-mail: > [EMAIL PROTECTED] > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
