In case Greg isn't aware of it -

OOo files (including the new opendoc formats) are just zip files with
OOo extensions instead of .zip. Try:

unzip -p File.odp content.xml | perl -p -e "s/<[^>]*>//g;s/ +//;"

should work without needing to unpack first.

On Wed, 2006-02-01 at 12:48 +0900, JC Helary wrote:
> > I don't know if this works for the new format for OO
> > (v2+), but for earlier versions to get the text from
> > the "content.xml" files I would;
> >
> > cat content.xml | perl -p -e "s/<[^>]*>//g;s/ +//;"
> 
> Hey, thanks, that is exactly what I was looking for !!!
> 
> JC
> 
> > Depending upon what I wanted to do next, I could
> > redirect to a file, append to an existing file, etc.
> >
> > This works very nicely for OO documents.  I just tried
> > SXIs and it *seems* to work okay.  Test it to see if
> > it works as is, or have fun to tweak to suit your
> > specific needs.
> >
> > regards,
> >
> > farmerdude
> >
> >
> > --- JC Helary <[EMAIL PROTECTED]> wrote:
> >
> >>> Having tried various combinations of 'strings' and'sed', I have
> >>> concluded that the text cannot be reliably extracted without some
> >>> more intelligent parsing of the PPT format.  OO obviously performs
> >>> this parsing since all the PPT files open flawlessly in
> >>> OpenOffice.org Impress.
> >>>
> >>> Is there any way I can, using OpenOffice.org, create a macro to
> >>> extract the text from all of these files?  There must be something
> >>> better than 1500 copy/paste operations!
> >>
> >> Greg,
> >>
> >> 1) there is not save to text in OOo for presentation files.
> >> 2) all the contents is there in the converted OD file, in the xml
> >> 3) there was recently an annoucement about an OOo batch conversion  
> >> utility
> >>
> >> with 3) you transform the PPT files to OD format, since 1) you  
> >> can't use that directly but thanks to 2) and smart XML parsers/ 
> >> conversion tools you can readily access the textual data by  
> >> removing _all_ the xml tags.
> >>
> >> I have never tried that because I never _had_ to dump to text but  
> >> my feeling is that what you ask, although a little unorthodox is  
> >> possible with a few tricks.
> >>
> >> Jean-Christophe
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to