The following link may provide a starting point for your batch
conversion. Impress allows export or saving as HTML. That may be more
convenient than scanning XML files. I would guess that this will also
split most of the text from the graphics for you as part of the
conversion.

http://www.oooforum.org/forum/viewtopic.phtml?t=673

On Wed, 2006-02-01 at 12:09 +0900, JC Helary wrote:
> > Having tried various combinations of 'strings' and 'sed', I have  
> > concluded that the text cannot be reliably extracted without some  
> > more intelligent parsing of the PPT format.  OO obviously performs  
> > this parsing since all the PPT files open flawlessly in  
> > OpenOffice.org Impress.
> >
> > Is there any way I can, using OpenOffice.org, create a macro to  
> > extract the text from all of these files?  There must be something  
> > better than 1500 copy/paste operations!
> 
> Greg,
> 
> 1) there is not save to text in OOo for presentation files.
> 2) all the contents is there in the converted OD file, in the xml
> 3) there was recently an annoucement about an OOo batch conversion  
> utility
> 
> with 3) you transform the PPT files to OD format, since 1) you can't  
> use that directly but thanks to 2) and smart XML parsers/conversion  
> tools you can readily access the textual data by removing _all_ the  
> xml tags.
> 
> I have never tried that because I never _had_ to dump to text but my  
> feeling is that what you ask, although a little unorthodox is  
> possible with a few tricks.
> 
> Jean-Christophe
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to