The following link may provide a starting point for your batch conversion. Impress allows export or saving as HTML. That may be more convenient than scanning XML files. I would guess that this will also split most of the text from the graphics for you as part of the conversion.
http://www.oooforum.org/forum/viewtopic.phtml?t=673 On Wed, 2006-02-01 at 12:09 +0900, JC Helary wrote: > > Having tried various combinations of 'strings' and 'sed', I have > > concluded that the text cannot be reliably extracted without some > > more intelligent parsing of the PPT format. OO obviously performs > > this parsing since all the PPT files open flawlessly in > > OpenOffice.org Impress. > > > > Is there any way I can, using OpenOffice.org, create a macro to > > extract the text from all of these files? There must be something > > better than 1500 copy/paste operations! > > Greg, > > 1) there is not save to text in OOo for presentation files. > 2) all the contents is there in the converted OD file, in the xml > 3) there was recently an annoucement about an OOo batch conversion > utility > > with 3) you transform the PPT files to OD format, since 1) you can't > use that directly but thanks to 2) and smart XML parsers/conversion > tools you can readily access the textual data by removing _all_ the > xml tags. > > I have never tried that because I never _had_ to dump to text but my > feeling is that what you ask, although a little unorthodox is > possible with a few tricks. > > Jean-Christophe > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
