Check this, http://wiki.apache.org/jakarta-lucene-data/attachments/PowerPoint/attachments/PPT2Text.java
--- Ryan Rhodes <[EMAIL PROTECTED]> wrote: > Hi Ralph, > > I haven't tested the PPT extractor with any other languages. I remember > reading about other people having problems with different character sets > though. > > Could you send a before and after example file here or to bugzilla? > > -Ryan Rhodes > > > -----Original Message----- > From: Ralph Scheuer [mailto:[EMAIL PROTECTED] > Sent: Wednesday, July 28, 2004 10:01 AM > To: slide > Subject: MSPowerPointExtractor problem > > Hello everybody, > > When I was searching for a Java class to extract text from PowerPoint > files, I accidentally discovered Slide. > > I pulled the MSPowerPointExtractor class and some other stuff it > depends on via CVS and tried it for some text extraction. > > The method I used looks very similar to the provided example main > method (see below). > > However. when I tried to extract text from a German PowerPoint > presentation, I had some problems with the encoding. I did not know > which encoding to use, converting the output to ISO Latin 1 with my > text editor solved only part of the problem (some German Umlaute were > displayed correctly, some were not). > > Is this a known issue or am I doing something wrong? Any hints for me? > > Thanks in advance. > > Ralph Scheuer > > BTW. I am using Mac OS X 10.3.4 with JDK 1.4.2_03, the native encoding > on this platform is MacRoman. > > > public static String contentStringForData(NSData data){ > > StringBuffer buf = new StringBuffer(); > try{ > ByteArrayInputStream input = data.stream(); > MSPowerPointExtractor ex = new MSPowerPointExtractor(null, > null); > > Reader reader = ex.extract(input); > > int c; > do > { > c = reader.read(); > > buf.append((char)c); > } > while( c != -1 ); > }catch(Exception e){ > > } > > return buf.toString(); > } > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > ===== "No one can earn a million dollars honestly."- William Jennings Bryan (1860-1925) "Make everything as simple as possible, but not simpler."- Albert Einstein (1879-1955) "It is dangerous to be sincere unless you are also stupid."- George Bernard Shaw (1856-1950) __________________________________ Do you Yahoo!? New and Improved Yahoo! Mail - 100MB free storage! http://promotions.yahoo.com/new_mail --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
