Check this,

http://wiki.apache.org/jakarta-lucene-data/attachments/PowerPoint/attachments/PPT2Text.java

--- Ryan Rhodes <[EMAIL PROTECTED]> wrote:

> Hi Ralph,
> 
> I haven't tested the PPT extractor with any other languages.  I remember
> reading about other people having problems with different character sets
> though.
> 
> Could you send a before and after example file here or to bugzilla?
> 
> -Ryan Rhodes
> 
> 
> -----Original Message-----
> From: Ralph Scheuer [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, July 28, 2004 10:01 AM
> To: slide
> Subject: MSPowerPointExtractor problem
> 
> Hello everybody,
> 
> When I was searching for a Java class to extract text from PowerPoint 
> files, I accidentally discovered Slide.
> 
> I pulled the MSPowerPointExtractor class and some other stuff it 
> depends on via CVS and tried it for some text extraction.
> 
> The method I used looks very similar to the provided example main 
> method (see below).
> 
> However. when I tried to extract text from a German PowerPoint 
> presentation, I had some problems with the encoding. I did not know 
> which encoding to use, converting the output to ISO Latin 1 with my 
> text editor solved only part of the problem (some German Umlaute were 
> displayed correctly, some were not).
> 
> Is this a known issue or am I doing something wrong? Any hints for me?
> 
> Thanks in advance.
> 
> Ralph Scheuer
> 
> BTW. I am using Mac OS X 10.3.4 with JDK 1.4.2_03, the native encoding 
> on this platform is MacRoman.
> 
> 
>      public static String contentStringForData(NSData data){
>       
>       StringBuffer buf = new StringBuffer();
>       try{
>           ByteArrayInputStream input = data.stream();
>           MSPowerPointExtractor ex = new MSPowerPointExtractor(null,
> null);
>       
>           Reader reader = ex.extract(input);
>       
>           int c;
>           do
>               {
>                   c = reader.read();
>               
>                   buf.append((char)c);
>               }
>           while( c != -1 );
>       }catch(Exception e){
>       
>       }
>       
>       return buf.toString();
>      }
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


=====
"No one can earn a million dollars honestly."- William Jennings Bryan (1860-1925) 

"Make everything as simple as possible, but not simpler."- Albert Einstein (1879-1955)

"It is dangerous to be sincere unless you are also stupid."- George Bernard Shaw 
(1856-1950)


        
                
__________________________________
Do you Yahoo!?
New and Improved Yahoo! Mail - 100MB free storage!
http://promotions.yahoo.com/new_mail 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to