Hmmmmm,

Basically we have concentrated on English language. So we never faced any problems. It 
become a
new task for our team now :-) 

Thanks to Ralph in pointing that problem.

We Will work on related and let the Jakarta team knows :-)

Regards
Sudhakar





--- Ralph Scheuer <[EMAIL PROTECTED]> wrote:

> Ryan,
> 
> thanks for your reply.
> 
> I have also seen the posts from Sudhakar on this subject who seems to 
> be contributing a whole lot of code here - which is a great thing but 
> in this code the problem also persists so I think we solve this 
> encoding problem in your code (which is simpler - the fix could later 
> be integrated into Sudhakar's code if this is checked in or 
> whatever...).
> 
> I have tested this with a simple PPT file containing just the following 
> text:
> 
> Umlaut-Test
> �kologie, M�hsal, Gr��e, Gr�tsche
> 
> I get the following console output with this text:
> 
> Umlaut-Test
> \326kologie, M\374hsal, Gr\374\337e, Gr\344tsche
> 
> Here is the output I get in a web browser (through a web app, "view 
> HTML source" mode):
> 
> Umlaut-Test �kologie, M�hsal, Gr�?e, Gr?tsche
> 
> German "umlaute" and other special characters work fine that way 
> whenever I extract text from Word documents or Excel spreadsheets using 
> POI and Ryan Ackley's TextMining framework.
> 
> just for the record: I have only tested this on my own configuration: 
> Mac OS X 10.3.4, Java 1.4.2_03 so I have no idea how these classes 
> might behave on Linux or Windows. Can anybody confirm this? I have seen 
> some German names on this list ;-)
> 
> Thanks for all the work you put into this.
> 
> Ralph Scheuer
> 
> Am 01.08.2004 um 08:07 schrieb Ryan Rhodes:
> 
> > Hi Ralph,
> >
> > I haven't tested the PPT extractor with any other languages.  I 
> > remember
> > reading about other people having problems with different character 
> > sets
> > though.
> >
> > Could you send a before and after example file here or to bugzilla?
> >
> > -Ryan Rhodes
> >
> >
> > -----Original Message-----
> > From: Ralph Scheuer [mailto:[EMAIL PROTECTED]
> > Sent: Wednesday, July 28, 2004 10:01 AM
> > To: slide
> > Subject: MSPowerPointExtractor problem
> >
> > Hello everybody,
> >
> > When I was searching for a Java class to extract text from PowerPoint
> > files, I accidentally discovered Slide.
> >
> > I pulled the MSPowerPointExtractor class and some other stuff it
> > depends on via CVS and tried it for some text extraction.
> >
> > The method I used looks very similar to the provided example main
> > method (see below).
> >
> > However. when I tried to extract text from a German PowerPoint
> > presentation, I had some problems with the encoding. I did not know
> > which encoding to use, converting the output to ISO Latin 1 with my
> > text editor solved only part of the problem (some German Umlaute were
> > displayed correctly, some were not).
> >
> > Is this a known issue or am I doing something wrong? Any hints for me?
> >
> > Thanks in advance.
> >
> > Ralph Scheuer
> >
> > BTW. I am using Mac OS X 10.3.4 with JDK 1.4.2_03, the native encoding
> > on this platform is MacRoman.
> >
> >
> >      public static String contentStringForData(NSData data){
> >     
> >     StringBuffer buf = new StringBuffer();
> >     try{
> >         ByteArrayInputStream input = data.stream();
> >         MSPowerPointExtractor ex = new MSPowerPointExtractor(null,
> > null);
> >     
> >         Reader reader = ex.extract(input);
> >     
> >         int c;
> >         do
> >             {
> >                 c = reader.read();
> >             
> >                 buf.append((char)c);
> >             }
> >         while( c != -1 );
> >     }catch(Exception e){
> >     
> >     }
> >     
> >     return buf.toString();
> >      }
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


=====
"No one can earn a million dollars honestly."- William Jennings Bryan (1860-1925) 

"Make everything as simple as possible, but not simpler."- Albert Einstein (1879-1955)

"It is dangerous to be sincere unless you are also stupid."- George Bernard Shaw 
(1856-1950)


        
                
__________________________________
Do you Yahoo!?
New and Improved Yahoo! Mail - 100MB free storage!
http://promotions.yahoo.com/new_mail 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to