Hmmmmm, Basically we have concentrated on English language. So we never faced any problems. It become a new task for our team now :-)
Thanks to Ralph in pointing that problem. We Will work on related and let the Jakarta team knows :-) Regards Sudhakar --- Ralph Scheuer <[EMAIL PROTECTED]> wrote: > Ryan, > > thanks for your reply. > > I have also seen the posts from Sudhakar on this subject who seems to > be contributing a whole lot of code here - which is a great thing but > in this code the problem also persists so I think we solve this > encoding problem in your code (which is simpler - the fix could later > be integrated into Sudhakar's code if this is checked in or > whatever...). > > I have tested this with a simple PPT file containing just the following > text: > > Umlaut-Test > �kologie, M�hsal, Gr��e, Gr�tsche > > I get the following console output with this text: > > Umlaut-Test > \326kologie, M\374hsal, Gr\374\337e, Gr\344tsche > > Here is the output I get in a web browser (through a web app, "view > HTML source" mode): > > Umlaut-Test �kologie, M�hsal, Gr�?e, Gr?tsche > > German "umlaute" and other special characters work fine that way > whenever I extract text from Word documents or Excel spreadsheets using > POI and Ryan Ackley's TextMining framework. > > just for the record: I have only tested this on my own configuration: > Mac OS X 10.3.4, Java 1.4.2_03 so I have no idea how these classes > might behave on Linux or Windows. Can anybody confirm this? I have seen > some German names on this list ;-) > > Thanks for all the work you put into this. > > Ralph Scheuer > > Am 01.08.2004 um 08:07 schrieb Ryan Rhodes: > > > Hi Ralph, > > > > I haven't tested the PPT extractor with any other languages. I > > remember > > reading about other people having problems with different character > > sets > > though. > > > > Could you send a before and after example file here or to bugzilla? > > > > -Ryan Rhodes > > > > > > -----Original Message----- > > From: Ralph Scheuer [mailto:[EMAIL PROTECTED] > > Sent: Wednesday, July 28, 2004 10:01 AM > > To: slide > > Subject: MSPowerPointExtractor problem > > > > Hello everybody, > > > > When I was searching for a Java class to extract text from PowerPoint > > files, I accidentally discovered Slide. > > > > I pulled the MSPowerPointExtractor class and some other stuff it > > depends on via CVS and tried it for some text extraction. > > > > The method I used looks very similar to the provided example main > > method (see below). > > > > However. when I tried to extract text from a German PowerPoint > > presentation, I had some problems with the encoding. I did not know > > which encoding to use, converting the output to ISO Latin 1 with my > > text editor solved only part of the problem (some German Umlaute were > > displayed correctly, some were not). > > > > Is this a known issue or am I doing something wrong? Any hints for me? > > > > Thanks in advance. > > > > Ralph Scheuer > > > > BTW. I am using Mac OS X 10.3.4 with JDK 1.4.2_03, the native encoding > > on this platform is MacRoman. > > > > > > public static String contentStringForData(NSData data){ > > > > StringBuffer buf = new StringBuffer(); > > try{ > > ByteArrayInputStream input = data.stream(); > > MSPowerPointExtractor ex = new MSPowerPointExtractor(null, > > null); > > > > Reader reader = ex.extract(input); > > > > int c; > > do > > { > > c = reader.read(); > > > > buf.append((char)c); > > } > > while( c != -1 ); > > }catch(Exception e){ > > > > } > > > > return buf.toString(); > > } > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > ===== "No one can earn a million dollars honestly."- William Jennings Bryan (1860-1925) "Make everything as simple as possible, but not simpler."- Albert Einstein (1879-1955) "It is dangerous to be sincere unless you are also stupid."- George Bernard Shaw (1856-1950) __________________________________ Do you Yahoo!? New and Improved Yahoo! Mail - 100MB free storage! http://promotions.yahoo.com/new_mail --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
