DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://nagoya.apache.org/bugzilla/show_bug.cgi?id=17824>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=17824 about reading ms. doc file Summary: about reading ms. doc file Product: POI Version: unspecified Platform: Sun OS/Version: Other Status: NEW Severity: Major Priority: Other Component: HDF AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] When i read a ms doc file with using HDF classes. I have got a big problem. If my data is not unicode and contains english char then there is no problem. But when i use unicode or utf-8 charset then i have a big problem. because when we use those type of charter string. It doesn't read all the data. it stopped to read some part of the data for example if i use something like inside of demo.doc document: ���� then when we read we got ��� and it is increasing like this. i will send my example given below public class Deneme { public static void main(String[] args) { testDoc deneme = new testDoc("demo.doc","demo.txt"); deneme.getText(); } } ----------------------------- //------- this code writes doc file to txt----------- //------go get hfd libs from jakarta.poi (scratchpad at the moment)------------- ------------------- //------------------------------------------------------------------------------ --------------- import org.apache.poi.hdf.extractor.util.*; import org.apache.poi.hdf.extractor.data.*; import org.apache.poi.hdf.extractor.*; import java.util.*; import java.io.*; import javax.swing.*; import java.awt.*; import org.apache.poi.poifs.filesystem.POIFSFileSystem; import org.apache.poi.poifs.filesystem.POIFSDocument; import org.apache.poi.poifs.filesystem.DocumentEntry; import org.apache.poi.util.LittleEndian; class testDoc extends Deneme{ String origFileName; String tempFile; WordDocument wd; testDoc(String origFileName, String tempFile) { this.tempFile=tempFile; this.origFileName=origFileName; } public void getText() { try { wd = new WordDocument(origFileName); //Writer out = new BufferedWriter(new FileWriter(tempFile)); //eskisi Writer out = new OutputStreamWriter(new FileOutputStream(tempFile),"utf-8"); wd.writeAllText(out); out.flush(); out.close(); } catch (Exception eN) { System.out.println("Error reading document:"+origFileName+"\n"+eN.toString()); } } // end for getText } // end of class ------------------------ the problem starts in wd.writeAllText(out); when we look at the this method we see that end integer doesn't get the end point when we use unicode ms doc file.. Thank you for your supports. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
