DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG· RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://issues.apache.org/bugzilla/show_bug.cgi?id=38230>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND· INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=38230 Summary: [PATCH] UnicodeString#fillFields invalid read of non US characters >=128 and <=255 Product: POI Version: 3.0-dev Platform: All OS/Version: other Status: NEW Keywords: PatchAvailable Severity: major Priority: P2 Component: HSSF AssignedTo: poi-dev@jakarta.apache.org ReportedBy: [EMAIL PROTECTED] CC: [EMAIL PROTECTED] I had a problem reading HSSFCell values with german specific letters (umlauts). Most probably the same difficulties apply to all characters from integer value 128 to 255. They all have ended up with high byte having all bits set to 1. It has turned out this is a type cast problem on J2SE 1.4.2(06). Casting from byte to char seems to take the highest bit of the byte to fill the high byte of the char value. German umlaut ä (ä) uses 0xe4 or 11100100. Converting this value to char results in 1111111111100100. See this small code: ---------------------- public class ByteConverterTest { public static void main(String[] args) { byte umlautChar = (byte)0xe4; // the German umlaut ä ä char badEncoded = (char)umlautChar; char goodEncoded = (char)( (short)0xff & (short)umlautChar ); System.out.println("Badly converted umlaut uses hex value: " + Integer.toHexString(badEncoded)); System.out.println("Good converted umlaut uses hex value: " + Integer.toHexString(goodEncoded) + "\n"); } } ---------------------- Output is: ---------------------- Badly converted umlaut uses hex value: ffe4 Good converted umlaut uses hex value: e4 ---------------------- Attached you will find a patch to resolve this issue with the class UnicodeString. The function fillFields uses this type of inproper type cast. Perhaps ofer classes do as well. Reproducible: Always (see test code) Plattform: Windows 2k, Linux 2.6.x JVM: J2SE 1.4.2(06) and J2SE 1.4.2(10) For those who are experiencing the same problem but do not want to wait for this patch making its way to CVS, you can use the following code to convert your cell value to proper Java string: ---------------------- String cellValue = cell.getRichStringCellValue().getString(); // clean invalid type casts if (cellValue != null) { char[] buffer = cellValue.toCharArray(); StringBuffer newValue = new StringBuffer(buffer.length); for (int i=0; i<cellValue.length; i++) { char charValue = buffer[i]; short numValue = (short)charValue; // strip high byte if all bits are set to 1 if ((numValue & 0xff00) == 0xff00) charValue = (char)(numValue & 0xff); newValue.append(charValue); } cellValue = newValue.toString(); } ---------------------- I have tried to find a previously entered bug report on this subject but failed. I am sorry if i have missed it. -- Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] Mailing List: http://jakarta.apache.org/site/mail2.html#poi The Apache Jakarta POI Project: http://jakarta.apache.org/poi/