Daniel Noll wrote:
I'll submit a patch in a few minutes if I can clean up my codesufficiently, but it won't be a clean patch so you'll probably have to rearrange it a little. :-/
Patch attached as promised / threatened. ;-)
To compare before and after, you might want to construct a test document, and create a custom property with non-latin name and value. The previous code will work for the value but not the name, and this update should hopefully make it work for the property name as well.
I'm not sure if this works in all cases, but it actually seems to behave for our test UTF-8 custom properties, which should be the most exotic encoding one would come across (fingers crossed.)
Daniel
-- Daniel Noll
NUIX Pty Ltd Level 8, 143 York Street, Sydney 2000 Phone: (02) 9283 9010 Fax: (02) 9283 9020
This message is intended only for the named recipient. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this message or attachment is strictly prohibited.
Index: src/java/org/apache/poi/hpsf/Property.java
===================================================================
RCS file:
/home/cvspublic/jakarta-poi/src/java/org/apache/poi/hpsf/Property.java,v
retrieving revision 1.20
diff -u -r1.20 Property.java
--- src/java/org/apache/poi/hpsf/Property.java 31 Aug 2004 20:45:00 -0000
1.20
+++ src/java/org/apache/poi/hpsf/Property.java 30 Mar 2005 06:30:44 -0000
@@ -170,9 +170,12 @@
* @param length The dictionary contains at most this many bytes.
* @param codepage The codepage of the string values.
* @return The dictonary
+ * @exception UnsupportedEncodingException if the specified codepage is not
+ * supported.
*/
protected Map readDictionary(final byte[] src, final long offset,
final int length, final int codepage)
+ throws UnsupportedEncodingException
{
/* Check whether "offset" points into the "src" array". */
if (offset < 0 || offset > src.length)
@@ -202,19 +205,23 @@
long sLength = LittleEndian.getUInt(src, o);
o += LittleEndian.INT_SIZE;
- /* Read the bytes or characters depending on whether the
- * character set is Unicode or not. */
- StringBuffer b = new StringBuffer((int) sLength);
- for (int j = 0; j < sLength; j++)
- if (codepage == Constants.CP_UNICODE)
- {
- final int i1 = o + (j * 2);
- final int i2 = i1 + 1;
- b.append((char) ((src[i2] << 8) + src[i1]));
- }
- else
- b.append((char) src[o + j]);
-
+ String value;
+ switch (codepage)
+ {
+ case -1:
+ value = new String(src, o, (int) sLength);
+ break;
+ case Constants.CP_UNICODE:
+ // In the case of UTF-16, the length represents the number
of characters.
+ value = new String(src, o, (int) sLength * 2,
VariantSupport.codepageToEncoding(codepage));
+ break;
+ default:
+ // TODO: Confirm the behaviour of UTF-8.
+ value = new String(src, o, (int) sLength,
VariantSupport.codepageToEncoding(codepage));
+ }
+
+ StringBuffer b = new StringBuffer(value);
+
/* Strip 0x00 characters from the end of the string: */
while (b.length() > 0 && b.charAt(b.length() - 1) == 0x00)
b.setLength(b.length() - 1);
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] Mailing List: http://jakarta.apache.org/site/mail2.html#poi The Apache Jakarta Poi Project: http://jakarta.apache.org/poi/
