Daniel Noll wrote:

I'll submit a patch in a few minutes if I can clean up my codesufficiently, but it won't be a clean patch so you'll probably have to rearrange it a little. :-/

Patch attached as promised / threatened. ;-)

To compare before and after, you might want to construct a test document, and create a custom property with non-latin name and value. The previous code will work for the value but not the name, and this update should hopefully make it work for the property name as well.

I'm not sure if this works in all cases, but it actually seems to behave for our test UTF-8 custom properties, which should be the most exotic encoding one would come across (fingers crossed.)

Daniel

--
Daniel Noll

NUIX Pty Ltd
Level 8, 143 York Street, Sydney 2000
Phone: (02) 9283 9010
Fax:   (02) 9283 9020

This message is intended only for the named recipient. If you are not
the intended recipient you are notified that disclosing, copying,
distributing or taking any action in reliance on the contents of this
message or attachment is strictly prohibited.

Index: src/java/org/apache/poi/hpsf/Property.java
===================================================================
RCS file: 
/home/cvspublic/jakarta-poi/src/java/org/apache/poi/hpsf/Property.java,v
retrieving revision 1.20
diff -u -r1.20 Property.java
--- src/java/org/apache/poi/hpsf/Property.java  31 Aug 2004 20:45:00 -0000      
1.20
+++ src/java/org/apache/poi/hpsf/Property.java  30 Mar 2005 06:30:44 -0000
@@ -170,9 +170,12 @@
      * @param length The dictionary contains at most this many bytes.
      * @param codepage The codepage of the string values.
      * @return The dictonary
+     * @exception UnsupportedEncodingException if the specified codepage is not
+     * supported.
      */
     protected Map readDictionary(final byte[] src, final long offset,
                                  final int length, final int codepage)
+    throws UnsupportedEncodingException
     {
         /* Check whether "offset" points into the "src" array". */
         if (offset < 0 || offset > src.length)
@@ -202,19 +205,23 @@
             long sLength = LittleEndian.getUInt(src, o);
             o += LittleEndian.INT_SIZE;

-            /* Read the bytes or characters depending on whether the
-             * character set is Unicode or not. */
-            StringBuffer b = new StringBuffer((int) sLength);
-            for (int j = 0; j < sLength; j++)
-                if (codepage == Constants.CP_UNICODE)
-                {
-                    final int i1 = o + (j * 2);
-                    final int i2 = i1 + 1;
-                    b.append((char) ((src[i2] << 8) + src[i1]));
-                }
-                else
-                    b.append((char) src[o + j]);
-
+            String value;
+            switch (codepage)
+            {
+                case -1:
+                    value = new String(src, o, (int) sLength);
+                    break;
+                case Constants.CP_UNICODE:
+                    // In the case of UTF-16, the length represents the number 
of characters.
+                    value = new String(src, o, (int) sLength * 2, 
VariantSupport.codepageToEncoding(codepage));
+                    break;
+                default:
+                    // TODO: Confirm the behaviour of UTF-8.
+                    value = new String(src, o, (int) sLength, 
VariantSupport.codepageToEncoding(codepage));
+            }
+
+            StringBuffer b = new StringBuffer(value);
+
             /* Strip 0x00 characters from the end of the string: */
             while (b.length() > 0 && b.charAt(b.length() - 1) == 0x00)
                 b.setLength(b.length() - 1);

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/

Reply via email to