Implemented more of the missing Unicode-related methods in
java.lang.Character, and a missing constructor in java.lang.String.
Also fixed an off by one problem in Character and threw an Exception in
toChars that we weren't throwing.

Note that the String constructor still has issues - I believe the code
there is correct and the internal representation seems to be correct,
but when printing the String to an output stream it doesn't print
correctly.  I haven't investigated this very much, but I suspect it's a
problem with the printing (looping through the characters one 'char' at
a time rather than one 'character' at a time - since characters can be
two chars long if they are supplementary characters).

I have tests for this stuff but they are not yet in Mauve form.  I guess
I'll convert them now. 

More to follow.

2006-01-27  Anthony Balkissoon  <[EMAIL PROTECTED]>

        * java/lang/Character.java:
        (offsetByCodePoints(CharSequence, int, int)): New API method.
        (offsetByCodePoints(char[], int, int, int, int)): Likewise.
        (toChars): Throw the Exception that the docs say we throw.
        (codePointAt): Fixed an off-by-one error in the bounds of the if 
        statement.
        * java/lang/String.java:
        (String(int[], int, int)): New API constructor.

--Tony
Index: java/lang/Character.java
===================================================================
RCS file: /cvsroot/classpath/classpath/java/lang/Character.java,v
retrieving revision 1.42
diff -u -r1.42 Character.java
--- java/lang/Character.java	25 Jan 2006 22:00:44 -0000	1.42
+++ java/lang/Character.java	27 Jan 2006 19:13:57 -0000
@@ -1741,6 +1741,136 @@
   }
   
   /**
+   * Returns the index into the given CharSequence that is offset
+   * <code>codePointOffset</code> code points from <code>index</code>.
+   * @param seq the CharSequence
+   * @param index the start position in the CharSequence
+   * @param codePointOffset the number of code points offset from the start
+   * position
+   * @return the index into the CharSequence that is codePointOffset code 
+   * points offset from index
+   * 
+   * @throws NullPointerException if seq is null
+   * @throws IndexOutOfBoundsException if index is negative or greater than the
+   * length of the sequence.
+   * @throws IndexOutOfBoundsException if codePointOffset is positive and the 
+   * subsequence from index to the end of seq has fewer than codePointOffset
+   * code points
+   * @throws IndexOutOfBoundsException if codePointOffset is negative and the
+   * subsequence from the start of seq to index has fewer than 
+   * (-codePointOffset) code points
+   * @since 1.5
+   */
+  public static int offsetByCodePoints(CharSequence seq,
+                                       int index,
+                                       int codePointOffset)
+  {
+    int len = seq.length();
+    if (index < 0 || index > len)
+      throw new IndexOutOfBoundsException();
+    
+    int numToGo = codePointOffset;
+    int offset = index;
+    int adjust = 1;
+    if (numToGo >= 0)
+      {
+        for (; numToGo > 0; offset++)
+          {
+            numToGo--;
+            if (Character.isHighSurrogate(seq.charAt(offset))
+                && (offset + 1) < len
+                && Character.isLowSurrogate(seq.charAt(offset + 1)))
+              offset++;
+          }
+        return offset;
+      }
+    else
+      {
+        numToGo *= -1;
+        for (; numToGo > 0;)
+          {
+            numToGo--;
+            offset--;
+            if (Character.isLowSurrogate(seq.charAt(offset))
+                && (offset - 1) >= 0
+                && Character.isHighSurrogate(seq.charAt(offset - 1)))
+              offset--;
+          }
+        return offset;
+      }
+  }
+  
+  /**
+   * Returns the index into the given char subarray that is offset
+   * <code>codePointOffset</code> code points from <code>index</code>.
+   * @param a the char array
+   * @param start the start index of the subarray
+   * @param count the length of the subarray
+   * @param index the index to be offset
+   * @param codePointOffset the number of code points offset from <code>index
+   * </code>
+   * @return the index into the char array
+   * 
+   * @throws NullPointerException if a is null
+   * @throws IndexOutOfBoundsException if start or count is negative or if
+   * start + count is greater than the length of the array
+   * @throws IndexOutOfBoundsException if index is less than start or larger 
+   * than start + count
+   * @throws IndexOutOfBoundsException if codePointOffset is positive and the
+   * subarray from index to start + count - 1 has fewer than codePointOffset
+   * code points.
+   * @throws IndexOutOfBoundsException if codePointOffset is negative and the
+   * subarray from start to index - 1 has fewer than (-codePointOffset) code
+   * points
+   * @since 1.5
+
+   */
+  public static int offsetByCodePoints(char[] a,
+                                       int start,
+                                       int count,
+                                       int index,
+                                       int codePointOffset)
+  {
+    int len = a.length;
+    int end = start + count;
+    if (start < 0 || count < 0 || end > len || index < start || index > end)
+      throw new IndexOutOfBoundsException();
+    
+    int numToGo = codePointOffset;
+    int offset = index;
+    int adjust = 1;
+    if (numToGo >= 0)
+      {
+        for (; numToGo > 0; offset++)
+          {
+            numToGo--;
+            if (Character.isHighSurrogate(a[offset])
+                && (offset + 1) < len
+                && Character.isLowSurrogate(a[offset + 1]))
+              offset++;
+          }
+        return offset;
+      }
+    else
+      {
+        numToGo *= -1;
+        for (; numToGo > 0;)
+          {
+            numToGo--;
+            offset--;
+            if (Character.isLowSurrogate(a[offset])
+                && (offset - 1) >= 0
+                && Character.isHighSurrogate(a[offset - 1]))
+              offset--;
+            if (offset < start)
+              throw new IndexOutOfBoundsException();
+          }
+        return offset;
+      }
+
+  }
+  
+  /**
    * Returns the number of Unicode code points in the specified range of the
    * given CharSequence.  The first char in the range is at position
    * beginIndex and the last one is at position endIndex - 1.  Paired 
@@ -2447,6 +2577,9 @@
    */
   public static char[] toChars(int codePoint)
   {
+    if (!isValidCodePoint(codePoint))
+      throw new IllegalArgumentException("Illegal Unicode code point : "
+                                         + codePoint);
     char[] result = new char[charCount(codePoint)];
     int ignore = toChars(codePoint, result, 0);
     return result;
@@ -2664,7 +2797,7 @@
    */
   public static int codePointAt(char[] chars, int index, int limit)
   {
-    if (index < 0 || index >= limit || limit < 0 || limit >= chars.length)
+    if (index < 0 || index >= limit || limit < 0 || limit > chars.length)
       throw new IndexOutOfBoundsException();
     char high = chars[index];
     if (! isHighSurrogate(high) || ++index >= limit)
Index: java/lang/String.java
===================================================================
RCS file: /cvsroot/classpath/classpath/java/lang/String.java,v
retrieving revision 1.79
diff -u -r1.79 String.java
--- java/lang/String.java	6 Jan 2006 22:53:51 -0000	1.79
+++ java/lang/String.java	27 Jan 2006 19:13:58 -0000
@@ -555,6 +555,49 @@
   }
 
   /**
+   * Creates a new String containing the characters represented in the
+   * given subarray of Unicode code points.
+   * @param codePoints the entire array of code points
+   * @param offset the start of the subarray
+   * @param count the length of the subarray
+   * 
+   * @throws IllegalArgumentException if an invalid code point is found
+   * in the codePoints array
+   * @throws IndexOutOfBoundsException if offset is negative or offset + count
+   * is greater than the length of the array.
+   */
+  public String(int[] codePoints, int offset, int count)
+  {
+    // FIXME: This implementation appears to give correct internal
+    // representation of the String because: 
+    //   - length() is correct
+    //   - getting a char[] from toCharArray() and testing 
+    //     Character.codePointAt() on all the characters in that array gives
+    //     the appropriate results
+    // however printing the String gives incorrect results.  This may be 
+    // due to printing method errors (such as incorrectly looping through
+    // the String one char at a time rather than one "character" at a time.
+    
+    if (offset < 0)
+      throw new IndexOutOfBoundsException();
+    int end = offset + count;
+    int pos = 0;
+    // This creates a char array that is long enough for all of the code
+    // points to represent supplementary characters.  This is more than likely
+    // a waste of storage, so we use it only temporarily and then copy the 
+    // used portion into the value array.
+    char[] temp = new char[2 * codePoints.length];
+    for (int i = offset; i < end; i++)
+      {
+        pos += Character.toChars(codePoints[i], temp, pos);        
+      }
+    this.count = pos;
+    this.value = new char[pos];
+    System.arraycopy(temp, 0, value, 0, pos);
+    this.offset = 0;
+  }
+  
+  /**
    * Returns the number of characters contained in this String.
    *
    * @return the length of this String

Reply via email to