I realized there are two parts to this, so here's the first part.

Note that the charset API has the entire encoding API in it. This allows the charset to interfere with what code wants to do to the underlying byte buffer. (The unicode charset will undoubtedly want to make sure those set_codepoint and set_byte calls don't leave bogus unicode strings, for example)

Also note that there's no "get single grapheme" the way there's a "get single byte/codepoint". This is because a grapheme must be returned as a STRING, since it may be multiple codepoints.

Additionally note that any function which deals with multiple strings will throw charset mismatch exceptions when those two strings aren't of a compatible type. How compatible depends on current interpreter error flag settings. (So they may upconvert both to unicode, or a common encoding, if the flags so indicate)

  STRING get_graphemes(STRING, grapehem_offset, count)

    Returns a string of count graphemes starting at the offset

  void set_graphemes(STRING, grapheme_offset, count, insertion_string)

    Set count graphemes, starting at offset, to the contents of
    insertion_string

  void to_charset(STRING, charset)

    Transform the string to the specified charset in place.

  STRING copy_to_charset(STRING, charset)

    Create a new string from the base string, transforming the dataa
    to the specified charset

  void to_unicode(STRING)

    Transform the string to a unicode string in place

  void compose(STRING)

    Fully compose the string, if the charset supports it

  void decompose(STRING)

    Fully decompose the string, if the charset supports it

  void upcase(STRING)

    upcase the entire string in place

  void upcase_first(STRING)

    upcase the first grapheme of the string in place

  void downcase(STRING)

    downcase the entire string in place

  void downcase_first(STRING)

    downcase the first grapheme of the string in place

  void titlecase(STRING)

    titlecase the entire string in place

  void titlecase_first(STRING)

    titlecase the first grapheme of the string in place

  INTVAL compare(STRING, STRING)

    compare the two strings. Return 1 if the first string is logically
    greater, -1 if it is logically lesser, and 0 if they are logically
    the same. The composition state of the two strings is irrelevant if
    the charset defines it as such. May transform one or both strings
    to a common encoding.

  INTVAL index(STRING, STRING, grapheme_offset)

    Return the index of the second string in the first string, with the
    search starting at the specified offset. Returns -1 if not found.

  INTVAL rindex(STRING, STRING, grapheme_offset)

    Return the index of the second string in the first string, with the
    search starting at the specified offset. Returns -1 if not found.
    Searches backwards.

  validate(STRING)

    revalidate the string, to make sure it's still acceptable.

--
                                Dan

--------------------------------------it's like this-------------------
Dan Sugalski                          even samurai
[EMAIL PROTECTED]                         have teddy bears and even
                                      teddy bears get drunk

Reply via email to