I realized there are two parts to this, so here's the first part.
Note that the charset API has the entire encoding API in it. This allows the charset to interfere with what code wants to do to the underlying byte buffer. (The unicode charset will undoubtedly want to make sure those set_codepoint and set_byte calls don't leave bogus unicode strings, for example)
Also note that there's no "get single grapheme" the way there's a "get single byte/codepoint". This is because a grapheme must be returned as a STRING, since it may be multiple codepoints.
Additionally note that any function which deals with multiple strings will throw charset mismatch exceptions when those two strings aren't of a compatible type. How compatible depends on current interpreter error flag settings. (So they may upconvert both to unicode, or a common encoding, if the flags so indicate)
STRING get_graphemes(STRING, grapehem_offset, count)
Returns a string of count graphemes starting at the offset
void set_graphemes(STRING, grapheme_offset, count, insertion_string)
Set count graphemes, starting at offset, to the contents of insertion_string
void to_charset(STRING, charset)
Transform the string to the specified charset in place.
STRING copy_to_charset(STRING, charset)
Create a new string from the base string, transforming the dataa to the specified charset
void to_unicode(STRING)
Transform the string to a unicode string in place
void compose(STRING)
Fully compose the string, if the charset supports it
void decompose(STRING)
Fully decompose the string, if the charset supports it
void upcase(STRING)
upcase the entire string in place
void upcase_first(STRING)
upcase the first grapheme of the string in place
void downcase(STRING)
downcase the entire string in place
void downcase_first(STRING)
downcase the first grapheme of the string in place
void titlecase(STRING)
titlecase the entire string in place
void titlecase_first(STRING)
titlecase the first grapheme of the string in place
INTVAL compare(STRING, STRING)
compare the two strings. Return 1 if the first string is logically greater, -1 if it is logically lesser, and 0 if they are logically the same. The composition state of the two strings is irrelevant if the charset defines it as such. May transform one or both strings to a common encoding.
INTVAL index(STRING, STRING, grapheme_offset)
Return the index of the second string in the first string, with the search starting at the specified offset. Returns -1 if not found.
INTVAL rindex(STRING, STRING, grapheme_offset)
Return the index of the second string in the first string, with the search starting at the specified offset. Returns -1 if not found. Searches backwards.
validate(STRING)
revalidate the string, to make sure it's still acceptable.
-- Dan
--------------------------------------it's like this------------------- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk