Juliusz Chroboczek wrote:
> I believe that Java strings use UTF-8 internally.

.class files use a _modified_ utf-8. at runtime, strings are always in 16-bit unicode.

>  At any rate the
> internal implementation is not exposed to applications -- note that
> `length' is a method in class String (while it is a field in vector
> classes).

but length() and charAt() are some of the apis that expose that the internal 
representation is in 16-bit unicode, at least semantically. length() counts 16-bit 
units from ucs-2/utf-16, not bytes from utf-8 or code points from utf-32. all charAt() 
and substring() etc. behave like that.

markus

Reply via email to