Hi all.
I've read the discussion of internal string representations with some
interest. I have a suggestion that might strike a comprimise between
some of the positions.
Why not represent strings internally with UTF8? As far as I understand,
it encodes the same character set as UTF16.
This would simplify the A->W->U cases that some people hate (since
you you just use A->U, or W->U or whatever you want).
You can also trivially create an ASCII only version by NOOP-ing the
A->U conversion routine (one measly ifdef).
The only real drawback I can see is that the variable width character
routines are probably slower than wide character routines. However,
we can optimize the routines for ASCII which is currently the most
common character set.
Mike