-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, 6 Sep 2005, Reed Hedges wrote:

Why use wxConvCurrent instead of wxConvUTF8?  I got the impression from
the wx docs that wxConvCurrent depends on the (GUI) platform, so on a
unicode platform you'd be telling it your strings from VOS were unicode,
when they aren't.  I only read the wx docs with a normal depth, no
backwards bending, maybe I am wrong :)

Hmm, you may well be right. I think I mistakenly though that wxConvCurrent converted *to* the "current" native encoding for the window system -- but that doesn't really make sense, because the purpose of the wxString class is precisely to gloss over those details.

So actually it's the source encoding. If we decide to go with UTF-8, then it will need to be changed everywhere.

UTF-8 pros:
 backwards-compatible with ASCII
 most efficient way to encode western scripts
 most common unicode encoding on Unix (?)

UTF-8 cons:
 variable-length characters
least-efficient way to encode eastern scripts (chinese/japanese) due to extra control characters required
 native encoding in Windows is UTF-16

Variable-length characters is what really burns people. Unless one has a string class that specifically knows about unicode varable-length encodings, the usual solution is to store the string in more-or-less uncompressed UTF-32. So for example, std::string is really a typedef for std::basic_string<char>, so unicode would be std::basic_string<int32_t>.

Making Ter'angreal Unicode friendly (by using the wx unicode classes _correctly_ :-) shouldn't be too hard. I'm going to convert my install to use unicode and start primarily developing in that environment.

Making VOS unicode friendly is a much bigger issue.

- Properties store arbitrary binary data. We probably want to have the property datatype include the encoding, with it defaulting to UTF-8. Reading/writing text with multibyte characters to properties will require an encoding step.

- As noted above, to store unicode strings so that methods that operate on the string work correctly, it would be necessary to convert everything to use std::basic_string<int32_t> or something similar.

- Various other things such as mesh's command line parser, the vosapp framework may need to be made multibyte-aware as well.

- It's not clear what immediate short-term benefit there would be to allow chinese characters in the child contextual names.


My feelings are that there are two main places we would want to support unicode: properties with text would be stored in UTF-8 (?) and chat messages would also be sent in UTF-8. This will require a library to provide the conversion support. Does anyone know a C++ library for this offhand?

Overall these two specific changes wouldn't be such a huge amount of work and are contained enough that they could probably be phased in, in such a way that they didn't break the entire VOS API all at once.

[   Peter Amstutz   ][ [EMAIL PROTECTED] ][ [EMAIL PROTECTED]  ]
[Lead Programmer][Interreality Project][Virtual Reality for the Internet]
[ VOS: Next Generation Internet Communication][ http://interreality.org ]
[ http://interreality.org/~tetron ][ pgpkey:  pgpkeys.mit.edu  18C21DF7 ]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFDH8RaaeHUyhjCHfcRAt8gAJ9IMq+XmDUaG50/a3LOJ8+j35pWrACgsNqt
FWQ+yzu4LbtNrzUoicaC+vY=
=9Yrj
-----END PGP SIGNATURE-----


_______________________________________________
vos-d mailing list
vos-d@interreality.org
http://www.interreality.org/cgi-bin/mailman/listinfo/vos-d

Reply via email to