Re: [vos-d] Re: Terangreal and Unicode?

Peter Amstutz Wed, 07 Sep 2005 21:35:10 -0700

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, 6 Sep 2005, Reed Hedges wrote:

Why use wxConvCurrent instead of wxConvUTF8?  I got the impression from
the wx docs that wxConvCurrent depends on the (GUI) platform, so on a
unicode platform you'd be telling it your strings from VOS were unicode,
when they aren't.  I only read the wx docs with a normal depth, no
backwards bending, maybe I am wrong :)

Hmm, you may well be right. I think I mistakenly though thatwxConvCurrent converted *to* the "current" native encoding for the windowsystem -- but that doesn't really make sense, because the purpose of thewxString class is precisely to gloss over those details.

So actually it's the source encoding. If we decide to go with UTF-8, thenit will need to be changed everywhere.


UTF-8 pros:
 backwards-compatible with ASCII
 most efficient way to encode western scripts
 most common unicode encoding on Unix (?)

UTF-8 cons:
 variable-length characters

least-efficient way to encode eastern scripts (chinese/japanese) due toextra control characters required

 native encoding in Windows is UTF-16

Variable-length characters is what really burns people. Unless one has astring class that specifically knows about unicode varable-lengthencodings, the usual solution is to store the string in more-or-lessuncompressed UTF-32. So for example, std::string is really a typedef forstd::basic_string<char>, so unicode would be std::basic_string<int32_t>.

Making Ter'angreal Unicode friendly (by using the wx unicode classes_correctly_ :-) shouldn't be too hard. I'm going to convert my install touse unicode and start primarily developing in that environment.


Making VOS unicode friendly is a much bigger issue.

- Properties store arbitrary binary data. We probably want to have theproperty datatype include the encoding, with it defaulting to UTF-8.Reading/writing text with multibyte characters to properties will requirean encoding step.

- As noted above, to store unicode strings so that methods that operateon the string work correctly, it would be necessary to convert everythingto use std::basic_string<int32_t> or something similar.

- Various other things such as mesh's command line parser, the vosappframework may need to be made multibyte-aware as well.

- It's not clear what immediate short-term benefit there would be toallow chinese characters in the child contextual names.

My feelings are that there are two main places we would want to supportunicode: properties with text would be stored in UTF-8 (?) and chatmessages would also be sent in UTF-8. This will require a library toprovide the conversion support. Does anyone know a C++ library for thisoffhand?

Overall these two specific changes wouldn't be such a huge amount of workand are contained enough that they could probably be phased in, in such away that they didn't break the entire VOS API all at once.


[   Peter Amstutz   ][ [EMAIL PROTECTED] ][ [EMAIL PROTECTED]  ]
[Lead Programmer][Interreality Project][Virtual Reality for the Internet]
[ VOS: Next Generation Internet Communication][ http://interreality.org ]
[ http://interreality.org/~tetron ][ pgpkey:  pgpkeys.mit.edu  18C21DF7 ]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFDH8RaaeHUyhjCHfcRAt8gAJ9IMq+XmDUaG50/a3LOJ8+j35pWrACgsNqt
FWQ+yzu4LbtNrzUoicaC+vY=
=9Yrj
-----END PGP SIGNATURE-----


_______________________________________________
vos-d mailing list
vos-d@interreality.org
http://www.interreality.org/cgi-bin/mailman/listinfo/vos-d

Re: [vos-d] Re: Terangreal and Unicode?

Reply via email to