Re: [osg-submissions] Big changes in osgDAE: double precision, UTF8, fixes...

Sukender Mon, 17 Jan 2011 00:33:15 -0800

Hi Robert,

> I now believe the right thing to do would be to break the submission
> into separate parts so that each set of new features are checked in
> separately.  This will make it easier to trace changes and any
> regressions, as well as make it possible for me to merge the less
> controversial changes faster.  Could you break the submission up into
> parts based on functionality for me?


I was afraid you would say that! Well... certainly. I'm not sure *when* but 
I'll try ASAP.


> > Well, about UTF8, I guess those have same encoding as filenames:
> > - nodes names (some readers put the full path as node name for the
> root)
> > - geometries (and RigGeometries and MorphGeometries)
> > - animations and channels names
> >
> > Of course, this is only my feeling, and you may disagree.
> > If so, may I suggest to turn the proprocessor test into a standard
> "if()" testing a readerwriter option? We could have
> daeGeometriesNamesUseCodepage / daeAnimationsNamesUseCodepage, or
> such. However, if the same is possible with node names, I suggest to
> interpret node names the same way we do for filenames.
> >
> > Thoughts?
> 
> This isn't quite the specific question about the code that I
> asked....
> but asking wider questions...
> 
> Personally I don't have an experience with UTF8.  Pushing changes
> from
> filenames down on to general OSG names is a much wider issue that we
> can't deal with prior to 3.0 release.

Of course. I personally have to deal with string translations & encoding, but 
you're absolutely right about not trying anything before 3.0.


> As a general note, I really dislike having #ifdef code paths in the
> OSG codebase as it'll make the code much less maintainable and more
> error prone.  If we can do stuff at runtime then this is a better way
> of doing it.

Ok. As said before, I guess I'll use plugin options, or such.

More generally speaking, do you think about removing the #ifdef everywhere and 
have a runtime way of handling encoding?
Even if slightly off-topic, I can bring you a few little hints/info from my 
experience:
- Fixed length encodings: UCS-2 (16 bits), UTF16 (synonym: UCS-4) and 7-8 bits 
encodings (ASCII, latin-1, etc.)
- UTF (8-16-32) cover all possible characters, contrary to all others 
(noticeably UCS-2).
- Thus, you can't have an unique FIXED-LENGTH encoding for multiple languages, 
except with 32 bits (UTF32 / UCS-4).
- UTF16 is often misinterpreted as UCS-2 (UCS-2 is a subset).
- An ASCII string is a valid UTF8 string, but not an UTF16 string (of course).
- UTF-8 is generally takes 3 bytes for some languages (Chinese, Japanese or 
Hindi), but "This happens for pure text, but rarely for HTML documents." 
(Source wikipedia).
- UTF-8 can be stored in an standatd "narrow" std::string.

So you have the choice between:
1. Having a 32 bits encoding. Drawback: too heavy.
2. Having a fixed-length encoding (8 or 16 bits). Drawback: need to change 
encoding when switching language.
3. Using UTF16. Drawback: confusion with UCS-2, not compatible with ASCII, and 
generally used with "wide" std::wstring.
4. Using UTF8. Drawback: a bit heavy for some languages.

So my personnal choice *FOR MULTI-LANGUAGES* apps (and the choice of those who 
wrote "boost.locale" ;) ) is to use UTF8 everywhere for the general case, or 
use 7/8-bit encoding for situations where memory really-really-really-matters, 
or for single-language apps.

I would then recommend to keep OSG use "narrow" std::string, and have a flag 
somewhere (Some singleton? Registry? ReaderWriter::Options?) which says if 
filenames use a (8-bit) code page or UTF8. Having "wide" UTF16 overloads may be 
possible too (even if conversion functions are enough). Thoughts?

See http://cppcms.sourceforge.net/boost_locale/html/tutorial.html , section 
"Recommendations and Myths"

Sukender
PVLE - Lightweight cross-platform game engine - http://pvle.sourceforge.net/
_______________________________________________
osg-submissions mailing list
[email protected]
http://lists.openscenegraph.org/listinfo.cgi/osg-submissions-openscenegraph.org

Re: [osg-submissions] Big changes in osgDAE: double precision, UTF8, fixes...

Reply via email to