Hi Robert, > I now believe the right thing to do would be to break the submission > into separate parts so that each set of new features are checked in > separately. This will make it easier to trace changes and any > regressions, as well as make it possible for me to merge the less > controversial changes faster. Could you break the submission up into > parts based on functionality for me?
I was afraid you would say that! Well... certainly. I'm not sure *when* but I'll try ASAP. > > Well, about UTF8, I guess those have same encoding as filenames: > > - nodes names (some readers put the full path as node name for the > root) > > - geometries (and RigGeometries and MorphGeometries) > > - animations and channels names > > > > Of course, this is only my feeling, and you may disagree. > > If so, may I suggest to turn the proprocessor test into a standard > "if()" testing a readerwriter option? We could have > daeGeometriesNamesUseCodepage / daeAnimationsNamesUseCodepage, or > such. However, if the same is possible with node names, I suggest to > interpret node names the same way we do for filenames. > > > > Thoughts? > > This isn't quite the specific question about the code that I > asked.... > but asking wider questions... > > Personally I don't have an experience with UTF8. Pushing changes > from > filenames down on to general OSG names is a much wider issue that we > can't deal with prior to 3.0 release. Of course. I personally have to deal with string translations & encoding, but you're absolutely right about not trying anything before 3.0. > As a general note, I really dislike having #ifdef code paths in the > OSG codebase as it'll make the code much less maintainable and more > error prone. If we can do stuff at runtime then this is a better way > of doing it. Ok. As said before, I guess I'll use plugin options, or such. More generally speaking, do you think about removing the #ifdef everywhere and have a runtime way of handling encoding? Even if slightly off-topic, I can bring you a few little hints/info from my experience: - Fixed length encodings: UCS-2 (16 bits), UTF16 (synonym: UCS-4) and 7-8 bits encodings (ASCII, latin-1, etc.) - UTF (8-16-32) cover all possible characters, contrary to all others (noticeably UCS-2). - Thus, you can't have an unique FIXED-LENGTH encoding for multiple languages, except with 32 bits (UTF32 / UCS-4). - UTF16 is often misinterpreted as UCS-2 (UCS-2 is a subset). - An ASCII string is a valid UTF8 string, but not an UTF16 string (of course). - UTF-8 is generally takes 3 bytes for some languages (Chinese, Japanese or Hindi), but "This happens for pure text, but rarely for HTML documents." (Source wikipedia). - UTF-8 can be stored in an standatd "narrow" std::string. So you have the choice between: 1. Having a 32 bits encoding. Drawback: too heavy. 2. Having a fixed-length encoding (8 or 16 bits). Drawback: need to change encoding when switching language. 3. Using UTF16. Drawback: confusion with UCS-2, not compatible with ASCII, and generally used with "wide" std::wstring. 4. Using UTF8. Drawback: a bit heavy for some languages. So my personnal choice *FOR MULTI-LANGUAGES* apps (and the choice of those who wrote "boost.locale" ;) ) is to use UTF8 everywhere for the general case, or use 7/8-bit encoding for situations where memory really-really-really-matters, or for single-language apps. I would then recommend to keep OSG use "narrow" std::string, and have a flag somewhere (Some singleton? Registry? ReaderWriter::Options?) which says if filenames use a (8-bit) code page or UTF8. Having "wide" UTF16 overloads may be possible too (even if conversion functions are enough). Thoughts? See http://cppcms.sourceforge.net/boost_locale/html/tutorial.html , section "Recommendations and Myths" Sukender PVLE - Lightweight cross-platform game engine - http://pvle.sourceforge.net/ _______________________________________________ osg-submissions mailing list [email protected] http://lists.openscenegraph.org/listinfo.cgi/osg-submissions-openscenegraph.org
