Ienup Sung wrote: > > Yes, we have numerous locales with different codesets. Solaris 10, > as an example, we have 165 locales with 23 different codesets. > In many cases, codesets use quite similar representation forms and yet > the mappings between the code point values and actual characters/glyphs > are quite different. > > Underlying file systems also have various ways of depositing characters > althought many new file systems are converging to Unicode. (Even then, among > those rather new file systems that use Unicode, they use sometimes > different Unicode encodings not entirely compatible with others byte by > byte.) > > To solve the problem of not correctly showing non-ASCII characters and yet > keeping the maximum compatibility with existing applications and also > numerous locales and codesets it appears that either we tag codeset for > each file or adopt Unicode, in particular, UTF-8, as the file system codeset > as the one thing and then add/doing transparent codeset conversion as > the other. These two could go together or separately supported too.
... The various LC_* variables can point to different locales+encodings (for example ja_JP.UTF-8 vs. ja_JP.PCK) ... isn't there some risk that transparent translations somehow cause havoc ? I assume it's not the case (assuming that only filenames are converted transparently) but did anyone thought this detail (different LC_* variables pointing to different locales/encodings) to the end (I can't think anymore in a straight line after ~~48h brain_uptime, please excuse me if I start asking silly things...) ? Another (likely more real-world) problem is: How would such a "transparent conversion" handle characters which cannot be represented in the current locale - for example how should the "C"/"POSIX" locale handle german umlauts (e.g. "öäü") ? Just replace them via '?', use transliteration (e.g. 'ü' = "ue", 'ö' == "oe" etc.), encode them URLencoding-like (another, (more portable ?) style of transliteration) or invent some all-new solution for the problem ? Final thought: I guess if Solaris wants to use Unicode (locales) more widespread some of tools in /usr/bin/ need to be replaced with the versions from /usr/xpg4/bin/ since the /usr/bin/ tools suffer from the widespread "I don't care about multibyte locales"-disease (question is whether this is considered a "bug" or "feature"... ;-/ ) ... how should this be handled if the "bugfix" (e.g. handle multibyte characters correctly) collides with something like "backwards compatibility" ? ---- Bye, Roland -- __ . . __ (o.\ \/ /.o) [EMAIL PROTECTED] \__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer /O /==\ O\ TEL +49 641 7950090 (;O/ \/ \O;) _______________________________________________ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org