On Friday 2004.01.16 13:38:01 +0100, Philippe Verdy wrote: > Instead of relying on the support of UTF-8 locales by your C/C++ platform, > why don't you create your own function which would wrap the calls to > mbstowcs() and similar calls on Unix, or to WideCharToMultiByte() on Windows > (yes this works even on Windows 95 which does not support many charsets > except conversions between the system default OEMCP and ACP codepages and > UTF-8) depending on the platform and without requiring you to adjust > locales? > > If you really want to support only UTF-8, then don't use locale-related > functions to perform this job. Create your own wrappers to support the > string functions you need to work with this encoding. And make sure that all > your interfaces will perform the necessary conversion between the external > charsets and the internal UTF-8. > > My opinion however would be that it will be more convenient to use UTF-16 as > the internal encoding of your application, as it really simplifies things.
I just wonder why you say that? I think it depends on the application. I have an application which originally only handled ASCII: to make it Unicode-enabled UTF-8 is the obvious answer as I only need to add/change things in a very few places to make it all work. As (extended) UTF-16 is also a variable-length encoding format (when going beyond the Basic Multilingual Plane), I don't see it as being "more convenient" than UTF-8. In fact, I see UTF-8 as being more convenient, since it is completely compatible with ASCII and the basic C string handling functions. > > Each time you identify "standard library" functions that are in fact system > dependant, it's best to create your own simple wrappers to encapsulate the > portability logic and remove the system-dependant functions from your main > applicative code. Using a coherent internal charset will also simplify its > debugging and enhance the runtime performance. Trying to cope with multiple > charsets in the middle of your application will always be tricky. So > consider the standard library as a convenient gateway to create easily your > own wrappers for external interfaces, not as a general purpose tool used for > the design of your code. > > In large projects, these string handling functions are almost always wrapped > (this is true for Java, except that the Java core library is normally > guaranteed to be natively portable as they are already implementing > internally the system-specific wrappers, so that you can be confident that > Java Strings will always be UTF-16 encoded without requiring you to handle > multiple charsets for the internal string handling methods of your > application). > > ----- Original Message ----- > From: "Deepak Chand Rathore" <[EMAIL PROTECTED]> > To: <[EMAIL PROTECTED]> > Sent: Friday, January 16, 2004 11:37 AM > Subject: UTF8 locale & shell encoding > > > > i am dealing with utf-8 unicode , using functions mbstowcs( ),wcwidth( > > ),etc defined in wchar.h > > for converting wide char to utf8 & other things. > > For these functions to behave correctly , i need to set locale to > xxx.UTF-8 > > As solaris has en_US.UTF8 (w/o installing any extra support) , there is > no > > problem. > > i don't know about HP, AIX, DEC, other flavours of unix ?? (any good URL > > where i can get this information ??) > > in unix i can generate utf8 locales using localedef. > > But i am having problem especially in windows, as i can't find a locale > > supporting this. > > i tried changing windows code page to utf8 using _setmbcp(65001), but it > > didn't work > > as the functions i am using is locale dependent. > > in java, it's really easy, but i am coding in c++ > > What shall i do now???? > > > > I also want to know the shell encoding in different OS (windows & > different > > flavours of unix) > > Is the shell encoding same as the default locale encoding > > > > Thanks > > > > DC > > > >

