Re: UTF8 locale & shell encoding

Edward H. Trager Fri, 16 Jan 2004 09:31:39 -0800

On Friday 2004.01.16 13:38:01 +0100, Philippe Verdy wrote:
> Instead of relying on the support of UTF-8 locales by your C/C++ platform,
> why don't you create your own function which would wrap the calls to
> mbstowcs() and similar calls on Unix, or to WideCharToMultiByte() on Windows
> (yes this works even on Windows 95 which does not support many charsets
> except conversions between the system default OEMCP and ACP codepages and
> UTF-8) depending on the platform and without requiring you to adjust
> locales?
> 
> If you really want to support only UTF-8, then don't use locale-related
> functions to perform this job. Create your own wrappers to support the
> string functions you need to work with this encoding. And make sure that all
> your interfaces will perform the necessary conversion between the external
> charsets and the internal UTF-8.
> 
> My opinion however would be that it will be more convenient to use UTF-16 as
> the internal encoding of your application, as it really simplifies things.


I just wonder why you say that?  I think it depends on the application.  I have
an application which originally only handled ASCII: to make it Unicode-enabled
UTF-8 is the obvious answer as I only need to add/change things in a very few
places to make it all work.  As (extended) UTF-16 is also a variable-length
encoding format (when going beyond the Basic Multilingual Plane), I don't see
it as being "more convenient" than UTF-8.  In fact, I see UTF-8 as being more
convenient, since it is completely compatible with ASCII and the basic C string
handling functions.

> 
> Each time you identify "standard library" functions that are in fact system
> dependant, it's best to create your own simple wrappers to encapsulate the
> portability logic and remove the system-dependant functions from your main
> applicative code. Using a coherent internal charset will also simplify its
> debugging and enhance the runtime performance. Trying to cope with multiple
> charsets in the middle of your application will always be tricky. So
> consider the standard library as a convenient gateway to create easily your
> own wrappers for external interfaces, not as a general purpose tool used for
> the design of your code.
> 
> In large projects, these string handling functions are almost always wrapped
> (this is true for Java, except that the Java core library is normally
> guaranteed to be natively portable as they are already implementing
> internally the system-specific wrappers, so that you can be confident that
> Java Strings will always be UTF-16 encoded without requiring you to handle
> multiple charsets for the internal string handling methods of your
> application).
> 
> ----- Original Message ----- 
> From: "Deepak Chand Rathore" <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>
> Sent: Friday, January 16, 2004 11:37 AM
> Subject: UTF8 locale & shell encoding
> 
> 
> > i am  dealing with  utf-8 unicode , using functions mbstowcs( ),wcwidth(
> > ),etc defined in wchar.h
> > for converting wide char to utf8 & other things.
> > For these functions to behave correctly , i need to set locale to
> xxx.UTF-8
> > As solaris has en_US.UTF8  (w/o installing any extra support) , there is
> no
> > problem.
> > i don't know about HP, AIX, DEC, other flavours of unix ?? (any good URL
> > where i can get this information ??)
> > in unix i can generate utf8 locales using localedef.
> > But i am having problem especially in windows, as i can't find a locale
> > supporting this.
> > i tried changing windows code page to utf8 using _setmbcp(65001), but it
> > didn't work
> > as the functions i am using is locale dependent.
> > in java, it's really easy, but i am coding in c++
> > What shall i do now????
> >
> > I also want to know the shell encoding in different OS (windows &
> different
> > flavours of unix)
> > Is the shell encoding same as the default locale encoding
> >
> > Thanks
> >
> > DC
> >
> 
>

Re: UTF8 locale & shell encoding

Reply via email to