Yves Arrouye <[EMAIL PROTECTED]> writes on Fri, 6 Apr 2001 15:52:59 -0700:

>> Does anybody know if the C++ standard specified how many hex digits
>> max this escape can have? And doesn't the standard say something
>> like \u is for wchar_t, which may not be Unicode (I hope I'm wrong
>> here)?

Here is what

        INTERNATIONAL STANDARD ISO/IEC 14882
        First edition 1998-09-01
        Programming languages -- C++
        Langages de programmation -- C++

        http://www.iso.ch/cate/d25845.html
        https://webstore.ansi.org/
        http://webstore.ansi.org/ansidocstore/product.asp?sku=ISO%2FIEC+14882%2D1998
        http://webstore.ansi.org/ansidocstore/product.asp?sku=ISO%2FIEC+14882%3A1998

has to say:

>> ...
>>      The universal-character-name construct provides a way to name other
>>      characters.
>>
>>      hex-quad: hexadecimal-digit hexadecimal-digit hexadecimal-digit 
>hexadecimal-digit
>>
>>      universal-character-name: \u hex-quad \U hex-quad hex-quad
>>
>>      The character designated by the universal-character-name \UNNNNNNNN is
>>      that character whose character short name in ISO/IEC 10646 is
>>      NNNNNNNN; the character designated by the universal-character-name
>>      \uNNNN is that character whose character short name in ISO/IEC 10646
>>      is 0000NNNN.  If the hexadecimal value for a universal character name
>>      is less than 0x20 or in the range 0x7F-0x9F (inclusive), or if the
>>      universal character name designates a character in the basic source
>>      character set, then the program is ill-formed.
>> ...

Thus, \u and \U both imply ISO/IEC 10646, not some other character
set.  However, it is not clear to me on a quick skim that wchar_t
necessarily is big enough to hold any character from this set.

The C99 Standard

        INTERNATIONAL STANDARD ISO/IEC 9899
        Second edition 1999-12-01
        Programming languages -- C
        Langages de programmation -- C

        http://www.iso.ch/cate/d29237.html
        http://webstore.ansi.org/ansidocstore/product.asp?sku=ISO%2FIEC+9899%3A1999

has essentially the same text as the C++98 Standard for the meaning of
\u and \U, and it too is vague about what wchar_t represents.

The C99 Standard then goes on to define:

>> ...
>>      __STDC_ISO_10646__
>>              An integer constant of the form yyyymmL (for example,
>>              199712L), intended to indicate that values of type
>>              wchar_t are the coded representations of the
>>              characters defined by ISO/IEC 10646, along with all
>>              amendments and technical corrigenda as of the
>>              specified year and month.
>> ...

This symbol is not defined in C++98, and evidently, was introduced so
that programmers would have a way of finding out whether wchar_t holds
ISO/IEC 10646 values, or not.


-------------------------------------------------------------------------------
- Nelson H. F. Beebe                    Tel: +1 801 581 5254                  -
- Center for Scientific Computing       FAX: +1 801 585 1640, +1 801 581 4148 -
- University of Utah                    Internet e-mail: [EMAIL PROTECTED]  -
- Department of Mathematics, 322 INSCC      [EMAIL PROTECTED]  [EMAIL PROTECTED] -
- 155 S 1400 E RM 233                       [EMAIL PROTECTED]                    -
- Salt Lake City, UT 84112-0090, USA    URL: http://www.math.utah.edu/~beebe  -
-------------------------------------------------------------------------------

Reply via email to