From: <[EMAIL PROTECTED]> > > > Still, I stand by saying that \n is defined in C++ as LF and \r as CR, > > because > > > that's sitting in front of me in black and white. > > > > Yes, true. But that does *not* mean that (int)'\n' can be counted on to > > be 10 > > Of course, given that any of a variety of character encodings could be in use > any guarantee that (int)'\n' == 10 would violate the definition of \n as LF.
What is important in the standard is that the source author must assume that '\n' will have the desired effect of terminating a line in text files, i.e. the same effect produced by LF in a Unix environment. There's no such requirement for binary files (so this requirement does not apply to files open with the standard C library without the "t" flag), and only text files are required to support the conversions if necessary to keep that effect: - in CP/M, DOS, OS/2, Windows, this is done by the standard library linked with the application, not by the OS. - in MVS, VMS (and in some cases in NT with its optional support for pluggable foreign filesystems), this may be done by the OS itself. - on Mac Classic, this is done by the compiler itself, which binds \n to the LF function (as defined by the language standard), where this LF is mapped to 13 in the Macintosh character set. In any of these cases, the test "if ('\n' == 10)" will not necessarily be true even if the compiler is conforming to the C99 or ISO C++ standard: this is in the gray area where characters are promoted to integers, and where the C/C++ languages are not very clear as they use simple integer promotion rules to represent characters as integers, instead of separating them semantically (this gray area does not exist in Java, where bytes and chars are separate datatypes, and where the implicit numeric promotion is forbidden for chars: typecasting a char to an integer type explicitly is required, even if Java still allows chars to be treated as numeric with a defined but limited arithmetic on them). I just think that it's a shame that the legacy usage of char as meaning a byte in C/C++ was an initial design error, but we have to live with it, due to the huge amount of programs that have been written assuming it. But this is still in conformance with the initial design of C/C++ for performance, where a byte (as an integer type) is not even defined to have a defined bitwidth. This causes problems in systems like 4-bit microcontrolers, where the minimum addressable (and allocatable) memory unit is the nibble: on them, a C program would have to assume that a char takes two nibbles, and thus two memory cells, so that an operation like c++ where c is a char would need to increment the physical memory by 2: this would violate the usage of char as an integer type, so instead, the compiler will handle the conversion between integers and char* using a multiplication factor of 2, and differences of char* will include a division by 2. The problem with this scheme is that it becomes impossible to address a single memory nibble, except through another compiler-specific native datatype, smaller than a char, such as __int4 or __nibble. The same problem occurs on systems where the memory or I/O space is addressable by 1-bit units: to support these systems (most often microcontrolers), the C compiler needs to add support for a __bit datatype, and to handle the conversions between char* and __bit* pointers, notably when computing pointer differences. Whatever you think, all this should have been defined more precisely in C/C++ standards, by designing two separate sets of datatypes, requiring explicit rather than implicit conversions and promotion between them: 1) one set bound for performance or system integration, which maps physically addressable memory units, but without any requirement about the support value range, including the supported native floating point numbers (with their full value range and precision even if it is a superset or subset of the standard IEEE formats); for now C and C++ only define (though not completely) this set of datatypes, with various portability issues (the standard C datatype 'char' is among them, and also shamely the ANSI 'wchar_t' datatype). 2) one set bound for semantics, which maps enough addressable memory units to support the standard ranges, and in which a "character" datatype (as defined by Unicode) could be designed, as well as the standard IEEE floating point numbers, and all their expected values so that it becomes portable across systems; Java only includes this set of datatypes, but most C/C++ compilers come now with a set of include header mapping these standard types in terms of native datatypes.