On 11/06/2013 03:43 AM, Steffen Daode Nurpmeso wrote:
Philippe Verdy <[email protected]> wrote:
|2013/11/5 Steffen Daode <[email protected]>
|> (The problem i'm facing is that _PRINT and _GRAPH cannot be set
|> for some properties from PropList.txt, say, _PRINT can't be set
|> for U+0009, CHARACTER TABULATION (ht), since it's a Cc, but in
|
|TAB is "printable" (for the isprint() macro in standard C librries) because
|it has a whitespace property, even if its general category is very weakly
Nope according to POSIX, Vol. 1: Base Definitions, 7.3.1. LC_CTYPE ([1]):
print
Define characters to be classified as printable characters,
including the <space>.
In the POSIX locale, all characters in class graph shall be
included; no characters in class cntrl shall be included.
In a locale definition file, characters specified for the
keywords upper, lower, alpha, digit, xdigit, punct, graph, and
the <space> are automatically included in this class. No
character specified for the keyword cntrl shall be specified.
[1]
<http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html#tag_07_03_01>
Verifieable under LC_ALL=en_GB.UTF-8 in Mac OS X Snow Leopard
(which admittedly uses very old Citrus data, i always wonder why all
those Gigabytes of «Software Update»s don't tweak that, not to
talk about GNU make 3.81 and all the other buggy or non-compliant
stuff, but that is a different story):
#include <stdio.h>
#include <ctype.h>
#include <wctype.h>
int main(void) {
printf("%d %d\n",isprint('\t'), wcwidth(L'\t'));
return 0;
}
?0[steffen@sherwood tmp]$ cc -o zt t.c && ./zt
0 -1
|The character mapping for the isprint() macro is defined by an expression
|based on existing Unicode properties. Most C libraries optimize this
But i agree that POSIX has to move towards Unicode definitions,
and more byte- than bitwise.
--steffen
The only vendor I'm aware of that makes TAB a printable is Microsoft.
Thus Philippe is wrong about this except for MS products.
MS makes TAB also a control, violating the Posix standard by having it
be both printable and a control. This is true in all locales I've seen
under MS except the C locale. (MS also has other Posix violations, such
as having isdigit() match superscript numbers.)