Karl Williamson <[email protected]> wrote:
 |On 11/06/2013 03:43 AM, Steffen Daode Nurpmeso wrote:
 |> Philippe Verdy <[email protected]> wrote:
 |>|2013/11/5 Steffen Daode <[email protected]>
 |>|> (The problem i'm facing is that _PRINT and _GRAPH cannot be set
 |>|> for some properties from PropList.txt, say, _PRINT can't be set
 |>|> for U+0009, CHARACTER TABULATION (ht), since it's a Cc, but in
 |>|
 |>|TAB is "printable" (for the isprint() macro in standard \
 |>|C librries) because
 |>
 |> Nope according to POSIX, Vol. 1: Base Definitions, 7.3.1. LC_CTYPE ([1]):
 |
 |The only vendor I'm aware of that makes TAB a printable is Microsoft. 
 |Thus Philippe is wrong about this except for MS products.

That made me curious, and it doesn't seem to be right [1].

  isprint returns a nonzero value if c is a printable character—this
  includes the space character (0x20 – 0x7E).

  The behavior of isprint and _isprint_l is undefined if c is not
  EOF or in the range 0 through 0xFF, inclusive. When a debug CRT
  library is used and c is not one of these values, the functions
  raise an assertion.

  [1] <http://msdn.microsoft.com/en-us/library/ewx8s4kw(v=vs.110).aspx>

Well, i hope this is not a crashing assertion but only a loud log
entry...  (Having no idea of M$, but completely separating debug
and shipout code i ever used, too.)

 |under MS except the C locale.  (MS also has other Posix violations, such 
 |as having isdigit() match superscript numbers.)

isdigit() is silent ([2]),

  isdigit returns a nonzero value if c is a decimal digit (0 – 9).
  iswdigit returns a nonzero value if c is a wide character that
  corresponds to a decimal-digit character.

  [2] <http://msdn.microsoft.com/en-us/library/fcc4ksh8(v=vs.110).aspx>

but going to the equivalent etc. leads to "Character
Classification" [3]

  [3] <http://msdn.microsoft.com/en-us/library/t9zea13t(v=vs.110).aspx>

and finally "Char.IsDigit Method (Char)" [4], where i've found:

  This method determines whether a Char is a radix-10 digit. This
  contrasts with IsNumber, which determines whether a Char is of any
  numeric Unicode category. Numbers include characters such as
  fractions, subscripts, superscripts, Roman numerals, currency
  numerators, encircled numbers, and script-specific digits.

  Valid digits are members of the UnicodeCategory.DecimalDigitNumber
  category.

  [4] <http://msdn.microsoft.com/en-us/library/7f0ddtxh(v=vs.110).aspx>

So, whew!, Microsoft seems to get the carefully designed isXY(3)
series right.  But i also came across _pipe(), and there you go.

--steffen
--- Begin Message ---
On 11/06/2013 03:43 AM, Steffen Daode Nurpmeso wrote:
Philippe Verdy <[email protected]> wrote:
  |2013/11/5 Steffen Daode <[email protected]>
  |> (The problem i'm facing is that _PRINT and _GRAPH cannot be set
  |> for some properties from PropList.txt, say, _PRINT can't be set
  |> for U+0009, CHARACTER TABULATION (ht), since it's a Cc, but in
  |
  |TAB is "printable" (for the isprint() macro in standard C librries) because
  |it has a whitespace property, even if its general category is very weakly

Nope according to POSIX, Vol. 1: Base Definitions, 7.3.1. LC_CTYPE ([1]):

   print
   Define characters to be classified as printable characters,
   including the <space>.

   In the POSIX locale, all characters in class graph shall be
   included; no characters in class cntrl shall be included.

   In a locale definition file, characters specified for the
   keywords upper, lower, alpha, digit, xdigit, punct, graph, and
   the <space> are automatically included in this class. No
   character specified for the keyword cntrl shall be specified.

   [1] 
<http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html#tag_07_03_01>

Verifieable under LC_ALL=en_GB.UTF-8 in Mac OS X Snow Leopard
(which admittedly uses very old Citrus data, i always wonder why all
those Gigabytes of «Software Update»s don't tweak that, not to
talk about GNU make 3.81 and all the other buggy or non-compliant
stuff, but that is a different story):

   #include <stdio.h>
   #include <ctype.h>
   #include <wctype.h>
   int main(void) {
     printf("%d %d\n",isprint('\t'), wcwidth(L'\t'));
     return 0;
   }

   ?0[steffen@sherwood tmp]$ cc -o zt t.c && ./zt
   0 -1

  |The character mapping for the isprint() macro is defined by an expression
  |based on existing Unicode properties. Most C libraries optimize this

But i agree that POSIX has to move towards Unicode definitions,
and more byte- than bitwise.

--steffen


The only vendor I'm aware of that makes TAB a printable is Microsoft. Thus Philippe is wrong about this except for MS products.

MS makes TAB also a control, violating the Posix standard by having it be both printable and a control. This is true in all locales I've seen under MS except the C locale. (MS also has other Posix violations, such as having isdigit() match superscript numbers.)


--- End Message ---

Reply via email to