Re: New AWK bug with collating

2002-12-14 Thread Bruce Evans
On Fri, 13 Dec 2002, Ruslan Ermilov wrote:

 On Fri, Dec 13, 2002 at 04:41:06PM +0300, Andrey A. Chernov wrote:
  On Fri, Dec 13, 2002 at 14:32:40 +0200, Ruslan Ermilov wrote:
   Pardon my ignorance here, but the following fragment
   returns -1, doesn't it?
  
   #include stdio.h
   void
   main(void)
   {
   int i;
  
   i = (unsigned char)1 - (unsigned char)2;
   printf(%d\n, i);
   }
 
  It very depends on compiler, i.e. does it implements value preseving or
  unsigned preserving for 'char' type conversions. Or ANSI C vs. common C
  mode. Better be safe for both.
 
  Read 6.10.1.1 section here:
  http://wwwrsphysse.anu.edu.au/doc/DUhelp/AQTLTBTE/DOCU_067.HTM

For ANSI C, the result of the subtraction only depends on the width
of unsigned char.  If unsigned char has the same width as int, then
the result is UINT_MAX; otherwise the result is -1.  This is an example
of the brokenness of value preserving conversions -- the value is
as far as possible from being preserved.

Then assignment to int i may cause overflow.  There is no overflow if
the RHS is -1.  If the RHS is UINT_MAX, then the result of the assignment
is implementation-defined.  The value is is preserved even less than before.
I think it is usually -0 on 1's complement machines.

So ache's changes is basically a fix for 1's complement machines.  I don't
see much point in it, sincw we assume 2's complement in most places in
libc/string (except strcoll() :-).  E.g., memcmp() just subtracts the
unsigned char's and assume that all the conversions turn out like they
do on 2's complement machines.  We actually use an assembler version of
memcmp on most arches but...

 This is handled by the -traditional flag of gcc(1):

 : `-traditional'
 :
 :  Attempt to support some aspects of traditional C compilers.
 :  Specifically:
 :
 [...]
 :
 : * Integer types `unsigned short' and `unsigned char' promote to
 :   `unsigned int'.

 With -traditional, the code I quoted still produces -1.

It produces overflow which normally gives -1 on 2's complement machines.

 In any case, this section doesn't apply to this case because
 no conversion described in section 6.10 is ever done here,
 since both operands are of the same type, unsigned char.

Yes it does.  The common type (for arithmetic operators like subtraction)
is never smaller than int.  Both of the unsigned char operands get
converted to int in the simplest case where unsigned char is smaller
than int.  See 6.10.1 (5) and 6.10.1.1 about integral promotions.

Bruce


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: New AWK bug with collating

2002-12-14 Thread Ruslan Ermilov
On Sat, Dec 14, 2002 at 09:02:40PM +1100, Bruce Evans wrote:
 On Fri, 13 Dec 2002, Ruslan Ermilov wrote:
 
  On Fri, Dec 13, 2002 at 04:41:06PM +0300, Andrey A. Chernov wrote:
   On Fri, Dec 13, 2002 at 14:32:40 +0200, Ruslan Ermilov wrote:
Pardon my ignorance here, but the following fragment
returns -1, doesn't it?
   
#include stdio.h
void
main(void)
{
int i;
   
i = (unsigned char)1 - (unsigned char)2;
printf(%d\n, i);
}
  
   It very depends on compiler, i.e. does it implements value preseving or
   unsigned preserving for 'char' type conversions. Or ANSI C vs. common C
   mode. Better be safe for both.
  
   Read 6.10.1.1 section here:
   http://wwwrsphysse.anu.edu.au/doc/DUhelp/AQTLTBTE/DOCU_067.HTM
 
 For ANSI C, the result of the subtraction only depends on the width
 of unsigned char.  If unsigned char has the same width as int, then
 the result is UINT_MAX; otherwise the result is -1.  This is an example
 of the brokenness of value preserving conversions -- the value is
 as far as possible from being preserved.
 
 Then assignment to int i may cause overflow.  There is no overflow if
 the RHS is -1.  If the RHS is UINT_MAX, then the result of the assignment
 is implementation-defined.  The value is is preserved even less than before.
 I think it is usually -0 on 1's complement machines.
 
 So ache's changes is basically a fix for 1's complement machines.  I don't
 see much point in it, sincw we assume 2's complement in most places in
 libc/string (except strcoll() :-).  E.g., memcmp() just subtracts the
 unsigned char's and assume that all the conversions turn out like they
 do on 2's complement machines.  We actually use an assembler version of
 memcmp on most arches but...
 
Hmm, then how you could explain the difference between -traditional
and -ansi outputs for the following fragment on i386:

int printf(char *, ...);

int
main(void)
{
long long l;
unsigned char c1 = 1;
unsigned char c2 = 2;

l = c1 - c2;
printf(%lld\n, l);
l = -1;
printf(%lld\n, l);
}

Or the same code but with `long' on sparc64.

  This is handled by the -traditional flag of gcc(1):
 
  : `-traditional'
  :
  :  Attempt to support some aspects of traditional C compilers.
  :  Specifically:
  :
  [...]
  :
  : * Integer types `unsigned short' and `unsigned char' promote to
  :   `unsigned int'.
 
  With -traditional, the code I quoted still produces -1.
 
 It produces overflow which normally gives -1 on 2's complement machines.
 
  In any case, this section doesn't apply to this case because
  no conversion described in section 6.10 is ever done here,
  since both operands are of the same type, unsigned char.
 
 Yes it does.  The common type (for arithmetic operators like subtraction)
 is never smaller than int.  Both of the unsigned char operands get
 converted to int in the simplest case where unsigned char is smaller
 than int.  See 6.10.1 (5) and 6.10.1.1 about integral promotions.
 
I stand corrected, thanks for explanations, now I see they do.


Cheers,
-- 
Ruslan Ermilov  Sysadmin and DBA,
[EMAIL PROTECTED]   Sunbay Software AG,
[EMAIL PROTECTED]  FreeBSD committer,
+380.652.512.251Simferopol, Ukraine

http://www.FreeBSD.org  The Power To Serve
http://www.oracle.com   Enabling The Information Age



msg48703/pgp0.pgp
Description: PGP signature


Re: New AWK bug with collating

2002-12-14 Thread Bruce Evans
On Sat, 14 Dec 2002, Ruslan Ermilov wrote:

 On Sat, Dec 14, 2002 at 09:02:40PM +1100, Bruce Evans wrote:
  For ANSI C, the result of the subtraction only depends on the width
  of unsigned char.  If unsigned char has the same width as int, then
  the result is UINT_MAX; otherwise the result is -1.  This is an example
  of the brokenness of value preserving conversions -- the value is
  as far as possible from being preserved.

 Hmm, then how you could explain the difference between -traditional
 and -ansi outputs for the following fragment on i386:

 int printf(char *, ...);

 int
 main(void)
 {
 long long l;
 unsigned char c1 = 1;
 unsigned char c2 = 2;

 l = c1 - c2;
 printf(%lld\n, l);
 l = -1;
 printf(%lld\n, l);
 }

 Or the same code but with `long' on sparc64.

The first paragraph above is all about the ANSI C case.  -traditional
gives signedness-preserving conversions, so c1 is prompted to 1U and
c2 is promoted to 2U.  1U - 2U is UINT_MAX on all machines.  The
difference between UINT_MAX and -1 can be seen by converting these
values to a common wider type as in your example.  UINT_MAX  LLONG_MAX
on all machines supported by FreeBSD although not in general, so
assigning it to `l' doesn't change its value.

Bruce


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: New AWK bug with collating

2002-12-13 Thread Ruslan Ermilov
On Fri, Dec 13, 2002 at 03:26:54PM +0300, Andrey A. Chernov wrote:
 Since both operands are unsigned, result can't be negative, but supposed
 to be. Here is the fix:
 
 --- b.c.bak   Fri Dec 13 14:54:12 2002
 +++ b.c   Fri Dec 13 15:20:15 2002
 @@ -292,7 +292,7 @@
   s[0][0] = a;
   s[1][0] = b;
   if ((r = strcoll(s[0], s[1])) == 0)
 - r = (uschar)a - (uschar)b;
 + r = (int)((uschar)a) - (int)((uschar)b);
   return r;
  }
  
Pardon my ignorance here, but the following fragment
returns -1, doesn't it?

#include stdio.h
void
main(void)
{
int i;

i = (unsigned char)1 - (unsigned char)2;
printf(%d\n, i);
}


Cheers,
-- 
Ruslan Ermilov  Sysadmin and DBA,
[EMAIL PROTECTED]   Sunbay Software AG,
[EMAIL PROTECTED]  FreeBSD committer,
+380.652.512.251Simferopol, Ukraine

http://www.FreeBSD.org  The Power To Serve
http://www.oracle.com   Enabling The Information Age



msg48644/pgp0.pgp
Description: PGP signature


Re: New AWK bug with collating

2002-12-13 Thread Andrey A. Chernov
On Fri, Dec 13, 2002 at 14:32:40 +0200, Ruslan Ermilov wrote:
 Pardon my ignorance here, but the following fragment
 returns -1, doesn't it?
 
 #include stdio.h
 void
 main(void)
 {
 int i;
 
 i = (unsigned char)1 - (unsigned char)2;
 printf(%d\n, i);
 }

It very depends on compiler, i.e. does it implements value preseving or 
unsigned preserving for 'char' type conversions. Or ANSI C vs. common C 
mode. Better be safe for both.

Read 6.10.1.1 section here:
http://wwwrsphysse.anu.edu.au/doc/DUhelp/AQTLTBTE/DOCU_067.HTM


-- 
Andrey A. Chernov
http://ache.pp.ru/



msg48645/pgp0.pgp
Description: PGP signature


Re: New AWK bug with collating

2002-12-13 Thread Ruslan Ermilov
On Fri, Dec 13, 2002 at 04:41:06PM +0300, Andrey A. Chernov wrote:
 On Fri, Dec 13, 2002 at 14:32:40 +0200, Ruslan Ermilov wrote:
  Pardon my ignorance here, but the following fragment
  returns -1, doesn't it?
  
  #include stdio.h
  void
  main(void)
  {
  int i;
  
  i = (unsigned char)1 - (unsigned char)2;
  printf(%d\n, i);
  }
 
 It very depends on compiler, i.e. does it implements value preseving or 
 unsigned preserving for 'char' type conversions. Or ANSI C vs. common C 
 mode. Better be safe for both.
 
 Read 6.10.1.1 section here:
 http://wwwrsphysse.anu.edu.au/doc/DUhelp/AQTLTBTE/DOCU_067.HTM
 
This is handled by the -traditional flag of gcc(1):

: `-traditional'
: 
:  Attempt to support some aspects of traditional C compilers.
:  Specifically:
: 
[...]
: 
: * Integer types `unsigned short' and `unsigned char' promote to
:   `unsigned int'.

With -traditional, the code I quoted still produces -1.

In any case, this section doesn't apply to this case because
no conversion described in section 6.10 is ever done here,
since both operands are of the same type, unsigned char.


Cheers,
-- 
Ruslan Ermilov  Sysadmin and DBA,
[EMAIL PROTECTED]   Sunbay Software AG,
[EMAIL PROTECTED]  FreeBSD committer,
+380.652.512.251Simferopol, Ukraine

http://www.FreeBSD.org  The Power To Serve
http://www.oracle.com   Enabling The Information Age



msg48646/pgp0.pgp
Description: PGP signature


Re: New AWK bug with collating

2002-12-13 Thread Andrey A. Chernov
On Fri, Dec 13, 2002 at 17:09:42 +0200, Ruslan Ermilov wrote:
 : 
 : * Integer types `unsigned short' and `unsigned char' promote to
 :   `unsigned int'.
 
 With -traditional, the code I quoted still produces -1.

Probably because of machine-specific overflow handling or printf overflow.
Use this safe example instead (with -traditional):

main()
{
long long l;
unsigned char a = 1;
unsigned char b = 2;

l = a - b;
printf(%04x %04x\n, (int)((l  32)  0x), (int)(l  0x));
l = -1;
printf(%04x %04x\n, (int)((l  32)  0x), (int)(l  0x));
}


 In any case, this section doesn't apply to this case because
 no conversion described in section 6.10 is ever done here,
 since both operands are of the same type, unsigned char.

No, any char type converted to int (or to unsigned int with -traditional) 
prior to any operation with it.

-- 
Andrey A. Chernov
http://ache.pp.ru/



msg48647/pgp0.pgp
Description: PGP signature