Re: New AWK bug with collating
On Fri, 13 Dec 2002, Ruslan Ermilov wrote: On Fri, Dec 13, 2002 at 04:41:06PM +0300, Andrey A. Chernov wrote: On Fri, Dec 13, 2002 at 14:32:40 +0200, Ruslan Ermilov wrote: Pardon my ignorance here, but the following fragment returns -1, doesn't it? #include stdio.h void main(void) { int i; i = (unsigned char)1 - (unsigned char)2; printf(%d\n, i); } It very depends on compiler, i.e. does it implements value preseving or unsigned preserving for 'char' type conversions. Or ANSI C vs. common C mode. Better be safe for both. Read 6.10.1.1 section here: http://wwwrsphysse.anu.edu.au/doc/DUhelp/AQTLTBTE/DOCU_067.HTM For ANSI C, the result of the subtraction only depends on the width of unsigned char. If unsigned char has the same width as int, then the result is UINT_MAX; otherwise the result is -1. This is an example of the brokenness of value preserving conversions -- the value is as far as possible from being preserved. Then assignment to int i may cause overflow. There is no overflow if the RHS is -1. If the RHS is UINT_MAX, then the result of the assignment is implementation-defined. The value is is preserved even less than before. I think it is usually -0 on 1's complement machines. So ache's changes is basically a fix for 1's complement machines. I don't see much point in it, sincw we assume 2's complement in most places in libc/string (except strcoll() :-). E.g., memcmp() just subtracts the unsigned char's and assume that all the conversions turn out like they do on 2's complement machines. We actually use an assembler version of memcmp on most arches but... This is handled by the -traditional flag of gcc(1): : `-traditional' : : Attempt to support some aspects of traditional C compilers. : Specifically: : [...] : : * Integer types `unsigned short' and `unsigned char' promote to : `unsigned int'. With -traditional, the code I quoted still produces -1. It produces overflow which normally gives -1 on 2's complement machines. In any case, this section doesn't apply to this case because no conversion described in section 6.10 is ever done here, since both operands are of the same type, unsigned char. Yes it does. The common type (for arithmetic operators like subtraction) is never smaller than int. Both of the unsigned char operands get converted to int in the simplest case where unsigned char is smaller than int. See 6.10.1 (5) and 6.10.1.1 about integral promotions. Bruce To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: New AWK bug with collating
On Sat, Dec 14, 2002 at 09:02:40PM +1100, Bruce Evans wrote: On Fri, 13 Dec 2002, Ruslan Ermilov wrote: On Fri, Dec 13, 2002 at 04:41:06PM +0300, Andrey A. Chernov wrote: On Fri, Dec 13, 2002 at 14:32:40 +0200, Ruslan Ermilov wrote: Pardon my ignorance here, but the following fragment returns -1, doesn't it? #include stdio.h void main(void) { int i; i = (unsigned char)1 - (unsigned char)2; printf(%d\n, i); } It very depends on compiler, i.e. does it implements value preseving or unsigned preserving for 'char' type conversions. Or ANSI C vs. common C mode. Better be safe for both. Read 6.10.1.1 section here: http://wwwrsphysse.anu.edu.au/doc/DUhelp/AQTLTBTE/DOCU_067.HTM For ANSI C, the result of the subtraction only depends on the width of unsigned char. If unsigned char has the same width as int, then the result is UINT_MAX; otherwise the result is -1. This is an example of the brokenness of value preserving conversions -- the value is as far as possible from being preserved. Then assignment to int i may cause overflow. There is no overflow if the RHS is -1. If the RHS is UINT_MAX, then the result of the assignment is implementation-defined. The value is is preserved even less than before. I think it is usually -0 on 1's complement machines. So ache's changes is basically a fix for 1's complement machines. I don't see much point in it, sincw we assume 2's complement in most places in libc/string (except strcoll() :-). E.g., memcmp() just subtracts the unsigned char's and assume that all the conversions turn out like they do on 2's complement machines. We actually use an assembler version of memcmp on most arches but... Hmm, then how you could explain the difference between -traditional and -ansi outputs for the following fragment on i386: int printf(char *, ...); int main(void) { long long l; unsigned char c1 = 1; unsigned char c2 = 2; l = c1 - c2; printf(%lld\n, l); l = -1; printf(%lld\n, l); } Or the same code but with `long' on sparc64. This is handled by the -traditional flag of gcc(1): : `-traditional' : : Attempt to support some aspects of traditional C compilers. : Specifically: : [...] : : * Integer types `unsigned short' and `unsigned char' promote to : `unsigned int'. With -traditional, the code I quoted still produces -1. It produces overflow which normally gives -1 on 2's complement machines. In any case, this section doesn't apply to this case because no conversion described in section 6.10 is ever done here, since both operands are of the same type, unsigned char. Yes it does. The common type (for arithmetic operators like subtraction) is never smaller than int. Both of the unsigned char operands get converted to int in the simplest case where unsigned char is smaller than int. See 6.10.1 (5) and 6.10.1.1 about integral promotions. I stand corrected, thanks for explanations, now I see they do. Cheers, -- Ruslan Ermilov Sysadmin and DBA, [EMAIL PROTECTED] Sunbay Software AG, [EMAIL PROTECTED] FreeBSD committer, +380.652.512.251Simferopol, Ukraine http://www.FreeBSD.org The Power To Serve http://www.oracle.com Enabling The Information Age msg48703/pgp0.pgp Description: PGP signature
Re: New AWK bug with collating
On Sat, 14 Dec 2002, Ruslan Ermilov wrote: On Sat, Dec 14, 2002 at 09:02:40PM +1100, Bruce Evans wrote: For ANSI C, the result of the subtraction only depends on the width of unsigned char. If unsigned char has the same width as int, then the result is UINT_MAX; otherwise the result is -1. This is an example of the brokenness of value preserving conversions -- the value is as far as possible from being preserved. Hmm, then how you could explain the difference between -traditional and -ansi outputs for the following fragment on i386: int printf(char *, ...); int main(void) { long long l; unsigned char c1 = 1; unsigned char c2 = 2; l = c1 - c2; printf(%lld\n, l); l = -1; printf(%lld\n, l); } Or the same code but with `long' on sparc64. The first paragraph above is all about the ANSI C case. -traditional gives signedness-preserving conversions, so c1 is prompted to 1U and c2 is promoted to 2U. 1U - 2U is UINT_MAX on all machines. The difference between UINT_MAX and -1 can be seen by converting these values to a common wider type as in your example. UINT_MAX LLONG_MAX on all machines supported by FreeBSD although not in general, so assigning it to `l' doesn't change its value. Bruce To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: New AWK bug with collating
On Fri, Dec 13, 2002 at 03:26:54PM +0300, Andrey A. Chernov wrote: Since both operands are unsigned, result can't be negative, but supposed to be. Here is the fix: --- b.c.bak Fri Dec 13 14:54:12 2002 +++ b.c Fri Dec 13 15:20:15 2002 @@ -292,7 +292,7 @@ s[0][0] = a; s[1][0] = b; if ((r = strcoll(s[0], s[1])) == 0) - r = (uschar)a - (uschar)b; + r = (int)((uschar)a) - (int)((uschar)b); return r; } Pardon my ignorance here, but the following fragment returns -1, doesn't it? #include stdio.h void main(void) { int i; i = (unsigned char)1 - (unsigned char)2; printf(%d\n, i); } Cheers, -- Ruslan Ermilov Sysadmin and DBA, [EMAIL PROTECTED] Sunbay Software AG, [EMAIL PROTECTED] FreeBSD committer, +380.652.512.251Simferopol, Ukraine http://www.FreeBSD.org The Power To Serve http://www.oracle.com Enabling The Information Age msg48644/pgp0.pgp Description: PGP signature
Re: New AWK bug with collating
On Fri, Dec 13, 2002 at 14:32:40 +0200, Ruslan Ermilov wrote: Pardon my ignorance here, but the following fragment returns -1, doesn't it? #include stdio.h void main(void) { int i; i = (unsigned char)1 - (unsigned char)2; printf(%d\n, i); } It very depends on compiler, i.e. does it implements value preseving or unsigned preserving for 'char' type conversions. Or ANSI C vs. common C mode. Better be safe for both. Read 6.10.1.1 section here: http://wwwrsphysse.anu.edu.au/doc/DUhelp/AQTLTBTE/DOCU_067.HTM -- Andrey A. Chernov http://ache.pp.ru/ msg48645/pgp0.pgp Description: PGP signature
Re: New AWK bug with collating
On Fri, Dec 13, 2002 at 04:41:06PM +0300, Andrey A. Chernov wrote: On Fri, Dec 13, 2002 at 14:32:40 +0200, Ruslan Ermilov wrote: Pardon my ignorance here, but the following fragment returns -1, doesn't it? #include stdio.h void main(void) { int i; i = (unsigned char)1 - (unsigned char)2; printf(%d\n, i); } It very depends on compiler, i.e. does it implements value preseving or unsigned preserving for 'char' type conversions. Or ANSI C vs. common C mode. Better be safe for both. Read 6.10.1.1 section here: http://wwwrsphysse.anu.edu.au/doc/DUhelp/AQTLTBTE/DOCU_067.HTM This is handled by the -traditional flag of gcc(1): : `-traditional' : : Attempt to support some aspects of traditional C compilers. : Specifically: : [...] : : * Integer types `unsigned short' and `unsigned char' promote to : `unsigned int'. With -traditional, the code I quoted still produces -1. In any case, this section doesn't apply to this case because no conversion described in section 6.10 is ever done here, since both operands are of the same type, unsigned char. Cheers, -- Ruslan Ermilov Sysadmin and DBA, [EMAIL PROTECTED] Sunbay Software AG, [EMAIL PROTECTED] FreeBSD committer, +380.652.512.251Simferopol, Ukraine http://www.FreeBSD.org The Power To Serve http://www.oracle.com Enabling The Information Age msg48646/pgp0.pgp Description: PGP signature
Re: New AWK bug with collating
On Fri, Dec 13, 2002 at 17:09:42 +0200, Ruslan Ermilov wrote: : : * Integer types `unsigned short' and `unsigned char' promote to : `unsigned int'. With -traditional, the code I quoted still produces -1. Probably because of machine-specific overflow handling or printf overflow. Use this safe example instead (with -traditional): main() { long long l; unsigned char a = 1; unsigned char b = 2; l = a - b; printf(%04x %04x\n, (int)((l 32) 0x), (int)(l 0x)); l = -1; printf(%04x %04x\n, (int)((l 32) 0x), (int)(l 0x)); } In any case, this section doesn't apply to this case because no conversion described in section 6.10 is ever done here, since both operands are of the same type, unsigned char. No, any char type converted to int (or to unsigned int with -traditional) prior to any operation with it. -- Andrey A. Chernov http://ache.pp.ru/ msg48647/pgp0.pgp Description: PGP signature