[hackers] [libgrapheme] Use SIZE_MAX instead of (size_t)(-1) || Laslo Hunhold
commit 5ef98fdec958130a92aa40006ddd9daba36b4d12 Author: Laslo Hunhold AuthorDate: Tue Aug 16 16:51:17 2022 +0200 Commit: Laslo Hunhold CommitDate: Tue Aug 16 16:51:17 2022 +0200 Use SIZE_MAX instead of (size_t)(-1) Thanks Mattias for pointing this out! Signed-off-by: Laslo Hunhold diff --git a/gen/util.c b/gen/util.c index bfe0dbf..012b04a 100644 --- a/gen/util.c +++ b/gen/util.c @@ -34,7 +34,7 @@ struct break_test_payload static void * reallocate_array(void *p, size_t len, size_t size) { - if (len > 0 && size > (size_t)(-1) / len) { + if (len > 0 && size > SIZE_MAX / len) { errno = ENOMEM; return NULL; }
Re: [hackers] [libgrapheme] Use SIZE_MAX instead of (size_t)-1 || Laslo Hunhold
On Sat, 18 Dec 2021 15:07:30 -0500 Ethan Sommer wrote: Dear Ethan, > > (size_t)-1 is also undefined behaviour. > > It isn't, wrap-around with unsigned types is defined, it's only signed > overflow that isn't. yes, exactly. For posterity, the standard specifies that in 6.3.1.3p2: "Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type." With best regards Laslo
Re: [hackers] [libgrapheme] Use SIZE_MAX instead of (size_t)-1 || Laslo Hunhold
It appears you are correct, I've been tricked by some tool that checked for undefined behaviour during runtime (don't remember which, it was an website that forced it upon the user). Casting a signed value X to unsigned is for an N-bit integer shall result in the number that that is congruent with X modulo 2^N. So, 2^N + X for negative numbers. And yes, signed overflow is undefined, despite some LinkedIn Learning course I took claiming otherwise (it even claimed that C always used two's complement). (And no, LinkedIn Learning is not worth your money, whatever it may cost; my employer pays for it.) On Sat, 18 Dec 2021 15:07:30 -0500 Ethan Sommer wrote: > On Sat, Dec 18, 2021 at 3:02 PM Mattias Andrée wrote: > > > (size_t)-1 is also undefined behaviour. > > > It isn't, wrap-around with unsigned types is defined, it's only signed > overflow that isn't. > > > > On Sat, 18 Dec 2021 20:21:42 +0100 > > wrote: > > > > > commit cb7e9c00899ae0ed57a84991308b7f880f4ddef6 > > > Author: Laslo Hunhold > > > AuthorDate: Sat Dec 18 20:21:04 2021 +0100 > > > Commit: Laslo Hunhold > > > CommitDate: Sat Dec 18 20:21:04 2021 +0100 > > > > > > Use SIZE_MAX instead of (size_t)-1 > > > > > > This makes a bit clearer what we mean, and given the library is C99 > > > we can rely on this constant to exist. > > > > > > Signed-off-by: Laslo Hunhold > > > > > > diff --git a/man/grapheme_decode_utf8.3 b/man/grapheme_decode_utf8.3 > > > index 26e3afb..d5c7c9d 100644 > > > --- a/man/grapheme_decode_utf8.3 > > > +++ b/man/grapheme_decode_utf8.3 > > > @@ -31,8 +31,8 @@ Given NUL has a unique 1 byte representation, it is > > safe to operate on > > > NUL-terminated strings by setting > > > .Va len > > > to > > > -.Dv (size_t)-1 > > > -and terminating when > > > +.Dv SIZE_MAX > > > +(stdint.h is already included by grapheme.h) and terminating when > > > .Va cp > > > is 0 (see > > > .Sx EXAMPLES > > > @@ -87,7 +87,7 @@ print_cps_nul_terminated(const char *str) > > > uint_least32_t cp; > > > > > > for (off = 0; (ret = grapheme_decode_utf8(str + off, > > > - (size_t)-1, &cp)) > 0 && > > > + SIZE_MAX, &cp)) > 0 && > > >cp != 0; off += ret) { > > > printf("%"PRIxLEAST32"\\n", cp); > > > } > > > diff --git a/src/character.c b/src/character.c > > > index 015b4e0..8f1143f 100644 > > > --- a/src/character.c > > > +++ b/src/character.c > > > @@ -197,19 +197,19 @@ grapheme_next_character_break(const char *str) > > >* miss it, even if the previous UTF-8 sequence terminates > > >* unexpectedly, as it would either act as an unexpected byte, > > >* saved for later, or as a null byte itself, that we can catch. > > > - * We pass (size_t)-1 to the length, as we will never read beyond > > > + * We pass SIZE_MAX to the length, as we will never read beyond > > >* the null byte for the reasons given above. > > >*/ > > > > > > /* get first codepoint */ > > > - len += grapheme_decode_utf8(str, (size_t)-1, &cp0); > > > + len += grapheme_decode_utf8(str, SIZE_MAX, &cp0); > > > if (cp0 == GRAPHEME_INVALID_CODEPOINT) { > > > return len; > > > } > > > > > > while (cp0 != 0) { > > > /* get next codepoint */ > > > - ret = grapheme_decode_utf8(str + len, (size_t)-1, &cp1); > > > + ret = grapheme_decode_utf8(str + len, SIZE_MAX, &cp1); > > > > > > if (cp1 == GRAPHEME_INVALID_CODEPOINT || > > > grapheme_is_character_break(cp0, cp1, &state)) { > > > > > > > > >
Re: [hackers] [libgrapheme] Use SIZE_MAX instead of (size_t)-1 || Laslo Hunhold
On Sat, Dec 18, 2021 at 3:02 PM Mattias Andrée wrote: > (size_t)-1 is also undefined behaviour. It isn't, wrap-around with unsigned types is defined, it's only signed overflow that isn't. > On Sat, 18 Dec 2021 20:21:42 +0100 > wrote: > > > commit cb7e9c00899ae0ed57a84991308b7f880f4ddef6 > > Author: Laslo Hunhold > > AuthorDate: Sat Dec 18 20:21:04 2021 +0100 > > Commit: Laslo Hunhold > > CommitDate: Sat Dec 18 20:21:04 2021 +0100 > > > > Use SIZE_MAX instead of (size_t)-1 > > > > This makes a bit clearer what we mean, and given the library is C99 > > we can rely on this constant to exist. > > > > Signed-off-by: Laslo Hunhold > > > > diff --git a/man/grapheme_decode_utf8.3 b/man/grapheme_decode_utf8.3 > > index 26e3afb..d5c7c9d 100644 > > --- a/man/grapheme_decode_utf8.3 > > +++ b/man/grapheme_decode_utf8.3 > > @@ -31,8 +31,8 @@ Given NUL has a unique 1 byte representation, it is > safe to operate on > > NUL-terminated strings by setting > > .Va len > > to > > -.Dv (size_t)-1 > > -and terminating when > > +.Dv SIZE_MAX > > +(stdint.h is already included by grapheme.h) and terminating when > > .Va cp > > is 0 (see > > .Sx EXAMPLES > > @@ -87,7 +87,7 @@ print_cps_nul_terminated(const char *str) > > uint_least32_t cp; > > > > for (off = 0; (ret = grapheme_decode_utf8(str + off, > > - (size_t)-1, &cp)) > 0 && > > + SIZE_MAX, &cp)) > 0 && > >cp != 0; off += ret) { > > printf("%"PRIxLEAST32"\\n", cp); > > } > > diff --git a/src/character.c b/src/character.c > > index 015b4e0..8f1143f 100644 > > --- a/src/character.c > > +++ b/src/character.c > > @@ -197,19 +197,19 @@ grapheme_next_character_break(const char *str) > >* miss it, even if the previous UTF-8 sequence terminates > >* unexpectedly, as it would either act as an unexpected byte, > >* saved for later, or as a null byte itself, that we can catch. > > - * We pass (size_t)-1 to the length, as we will never read beyond > > + * We pass SIZE_MAX to the length, as we will never read beyond > >* the null byte for the reasons given above. > >*/ > > > > /* get first codepoint */ > > - len += grapheme_decode_utf8(str, (size_t)-1, &cp0); > > + len += grapheme_decode_utf8(str, SIZE_MAX, &cp0); > > if (cp0 == GRAPHEME_INVALID_CODEPOINT) { > > return len; > > } > > > > while (cp0 != 0) { > > /* get next codepoint */ > > - ret = grapheme_decode_utf8(str + len, (size_t)-1, &cp1); > > + ret = grapheme_decode_utf8(str + len, SIZE_MAX, &cp1); > > > > if (cp1 == GRAPHEME_INVALID_CODEPOINT || > > grapheme_is_character_break(cp0, cp1, &state)) { > > > > >
Re: [hackers] [libgrapheme] Use SIZE_MAX instead of (size_t)-1 || Laslo Hunhold
(size_t)-1 is also undefined behaviour. On Sat, 18 Dec 2021 20:21:42 +0100 wrote: > commit cb7e9c00899ae0ed57a84991308b7f880f4ddef6 > Author: Laslo Hunhold > AuthorDate: Sat Dec 18 20:21:04 2021 +0100 > Commit: Laslo Hunhold > CommitDate: Sat Dec 18 20:21:04 2021 +0100 > > Use SIZE_MAX instead of (size_t)-1 > > This makes a bit clearer what we mean, and given the library is C99 > we can rely on this constant to exist. > > Signed-off-by: Laslo Hunhold > > diff --git a/man/grapheme_decode_utf8.3 b/man/grapheme_decode_utf8.3 > index 26e3afb..d5c7c9d 100644 > --- a/man/grapheme_decode_utf8.3 > +++ b/man/grapheme_decode_utf8.3 > @@ -31,8 +31,8 @@ Given NUL has a unique 1 byte representation, it is safe to > operate on > NUL-terminated strings by setting > .Va len > to > -.Dv (size_t)-1 > -and terminating when > +.Dv SIZE_MAX > +(stdint.h is already included by grapheme.h) and terminating when > .Va cp > is 0 (see > .Sx EXAMPLES > @@ -87,7 +87,7 @@ print_cps_nul_terminated(const char *str) > uint_least32_t cp; > > for (off = 0; (ret = grapheme_decode_utf8(str + off, > - (size_t)-1, &cp)) > 0 && > + SIZE_MAX, &cp)) > 0 && >cp != 0; off += ret) { > printf("%"PRIxLEAST32"\\n", cp); > } > diff --git a/src/character.c b/src/character.c > index 015b4e0..8f1143f 100644 > --- a/src/character.c > +++ b/src/character.c > @@ -197,19 +197,19 @@ grapheme_next_character_break(const char *str) >* miss it, even if the previous UTF-8 sequence terminates >* unexpectedly, as it would either act as an unexpected byte, >* saved for later, or as a null byte itself, that we can catch. > - * We pass (size_t)-1 to the length, as we will never read beyond > + * We pass SIZE_MAX to the length, as we will never read beyond >* the null byte for the reasons given above. >*/ > > /* get first codepoint */ > - len += grapheme_decode_utf8(str, (size_t)-1, &cp0); > + len += grapheme_decode_utf8(str, SIZE_MAX, &cp0); > if (cp0 == GRAPHEME_INVALID_CODEPOINT) { > return len; > } > > while (cp0 != 0) { > /* get next codepoint */ > - ret = grapheme_decode_utf8(str + len, (size_t)-1, &cp1); > + ret = grapheme_decode_utf8(str + len, SIZE_MAX, &cp1); > > if (cp1 == GRAPHEME_INVALID_CODEPOINT || > grapheme_is_character_break(cp0, cp1, &state)) { >
[hackers] [libgrapheme] Use SIZE_MAX instead of (size_t)-1 || Laslo Hunhold
commit cb7e9c00899ae0ed57a84991308b7f880f4ddef6 Author: Laslo Hunhold AuthorDate: Sat Dec 18 20:21:04 2021 +0100 Commit: Laslo Hunhold CommitDate: Sat Dec 18 20:21:04 2021 +0100 Use SIZE_MAX instead of (size_t)-1 This makes a bit clearer what we mean, and given the library is C99 we can rely on this constant to exist. Signed-off-by: Laslo Hunhold diff --git a/man/grapheme_decode_utf8.3 b/man/grapheme_decode_utf8.3 index 26e3afb..d5c7c9d 100644 --- a/man/grapheme_decode_utf8.3 +++ b/man/grapheme_decode_utf8.3 @@ -31,8 +31,8 @@ Given NUL has a unique 1 byte representation, it is safe to operate on NUL-terminated strings by setting .Va len to -.Dv (size_t)-1 -and terminating when +.Dv SIZE_MAX +(stdint.h is already included by grapheme.h) and terminating when .Va cp is 0 (see .Sx EXAMPLES @@ -87,7 +87,7 @@ print_cps_nul_terminated(const char *str) uint_least32_t cp; for (off = 0; (ret = grapheme_decode_utf8(str + off, - (size_t)-1, &cp)) > 0 && + SIZE_MAX, &cp)) > 0 && cp != 0; off += ret) { printf("%"PRIxLEAST32"\\n", cp); } diff --git a/src/character.c b/src/character.c index 015b4e0..8f1143f 100644 --- a/src/character.c +++ b/src/character.c @@ -197,19 +197,19 @@ grapheme_next_character_break(const char *str) * miss it, even if the previous UTF-8 sequence terminates * unexpectedly, as it would either act as an unexpected byte, * saved for later, or as a null byte itself, that we can catch. -* We pass (size_t)-1 to the length, as we will never read beyond +* We pass SIZE_MAX to the length, as we will never read beyond * the null byte for the reasons given above. */ /* get first codepoint */ - len += grapheme_decode_utf8(str, (size_t)-1, &cp0); + len += grapheme_decode_utf8(str, SIZE_MAX, &cp0); if (cp0 == GRAPHEME_INVALID_CODEPOINT) { return len; } while (cp0 != 0) { /* get next codepoint */ - ret = grapheme_decode_utf8(str + len, (size_t)-1, &cp1); + ret = grapheme_decode_utf8(str + len, SIZE_MAX, &cp1); if (cp1 == GRAPHEME_INVALID_CODEPOINT || grapheme_is_character_break(cp0, cp1, &state)) {