[hackers] [libgrapheme] Use SIZE_MAX instead of (size_t)(-1) || Laslo Hunhold

2022-08-16 Thread git
commit 5ef98fdec958130a92aa40006ddd9daba36b4d12
Author: Laslo Hunhold 
AuthorDate: Tue Aug 16 16:51:17 2022 +0200
Commit: Laslo Hunhold 
CommitDate: Tue Aug 16 16:51:17 2022 +0200

Use SIZE_MAX instead of (size_t)(-1)

Thanks Mattias for pointing this out!

Signed-off-by: Laslo Hunhold 

diff --git a/gen/util.c b/gen/util.c
index bfe0dbf..012b04a 100644
--- a/gen/util.c
+++ b/gen/util.c
@@ -34,7 +34,7 @@ struct break_test_payload
 static void *
 reallocate_array(void *p, size_t len, size_t size)
 {
-   if (len > 0 && size > (size_t)(-1) / len) {
+   if (len > 0 && size > SIZE_MAX / len) {
errno = ENOMEM;
return NULL;
}



Re: [hackers] [libgrapheme] Use SIZE_MAX instead of (size_t)-1 || Laslo Hunhold

2021-12-18 Thread Laslo Hunhold
On Sat, 18 Dec 2021 15:07:30 -0500
Ethan Sommer  wrote:

Dear Ethan,

> > (size_t)-1 is also undefined behaviour.  
> 
> It isn't, wrap-around with unsigned types is defined, it's only signed
> overflow that isn't.

yes, exactly. For posterity, the standard specifies that in 6.3.1.3p2:

  "Otherwise, if the new type is unsigned, the value is converted by
  repeatedly adding or subtracting one more than the maximum value that
  can be represented in the new type until the value is in the range of
  the new type."

With best regards

Laslo



Re: [hackers] [libgrapheme] Use SIZE_MAX instead of (size_t)-1 || Laslo Hunhold

2021-12-18 Thread Mattias Andrée
It appears you are correct, I've been tricked by some tool that
checked for undefined behaviour during runtime (don't remember
which, it was an website that forced it upon the user). Casting
a signed value X to unsigned is for an N-bit integer shall
result in the number that that is congruent with X modulo
2^N. So, 2^N + X for negative numbers.

And yes, signed overflow is undefined, despite some LinkedIn
Learning course I took claiming otherwise (it even claimed that
C always used two's complement). (And no, LinkedIn Learning is
not worth your money, whatever it may cost; my employer pays
for it.)


On Sat, 18 Dec 2021 15:07:30 -0500
Ethan Sommer  wrote:

> On Sat, Dec 18, 2021 at 3:02 PM Mattias Andrée  wrote:
> 
> > (size_t)-1 is also undefined behaviour.  
> 
> 
> It isn't, wrap-around with unsigned types is defined, it's only signed
> overflow that isn't.
> 
> 
> > On Sat, 18 Dec 2021 20:21:42 +0100
> >  wrote:
> >  
> > > commit cb7e9c00899ae0ed57a84991308b7f880f4ddef6
> > > Author: Laslo Hunhold 
> > > AuthorDate: Sat Dec 18 20:21:04 2021 +0100
> > > Commit: Laslo Hunhold 
> > > CommitDate: Sat Dec 18 20:21:04 2021 +0100
> > >
> > > Use SIZE_MAX instead of (size_t)-1
> > >
> > > This makes a bit clearer what we mean, and given the library is C99
> > > we can rely on this constant to exist.
> > >
> > > Signed-off-by: Laslo Hunhold 
> > >
> > > diff --git a/man/grapheme_decode_utf8.3 b/man/grapheme_decode_utf8.3
> > > index 26e3afb..d5c7c9d 100644
> > > --- a/man/grapheme_decode_utf8.3
> > > +++ b/man/grapheme_decode_utf8.3
> > > @@ -31,8 +31,8 @@ Given NUL has a unique 1 byte representation, it is  
> > safe to operate on  
> > >  NUL-terminated strings by setting
> > >  .Va len
> > >  to
> > > -.Dv (size_t)-1
> > > -and terminating when
> > > +.Dv SIZE_MAX
> > > +(stdint.h is already included by grapheme.h) and terminating when
> > >  .Va cp
> > >  is 0 (see
> > >  .Sx EXAMPLES
> > > @@ -87,7 +87,7 @@ print_cps_nul_terminated(const char *str)
> > >   uint_least32_t cp;
> > >
> > >   for (off = 0; (ret = grapheme_decode_utf8(str + off,
> > > -   (size_t)-1, &cp)) > 0 &&
> > > +   SIZE_MAX, &cp)) > 0 &&
> > >cp != 0; off += ret) {
> > >   printf("%"PRIxLEAST32"\\n", cp);
> > >   }
> > > diff --git a/src/character.c b/src/character.c
> > > index 015b4e0..8f1143f 100644
> > > --- a/src/character.c
> > > +++ b/src/character.c
> > > @@ -197,19 +197,19 @@ grapheme_next_character_break(const char *str)
> > >* miss it, even if the previous UTF-8 sequence terminates
> > >* unexpectedly, as it would either act as an unexpected byte,
> > >* saved for later, or as a null byte itself, that we can catch.
> > > -  * We pass (size_t)-1 to the length, as we will never read beyond
> > > +  * We pass SIZE_MAX to the length, as we will never read beyond
> > >* the null byte for the reasons given above.
> > >*/
> > >
> > >   /* get first codepoint */
> > > - len += grapheme_decode_utf8(str, (size_t)-1, &cp0);
> > > + len += grapheme_decode_utf8(str, SIZE_MAX, &cp0);
> > >   if (cp0 == GRAPHEME_INVALID_CODEPOINT) {
> > >   return len;
> > >   }
> > >
> > >   while (cp0 != 0) {
> > >   /* get next codepoint */
> > > - ret = grapheme_decode_utf8(str + len, (size_t)-1, &cp1);
> > > + ret = grapheme_decode_utf8(str + len, SIZE_MAX, &cp1);
> > >
> > >   if (cp1 == GRAPHEME_INVALID_CODEPOINT ||
> > >   grapheme_is_character_break(cp0, cp1, &state)) {
> > >  
> >
> >
> >  




Re: [hackers] [libgrapheme] Use SIZE_MAX instead of (size_t)-1 || Laslo Hunhold

2021-12-18 Thread Ethan Sommer
On Sat, Dec 18, 2021 at 3:02 PM Mattias Andrée  wrote:

> (size_t)-1 is also undefined behaviour.


It isn't, wrap-around with unsigned types is defined, it's only signed
overflow that isn't.


> On Sat, 18 Dec 2021 20:21:42 +0100
>  wrote:
>
> > commit cb7e9c00899ae0ed57a84991308b7f880f4ddef6
> > Author: Laslo Hunhold 
> > AuthorDate: Sat Dec 18 20:21:04 2021 +0100
> > Commit: Laslo Hunhold 
> > CommitDate: Sat Dec 18 20:21:04 2021 +0100
> >
> > Use SIZE_MAX instead of (size_t)-1
> >
> > This makes a bit clearer what we mean, and given the library is C99
> > we can rely on this constant to exist.
> >
> > Signed-off-by: Laslo Hunhold 
> >
> > diff --git a/man/grapheme_decode_utf8.3 b/man/grapheme_decode_utf8.3
> > index 26e3afb..d5c7c9d 100644
> > --- a/man/grapheme_decode_utf8.3
> > +++ b/man/grapheme_decode_utf8.3
> > @@ -31,8 +31,8 @@ Given NUL has a unique 1 byte representation, it is
> safe to operate on
> >  NUL-terminated strings by setting
> >  .Va len
> >  to
> > -.Dv (size_t)-1
> > -and terminating when
> > +.Dv SIZE_MAX
> > +(stdint.h is already included by grapheme.h) and terminating when
> >  .Va cp
> >  is 0 (see
> >  .Sx EXAMPLES
> > @@ -87,7 +87,7 @@ print_cps_nul_terminated(const char *str)
> >   uint_least32_t cp;
> >
> >   for (off = 0; (ret = grapheme_decode_utf8(str + off,
> > -   (size_t)-1, &cp)) > 0 &&
> > +   SIZE_MAX, &cp)) > 0 &&
> >cp != 0; off += ret) {
> >   printf("%"PRIxLEAST32"\\n", cp);
> >   }
> > diff --git a/src/character.c b/src/character.c
> > index 015b4e0..8f1143f 100644
> > --- a/src/character.c
> > +++ b/src/character.c
> > @@ -197,19 +197,19 @@ grapheme_next_character_break(const char *str)
> >* miss it, even if the previous UTF-8 sequence terminates
> >* unexpectedly, as it would either act as an unexpected byte,
> >* saved for later, or as a null byte itself, that we can catch.
> > -  * We pass (size_t)-1 to the length, as we will never read beyond
> > +  * We pass SIZE_MAX to the length, as we will never read beyond
> >* the null byte for the reasons given above.
> >*/
> >
> >   /* get first codepoint */
> > - len += grapheme_decode_utf8(str, (size_t)-1, &cp0);
> > + len += grapheme_decode_utf8(str, SIZE_MAX, &cp0);
> >   if (cp0 == GRAPHEME_INVALID_CODEPOINT) {
> >   return len;
> >   }
> >
> >   while (cp0 != 0) {
> >   /* get next codepoint */
> > - ret = grapheme_decode_utf8(str + len, (size_t)-1, &cp1);
> > + ret = grapheme_decode_utf8(str + len, SIZE_MAX, &cp1);
> >
> >   if (cp1 == GRAPHEME_INVALID_CODEPOINT ||
> >   grapheme_is_character_break(cp0, cp1, &state)) {
> >
>
>
>


Re: [hackers] [libgrapheme] Use SIZE_MAX instead of (size_t)-1 || Laslo Hunhold

2021-12-18 Thread Mattias Andrée
(size_t)-1 is also undefined behaviour.

On Sat, 18 Dec 2021 20:21:42 +0100
 wrote:

> commit cb7e9c00899ae0ed57a84991308b7f880f4ddef6
> Author: Laslo Hunhold 
> AuthorDate: Sat Dec 18 20:21:04 2021 +0100
> Commit: Laslo Hunhold 
> CommitDate: Sat Dec 18 20:21:04 2021 +0100
> 
> Use SIZE_MAX instead of (size_t)-1
> 
> This makes a bit clearer what we mean, and given the library is C99
> we can rely on this constant to exist.
> 
> Signed-off-by: Laslo Hunhold 
> 
> diff --git a/man/grapheme_decode_utf8.3 b/man/grapheme_decode_utf8.3
> index 26e3afb..d5c7c9d 100644
> --- a/man/grapheme_decode_utf8.3
> +++ b/man/grapheme_decode_utf8.3
> @@ -31,8 +31,8 @@ Given NUL has a unique 1 byte representation, it is safe to 
> operate on
>  NUL-terminated strings by setting
>  .Va len
>  to
> -.Dv (size_t)-1
> -and terminating when
> +.Dv SIZE_MAX
> +(stdint.h is already included by grapheme.h) and terminating when
>  .Va cp
>  is 0 (see
>  .Sx EXAMPLES
> @@ -87,7 +87,7 @@ print_cps_nul_terminated(const char *str)
>   uint_least32_t cp;
>  
>   for (off = 0; (ret = grapheme_decode_utf8(str + off,
> -   (size_t)-1, &cp)) > 0 &&
> +   SIZE_MAX, &cp)) > 0 &&
>cp != 0; off += ret) {
>   printf("%"PRIxLEAST32"\\n", cp);
>   }
> diff --git a/src/character.c b/src/character.c
> index 015b4e0..8f1143f 100644
> --- a/src/character.c
> +++ b/src/character.c
> @@ -197,19 +197,19 @@ grapheme_next_character_break(const char *str)
>* miss it, even if the previous UTF-8 sequence terminates
>* unexpectedly, as it would either act as an unexpected byte,
>* saved for later, or as a null byte itself, that we can catch.
> -  * We pass (size_t)-1 to the length, as we will never read beyond
> +  * We pass SIZE_MAX to the length, as we will never read beyond
>* the null byte for the reasons given above.
>*/
>  
>   /* get first codepoint */
> - len += grapheme_decode_utf8(str, (size_t)-1, &cp0);
> + len += grapheme_decode_utf8(str, SIZE_MAX, &cp0);
>   if (cp0 == GRAPHEME_INVALID_CODEPOINT) {
>   return len;
>   }
>  
>   while (cp0 != 0) {
>   /* get next codepoint */
> - ret = grapheme_decode_utf8(str + len, (size_t)-1, &cp1);
> + ret = grapheme_decode_utf8(str + len, SIZE_MAX, &cp1);
>  
>   if (cp1 == GRAPHEME_INVALID_CODEPOINT ||
>   grapheme_is_character_break(cp0, cp1, &state)) {
> 




[hackers] [libgrapheme] Use SIZE_MAX instead of (size_t)-1 || Laslo Hunhold

2021-12-18 Thread git
commit cb7e9c00899ae0ed57a84991308b7f880f4ddef6
Author: Laslo Hunhold 
AuthorDate: Sat Dec 18 20:21:04 2021 +0100
Commit: Laslo Hunhold 
CommitDate: Sat Dec 18 20:21:04 2021 +0100

Use SIZE_MAX instead of (size_t)-1

This makes a bit clearer what we mean, and given the library is C99
we can rely on this constant to exist.

Signed-off-by: Laslo Hunhold 

diff --git a/man/grapheme_decode_utf8.3 b/man/grapheme_decode_utf8.3
index 26e3afb..d5c7c9d 100644
--- a/man/grapheme_decode_utf8.3
+++ b/man/grapheme_decode_utf8.3
@@ -31,8 +31,8 @@ Given NUL has a unique 1 byte representation, it is safe to 
operate on
 NUL-terminated strings by setting
 .Va len
 to
-.Dv (size_t)-1
-and terminating when
+.Dv SIZE_MAX
+(stdint.h is already included by grapheme.h) and terminating when
 .Va cp
 is 0 (see
 .Sx EXAMPLES
@@ -87,7 +87,7 @@ print_cps_nul_terminated(const char *str)
uint_least32_t cp;
 
for (off = 0; (ret = grapheme_decode_utf8(str + off,
- (size_t)-1, &cp)) > 0 &&
+ SIZE_MAX, &cp)) > 0 &&
 cp != 0; off += ret) {
printf("%"PRIxLEAST32"\\n", cp);
}
diff --git a/src/character.c b/src/character.c
index 015b4e0..8f1143f 100644
--- a/src/character.c
+++ b/src/character.c
@@ -197,19 +197,19 @@ grapheme_next_character_break(const char *str)
 * miss it, even if the previous UTF-8 sequence terminates
 * unexpectedly, as it would either act as an unexpected byte,
 * saved for later, or as a null byte itself, that we can catch.
-* We pass (size_t)-1 to the length, as we will never read beyond
+* We pass SIZE_MAX to the length, as we will never read beyond
 * the null byte for the reasons given above.
 */
 
/* get first codepoint */
-   len += grapheme_decode_utf8(str, (size_t)-1, &cp0);
+   len += grapheme_decode_utf8(str, SIZE_MAX, &cp0);
if (cp0 == GRAPHEME_INVALID_CODEPOINT) {
return len;
}
 
while (cp0 != 0) {
/* get next codepoint */
-   ret = grapheme_decode_utf8(str + len, (size_t)-1, &cp1);
+   ret = grapheme_decode_utf8(str + len, SIZE_MAX, &cp1);
 
if (cp1 == GRAPHEME_INVALID_CODEPOINT ||
grapheme_is_character_break(cp0, cp1, &state)) {