Re: misc/uniutils fix %n

2021-09-15 Thread Nicholas Marriott
Looks good to me, ok nicm

I would probably have used snprintf first myself, to be fair.

The tempstr stuff is hilarious though...


On Tue, 14 Sept 2021 at 17:42, Ingo Schwarze  wrote:

> Hi,
>
> Stefan Hagen wrote on Sat, Sep 11, 2021 at 09:00:18AM +0200:
>
> > Is this better?
>
> No, it is no good whatsoever.
>
>  * It introduces yet another pointless static buffer.
>  * It pointlessly uses snprintf(3) where printf(3)
>is obviously sufficient.
>  * It introduces error checking for operations
>that obviously cannot fail when done properly.
>  * It needlessly does pointer arithmetic.
>
> So far, so bad, but the worst part is that it apparently wasn't
> tested on a 64-bit platform (a platform where sizeof(long) == 64,
> for example amd64).  On such a platform, there is another bug on
> the very same line.  The function binfmtl() has platform-dependent
> behaviour and does not behave as intended on a 64-bit platform and
> prints 32 spurious zero-digit characters, breaking the desired
> output alignment and making the content of the output
> self-contradictory on top of that.
>
> Rather than try to change binfmtl(), which is mostly pointless
> in the first place, let's just trivially inline the desired
> functionality, it's just two lines of very simple C code.
>
> OK?
>   Ingo
>
> P.S.
> Usually, we would want to upstream such a bugfix.  But the overall
> quality of this code is so abysmal that i'm not sure upstreaming
> anything is worth the hassle.  Doing minimal cleanup of the most
> glaring style issues in this code would require changing 99% of it,
> and given how it is written throughout, my suspicion is that it is
> likely full of bugs, and having two bugs on this one line may not
> be an exception.  Just look at the totally ridiculous tempstr
> gymnastics right below, which is incredibly stupid.  Or look just
> a few lines above and admire the hand-rolled code handling six-byte
> UTF-8 characters (which are invalid according to the standard).
> The code is just very, very bad throughout in large numbers of
> respects.
>
> I'm ashamed to admit that i have occasionally been using this port
> myself, and after looking at the code, i'm no longer convinced i
> should do that.  Not sure yet what to do about that.  Cleaning this
> up looks like a project of significant size, and quite a painful one.
>
> I just looked up the author of this code:
>   https://en.wikipedia.org/wiki/William_Poser
>   https://billposer.org/
> It looks like he is probably an eminent linguist, but he is clearly
> not a software developer by any stretch of the imagination.
>
>
> Index: Makefile
> ===
> RCS file: /cvs/ports/misc/uniutils/Makefile,v
> retrieving revision 1.9
> diff -u -p -r1.9 Makefile
> --- Makefile28 Jun 2021 21:34:19 -  1.9
> +++ Makefile14 Sep 2021 15:52:32 -
> @@ -3,7 +3,7 @@
>  COMMENT=   Unicode utilities
>
>  DISTNAME=  uniutils-2.27
> -REVISION=  3
> +REVISION=  4
>  CATEGORIES=misc
>
>  HOMEPAGE=  http://billposer.org/Software/unidesc.html
> Index: patches/patch-ExplicateUTF8_c
> ===
> RCS file: patches/patch-ExplicateUTF8_c
> diff -N patches/patch-ExplicateUTF8_c
> --- /dev/null   1 Jan 1970 00:00:00 -
> +++ patches/patch-ExplicateUTF8_c   14 Sep 2021 15:52:32 -
> @@ -0,0 +1,30 @@
> +$OpenBSD$
> +
> +Remove %n format specifier
> +and stop using the platform-dependent function binfmtl().
> +We want 32 bits here, not whatever sizeof(long) might be.
> +
> +Index: ExplicateUTF8.c
> +--- ExplicateUTF8.c.orig
>  ExplicateUTF8.c
> +@@ -87,7 +87,7 @@ main(int ac, char **av){
> +   int UsefulBits;
> +   unsigned char c[6];
> +   int i;
> +-  UTF32 ch;
> ++  UTF32 ch, mask;
> +   unsigned char *cptr;
> +   unsigned char ShiftedByte;
> +   char tempstr[33];
> +@@ -214,7 +214,10 @@ main(int ac, char **av){
> + printf("%s ",tempstr);
> +   }
> +   printf("\n");
> +-  printf("This is padded to 32 places with %d zeros:
> %n%s\n",(32-GotBits),,binfmtl(ch));
> ++  spaces = printf("This is padded to 32 places with %d zeros:
> ",(32-GotBits));
> ++  for (mask = 1UL << 31; mask; mask >>= 1)
> ++putchar(ch & mask ? '1' : '0');
> ++  putchar('\n');
> +   sprintf(tempstr,"");
> +   sprintf(tempstr,"%08lX",ch);
> +   tempstr[28] = tempstr[7];
>


Re: misc/uniutils fix %n

2021-09-14 Thread Ingo Schwarze
Hi Stefan,

Stefan Hagen wrote on Tue, Sep 14, 2021 at 07:37:20PM +0200:

> I really appreciate you (and Theo) taking the time to explain
> these things.

You are welcome, and thank you for trying to help fix stuff (i think
i forgot saying that in my previous mail because i got a bit worked
up about the horrible code in that file, which has nothing to do
with you).

> I noticed the weird output,

In such a case, it is useful to add a sentence like this:

  With my patch, it no longer crashes for me, but i'm not completely
  sure whether it is right because the output looks weird.  It prints
  far too many '0' characters (32 too many if i count correctly) -
  compare the size of the 0-padding printed to the desired size
  mentioned in the text "..." printed in the first half of the
  line before the base-2 number.

In general, if you have specific doubts, mentioning them specifically
is better than just asking "is it better?"

> but couldn't make sense of it as the
> upstream version crashes (abort trap) here.
> 
> Regarding my terrible C skills, I think I'll step back from trying to
> fix C code for a bit. I'm wasting peoples time here to educate me on
> topics I could as well learn elsewhere. (Except you tell me to continue
> submitting bad fixes and learn here...).

Well, your patch was still half-useful in so far as it correctly
spotted the one place in the code that needs to be fixed (i did
extensive grepping anyway to make sure, but it is reassuring that
we both came to the same conclusion).  There is a saying here "bad
patches trigger good ones".  Also, bug reports that contain *only*
a verbal description of the problem or *only* a patch often leave
me wondering what the sender is driving at.  When *both* a verbal
description and a patch are supplied together, even if the patch is bad,
it very rarely happens that the intention of the sender remains nebulous.

> I feel embarrassed, because I can't even explain why I used
> snprintf there.

I guess you looked at the rest of the code in the file, which uses lots
and lots of superfluous static buffers and lots of pointless snprintf(3).
Existence determines consciousness: what code one reads influences what
kind of code one writes (though not deterministically, and of course
it isn't the only factor).  So make sure you read sufficient amounts of
good code.

Either way, learning from feedback is better than feeling too embarassed.

> Your fix looks good to me. But my opinion doesn't matter at this point.

Well, a test report from a real-world user is always useful, can make
the difference between committing quickly and waiting for more feedback
from developers, and is usually acknowledged by something like

  OK some_developer@, and also tested by <...@...>

in the commit message.  So thanks for testing and reviewing.

Yours,
  Ingo



Re: misc/uniutils fix %n

2021-09-14 Thread Stefan Hagen
Ingo Schwarze wrote:
> Hi,
> 
> Stefan Hagen wrote on Sat, Sep 11, 2021 at 09:00:18AM +0200:
> 
> > Is this better?
> 
> No, it is no good whatsoever.
> 
>  * It introduces yet another pointless static buffer.
>  * It pointlessly uses snprintf(3) where printf(3)
>is obviously sufficient.
>  * It introduces error checking for operations
>that obviously cannot fail when done properly.
>  * It needlessly does pointer arithmetic.
> 
> So far, so bad, but the worst part is that it apparently wasn't
> tested on a 64-bit platform (a platform where sizeof(long) == 64,
> for example amd64).  On such a platform, there is another bug on
> the very same line.  The function binfmtl() has platform-dependent
> behaviour and does not behave as intended on a 64-bit platform and
> prints 32 spurious zero-digit characters, breaking the desired
> output alignment and making the content of the output
> self-contradictory on top of that.
> 
> Rather than try to change binfmtl(), which is mostly pointless
> in the first place, let's just trivially inline the desired
> functionality, it's just two lines of very simple C code.

I really appreciate you (and Theo) taking the time to explain these
things. I noticed the weird output, but couldn't make sense of it as the
upstream version crashes (abort trap) here.

Regarding my terrible C skills, I think I'll step back from trying to
fix C code for a bit. I'm wasting peoples time here to educate me on
topics I could as well learn elsewhere. (Except you tell me to continue
submitting bad fixes and learn here...).

I feel embarrassed, because I can't even explain why I used snprintf 
there.

Your fix looks good to me. But my opinion doesn't matter at this point.

Thanks,
Stefan



Re: misc/uniutils fix %n

2021-09-14 Thread Ingo Schwarze
Hi,

Stefan Hagen wrote on Sat, Sep 11, 2021 at 09:00:18AM +0200:

> Is this better?

No, it is no good whatsoever.

 * It introduces yet another pointless static buffer.
 * It pointlessly uses snprintf(3) where printf(3)
   is obviously sufficient.
 * It introduces error checking for operations
   that obviously cannot fail when done properly.
 * It needlessly does pointer arithmetic.

So far, so bad, but the worst part is that it apparently wasn't
tested on a 64-bit platform (a platform where sizeof(long) == 64,
for example amd64).  On such a platform, there is another bug on
the very same line.  The function binfmtl() has platform-dependent
behaviour and does not behave as intended on a 64-bit platform and
prints 32 spurious zero-digit characters, breaking the desired
output alignment and making the content of the output
self-contradictory on top of that.

Rather than try to change binfmtl(), which is mostly pointless
in the first place, let's just trivially inline the desired
functionality, it's just two lines of very simple C code.

OK?
  Ingo

P.S.
Usually, we would want to upstream such a bugfix.  But the overall
quality of this code is so abysmal that i'm not sure upstreaming
anything is worth the hassle.  Doing minimal cleanup of the most
glaring style issues in this code would require changing 99% of it,
and given how it is written throughout, my suspicion is that it is
likely full of bugs, and having two bugs on this one line may not
be an exception.  Just look at the totally ridiculous tempstr
gymnastics right below, which is incredibly stupid.  Or look just
a few lines above and admire the hand-rolled code handling six-byte
UTF-8 characters (which are invalid according to the standard).
The code is just very, very bad throughout in large numbers of
respects.

I'm ashamed to admit that i have occasionally been using this port
myself, and after looking at the code, i'm no longer convinced i
should do that.  Not sure yet what to do about that.  Cleaning this
up looks like a project of significant size, and quite a painful one.

I just looked up the author of this code:
  https://en.wikipedia.org/wiki/William_Poser
  https://billposer.org/
It looks like he is probably an eminent linguist, but he is clearly
not a software developer by any stretch of the imagination.


Index: Makefile
===
RCS file: /cvs/ports/misc/uniutils/Makefile,v
retrieving revision 1.9
diff -u -p -r1.9 Makefile
--- Makefile28 Jun 2021 21:34:19 -  1.9
+++ Makefile14 Sep 2021 15:52:32 -
@@ -3,7 +3,7 @@
 COMMENT=   Unicode utilities
 
 DISTNAME=  uniutils-2.27
-REVISION=  3
+REVISION=  4
 CATEGORIES=misc
 
 HOMEPAGE=  http://billposer.org/Software/unidesc.html
Index: patches/patch-ExplicateUTF8_c
===
RCS file: patches/patch-ExplicateUTF8_c
diff -N patches/patch-ExplicateUTF8_c
--- /dev/null   1 Jan 1970 00:00:00 -
+++ patches/patch-ExplicateUTF8_c   14 Sep 2021 15:52:32 -
@@ -0,0 +1,30 @@
+$OpenBSD$
+
+Remove %n format specifier
+and stop using the platform-dependent function binfmtl().
+We want 32 bits here, not whatever sizeof(long) might be.
+
+Index: ExplicateUTF8.c
+--- ExplicateUTF8.c.orig
 ExplicateUTF8.c
+@@ -87,7 +87,7 @@ main(int ac, char **av){
+   int UsefulBits;
+   unsigned char c[6];
+   int i;
+-  UTF32 ch;
++  UTF32 ch, mask;
+   unsigned char *cptr;
+   unsigned char ShiftedByte;
+   char tempstr[33];
+@@ -214,7 +214,10 @@ main(int ac, char **av){
+ printf("%s ",tempstr); 
+   }
+   printf("\n");
+-  printf("This is padded to 32 places with %d zeros: 
%n%s\n",(32-GotBits),,binfmtl(ch));
++  spaces = printf("This is padded to 32 places with %d zeros: ",(32-GotBits));
++  for (mask = 1UL << 31; mask; mask >>= 1)
++putchar(ch & mask ? '1' : '0');
++  putchar('\n');
+   sprintf(tempstr,"");
+   sprintf(tempstr,"%08lX",ch);
+   tempstr[28] = tempstr[7];



Re: misc/uniutils fix %n

2021-09-11 Thread Stefan Hagen
Theo de Raadt wrote:
> I do not like your style of adjusting the counter with -=
> 
> the character counting approach looks fragile.
> 
> I suspect some of these refactorings are better if MULTIPLE *printf
> calls are used, instead of one call.
> 
> Cut the call into two at the %n

Ok, same approach as with the brltty port. It looks better indeed and 
avoids forgetting \0.

Is this better?

Index: misc/uniutils/Makefile
===
RCS file: /cvs/ports/misc/uniutils/Makefile,v
retrieving revision 1.9
diff -u -p -u -p -r1.9 Makefile
--- misc/uniutils/Makefile  28 Jun 2021 21:34:19 -  1.9
+++ misc/uniutils/Makefile  11 Sep 2021 06:58:02 -
@@ -3,7 +3,7 @@
 COMMENT=   Unicode utilities
 
 DISTNAME=  uniutils-2.27
-REVISION=  3
+REVISION=  4
 CATEGORIES=misc
 
 HOMEPAGE=  http://billposer.org/Software/unidesc.html
Index: misc/uniutils/patches/patch-ExplicateUTF8_c
===
RCS file: misc/uniutils/patches/patch-ExplicateUTF8_c
diff -N misc/uniutils/patches/patch-ExplicateUTF8_c
--- /dev/null   1 Jan 1970 00:00:00 -
+++ misc/uniutils/patches/patch-ExplicateUTF8_c 11 Sep 2021 06:58:02 -
@@ -0,0 +1,21 @@
+$OpenBSD$
+
+Remove %n format-specifier from snprintf
+
+Index: ExplicateUTF8.c
+--- ExplicateUTF8.c.orig
 ExplicateUTF8.c
+@@ -214,7 +214,12 @@ main(int ac, char **av){
+ printf("%s ",tempstr); 
+   }
+   printf("\n");
+-  printf("This is padded to 32 places with %d zeros: 
%n%s\n",(32-GotBits),,binfmtl(ch));
++  char buf[128];
++  int buf_size = sizeof(buf);
++  spaces = snprintf(buf,buf_size,"This is padded to 32 places with %d zeros: 
",(32-GotBits));
++  if(spaces>0)
++snprintf(buf + spaces,buf_size - spaces,"%s",binfmtl(ch));
++  printf("%s\n",buf);
+   sprintf(tempstr,"");
+   sprintf(tempstr,"%08lX",ch);
+   tempstr[28] = tempstr[7];



Re: misc/uniutils fix %n

2021-09-10 Thread Theo de Raadt
I do not like your style of adjusting the counter with -=

the character counting approach looks fragile.

I suspect some of these refactorings are better if MULTIPLE *printf calls
are used, instead of one call.

Cut the call into two at the %n

Stefan Hagen  wrote:

> Hi,
> 
> Another %n fix.
> 
> There's actually a change in behavior, which I can't explain:
> 
> BEFORE:
> $ ExplicateUTF8 CanterburyPieces.txt
> The sequence 0xEF 0xBB 0xBF
>  1110 10111011 1011
> is a valid UTF-8 character encoding equivalent to UTF32 0xFEFF.
> The first byte tells us that there should be 2
> continuation bytes since it begins with 3 contiguous 1s.
> There are 2 following bytes and all are valid
> continuation bytes since they all have high bits 10.
> The first byte contributes its low 4 bits.
> The remaining bytes each contribute their low 6 bits,
> for a total of 16 bits:  111011 11
> Abort trap
> 
> AFTER:
> $ ExplicateUTF8 CanterburyPieces.txt
> The sequence 0xEF 0xBB 0xBF
>  1110 10111011 1011
> is a valid UTF-8 character encoding equivalent to UTF32 0xFEFF.
> The first byte tells us that there should be 2
> continuation bytes since it begins with 3 contiguous 1s.
> There are 2 following bytes and all are valid
> continuation bytes since they all have high bits 10.
> The first byte contributes its low 4 bits.
> The remaining bytes each contribute their low 6 bits,
> for a total of 16 bits:  111011 11
> This is padded to 32 places with 16 zeros: 
> 1110
> 0   0   0   0   F   E   F   F
> 
> 
> If anyone wants to try it, my test file is here:
> https://codevoid.de/0/p/CanterburyPieces.txt
> 
> OK?
> 
> Best regards,
> Stefan
> 
> Index: misc/uniutils/Makefile
> ===
> RCS file: /cvs/ports/misc/uniutils/Makefile,v
> retrieving revision 1.9
> diff -u -p -u -p -r1.9 Makefile
> --- misc/uniutils/Makefile28 Jun 2021 21:34:19 -  1.9
> +++ misc/uniutils/Makefile10 Sep 2021 21:09:48 -
> @@ -3,7 +3,7 @@
>  COMMENT= Unicode utilities
>  
>  DISTNAME=uniutils-2.27
> -REVISION=3
> +REVISION=4
>  CATEGORIES=  misc
>  
>  HOMEPAGE=http://billposer.org/Software/unidesc.html
> Index: misc/uniutils/patches/patch-ExplicateUTF8_c
> ===
> RCS file: misc/uniutils/patches/patch-ExplicateUTF8_c
> diff -N misc/uniutils/patches/patch-ExplicateUTF8_c
> --- /dev/null 1 Jan 1970 00:00:00 -
> +++ misc/uniutils/patches/patch-ExplicateUTF8_c   10 Sep 2021 21:09:48 
> -
> @@ -0,0 +1,17 @@
> +$OpenBSD$
> +
> +Remove %n format specifier
> +
> +Index: ExplicateUTF8.c
> +--- ExplicateUTF8.c.orig
>  ExplicateUTF8.c
> +@@ -214,7 +214,8 @@ main(int ac, char **av){
> + printf("%s ",tempstr); 
> +   }
> +   printf("\n");
> +-  printf("This is padded to 32 places with %d zeros: 
> %n%s\n",(32-GotBits),,binfmtl(ch));
> ++  spaces = printf("This is padded to 32 places with %d zeros: 
> %s\n",(32-GotBits),binfmtl(ch));
> ++  spaces -= strlen(binfmtl(ch));
> +   sprintf(tempstr,"");
> +   sprintf(tempstr,"%08lX",ch);
> +   tempstr[28] = tempstr[7];
> 



misc/uniutils fix %n

2021-09-10 Thread Stefan Hagen
Hi,

Another %n fix.

There's actually a change in behavior, which I can't explain:

BEFORE:
$ ExplicateUTF8 CanterburyPieces.txt
The sequence 0xEF 0xBB 0xBF
 1110 10111011 1011
is a valid UTF-8 character encoding equivalent to UTF32 0xFEFF.
The first byte tells us that there should be 2
continuation bytes since it begins with 3 contiguous 1s.
There are 2 following bytes and all are valid
continuation bytes since they all have high bits 10.
The first byte contributes its low 4 bits.
The remaining bytes each contribute their low 6 bits,
for a total of 16 bits:  111011 11
Abort trap

AFTER:
$ ExplicateUTF8 CanterburyPieces.txt
The sequence 0xEF 0xBB 0xBF
 1110 10111011 1011
is a valid UTF-8 character encoding equivalent to UTF32 0xFEFF.
The first byte tells us that there should be 2
continuation bytes since it begins with 3 contiguous 1s.
There are 2 following bytes and all are valid
continuation bytes since they all have high bits 10.
The first byte contributes its low 4 bits.
The remaining bytes each contribute their low 6 bits,
for a total of 16 bits:  111011 11
This is padded to 32 places with 16 zeros: 
1110
0   0   0   0   F   E   F   F


If anyone wants to try it, my test file is here:
https://codevoid.de/0/p/CanterburyPieces.txt

OK?

Best regards,
Stefan

Index: misc/uniutils/Makefile
===
RCS file: /cvs/ports/misc/uniutils/Makefile,v
retrieving revision 1.9
diff -u -p -u -p -r1.9 Makefile
--- misc/uniutils/Makefile  28 Jun 2021 21:34:19 -  1.9
+++ misc/uniutils/Makefile  10 Sep 2021 21:09:48 -
@@ -3,7 +3,7 @@
 COMMENT=   Unicode utilities
 
 DISTNAME=  uniutils-2.27
-REVISION=  3
+REVISION=  4
 CATEGORIES=misc
 
 HOMEPAGE=  http://billposer.org/Software/unidesc.html
Index: misc/uniutils/patches/patch-ExplicateUTF8_c
===
RCS file: misc/uniutils/patches/patch-ExplicateUTF8_c
diff -N misc/uniutils/patches/patch-ExplicateUTF8_c
--- /dev/null   1 Jan 1970 00:00:00 -
+++ misc/uniutils/patches/patch-ExplicateUTF8_c 10 Sep 2021 21:09:48 -
@@ -0,0 +1,17 @@
+$OpenBSD$
+
+Remove %n format specifier
+
+Index: ExplicateUTF8.c
+--- ExplicateUTF8.c.orig
 ExplicateUTF8.c
+@@ -214,7 +214,8 @@ main(int ac, char **av){
+ printf("%s ",tempstr); 
+   }
+   printf("\n");
+-  printf("This is padded to 32 places with %d zeros: 
%n%s\n",(32-GotBits),,binfmtl(ch));
++  spaces = printf("This is padded to 32 places with %d zeros: 
%s\n",(32-GotBits),binfmtl(ch));
++  spaces -= strlen(binfmtl(ch));
+   sprintf(tempstr,"");
+   sprintf(tempstr,"%08lX",ch);
+   tempstr[28] = tempstr[7];