Re: misc/uniutils fix %n
Looks good to me, ok nicm I would probably have used snprintf first myself, to be fair. The tempstr stuff is hilarious though... On Tue, 14 Sept 2021 at 17:42, Ingo Schwarze wrote: > Hi, > > Stefan Hagen wrote on Sat, Sep 11, 2021 at 09:00:18AM +0200: > > > Is this better? > > No, it is no good whatsoever. > > * It introduces yet another pointless static buffer. > * It pointlessly uses snprintf(3) where printf(3) >is obviously sufficient. > * It introduces error checking for operations >that obviously cannot fail when done properly. > * It needlessly does pointer arithmetic. > > So far, so bad, but the worst part is that it apparently wasn't > tested on a 64-bit platform (a platform where sizeof(long) == 64, > for example amd64). On such a platform, there is another bug on > the very same line. The function binfmtl() has platform-dependent > behaviour and does not behave as intended on a 64-bit platform and > prints 32 spurious zero-digit characters, breaking the desired > output alignment and making the content of the output > self-contradictory on top of that. > > Rather than try to change binfmtl(), which is mostly pointless > in the first place, let's just trivially inline the desired > functionality, it's just two lines of very simple C code. > > OK? > Ingo > > P.S. > Usually, we would want to upstream such a bugfix. But the overall > quality of this code is so abysmal that i'm not sure upstreaming > anything is worth the hassle. Doing minimal cleanup of the most > glaring style issues in this code would require changing 99% of it, > and given how it is written throughout, my suspicion is that it is > likely full of bugs, and having two bugs on this one line may not > be an exception. Just look at the totally ridiculous tempstr > gymnastics right below, which is incredibly stupid. Or look just > a few lines above and admire the hand-rolled code handling six-byte > UTF-8 characters (which are invalid according to the standard). > The code is just very, very bad throughout in large numbers of > respects. > > I'm ashamed to admit that i have occasionally been using this port > myself, and after looking at the code, i'm no longer convinced i > should do that. Not sure yet what to do about that. Cleaning this > up looks like a project of significant size, and quite a painful one. > > I just looked up the author of this code: > https://en.wikipedia.org/wiki/William_Poser > https://billposer.org/ > It looks like he is probably an eminent linguist, but he is clearly > not a software developer by any stretch of the imagination. > > > Index: Makefile > === > RCS file: /cvs/ports/misc/uniutils/Makefile,v > retrieving revision 1.9 > diff -u -p -r1.9 Makefile > --- Makefile28 Jun 2021 21:34:19 - 1.9 > +++ Makefile14 Sep 2021 15:52:32 - > @@ -3,7 +3,7 @@ > COMMENT= Unicode utilities > > DISTNAME= uniutils-2.27 > -REVISION= 3 > +REVISION= 4 > CATEGORIES=misc > > HOMEPAGE= http://billposer.org/Software/unidesc.html > Index: patches/patch-ExplicateUTF8_c > === > RCS file: patches/patch-ExplicateUTF8_c > diff -N patches/patch-ExplicateUTF8_c > --- /dev/null 1 Jan 1970 00:00:00 - > +++ patches/patch-ExplicateUTF8_c 14 Sep 2021 15:52:32 - > @@ -0,0 +1,30 @@ > +$OpenBSD$ > + > +Remove %n format specifier > +and stop using the platform-dependent function binfmtl(). > +We want 32 bits here, not whatever sizeof(long) might be. > + > +Index: ExplicateUTF8.c > +--- ExplicateUTF8.c.orig > ExplicateUTF8.c > +@@ -87,7 +87,7 @@ main(int ac, char **av){ > + int UsefulBits; > + unsigned char c[6]; > + int i; > +- UTF32 ch; > ++ UTF32 ch, mask; > + unsigned char *cptr; > + unsigned char ShiftedByte; > + char tempstr[33]; > +@@ -214,7 +214,10 @@ main(int ac, char **av){ > + printf("%s ",tempstr); > + } > + printf("\n"); > +- printf("This is padded to 32 places with %d zeros: > %n%s\n",(32-GotBits),,binfmtl(ch)); > ++ spaces = printf("This is padded to 32 places with %d zeros: > ",(32-GotBits)); > ++ for (mask = 1UL << 31; mask; mask >>= 1) > ++putchar(ch & mask ? '1' : '0'); > ++ putchar('\n'); > + sprintf(tempstr,""); > + sprintf(tempstr,"%08lX",ch); > + tempstr[28] = tempstr[7]; >
Re: misc/uniutils fix %n
Hi Stefan, Stefan Hagen wrote on Tue, Sep 14, 2021 at 07:37:20PM +0200: > I really appreciate you (and Theo) taking the time to explain > these things. You are welcome, and thank you for trying to help fix stuff (i think i forgot saying that in my previous mail because i got a bit worked up about the horrible code in that file, which has nothing to do with you). > I noticed the weird output, In such a case, it is useful to add a sentence like this: With my patch, it no longer crashes for me, but i'm not completely sure whether it is right because the output looks weird. It prints far too many '0' characters (32 too many if i count correctly) - compare the size of the 0-padding printed to the desired size mentioned in the text "..." printed in the first half of the line before the base-2 number. In general, if you have specific doubts, mentioning them specifically is better than just asking "is it better?" > but couldn't make sense of it as the > upstream version crashes (abort trap) here. > > Regarding my terrible C skills, I think I'll step back from trying to > fix C code for a bit. I'm wasting peoples time here to educate me on > topics I could as well learn elsewhere. (Except you tell me to continue > submitting bad fixes and learn here...). Well, your patch was still half-useful in so far as it correctly spotted the one place in the code that needs to be fixed (i did extensive grepping anyway to make sure, but it is reassuring that we both came to the same conclusion). There is a saying here "bad patches trigger good ones". Also, bug reports that contain *only* a verbal description of the problem or *only* a patch often leave me wondering what the sender is driving at. When *both* a verbal description and a patch are supplied together, even if the patch is bad, it very rarely happens that the intention of the sender remains nebulous. > I feel embarrassed, because I can't even explain why I used > snprintf there. I guess you looked at the rest of the code in the file, which uses lots and lots of superfluous static buffers and lots of pointless snprintf(3). Existence determines consciousness: what code one reads influences what kind of code one writes (though not deterministically, and of course it isn't the only factor). So make sure you read sufficient amounts of good code. Either way, learning from feedback is better than feeling too embarassed. > Your fix looks good to me. But my opinion doesn't matter at this point. Well, a test report from a real-world user is always useful, can make the difference between committing quickly and waiting for more feedback from developers, and is usually acknowledged by something like OK some_developer@, and also tested by <...@...> in the commit message. So thanks for testing and reviewing. Yours, Ingo
Re: misc/uniutils fix %n
Ingo Schwarze wrote: > Hi, > > Stefan Hagen wrote on Sat, Sep 11, 2021 at 09:00:18AM +0200: > > > Is this better? > > No, it is no good whatsoever. > > * It introduces yet another pointless static buffer. > * It pointlessly uses snprintf(3) where printf(3) >is obviously sufficient. > * It introduces error checking for operations >that obviously cannot fail when done properly. > * It needlessly does pointer arithmetic. > > So far, so bad, but the worst part is that it apparently wasn't > tested on a 64-bit platform (a platform where sizeof(long) == 64, > for example amd64). On such a platform, there is another bug on > the very same line. The function binfmtl() has platform-dependent > behaviour and does not behave as intended on a 64-bit platform and > prints 32 spurious zero-digit characters, breaking the desired > output alignment and making the content of the output > self-contradictory on top of that. > > Rather than try to change binfmtl(), which is mostly pointless > in the first place, let's just trivially inline the desired > functionality, it's just two lines of very simple C code. I really appreciate you (and Theo) taking the time to explain these things. I noticed the weird output, but couldn't make sense of it as the upstream version crashes (abort trap) here. Regarding my terrible C skills, I think I'll step back from trying to fix C code for a bit. I'm wasting peoples time here to educate me on topics I could as well learn elsewhere. (Except you tell me to continue submitting bad fixes and learn here...). I feel embarrassed, because I can't even explain why I used snprintf there. Your fix looks good to me. But my opinion doesn't matter at this point. Thanks, Stefan
Re: misc/uniutils fix %n
Hi, Stefan Hagen wrote on Sat, Sep 11, 2021 at 09:00:18AM +0200: > Is this better? No, it is no good whatsoever. * It introduces yet another pointless static buffer. * It pointlessly uses snprintf(3) where printf(3) is obviously sufficient. * It introduces error checking for operations that obviously cannot fail when done properly. * It needlessly does pointer arithmetic. So far, so bad, but the worst part is that it apparently wasn't tested on a 64-bit platform (a platform where sizeof(long) == 64, for example amd64). On such a platform, there is another bug on the very same line. The function binfmtl() has platform-dependent behaviour and does not behave as intended on a 64-bit platform and prints 32 spurious zero-digit characters, breaking the desired output alignment and making the content of the output self-contradictory on top of that. Rather than try to change binfmtl(), which is mostly pointless in the first place, let's just trivially inline the desired functionality, it's just two lines of very simple C code. OK? Ingo P.S. Usually, we would want to upstream such a bugfix. But the overall quality of this code is so abysmal that i'm not sure upstreaming anything is worth the hassle. Doing minimal cleanup of the most glaring style issues in this code would require changing 99% of it, and given how it is written throughout, my suspicion is that it is likely full of bugs, and having two bugs on this one line may not be an exception. Just look at the totally ridiculous tempstr gymnastics right below, which is incredibly stupid. Or look just a few lines above and admire the hand-rolled code handling six-byte UTF-8 characters (which are invalid according to the standard). The code is just very, very bad throughout in large numbers of respects. I'm ashamed to admit that i have occasionally been using this port myself, and after looking at the code, i'm no longer convinced i should do that. Not sure yet what to do about that. Cleaning this up looks like a project of significant size, and quite a painful one. I just looked up the author of this code: https://en.wikipedia.org/wiki/William_Poser https://billposer.org/ It looks like he is probably an eminent linguist, but he is clearly not a software developer by any stretch of the imagination. Index: Makefile === RCS file: /cvs/ports/misc/uniutils/Makefile,v retrieving revision 1.9 diff -u -p -r1.9 Makefile --- Makefile28 Jun 2021 21:34:19 - 1.9 +++ Makefile14 Sep 2021 15:52:32 - @@ -3,7 +3,7 @@ COMMENT= Unicode utilities DISTNAME= uniutils-2.27 -REVISION= 3 +REVISION= 4 CATEGORIES=misc HOMEPAGE= http://billposer.org/Software/unidesc.html Index: patches/patch-ExplicateUTF8_c === RCS file: patches/patch-ExplicateUTF8_c diff -N patches/patch-ExplicateUTF8_c --- /dev/null 1 Jan 1970 00:00:00 - +++ patches/patch-ExplicateUTF8_c 14 Sep 2021 15:52:32 - @@ -0,0 +1,30 @@ +$OpenBSD$ + +Remove %n format specifier +and stop using the platform-dependent function binfmtl(). +We want 32 bits here, not whatever sizeof(long) might be. + +Index: ExplicateUTF8.c +--- ExplicateUTF8.c.orig ExplicateUTF8.c +@@ -87,7 +87,7 @@ main(int ac, char **av){ + int UsefulBits; + unsigned char c[6]; + int i; +- UTF32 ch; ++ UTF32 ch, mask; + unsigned char *cptr; + unsigned char ShiftedByte; + char tempstr[33]; +@@ -214,7 +214,10 @@ main(int ac, char **av){ + printf("%s ",tempstr); + } + printf("\n"); +- printf("This is padded to 32 places with %d zeros: %n%s\n",(32-GotBits),,binfmtl(ch)); ++ spaces = printf("This is padded to 32 places with %d zeros: ",(32-GotBits)); ++ for (mask = 1UL << 31; mask; mask >>= 1) ++putchar(ch & mask ? '1' : '0'); ++ putchar('\n'); + sprintf(tempstr,""); + sprintf(tempstr,"%08lX",ch); + tempstr[28] = tempstr[7];
Re: misc/uniutils fix %n
Theo de Raadt wrote: > I do not like your style of adjusting the counter with -= > > the character counting approach looks fragile. > > I suspect some of these refactorings are better if MULTIPLE *printf > calls are used, instead of one call. > > Cut the call into two at the %n Ok, same approach as with the brltty port. It looks better indeed and avoids forgetting \0. Is this better? Index: misc/uniutils/Makefile === RCS file: /cvs/ports/misc/uniutils/Makefile,v retrieving revision 1.9 diff -u -p -u -p -r1.9 Makefile --- misc/uniutils/Makefile 28 Jun 2021 21:34:19 - 1.9 +++ misc/uniutils/Makefile 11 Sep 2021 06:58:02 - @@ -3,7 +3,7 @@ COMMENT= Unicode utilities DISTNAME= uniutils-2.27 -REVISION= 3 +REVISION= 4 CATEGORIES=misc HOMEPAGE= http://billposer.org/Software/unidesc.html Index: misc/uniutils/patches/patch-ExplicateUTF8_c === RCS file: misc/uniutils/patches/patch-ExplicateUTF8_c diff -N misc/uniutils/patches/patch-ExplicateUTF8_c --- /dev/null 1 Jan 1970 00:00:00 - +++ misc/uniutils/patches/patch-ExplicateUTF8_c 11 Sep 2021 06:58:02 - @@ -0,0 +1,21 @@ +$OpenBSD$ + +Remove %n format-specifier from snprintf + +Index: ExplicateUTF8.c +--- ExplicateUTF8.c.orig ExplicateUTF8.c +@@ -214,7 +214,12 @@ main(int ac, char **av){ + printf("%s ",tempstr); + } + printf("\n"); +- printf("This is padded to 32 places with %d zeros: %n%s\n",(32-GotBits),,binfmtl(ch)); ++ char buf[128]; ++ int buf_size = sizeof(buf); ++ spaces = snprintf(buf,buf_size,"This is padded to 32 places with %d zeros: ",(32-GotBits)); ++ if(spaces>0) ++snprintf(buf + spaces,buf_size - spaces,"%s",binfmtl(ch)); ++ printf("%s\n",buf); + sprintf(tempstr,""); + sprintf(tempstr,"%08lX",ch); + tempstr[28] = tempstr[7];
Re: misc/uniutils fix %n
I do not like your style of adjusting the counter with -= the character counting approach looks fragile. I suspect some of these refactorings are better if MULTIPLE *printf calls are used, instead of one call. Cut the call into two at the %n Stefan Hagen wrote: > Hi, > > Another %n fix. > > There's actually a change in behavior, which I can't explain: > > BEFORE: > $ ExplicateUTF8 CanterburyPieces.txt > The sequence 0xEF 0xBB 0xBF > 1110 10111011 1011 > is a valid UTF-8 character encoding equivalent to UTF32 0xFEFF. > The first byte tells us that there should be 2 > continuation bytes since it begins with 3 contiguous 1s. > There are 2 following bytes and all are valid > continuation bytes since they all have high bits 10. > The first byte contributes its low 4 bits. > The remaining bytes each contribute their low 6 bits, > for a total of 16 bits: 111011 11 > Abort trap > > AFTER: > $ ExplicateUTF8 CanterburyPieces.txt > The sequence 0xEF 0xBB 0xBF > 1110 10111011 1011 > is a valid UTF-8 character encoding equivalent to UTF32 0xFEFF. > The first byte tells us that there should be 2 > continuation bytes since it begins with 3 contiguous 1s. > There are 2 following bytes and all are valid > continuation bytes since they all have high bits 10. > The first byte contributes its low 4 bits. > The remaining bytes each contribute their low 6 bits, > for a total of 16 bits: 111011 11 > This is padded to 32 places with 16 zeros: > 1110 > 0 0 0 0 F E F F > > > If anyone wants to try it, my test file is here: > https://codevoid.de/0/p/CanterburyPieces.txt > > OK? > > Best regards, > Stefan > > Index: misc/uniutils/Makefile > === > RCS file: /cvs/ports/misc/uniutils/Makefile,v > retrieving revision 1.9 > diff -u -p -u -p -r1.9 Makefile > --- misc/uniutils/Makefile28 Jun 2021 21:34:19 - 1.9 > +++ misc/uniutils/Makefile10 Sep 2021 21:09:48 - > @@ -3,7 +3,7 @@ > COMMENT= Unicode utilities > > DISTNAME=uniutils-2.27 > -REVISION=3 > +REVISION=4 > CATEGORIES= misc > > HOMEPAGE=http://billposer.org/Software/unidesc.html > Index: misc/uniutils/patches/patch-ExplicateUTF8_c > === > RCS file: misc/uniutils/patches/patch-ExplicateUTF8_c > diff -N misc/uniutils/patches/patch-ExplicateUTF8_c > --- /dev/null 1 Jan 1970 00:00:00 - > +++ misc/uniutils/patches/patch-ExplicateUTF8_c 10 Sep 2021 21:09:48 > - > @@ -0,0 +1,17 @@ > +$OpenBSD$ > + > +Remove %n format specifier > + > +Index: ExplicateUTF8.c > +--- ExplicateUTF8.c.orig > ExplicateUTF8.c > +@@ -214,7 +214,8 @@ main(int ac, char **av){ > + printf("%s ",tempstr); > + } > + printf("\n"); > +- printf("This is padded to 32 places with %d zeros: > %n%s\n",(32-GotBits),,binfmtl(ch)); > ++ spaces = printf("This is padded to 32 places with %d zeros: > %s\n",(32-GotBits),binfmtl(ch)); > ++ spaces -= strlen(binfmtl(ch)); > + sprintf(tempstr,""); > + sprintf(tempstr,"%08lX",ch); > + tempstr[28] = tempstr[7]; >
misc/uniutils fix %n
Hi, Another %n fix. There's actually a change in behavior, which I can't explain: BEFORE: $ ExplicateUTF8 CanterburyPieces.txt The sequence 0xEF 0xBB 0xBF 1110 10111011 1011 is a valid UTF-8 character encoding equivalent to UTF32 0xFEFF. The first byte tells us that there should be 2 continuation bytes since it begins with 3 contiguous 1s. There are 2 following bytes and all are valid continuation bytes since they all have high bits 10. The first byte contributes its low 4 bits. The remaining bytes each contribute their low 6 bits, for a total of 16 bits: 111011 11 Abort trap AFTER: $ ExplicateUTF8 CanterburyPieces.txt The sequence 0xEF 0xBB 0xBF 1110 10111011 1011 is a valid UTF-8 character encoding equivalent to UTF32 0xFEFF. The first byte tells us that there should be 2 continuation bytes since it begins with 3 contiguous 1s. There are 2 following bytes and all are valid continuation bytes since they all have high bits 10. The first byte contributes its low 4 bits. The remaining bytes each contribute their low 6 bits, for a total of 16 bits: 111011 11 This is padded to 32 places with 16 zeros: 1110 0 0 0 0 F E F F If anyone wants to try it, my test file is here: https://codevoid.de/0/p/CanterburyPieces.txt OK? Best regards, Stefan Index: misc/uniutils/Makefile === RCS file: /cvs/ports/misc/uniutils/Makefile,v retrieving revision 1.9 diff -u -p -u -p -r1.9 Makefile --- misc/uniutils/Makefile 28 Jun 2021 21:34:19 - 1.9 +++ misc/uniutils/Makefile 10 Sep 2021 21:09:48 - @@ -3,7 +3,7 @@ COMMENT= Unicode utilities DISTNAME= uniutils-2.27 -REVISION= 3 +REVISION= 4 CATEGORIES=misc HOMEPAGE= http://billposer.org/Software/unidesc.html Index: misc/uniutils/patches/patch-ExplicateUTF8_c === RCS file: misc/uniutils/patches/patch-ExplicateUTF8_c diff -N misc/uniutils/patches/patch-ExplicateUTF8_c --- /dev/null 1 Jan 1970 00:00:00 - +++ misc/uniutils/patches/patch-ExplicateUTF8_c 10 Sep 2021 21:09:48 - @@ -0,0 +1,17 @@ +$OpenBSD$ + +Remove %n format specifier + +Index: ExplicateUTF8.c +--- ExplicateUTF8.c.orig ExplicateUTF8.c +@@ -214,7 +214,8 @@ main(int ac, char **av){ + printf("%s ",tempstr); + } + printf("\n"); +- printf("This is padded to 32 places with %d zeros: %n%s\n",(32-GotBits),,binfmtl(ch)); ++ spaces = printf("This is padded to 32 places with %d zeros: %s\n",(32-GotBits),binfmtl(ch)); ++ spaces -= strlen(binfmtl(ch)); + sprintf(tempstr,""); + sprintf(tempstr,"%08lX",ch); + tempstr[28] = tempstr[7];