On Mon, Feb 13, 2023 at 1:07 AM Rob Landley <r...@landley.net> wrote: > > > > On 2/9/23 19:25, enh wrote: > > On Thu, Feb 9, 2023 at 5:08 PM Rob Landley <r...@landley.net> wrote: > >> > >> On 2/9/23 07:01, Rob Landley wrote: > >> > On 2/9/23 03:51, Patrick Lauer wrote: > >> >> On 2/5/23 12:59, Rob Landley wrote: > >> >>> Doing my irregular trawl to see if distro repos have any interesting > >> >>> patches or > >> >>> bug reports that haven't made it upstream, and... at the risk of > >> >>> opening a can > >> >>> of worms: > >> >>> > >> >>> https://gitweb.gentoo.org/repo/gentoo.git/tree/sys-apps/toybox/toybox-0.8.8.ebuild#n52 > >> >>> > >> >>> You probably want "make tests" (plural), because "make test" builds > >> >>> the "test" > >> >>> command as a standalone executable. (Which should usually succeed?) > >> >> > >> >> Aye. That makes sense. Fixed. > >> >> > >> >> Now I'm reliably running into a test failure: > >> >> > >> >> FAIL: cut -C test1.txt > >> >> echo -ne '' | > >> >> "/var/tmp/portage/sys-apps/toybox-0.8.9/work/toybox-0.8.9/generated/testdir/cut" > >> >> -C -1 "$FILES/utf8/test1.txt" > >> >> --- expected 2023-02-09 09:49:21.525159648 -0000 > >> >> +++ actual 2023-02-09 09:49:21.525159648 -0000 > >> >> @@ -1 +1 @@ > >> >> -l̴̗̠ > >> >> +l > >> >> make: *** [Makefile:77: tests] Error 1 > >> >> > >> >> No idea yet what's triggering it, maybe you have some insight. > >> > > >> > Sigh, I hit something similar on bionic with the NDK build (because even > >> > a > >> > static build of bionic wanted to read files out of /System in order to > >> > tell me > >> > what is and isn't a combining character): > >> > > >> > http://lists.landley.net/pipermail/toybox-landley.net/2021-October/028766.html > >> > > >> > I do my own utf8 parsing, but _unicode_ is a bear to do yourself (just > >> > answering > >> > the question "is this a combining character" involves > >> > http://lists.landley.net/pipermail/toybox-landley.net/2021-October/028753.html > >> > and > >> > http://lists.landley.net/pipermail/toybox-landley.net/2021-October/028758.html > >> > and I decided it was just all out of scope), but the dance to get glibc > >> > to admit > >> > unicode exists is nontrivial. (And if the state isn't set, ze functions: > >> > zey do > >> > nothink.) > >> > > >> > Lemme see what I can do with livegui-amd64 under qemu to reproduce this > >> > here... > >> > >> Reproduced. Haven't really root caused, but I was reminded of: > >> > >> https://github.com/landley/toybox/issues/300 > >> > >> Which boils down to "the locale we're trying to use is not installed". > >> > >> Toybox is doing: > >> > >> setlocale(LC_CTYPE, ""); > >> if (strcmp("UTF-8", nl_langinfo(CODESET))) > >> uselocale(newlocale(LC_CTYPE_MASK, "en_US.UTF-8", NULL)); > >> > >> And it looks like gentoo has "C.utf8" instead (no dash), which... yeah, it > >> works > >> if I tell it to uselocale() that instead. I probably need multiple > >> fallbacks in > >> a loop. (Does it care about the dash? Is it case sensitive? How many > >> iterations > >> here...) > > Ok, "man 7 locale" says that C.UTF-8 should fall back to loading C.utf8... > > >> Oh goddess why is it doing uselocale(newlocale()), I think it was a macos > >> thing? > >> Yeah, git annotate says commit 4786fd610 which was Elliott. (Do you > >> remember why > >> it was doing that?) > > > > because there isn't a C.UTF-8 (no matter how you try to spell it!) on > > macOS, so we need to "merge" utf-8-ness into the current locale. (i'd > > That isn't what the man page for newlocale() says we're doing, though? > newlocale(FLAG, "NAME", 0) is creating a new locale that's a subset of the > "NAME" locale, and the 0 means locale elements we don't give a flag for are > copied from the "POSIX" locale. (Which should be a synonym for "C".)
i don't think so? that's not how i read https://pubs.opengroup.org/onlinepubs/9699919799/functions/newlocale.html anyway. i think it comes down to how you interpret "default locale"? i read it as equivalent to "", but you think it means "POSIX". i think POSIX agrees with me though? (search for "default locale" on https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html for their definition. this is why we call setlocale() in that code, fwiw.) > So if the "" locale was french or something, this will switch it to "C". (Did > I > mention I'm not a fan of the locale plumbing's design?) > > I _want_ to say we don't use any elements of locale other than the utf8 > character stuff (I hope WHICH locale it is doesn't affect toupper() and > tolower() but microsoft WAS on the unicode committee...)... but date > advertises > %x (which is handled for us by libc). > > I honestly don't remember what issue 52422388520e was fixing but I'm not gonna > argue locale issues with a guy with three umlauts in his name. But looking > back > at it, commit 67ddade3373d replaced all uses of mbrtowc() out of libc with > utf8towc() out of lib precisely so we WOULDN'T have to care about locale, so > why > are we still using libc's wcrtomb()? (Looking at the man page... what on earth > is "shift state"?) > > Sigh, the real problem is towupper() and towlower() which if I recall did not > work if a locale wasn't loaded first. I still kind of want to do an mbtoutf8() > to make it symmetrical. (I THINK I did this before, but alas Google search > continues to deteriorate: even though I got dreamhost to fix the robots.txt > file > on lists.landley.net back on Jan 23 Google STILL has the old one cached 3 > weeks > later.) > > Right, fix the thing in front of us for now... > > > argue _that's_ not the ugly part --- the ugly part is that we merge > > "en_US.UTF-8" in. but i thought i'd wait until someone was actually > > hurt by it before trying to construct the exact right locale for > > them.) > > The main page says we're merging it into the posix locale, not the current > locale. > > Is it too late to tell the gentoo guys to go back to running "make test"? > > Rob _______________________________________________ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net