macOS tests seem to be broken since this commit? FAIL: find strlower edge case echo -ne '' | touch aaaaaⱥⱥⱥⱥⱥⱥⱥⱥⱥ; find . -iname aaaaaȺȺȺȺȺȺȺȺȺ --- expected 2024-05-10 17:32:56.000000000 +0000 +++ actual 2024-05-10 17:32:56.000000000 +0000 @@ -1 +0,0 @@ -./aaaaaⱥⱥⱥⱥⱥⱥⱥⱥⱥ
On Wed, May 8, 2024 at 9:38 AM Rob Landley <[email protected]> wrote: > > On 5/6/24 17:12, Ray Gardner wrote: > > While working on an awk implementation for toybox, I found a bug in > > strlower(), which is used only in find.c. I've attached some tests to > > put in find.test to reveal it. I can't put them here directly because > > I don't think the UTF-8 names will come through. (I modelled my awk > > tolower()/toupper() code on your strlower().) > > Your test doesn't create the files you're finding, so find is supposed to > fail? > Your first test doesn't barf under ASAN, and then the second one's going to > fail > because echo -n | wc says it's 258 bytes and the VFS file length limit is 255 > bytes, so there CAN'T be a file named that on Linux. (Path length != path > component length, there's no slashes in there.) > > > The problem is in the test if the output string needs to be enlarged > > to take an expanded lowercase: > > // Case conversion can expand utf8 representation, but with extra mlen > > // space above we should basically never need to realloc > > if (mlen+4 > (len = new-try)) continue; > > > > The mlen+4 needs to be mlen-4 to leave at least 4 bytes for the next > > character. > > Hmmm, possibly. I still don't understand what your test case is testing. (Just > trying to trigger an ASAN violation with an otherwise nonsense test?) > > > As the comment indicates, it should "never" need to realloc; > > No, the first comment is "never" because triggering probably indicates a libc > bug (we converted it from valid utf8 to a unicode code point, ran it through > libc's towlower(), and are now trying to convert the result _back_ to utf8, an > encoding hiccup at this point seems unlikely? But I don't trust locale > plumbing > ever, so...) > > The second is "basically never" because it requires an insane input string, > but > that's user controlled and users do crazy things, sometimes even > intentionally. > > > it takes > > a very long name of uppercase characters that do expand when made > > lowercase. But the code is there to handle that very case. > > The first malloc rounds the allocation up to next 8 byte boundary _after_ what > it's actually using, so 9-16 bytes of zeroes at the end, and assuming the > conversion only ever grows 1 byte (I don't remember the pathological expansion > case, it's in my blog somewhere, but your test is turning c8 ba into e2 b1 a5 > which is 1 byte of expansion) then you need at least 8 expanding unicode code > points to burn through the padding, so your first test string is too short to > trigger a problem. And your second is too long to produce a valid filename, so > the test can't _succeed_... > > Sigh, lemme come up with a test that demonstrates the fix working... the > minimal > one seems to be ./find . -iname aaaaaȺȺȺȺȺȺȺȺȺ > > And then, of course, TEST_HOST fails because I need to enable a utf8 locale, > but > I made plumbing for that recent-ish-ly... > > commit 6800a95ef328 > > > BTW, when I run those tests, they "PASS", but show as aborted: > > corrupted size vs. prev_size > > scripts/runtest.sh: line 137: 265983 Aborted find . > > -iname > > AȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺȺC > > PASS: find utf8 uppercase long name > > Odd. > > > The test echos and checks the $? return code and the abort apparently > > leaves that as 0. > > That could be anything from a bash issue to your distro's libc. The only trap > in > tests.sh is for SIGINT, and that handler isn't inherited by child processes. > The > return code of a process killed by a signal should be 128+signum, which the > test > plumbing would notice if it was the actual exit code of your shell snippet. > > I checked in a test that should actually succeed, but would fail with ASAN > enabled before the bug was fixed. > > > Is there a way to fix the test system so it can > > force the exit code to be something else? > > Not if the signal/exit isn't allowed to propagate back to it by the test. You > ran a child process and then unconditionally did an ;echo $? meaning test.sh > doesn't get notified of the child process getting killed by a signal, it > unconditionally (because ;) went on to run a second command, "echo" which is > returning whatever your bash recorded. > > Some distros have horrible fault interceptors that log crap into syslog or > dmesg > or some such, AND THEN RETURN SUCCESS. (Which is doubly insane: A) a program > faulting does not need to be globally logged on a development system, B) > returning success when that happens is very sad, but their "logic" was that > some > scripts would otherwise misbehave.) > > > When I run the test from a > > command line directly in bash, it gets a code of 134 (SIGABRT). > > Without ASAN I'm getting 139 (128+11 = SIGSEGV). There would appear to be a > difference in our environments. > > Rob > _______________________________________________ > Toybox mailing list > [email protected] > http://lists.landley.net/listinfo.cgi/toybox-landley.net _______________________________________________ Toybox mailing list [email protected] http://lists.landley.net/listinfo.cgi/toybox-landley.net
