On 03/11/2016 05:12 PM, Andy Chu wrote: > What is the best way to run the toybox tests? If I just run "make > test", I get a lot of failures, some of which are probably because I'm > not running as root, while some I don't understand, like: > > PASS: pgrep -o pattern > pgrep: bad -s '0' > ^ > FAIL: pgrep -s
Unfortunately, the test suite needs as much work as the command implementations do. :( Ok, backstory! Where I started toybox I had a big todo list, and was filling it in. Some got completed and some (like sh.c, mdev.c, or mke2fs.c) got partially finished and put on hold for a long time. Then I started getting contributions of new commands from other developers, some of which were easy to verify, polish up, and declare done, and some of which required extensive review (an in several cases an outright rewrite). I used to just merge stuff and track the state of it in a local text file, but that didn't scale, and I got overwhelmed. So I created toys/pending and moved all the unfinished command implementations there. (And a lib/pending.c for shared infrastructure used by toys/pending which needs its own review/cleanup pass.) After a while I wrote a page (http://landley.net/toybox/cleanup.html) explaining about the "pending" directory and the work I do to promote stuff _out_ of the pending directory, in hopes other people would be interested in doing some of the cleanup for me. But people kept asking how they could help other than implementing new commands that would go into the giant toys/pending pile, or doing cleanup, and the next logical thing for me was "test suite". So I suggested that. And got a lot of test suite entries full of tests that don't pass, tests that don't actually test anything interesting in toybox (some test the kernel, most don't test the interesting edge cases, none of them were written with a though reading of the relevant standards document and/or man page...) Really, I need a tests/pending. :( There's a missing layer of test suite infrastructure, which isn't just "this has to be tested as root" but "this has to be tested on a known system with a known environment". Preferably a synthetic one running under an emulator, which makes it a good fit for my aboriginal linux project with its build control images: http://landley.net/aboriginal/about.html http://landley.net/aboriginal/control-images Unfortunately, when I tried to do this, the first one I did was "ps" and making the process less ps -a sees reproducible is hard, because the kernel launches a bunch of kernel threads based on driver configuration and kernel version, so getting stable behavior out of that was enough of a head-scratcher it went back on the todo list. I should try again with "mount" or something... Anyway, I've done a few minor cleanup passes over the test suite, but an awful lot of it is still tests that fail because the test is wrong, or lack of test coverage. One example of a test I did some cleanup on was tests/chmod.test, a "git log" of that might be instructive? That said, the result isn't remotely _complete_. (Endless cut and paste of "u+r" makes this ls output that's not a loop, but no tests for the sticky bit? Nothing sets the excutable bit on a script and then tests we can run it? Removes exec permission from a directory and checks we can't ls it? Removes read permission from a file and checks we can't read it? No, all it tests is ls output over and over...) > I'm on an Ubuntu 14.04 machine, running against the master branch. I > didn't try running as root since it seems like there is a non-zero > chance that it will mess up my machine. Very much so! That's why I need to do an aboriginal linux test harness that boots under qemu and runs tests in a known chroot. > I saw in the ELC YouTube talk that test infrastructure is a TODO. > > http://landley.net/talks/celf-2015.txt > > Is this something I can help with? If you could just triage the test suite and tell me the status of the tests, that would be great. (I've been meaning to do that forever, but every time I try I get distracted by fixing up a specific test and the related command...) First pass, you could sort the tests into: 1) this command is hard to test due to butterfly effects (run it twice get different output, so even a known emulated environment won't help; top, ps, bootchartd, vmstat...) 2) This command could be produce reliable output under an emulated environment. This includes everything requiring root access. (Properly testing oneit probably requires containers _within_ an emulator, but let's burn that bridge when we come to it.) 3) This command can have a good test now. (Whether it _does_ is separate.) Then let's put #1 and #2 aside for the moment and concentrate on filling out #3. > I guess if you can tell me what > environment you use to get all tests to pass, it shouldn't be too hard > to make a shell script to create that environment, probably with > Aboriginal Linux. Unfortunately, there isn't one. The test suite's bit rotted ever since I started getting significant contributions to it without having a "pending" directory to separate curated from wild tests. :( > I have built Aboriginal Linux before (like a year ago). > > One of the reasons I ran into this was because I wanted to distill a > test corpus for fuzzing from the shell test cases. afl-fuzz has a > utility to minimize a test corpus based on code path coverage. So > getting a stable test environment seems like a prerequisite for that. Looking at the tests, I suspect my recent changes to the dirtree infrastructure broke "mv". (Something did, anyway...) There's also the issue that "make test_mv" and "make tests" actually test slightly different things. The first builds the command standalone, and not all commands build correctly standalone. (That might be why "make test_mv" didn't work, if it's not building standalone...) Sometimes the command needs fixing, sometimes the build infrastructure needs fixing, sometimes the test needs fixing... > FWIW, I had a different approach for fuzzing each arg: > > https://github.com/andychu/toybox/commit/ff937e97881bfdf4b1221618c38857b75c9534e0 > > This seems to be a little laborious, because I have to manually write > shell scripts to fuzz individual inputs (and I didn't find anything > beyond that one crash yet). I think the mass fuzzing thing might work > better, but I'm not sure. Building scripts to test each individual input is what the test suite is all about. Figuring out what those inputs should _be_ (and the results to expect) is, alas, work. There's also the fact that either the correct output or the input to use is non-obvious. It's really easy for me to test things like grep by going "grep -r xopen toys/pending". There's a lot of data for it to bite on, and I can test ubuntu's version vs mine trivially and see where they diverge. But putting that in the test suite, I need to come up with a set of test files (the source changes each commit, source changes shouldn't cause test case regressions). I've done a start of tests/files with some utf8 code in there, but it hasn't got nearly enough complexity yet, and there's "standard test load that doesn't change" vs "I thought of a new utf8 torture test and added it, but that broke the ls -lR test." Or with testing "top", the output is based on the current system load. Even in a controlled environment, it's butterfly effects all the way down. I can look at the source files under /proc I calculated the values from, but A) hugely complex, B) giant race condition, C) is implementing two parallel code paths that do the same thing a valid test? If I'm calculating the wrong value because I didn't understand what that field should mean, my test would also be wrong... In theory testing "ps" is easier, but in theory "ps" with no arguments is the same as "ps -o pid,tty,time,cmd". But if you run it twice, the pid of the "ps" binary changes, and the "TIME" of the shell might tick over to the next second. You can't "head -n 2" that it because it's sorted by pid, which wraps, so if your ps pid is lower than your bash pid it would come first. Oh, and there's no guarantee the shell you're running is "bash" unless you're in a controlled environment... That's just testing the output with no arguments.) > thanks, > Andy Rob _______________________________________________ Toybox mailing list [email protected] http://lists.landley.net/listinfo.cgi/toybox-landley.net
