>>> So if you have a pending todo item that you think I should be working >>> on, giving me a poke to remind me where I was wouldn't go amiss. >> >> I think we have a few open threads: >> >> (1) expr has memory safety issues on master. I posted a message >> demonstrating this with the ASAN patches: >> >> http://lists.landley.net/pipermail/toybox-landley.net/2016-March/004884.html > > Hmmm... > > # Case that triggers ASAN error > testing "regex" "expr 3 : '\(.\)' = 4 : '\(.\)'" "0\n" "" "" > > Ran it and it works fine for me, how do I enable the address > sanitization you're using to see the complaint? (It's giving me three > different stack dumps and referring to threads in a non-threaded > program, so my approach to that would just be to fling printf()s around > until I tracked down the line of code it was complaining about and then > walk through that...) > > Is this an external program ala lint? An llvm mode? Something valgrind > does? Some new gcc feature added in 5.x?
It's runtime instrumention of the toybox binary when you pass -fsanitize=address to Clang (gcc has it too, but I think the original is Clang and hence that's the one I use). This is the toybox_asan binary I sent patches to build. It's literally a different binary because every time your code uses memory, it inserts instructions to use "shadow memory". The shadow memory enables a bounds checking algorithm. Think of it like code coverage -- that is also runtime instrumentation. The compiler labels all the code blocks and literally inserts instructions to increment counters for the particular code block being hit (roughly). You can try it like this: git clone [email protected]:andychu/toybox.git git checkout dev/fix-tests ./test.sh # This shows help # You probably need to download an up to date version of Clang here. (You can rm it when you're done) http://llvm.org/releases/download.html#3.8.0 export CLANG_DIR=~/install/clang+llvm-3.8.0-x86_64-linux-gnu-ubuntu-14.04 ./test.sh single -asan expr # This builds the toybox_asan binary, and runs the tests, which shows the failure given in the message. I just tried it again and it works. If you have trouble with this, start a new thread and I'll try to respond quickly. I fixed a lot of the build stuff so you don't have to "make clean" all the time... >> A lot of this has fallen out of my cache, > > Mine too, 's why I asked. :) > > I didn't write this command, so I don't have as good a mental model of > it as I should. I'm chipping away to try to GET a good mental model, > without which I can't properly maintain it... As far as memory allocation, I think the thing to know is that it allocates memory in two places: 1) When you're matching a regex, e.g. foo : \(.*\) . The result of the capture is a newly allocated string. 2) When the type coercion rules change integers to strings, e.g. 3 + 2 = foo -- the result of 3+2 was an integer, but it has to be converted to string to convert to foo. So that is an allocation too. I mentioned in a previous message that I considered the possibility of freeing the strings as soon as possible. I don't believe that is a good strategy, because it will be intertwined with the evaluation logic, and thus long and fragile. If you think of the operations like ret = lhs OP rhs, then the code is written so that "lhs" is destructively overwritten by "ret" (which means we don't have to care about allocations of 'struct value', which is good). So you CANNOT free the string where you're doing it because it can be used by subsequent operations, and whether it does is conditional on the operation. I think the only way it's realistic is if you put an extra flag in "struct value" indicating whether the string needs to be freed, or if someone else owns it, like the **argv array. Alternatively, you could ALWAYS allocate and copy no matter what, which means you can free with less logic. But obviously that is a pessimization against the common case. Anyway, my suspicion is that if you manage to write the correct code to free as soon as necessary, you'll probably go back to what I have, because it will be 10x shorter. And honestly it will not make any difference at runtime. If anything, I would just put a "TODO: free as soon as possible" and there will be 1000 other things on your TODO list more important that, and literally nobody will ever remind us of it in the next 10 years... > The shell has $(( blah )) and I'd like to extend this to be able to > handle that, and/or factor out common code to handle both. So I care > about more than just getting one use case right. So I've thought about $(( )), and I don't think any code from expr can be reused easily. What might make sense is the reverse -- implementing $(()) from scratch, and then subsuming expr with it (however, expr also has regex matching which $(()) doesn't have ). The problem is things like this: $ a=3; echo $(( $(seq $((3-2))) + 2 +${a:0:1} )) 6 This adds 1 + 2 + 3 using a command substitution and a variable substitution. The command substitution has another arithmetic substitution in it. If you think about how to implement that with the shell, the reuse with expr looks doubtful. > (I tend to apply fixes to pending verbatim on the theory that it's > unlikely to make it worse. With expr I'm reviewing it to promote out of > pending next release, so I'm trying to take ownership of that code. When > I get around to doing that with diff.c, the _t suffixes are 100% going > away again at the very least, FYI. Since struct is its own namespace, > nothing would ever NOT have a _t in there so adding it is silly.) I think monotonic improvement is a good standard... there is stuff in there that needs *multiple* passes of fix up, especially the tests, so if you wait until it to be 100% fixed, it won't get better very quickly. And it is harder to review that way too. > Indeed. I have a todo item to go through the help text of each command > and make tests for every corner case of the usage documented in there. > > Then go through the posix specs for each posix command and make tests > for every relevant statement of _that_. Well that is what my big patch set of 5 was starting... the first step to adding new tests for every command is to actually make the existing tests pass reliably! >> I'm sure that building Aboriginal with toybox_asan, etc. would >> shake out tons of issues. > > Indeed. (Not necessarily ever externally visible on single-threaded > processes, but yay fixing stuff.) Those bugs aren't related to threading. There is a separate ThreadSanitizer that detects data races with the same shadow memory technique. But what I showed you were real bugs -- it says thread T0 because that's the main thread. If your program has no threads, it will just show thread T0 I guess. But the bugs have nothing to do with threads; it's whether you're using freed memory, uninitialized memory, etc. > I need to do a proper review pass on expr to promote it out of pending. > I just need a spare solid 8 hour block of time with no other demands on > it. :) OK I understand this viewpoint... but also I would say that it's probably done and we won't hear about it for 10 years :) It doesn't interact with the OS at all, so it's easy to get pretty much 100% test coverage. (And the test coverage patches I worked on would help with that :) ) Whatever you end up with as far as expr is fine with me -- just don't expect that I think your shell is gonna get done any time soon :) > C++ and Perl share the characteristic that the true horror of the > language doesn't crop up until the project's being maintained my > multiple people over a long period of time. As mentioned, it's in the Google C++ style... which you're guaranteed to see more of if you're interested in Ninja (Chrome and Android build system) and Kati (the make parser/converter Android uses). There's a strong family resemblance there in terms of code style. There are thousands of people maintaining a big codebase in this same style, so it demonstrably works. And a lot of people are choosing it for their open source projects, where they're not constrained by company guidelines. The language has less of an effect on maintainability than the architecture of the program. In looking through all the shell implementations, I could conclude that C is horrible for writing shells. I need to write a rant about the horrors I've seen in the last few weeks, but seriously look at the bona fide gotos, globals, macros, and long functions in all existing shells... But classes are really useful, and do change the architecutre of the program ... I'll publish the code and you can see. > Could, but why? (If you want to test it yourself that way, have fun? > Really making a Linux From Scratch chapter 5 with your shell instead of > bash and then building linux from scratch under the result should get > you about as good testing...) As far as I remember, Aboriginal was a lot more self-contained and faster to build than LFS. But yeah the basic idea is to test the shell by building packages. Although I wonder if autoconf purposely restricts the shell dialect so much that it's not a great test. >> Most implementations use >> gotos, global variables, and macros a lot -- and not in a good way. >> For example, bash's > > FSF code is never a good example of anything. I intentionally don't read > the FSF implementations, ever. Didn't when I was doing busybox, still > don't. I'll run them under strace, that's as far as I go. Yeah but as far as I'm concerned *none* of the shells are a good example of anything. I think bash might be the most modern one! :) Though I did notice that mksh is more principled for certain cases I looked at, but still dense to the point of being hard to maintain. I am continuing to look as I implement my shell. Andy _______________________________________________ Toybox mailing list [email protected] http://lists.landley.net/listinfo.cgi/toybox-landley.net
