On 07/17/2016 02:08 PM, Andy Chu wrote: > On Sun, Jul 10, 2016 at 10:28 AM, Rob Landley <[email protected]> wrote: >> Awk's better than bc. > > That's interesting... I had no idea bc was a language with functions and > loops!
Neither did I until I tried to implement it. > https://github.com/jck/822kernel/blob/master/kernel/time/timeconst.bc > > This is the problem with DSLs... shell, make, awk, and presumably bc > all started out as very specific languages, for different purposes. > Over time, they all grew a C-like imperative language. And nobody > wants to remember 3 or 4 different syntaxes for that: > > f() { echo hi $1; } > f "bob" > > function f(name) { print "hi" name } > f("bob") > > define f > hi %1 > end > $(call f,"bob") > > (And repeat this mess for every other construct in a language...) > > It does seem that if you rule out Python/Perl, And ruby, php, java, javascript, tcl, lithp, go, swift, rust... A friend of mine was doing a job programming haskell a few years ago. I follow somebody on twitter maintaining a snobol compiler. Microsoft created C# because Java hadn't been invented there (and visual basic got taken out by internal politics during http://www.joelonsoftware.com/articles/APIWar.html), back under OS/2 there was something called Rexx, Apple had AppleScript... Awk is in posix and actually gets used. Heck, the linux kernel top level Makefile has: $(Q)$(AWK) '!x[$$0]++' $(vmlinux-dirs:%=$(objtree)/%/modules.order) > $(objtree)/modules.order $(Q)$(AWK) '!x[$$0]++' $^ > $(objtree)/modules.builtin And of course: $ find . -name "*.awk" ./tools/perf/util/intel-pt-decoder/gen-insn-attr-x86.awk ./tools/perf/arch/x86/tests/gen-insn-x86-dat.awk ./tools/objtool/arch/x86/insn/gen-insn-attr-x86.awk ./arch/x86/tools/gen-insn-attr-x86.awk ./arch/x86/tools/distill.awk ./arch/x86/tools/chkobjdump.awk ./Documentation/arm/Samsung/clksrc-change-registers.awk ./lib/raid6/unroll.awk ./net/wireless/genregdb.awk > awk is the winner with > respect to code generation, based on the fact that a lot of C code > uses it (many shells, Android, FreeBSD, etc.) And half of autoconf, apparently. > And I agree with the idea of minimizing build dependencies. > > However, I did a bunch of research and hacking on Kernighan's Awk. I > was trying to morph it into a "proper" modern language. Another one? why? Presumably you wouldn't remove anything significant from the base language, since that would break compatability with existing awk scripts, so your reaction to awk was "how could I fork this to make it bigger"? This feels like a variant of https://xkcd.com/927/ somehow. > For example, > you could imagine writing "ls" or "xargs" or even a shell in Awk, sort > of like the idea to write tools in Lua. awk can readlink()? (ls -l needs it.) The lua thing fell apart trying to write mount, ifconfig, netcat, losetup, nsenter, ionice, chroot, swapon, setsid, insmod, taskset, dmesg... The language just didn't have the bindings. (Then again java 1.1 didn't have any way to truncate a file until I reported the lack to a guy name Mark English and he added it to 1.2. Languages get usable when they get used. Most code has to be broken in.) > But then I ran into some big limitations, like you can't return > associative arrays from a function, or pass or return functions > to/from functions. Awk looks very similar to JavaScript -- C syntax > with associative arrays, but is semantically much more limited. There are an awful lot of scripting languages. > I lost interest in awk because of these limitations. awk is used, but > seems to be waning, and it's not really evolving. (But I haven't lost > interest in the shell.) Linus Torvalds recently said (https://lwn.net/Articles/687916/): Yeah, I know, I should have used 'awk' for this. Sue me. It's been too long since I did awk state machines. There's a reason there's a "git grep" but not a "git awk" command. One of the reasons some of these tools fall out of use is the documentation for them is terrible. (Gnu man pages often point to "info" pages nobody will EVER read and which sometimes _still_ aren't online.) I'm trying to write --help text for each command that's sufficient to learn to use the command just from that. It's not easy and I'm not happy with a lot of the results (terse vs complete vs easy to read, pick 2). Working on it... It would be nice if there were some youtube clips on "an introduction to sed", "an introduction to awk", and so on. I might wind up doing them someday if nobody else beats me to it. > I did however automate and slightly rewrite Kernighan's EXTENSIVE test > suite here, which is AFAICT is not in the other Git reconstructions: > > https://www.cs.princeton.edu/~bwk/btl.mirror/ > > https://github.com/onetrueawk/awk The git account isn't the test suite, it seems to be the same https://www.cs.princeton.edu/~bwk/btl.mirror/awk.tar.gz source file from the first page. Commit history is two commits, last touched 2012. Did you post the automated version anyway? > I think you mentioned you were looking for an awk test suite. Well > there it is -- there are hundreds or thousands of test cases, > including for the regex language. Which is provided by libc. Let's see... https://www.cs.princeton.edu/~bwk/btl.mirror/awktest.a is an ar archive, ar x awktest.a gives a directory full of files, README.TESTS says REGRESS controls the testing process, running that does... $ sh ./REGRESS Linux driftwood 4.2.0-38-generic #45~14.04.1-Ubuntu SMP Thu Jun 9 09:27:51 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux echo compiled oldawk=awk, awk=../a.out ./REGRESS: 11: ./REGRESS: Compare.t: not found 167 tests ./REGRESS: 14: ./REGRESS: Compare.p: not found 58 tests ./REGRESS: 17: ./REGRESS: Compare.T: not found 252 tests ./REGRESS: 20: ./REGRESS: Compare.tt: not found 21 tests Right, maybe I'll dig into this later but it's not obvious to me how to get it to work. > I actually ran it under LLVM > sanitizers (ASAN/MSAN/etc.), just as I did for toybox, and it revealed > the expected C coding bugs, in this code being maintained by one > person for 30 years... (BTW you never responded to my last message > about that) My laptop rebooted during txlf and I lost my open windows. I have a todo item to look at your test suite suggestions, but when I glanced at the start of it, it was things like adding "only run these tests as root" guards to some files which are part of any testing triage, so I just started doing test suite triage until I ran out of time that day, and haven't gotten back to it yet... > I will publish the combined repo at some point, and if anyone has a > burning need I can accelerate that. I'm interested. > I should make a blog post at some > point, demonstrating the sanitizers on old C code... though > unfortunately writing about code takes just as much time as coding > itself. Often more. :) > And I didn't actually fix any of the memory problems that I > found, as I did for toybox, since I don't have any plans for that code > in the future. I tried to link to your March 7 2016 email about the sed -f segfault and found out the mailing list archive is down again. Has it been a year already? (Answer: no, just it's been 7 months since http://landley.net/toybox/#12-21-2015 .) I wonder if Dreamhost will delete another chunk of history restoring from a stale backup? (My actual WEBSITE is still up. But of course they won't let me run mailman on that.) > The bottom line is that LLVM sanitizers are mandatory if you care > about bugs... You said the sed -f thing was "literally the first thing you tried" and was found with a fuzzer. The other thing you found outside of pending was commit c73947814aab (which was a thinko on my part, I was trusting the -1/2 to be zero, but was testing <= not = so it still went through the loop body then), which I can't find your submission email for (might have been on irc?) so I dunno how you found it. The other stuff you patched was in pending so hadn't BEEN reviewed. > nobody is careful enough (even Kernighan, with his > astoundingly thorough tests, much more thorough than toybox!). toybox > wget, tar, and crypto/compression libraries especially need this, > because they process untrusted input. Feel free to run it. I've never had much interest in false positive generators myself. > The other point about Kernighan's Awk is that if I were building > something like Aboriginal Linux, I would just use that for now, and > put toybox awk at low priority. Elliot showed me that the Android NDK > actually uses a copy of Kernighan's Awk and not the system awk for its > builds. Understood. Busybox had awk back when I maintained it so I'd get comparisons if I didn't have one in toybox, and I need it for dependency reduction for my mythical "four packages" goal, but I need make for that too and that's not even in the 1.0 goal list. :) > I get why you don't like GNU stuff. But Kernighan's Awk is like 7 > files of pure ANSI C, POSIX yacc, POSIX makefile, etc. Closing the circle means you have yacc as a dependency. (Although he mitigates it by shipping the yacc output...) > that builds > anywhere. Kernighan also expanded the yacc to c, so you don't need > yacc as a build dependency... that is a little "unprincipled" but I > think fine given that awk is changing so slowly and will likely not > need any maintenance. It's pretty common to ship generated files for prerequisites you don't want to demand from the end user. Everybody and their dog has a ./configure produced by autoconf, the Linux kernel has scripts/kconfig/zconfig*_shipped and so on. And of course Android's toybox git has the generated/ directory checked in. :) > Andy Rob _______________________________________________ Toybox mailing list [email protected] http://lists.landley.net/listinfo.cgi/toybox-landley.net
