On Sun, Jul 10, 2016 at 10:28 AM, Rob Landley <[email protected]> wrote:
> Awk's better than bc.

That's interesting... I had no idea bc was a language with functions and loops!

https://github.com/jck/822kernel/blob/master/kernel/time/timeconst.bc

This is the problem with DSLs... shell, make, awk, and presumably bc
all started out as very specific languages, for different purposes.
Over time, they all grew a C-like imperative language.  And nobody
wants to remember 3 or 4 different syntaxes for that:

f() { echo hi $1; }
f "bob"

function f(name) { print "hi" name }
f("bob")

define f
hi %1
end
$(call f,"bob")

(And repeat this mess for every other construct in a language...)

It does seem that if you rule out Python/Perl, awk is the winner with
respect to code generation, based on the fact that a lot of C code
uses it (many shells, Android, FreeBSD, etc.)  And I agree with the
idea of minimizing build dependencies.

However, I did a bunch of research and hacking on Kernighan's Awk.  I
was trying to morph it into a "proper" modern language.  For example,
you could imagine writing "ls" or "xargs" or even a shell in Awk, sort
of like the idea to write tools in Lua.

But then I ran into some big limitations, like you can't return
associative arrays from a function, or pass or return functions
to/from functions.  Awk looks very similar to JavaScript -- C syntax
with associative arrays, but is semantically much more limited.

I lost interest in awk because of these limitations.  awk is used, but
seems to be waning, and it's not really evolving.  (But I haven't lost
interest in the shell.)

I did however automate and slightly rewrite Kernighan's EXTENSIVE test
suite here, which is AFAICT is not in the other Git reconstructions:

https://www.cs.princeton.edu/~bwk/btl.mirror/

https://github.com/onetrueawk/awk

I think you mentioned you were looking for an awk test suite.  Well
there it is -- there are hundreds or thousands of test cases,
including for the regex language.  I actually ran it under LLVM
sanitizers (ASAN/MSAN/etc.), just as I did for toybox, and it revealed
the expected C coding bugs, in this code being maintained by one
person for 30 years... (BTW you never responded to my last message
about that)

I will publish the combined repo at some point, and if anyone has a
burning need I can accelerate that.  I should make a blog post at some
point, demonstrating the sanitizers on old C code... though
unfortunately writing about code takes just as much time as coding
itself.  And I didn't actually fix any of the memory problems that I
found, as I did for toybox, since I don't have any plans for that code
in the future.

The bottom line is that LLVM sanitizers are mandatory if you care
about bugs... nobody is careful enough (even Kernighan, with his
astoundingly thorough tests, much more thorough than toybox!).  toybox
wget, tar, and crypto/compression libraries especially need this,
because they process untrusted input.

The other point about Kernighan's Awk is that if I were building
something like Aboriginal Linux, I would just use that for now, and
put toybox awk at low priority.  Elliot showed me that the Android NDK
actually uses a copy of Kernighan's Awk and not the system awk for its
builds.

I get why you don't like GNU stuff.  But Kernighan's Awk is like 7
files of pure ANSI C, POSIX yacc, POSIX makefile, etc. that builds
anywhere.  Kernighan also expanded the yacc to c, so you don't need
yacc as a build dependency... that is a little "unprincipled" but I
think fine given that awk is changing so slowly and will likely not
need any maintenance.

Andy
_______________________________________________
Toybox mailing list
[email protected]
http://lists.landley.net/listinfo.cgi/toybox-landley.net

Reply via email to