On 07/21/2016 12:25 AM, Andy Chu wrote: >>> However, I did a bunch of research and hacking on Kernighan's Awk. I >>> was trying to morph it into a "proper" modern language. >> >> Another one? >> >> why? > > Because if you're going to rule out Python/Perl/etc. on a minimal > Unix, which I mostly agree with, then you still need a decent > scripting language that level of abstraction. Awk is by far the > closest out of any "classic Unix" language.
I'm fairly certain awk wasn't intended to be turing complete, but I'd have to dig into the computerphile youtube interviews with Kernighan to try to find his actual quote. >> Presumably you wouldn't remove anything significant from the base >> language, since that would break compatability with existing awk >> scripts, so your reaction to awk was "how could I fork this to make it >> bigger"? > > The overall system would be smaller if you expanded awk (add > readlink(), etc.) and then wrote the core utilities in it. You are aware that perl started life as an attempt to combine awk, sed, and shell into a single tool, right? Is this really a model you want to emulate? (I say this as someone who happily used C++ back when it was "C with classes" before templates went into the language. Doesn't mean I think another "C with classes" fork would be worth doing, which is why I never tried to learn objective C.) > But as mentioned in my previous message, through my hacking I > determined that both Awk and Make have bad semantics (while shell has > good semantics). So my current idea is to have "extreme > compatibility" for my shell, but add some awk and make features to it, > so you don't have to remember 3 different syntaxes for loops, > conditionals, and function calls. So instead of awk+sed+sh you're doing awk+make+sh. > In other words, the awk and make parts aren't compatible with actual > awk and make -- they just share the same architecture (row-wise > streaming of data and data-oriented parallel builds.) But the shell > part is compatible. You're creating yet another programming language (because clearly we haven't got enough of those yet) and if a system needs a script written in your tool they'll install yet another language alongside ruby and lua and python and so on. People write programs in a language. If it's not a compiled langauge, then the runtime of that language becomes a runtime dependency of that thing. If it is compiled, it's still a build-time dependency, and knowledge of it becomes a maintenance dependency. It can only be modified/extended/ported/fixed by people who understand that language. Everybody who creates a new language dreams that their language will cause a net simplification of the world by displacing OTHER languages and causing them to die out, but just about the only time this has EVER happened was C and it took several decades to do it, and by that time C itself was under embrace-and-extend attack by C++ (which still hasn't got any sort of boundaries to stop its endless feature creep), not to mention also-rans like objective C, or the modern crop of go/rust/swift that is each sure it will become the new C and somehow get all the kernels and other language runtimes rewritten in it. I ran "apropos interpreter" on my ubuntu 14.04 netbook (reasonably stock, I try not to install extra build dependencies on it) and it found erb (ruby), perl, dash, and tclsh. This DIDN'T pull up bash or python, which I know are on here, so clearly isn't a complete list. Your new thing, if _wildly_ successful, would displace none of those. Python 3 can't even displace python 2. > To remind everyone, the basic beef is that Unix has a good > architecture, but horrible syntax. And there are too many languages. > Nobody younger than me learns awk or make anymore. I didn't learn make until I had to (after graduating from college). > Or even shell. It just feels old. I sat though a couple decades being vastly outnumbered by MCSEs doing visual basic. They also pointed and laughed at shell scripts. Guess which outlived which? > The average engineer at Google doesn't know any of > those things if they started their career in the last 10 years. If their job is making chrome render faster on windows, I'm not surprised? You guys do web infrastructure through self driving cars on systems that got installed for you. You're writing apps in Python and C and such, doing cluster load balancing and AI research into semantic web analysis. Banging on the OS is not their domain expertise. (Do you know how hard it is to google for android _system_ information? You try to find information about Android programming it's apps in java all the way down. Clearly this means there IS no part of the system written in C, it's just java, as javaos demonstrated the feasability of 20 years ago...) Also, from 2005 to 2012 Guido Van Rossum worked at Google, during which Google made a big deal about writing stuff in Python. Then Guido left for Dropbox and Google stopped talking about python so much (at least externally), but Ken Thompson and Rob Pike took over providing an in-house language (Go) for Google to have Invented Here. (Apple's is Swift, apparently because Objective C was so totally overshadowed by C++ they wanted to try again.) As I said, a few years ago a friend worked for a company called Basho that did everything in Haskell. A decade back I worked a contract at a company that did everything in Perl 4. (Yes, significantly post-y2k!) The culture of a specific company, even Google, isn't necessarily representative of the world at large. "We use this set of technologies here" != "this is the universally important set of technologies". Years ago I noticed a corollary to Moore's Law, which is that 50% of what you know about software becomes obsolete every 18 months. The nice thing about unix is it's mostly been the same 50% cycling out over and over for many decades now. I've learned lots of things that only lasted 5 years, and decided to just wait for others to go away. I waited out AOL. I'm currently waiting out Facebook. It took a LONG TIME to wait out Windows but at this point I no longer have to care about it. I can wait out systemd. I'm not sure what parts of Android's infrastructure will cycle out in my lifetime and which new generations will embrace, but "it builds under itself" is necessary for the long-term health of any platform. Right now it builds under a semi-posix environment that's enumerable and can be bounded, and I'm trying to transplant that. > The xkcd somewhat applies, but is mitigated by 2 things: > > 1) You can replace an entrenched technology/language if you make your > new thing a superset of the old thing, i.e. retaining a high degree of > compatibility. The way C++ included the whole of C and was therefore clearly superior to C, you mean? Or the way perl combined awk and sed and shell? Who was it who said any problem can be solved by adding another layer of indirection, except too many layers of indirection? I'm not sure "let's use a bigger tool to solve the same problems" is a rallying cry I'm really comfortable getting behind. A couple core ideas of unix were small tools connected by pipes (communicating via mostly textual interfaces because humans can read that), and "do one thing and do it well". That's part of the reason it's survived so well, it's made from decoupled parts you can individually swap out. (more->less and so on.) The point of a shell script is to easily call lots of external commands which are not, strictly speaking, part of the shell. Yes busybox and toybox blur those lines, but not execing itself again out of the $PATH is _mostly_ a performance hack, I.E. entirely optional and can be disabled. There are a set of shell builtin commands that can't be implemented externally (cd/exit/export/read are process-local because they modify process attributes like environment variables and cwd), but when the kernel guys gave me the tools to do "ulimit" as a standalone command, I did. > After my research on ksh, it's clear that this is how bash gained > popularity. ksh was the most popular implementation at the time of > the POSIX standard, and was probably the biggest influence on the > standard. bash was playing catch up -- it aggressively implemented > POSIX *and* the non-POSIX parts of ksh. So eventually people ported > their ksh scripts to bash. Yes and no. Bash was the first program Linux ever ran. Linus created the Linux kernel by extending his boot-from-floppy terminal program to handle bash system calls so he didn't have to keep rebooting into minix to list/rename/move/delete files and directories. (He had a tiny hard drive and was constantly clearing off space to download more files from usenet via the university microvax). The initial release of bash was June 8, 1989, meaning when Linus posted the first Linux announcement in August 25, 1991 bash was 2 years old. Bash did not become the default shell of solaris (tcsh), freebsd (also tcsh), or aix (korn). As far as I can tell, its popularity was largely driven by the fact that it was the default shell of every single Linux installation for fifteen years, until Ubuntu decided in 2006 that its init scripts ran too slow. No really, that's the reason they gave for the switch: https://wiki.ubuntu.com/DashAsBinSh Even then diversifying away from bash was gradual, even Debian (which Ubuntu was basically carrying at that point, hiring full-time people to work on what was otherwise a badly struggling distro) took 3 years to bow to Ubuntu's will here: https://lwn.net/Articles/343924/ > It's basically embrace-and-extend in the open source world... there's > a reason that Microsoft used that strategy -- it works. You implement > something bug-for-bug and then you extend it with useful features. No, their strategy was bundling. There were two entire antitrust trials about this, and a quote about a ham sandwich. From windows coming free with every copy of DOS (and per-motherboard licensing so you couldn't buy the hardware without getting their OS) to Office (can't use their spreadsheet without their word processor and powerpoint) to making their browser un-removable from their OS (which still didn't let them change HTML much, no matter how they tried). They've embraced and extended all sorts of stuff people utterly ignored or which got traction and then lost it again, from the Zune embracing and extending MP3, Microsoft's "J" language (then C# when they got sued) embracing and extending Java... Outlook tried to embrace and extend email but their SUCCESS in that area was bundling calendaring with email so you needed one to use the other (both tied to an exchange server using protocols they tried to make very hard to reverse engineer). Bundling can't really be forced in the open source world, the closest you get is de-facto standards, such as bash and gcc were for linux. As Linux succeeded, bash succeeded, and got installed on other systems because people wanted their familiar Linux environment there too. Then ubuntu bundled dash instead (because stupid) and pushed the other way, but it still took a while even with Ubuntu having the same 50% workstation maketshare that Red Hat gave up a few years earlier... > 2) awk has a much smaller user base than shell. You do see big awk > scripts, but you see MANY more big shell scripts. And there are more > shell scripts altogether. > > awk and make are also at least 5x smaller and 5x easier to implement > than the shell (if you look at bash/zsh vs GNU awk/make, as well as > other implementations) Which is why I plan to implement awk and make as their own commands? > If you can manage to fold some awk functionality into shell, then you > could possibly decrease the total number of languages (at least in a > given system). No, seriously, this is how Larry Wall created perl: http://www.shlomifish.org/lecture/Perl/Newbies/lecture1/intro/history.html That way lies the Emperor of all Cosmos having a drunken bender and making you push a ball around Japanese living rooms to surprisingly catching theme music in order to create replacement stars. > As I said, nobody needs 3 different syntaxes for > loops, function calls, and expressions. (And you cannot avoid them, > at least if you are looking at real systems...) Those who do not know the history of loop syntaxes are doomed to repeat. >> The lua thing fell apart trying to write mount, ifconfig, netcat, >> losetup, nsenter, ionice, chroot, swapon, setsid, insmod, taskset, >> dmesg... The language just didn't have the bindings. > > Sure but you can find bindings or write them yourself. That's the > whole point of Lua! If I have to write code in C and cross-compile it to every supported target in order to bootstrap a system, what am I bothering with Lua for? It's just another unnecessary prerequisite package, I might as well just write the whole thing in C. (So I did.) (And no, "You can install 7 externally maintained prerequisite packages instead" is not an improvement.) >>> I think you mentioned you were looking for an awk test suite. Well >>> there it is -- there are hundreds or thousands of test cases, >>> including for the regex language. >> >> Which is provided by libc. > > Kernighan Awk has its own regex implementation "b.c" in 958 lines, and > there is an argument to keep it. It uses the Thompson linear-time > NFA/DFA algorithm rather than exponential backtracking. If musl or bionic should have a better regex expression, fine. But I see no need to reinvent this particular wheel. (And I've reinvented a lot of wheels. In fact I wrote my own regex engine for OS/2 feature install in 1996, although that used glob syntax rather than regex syntax because I did way more DOS than Unix back then. But I have NOT written one for toybox. Libc exists and posix says it should have this.) > See the note here: > > https://github.com/andychu/bwk > > Coincidentally StackOverflow was down today for a related reason... > matching regexes on user input can blow up CPU on your servers: > https://news.ycombinator.com/item?id=12131909 (I linked to my bwk repo > there). And I have seen this bug before elsewhere. I collated identical * runs even in my old glob implementation because otherwise it was obvious N^X complexity dealing with them. I assume Rich has decent stuff in musl and bionic can rip anything they haven't already got from there. If not, I'm aware of a couple smart guys maintaining those things who can handle this so it's not _my_ problem. :) > It matters what algorithm you use, and awk/sed/grep are all used with > big input data and (I think?) big regexes. I use them on gigabytes of > text. It probably doesn't matter for bash [[, because there you are > just matching a short string against a regex. Don't assume what your inputs look like. Modern Linux removed the 128k environment space limitation almost a decade ago (commit b6a2fea3931 which went into 2.6.22 released July 2007) and it was never there for local shell variables anyway. So there's no reason shell can't read X and "$X" ~= blah and churn through as big an input as anything else. > I think GNU awk/sed/grep all have their own regex implementation and > don't use libc, but I could be wrong (?). Gnu bash 2.x has its own malloc implementation, which is why I had to say --without-bash-malloc in configure. (Dunno what more current bash is doing, I haven't built any of the gplv3 versions from source.) The epic "not invented here" of the gnu/dammit brigades is kind of impressive. Me, I've consistently said if your libc is broken fix your libc. (I'm also aware of the performance hacks their grep does with block reads instead of line reads, and then backing up to find line context after the match. In theory I could do that with libc's regex stuff too. In practice, I haven't gone there. My big todo item is making it work with embedded NUL bytes, which is delaying my current round of grep replumbing...) (So you can "grep string /bin/blah" out of executables, of course. Has to find the string after the NUL byte. No, I can't use libc's regex engine to cross null bytes, but I can't cross newlines either.) > I thought busybox had some of its own regex support too. Not that I recall, but it's been 10 years. The sed I wrote for busybox way back when used libc's regex though. (There was a wrapper but I believe it was just the standard "turn regcomp() failure into exit()" sort of thing.) > Andy Rob _______________________________________________ Toybox mailing list [email protected] http://lists.landley.net/listinfo.cgi/toybox-landley.net
