Half the reason for the ongoing gzip work is I want to move the local deflate implementation into lib/deflate.c, because it's shared by gzip and zip, and might be of interest to some other stuff. (It's the only one I plan to do the compression-side for, the rest are just decompressors.)
The _reason_ for that last parenthetical is toybox's goal of enabling what google calls "hermetic builds", I.E. package builds that provide all their own prerequisites, and thus build reliably and portably and even bit-for-bit reproducibly on as many different systems as possible. (Portability applying to "the future" as much as the variety of today's systems.) The complete-the-circle case of hermetic builds is the minimal source bootstrap, I.E. the smallest system that can rebuild itself under itself from source code. A contemporary solution to this problem was demonstrated by https://landley.net/aboriginal/about.html, and https://github.com/landley/mkroot is a simpler more modern version currently in development. My earlier time working on busybox was aimed at making aboriginal linux self-hosting, and toybox's roadmap leading to the 1.0 release is organized around solving this problem using new code unencumbered by problematic licensing. Toybox aims to provide a complete hermetic build environment in a minimal number of packages. This minimum gives boundaries to a full-system security audit, allows students to read the code of a complete working system in a finite amount of time, minimizes the amount to port to new contexts, minimizes the number of programming languages you need to learn to understand "the system"... Conceptually this minimum is 4 packages*: kernel (linux), command line (toybox), libc (musl**), and C compiler*** toolchain (qcc). My old aboriginal linux project provided a working example of a self-contained minimal build using seven packages: linux, busybox/make/bash, uClbic, and binutils/gcc. Then as proof of concept it build Linux From Scratch 6.3 under the result. (Presumably enough to build anything else under, the programming version of "reducing to an earlier solved problem".) The saga of me doing that (and accidentally becoming busybox maintainer on the way) is detailed in http://landley.net/aboriginal/history.html But getting a real build cycle down to 4 packages (which the "minimal" and "self-contained" goals strive towards) means toybox has to include its own implementations of things like zlib and curses if it wants to provide that functionality To download and build source packages, you need to be able to parse incoming tarballs in all three popular formats (tar.gz, tar.bz2, tar.xz) but you only really need to be able to _create_ one. Gzip is useful both as an archiver (it's the 80/20 solution for archiving) and as a streaming protocol. (Plus the zip file format has been used for all sorts of things, from java jar files to the archives you reimage an android phone with. And see my recent post about wanting to loopback mount 'em, basically a simpler squashfs.) So "deflate" is functionality toybox probably needs to provide, which means toybox should include its own inflate and deflate implementations as part of its 1.0 release. tl;dr: That's why I fiddled with gzip. Rob * Each of these packages has good reasons to be separate. First, there's multiple iplementations you can swap out (llvm for bcc, bsd for linux, busybox for toybox, bionic for musl). Second each one deals with its own problem domain: the kernel is full of drivers for specific hardware and runs in a different context (ring 0) providing a defined API (system calls and such) to the rest of the system. The C library provides both generic functionality every userspace program needs and acts as a glue layer between the portable-ish c99+posix and a given kernel's system calls. The C compiler has a lot of processor-specific logic for assembly language parsing and code generation, and implements the C99 standard. And the command line utilities are all called from main(argc, argv[]) with environment variables, and run in a mostly architecture-independent context. Projects like bsd and xv6 have (historically) merged these together into one package, and it was a bad idea. http://dsbscience.com/freepubs/linuxoverwindows/node9.html ** This was bionic until they started rewriting it in C++. The other packages are all written in C, which is a much simpler language providing minimal abstraction between what the programmer wrote and the machine language the hardware interprets. Ten years ago tinycc was an example of a c99 compiler in 100k lines of code that didn't even provide a nontrivial optimizer, but which booted the linux kernel, ala https://bellard.org/tcc/tccboot.html *** Bootstrapping this up to a full modern distro**** would involve writing a modern version of cfront that tinycc/qcc***** could build, and then building llvm with the result. But this shouldn't be necessary in the base system. A simpler bootstrap kernel than Linux might be a good idea too, mit's xv6 and google fuchsia are playing in that space, as are a number of bootloaders and embedded systems. **** Bootstrapping this up to android involves writing a read-only git engine capable of being driven from repo to download repositories from git servers, check out working directories, and probably handle subtrees. I expect "git bisect" would be in scope to. Checking anythying _in_ (merging and commits) would not be in the 1.0 of that. Note the distinction between 'build environment' and 'development environment', which I covered in https://speakerdeck.com/landley/developing-for-non-x86-targets-using-qemu back in 2008. ***** qcc is https://landley.net/qcc and somebody PLEASE steal that idea so I don't have to do it after toybox and mkroot and breaking down AOSP into orthogonal build stages... _______________________________________________ Toybox mailing list [email protected] http://lists.landley.net/listinfo.cgi/toybox-landley.net
