Re: [Toybox] [PATCH] fix newline on stdin for

enh Mon, 29 Dec 2014 22:30:56 -0800

On Sat, Dec 27, 2014 at 5:02 PM, Rob Landley <[email protected]> wrote:
>
>
> On 12/26/14 12:46, enh wrote:
>> On Thu, Dec 25, 2014 at 4:04 PM, Rob Landley <[email protected]> wrote:
>>> On 12/25/2014 01:46 AM, enh wrote:
>>>> On Wed, Dec 24, 2014 at 7:47 PM, Rob Landley <[email protected]> wrote:
>>>>> The version I checked in won't error out for 'factor ""' or 'factor "36 "'
>>>>> the way Ubuntu's will, but I think I'm ok with that...?
>>>>
>>>> out of curiosity, what practical use is there for factor? even the
>>>> coreutils version gives up around 38 decimal digits, and it's pretty
>>>> slow even with numbers that small.
>>>
>>> I was reading http://www.muppetlabs.com/~breadbox/txt/rsa.html#14 on a
>>> long bus ride, because I probably have to implement TLS someday (by
>>> which I mean https not thread local storage) because wget can't talk to
>>> the world without encryption anymore (thanks NSA), and the section I
>>> linked to above used "factor", and I went "that's a command? Apparently
>>> so. This is probably like a dozen lines to implement"... and had it
>>> working before the end of the bus ride.
>>
>> speaking of which (and going back to "simple is complex"), i have an
>> openssl- (or boringssl-)based md5sum/sha1sum implementation that adds
>> all the other shas too. (a toybox built with all these is actually a
>> couple of hundred bytes larger than the one with just md5/sha1sum, but
>> that's because of the duplicated help strings.)
>
> Actually the main reason I don't include external code is licensing.
>
> Last year I gave two talks about how I went from GPL fanboy to advocate
> of the public domain. I didn't quite fit either of them in the assigned
> timeslot, but the more coherent of the two is probably:
>
> https://archive.org/download/OhioLinuxfest2013/24-Rob_Landley-The_Rise_and_Fall_of_Copyleft.mp3
>
> The current toybox license places the code into the public domain. It
> _looks_ like a BSD license, and I sometimes call it "zero clause BSD"
> because of this, but the requirement to copy this specific license text
> into derivative works is absent. This means it's a permission grant that
> allows reusing the code without even attributing it. (Attribution is
> _polite_, but it's possible to plagiaraize shakespeare. That's not a
> licensing issue, and these days Google makes it pretty easy for teachers
> to catch all those recycled term papers anyway.)
>
> The problem with BSD-style licenses is that there are a lot of them (2
> clause BSD, 3 clause BSD, 4 clause BSD, ISC, MIT, Apache, and so on)
> that all try to do the same thing but all of them say "you must copy
> this specific wording into your derived work", so if you combine code
> from two sources under different BSD variants you wind up concatenating
> the licenses, and this can get epically silly (the kindle paperwhite's
> about->licenses thing is over 300 pages of concatenated license
> boilerplate.)


tell me about it.
https://android.googlesource.com/platform/bionic/+/master/libc/NOTICE

> I respect BSD/ISC/Apache license terms enough _not_ to treat them as
> public domain. I would like toybox to provide a source of reusable
> public domain code, it's one of the goals of the project.
>
> Toybox has included explicitly public domain code from external sources
> (such as the xz implementation for toys/pending/xzcat.c), and I've
> looked at the libtom bignum library for implementing bc (haven't managed
> to make much sense of it, to be honest). But I recently turned down a
> ping.c submission that was based on BSD ping, in favor of writing my own.
>
>> i know one of your goals is to minimize dependencies,
>
> I'm juggling an awful lot of conflicting goals. (Most of them listed on
> the roadmap or design pages.)
>
> Because of this, toybox is probably going to implement more than a lot
> of users need, but as long as the commands are self-contained you can
> switch off any command in your config that you don't want to ship.
>
> If, for auditing reasons, you don't want to use toybox's sha1sum but
> instead want to use an openssl derived version that shares code with
> other sha1sum instances you've already cleared and are using elsewhere,
> then that's what makes sense for your deployment. (If you grow to trust
> toybox's version later, you can swich to it then after everybody else
> has looked at it longer.)

yeah, my question really is whether you want me to send patches like
that to the list, or just keep them downstream.

>> but for us the
>> goal of minimizing duplication (and thus amount of code to audit) is
>> probably stronger. i suspect no one really cares that the toybox
>> hashes are slower than the openssl ones, but the security folks
>> probably will care about having another TLS implementation.
>
> Indeed, and I agree. I don't _want_ to write TLS, I think it's out of
> scope for toybox... except that I need the functionality to do basic web
> transactions that _are_ in scope. (The internet's changing out from
> under me. Two years ago you could talk to github, kernel.org, and
> twitter without https. Now if you try they redirect.)
>
> What I really want is an "stunnel" variant that works, so I can pipe an
> https session through something that encrypts it for me.
>
>   https://www.stunnel.org/index.html
>
> I tried to convince dropbear to add one years ago, but their reply was
> more or less "patches welcome".
>
>   http://lists.ucc.gu.uwa.edu.au/pipermail/dropbear/2007q1/000506.html
>   http://lists.ucc.gu.uwa.edu.au/pipermail/dropbear/2008q4/000859.html
>
> I prefer not to link toybox against external libraries (I could give a
> long talk about why, but not here), and sucking in nontrivial amounts of
> external code to maintain a local copy has its own large downsides. But
> calling reasonably standardized external commands and piping stuff
> through them? I'm all for it.
>
> In fact toybox commands are designed to be able to call external
> versions of commands even when toybox has its own implementation. That's
> why mount.c doesn't check if CFG_LOSETUP is enabled before trying to
> xpopen("losetup"), if it's there in the $PATH but not in this binary, ok
> then.
>
> As for the md5/sha1/sha256/sha3, they're easy to test (their failures
> tend to be really obvious), and the two I implemented are inherently
> timing invariant and don't have obvious sidechannel attacks. And I _can_
> find existing public domain impelmentations of these to start from, such as:
>
> http://cpansearch.perl.org/src/BJOERN/Compress-Deflate7-1.0/7zip/C/Sha256.c
>
> So adding the other hashing functions to toybox makes sense to me,
> especially since I need them for a traditional /etc/shadow login.c. (I
> need to research android's user database and how to access it.)

i can save you some time there: there isn't one. bionic's getpwnam and
friends will do the right thing, though, so toybox's id works fine.
(the patch i sent you fixes bugs that affect id on the desktop too,
nothing Android-specific.)

> That said, I _do_ care that they're slower than other implementations.
> That's a simple vs fast balance that's... I took the first speedup
> patch, didn't take the second speedup patch, and I need to go back and
> look at it...
>
>> (and
>> things like reimplementing zlib and bunzip2 probably fall somewhere in
>> between.)
>
> One of the goals I'm juggling is "busybox replacement", and they have
> this stuff. But again that's just a weighting, busybox alrady contains a
> lot of stuff we're _not_ implementing.
>
> If I was starting from scratch today I might leave them out, but I have
> a history with both bzip2 and gzip which makes it easier for me to keep
> both of them in scope.
>
> The one we really _need_ is deflate/inflate, because we should have a
> compression algorithm and that's the simplest and most lightweight one.
> The extract side of the other two are there because tarballs come in
> that format and a build environment needs to be able to extract them
> (another goal I'm juggling. The strace source is _only_ available as .xz
> these days, for example.)
>
> But I probably won't bother with the compression side of bzip2 or xz. If
> you want to create a new tarball we support gzip and if you want it in
> those other formats you can install the other package.
>
> To explain my "history with bzip2 and gzip" above:
>
> I reimplemented bunzip2 years ago because the original was horrid, and
> my implementation got sucked up into a bunch of places. (I think the
> kernel uses it if you select bzip compression, although these days gzip
> or xz are the dominant ones.)
>
> I also wrote 90% of bzip2 compression side support a decade back for
> busybox, but got distracted near the end and never got back to it
> because the bzip2 compression algorithm is WEIRD:
>
> http://lists.busybox.net/pipermail/busybox/2004-February/010859.html
>
> Even _with_ most of the work done I probably won't bother with bzip2
> compression side unless somebody really wants it, both because it's
> semi-obsolete these days and because its compression is based on weird
> heuristics for the string sorting that I've never managed to clean up
> into something understandable. (The "crap.c" above, which is a series of
> fallbacks between different sorting algorithms with no explanation of
> _why_.) I _can't_ simplify this into something easy to understand that
> somebody might want to use as example code in a middle school
> programming class, the algorithm is just inherently nuts.
>
> I already did gunzip a few months ago, and I'm working on gzip
> compression side support now. I wrote a java implementation of that back
> when Java 1.0 didn't include it in the base library. (Java 1.1 came out
> before I did the decompression side that time, so I moved on to other
> things.) I took info-zip apart back when I as programming for OS/2, I
> actually know that one pretty well. So that's probably the only
> compressor I'll implement, when it's done it shuld be less than 500
> lines of code. Also, Ashwini Sharma asked me to prioritize that so they
> can use it in a product.
>
> As for xz: I received an external contribution based on the public
> domain decompressor. The "fetch tarball, extract, configure, make,
> install" codepath needs to be able to extract them, and the code's
> already in. (And is horrible, there's built-in knowledge of various
> processor machine language formats, which strongly implies upgrades will
> need more of this filigree for new processor variants.)
>
> But I don't particularly want to do the compression side for that.
>
> If you decide to switch off our bunzip2 and use the external version
> instead, toybox "tar" should call out to it and pipe stuff through it
> just fine. (I dunno if it currently _does_, but once I've cleaned it up...)
>
>> in this specific case the openssl API is reasonable enough your
>> implementations could be a drop-in replacement, but i suspect in other
>> cases part of your motivation for writing your own will have been the
>> awful API.
>
> Part, yes. But only part. I mentioned licensing above. There's also the
> fact that I can often come up with objectively better code.
>
> In the case of bunzip2, back in 2003 I replaced this:
>
> http://git.busybox.net/busybox/tree/archival/libunarchive/decompress_bunzip2.c?id=6fe55ae93983
>
> With this:
>
> http://git.busybox.net/busybox/tree/archival/libunarchive/decompress_bunzip2.c?id=0d6d88a2058d
>
> That's not just replacing 1658 lines with 531 lines: try actualy reading
> the old code. Contemplate the "save state" and the big switch/case in
> the main function starting at line 395. (They copy all the local
> variables out of a structure, each call, and copy them back before
> returning. They use a switch/case with labels covering the whole
> function so they can to jump back into the middle of nested loops.
> That's so it could return when it ran out of data and be called to
> resume decompressing once the buffer was filled. I replaced that with a
> get_bits() call that had the filehandle stored away and could read more
> data if it needed to.)
>
> Yes, that's Julian Sewards bunzip2 code. That wasn't something toybox
> did to it, that's what the upstream package they copied had always been
> like.
>
> A more recent case where I shrake a codebase to 1/3 of its original
> size/complexity was ifconfig. I described what I did at length here:
>
>   http://landley.net/toybox/cleanup.html#ifconfig
>
> The "old" and "new" lines with the totals are links to the original and
> changed file. I described each change on the mailing list, and collected
> links to all the descriptions on that page. You might want to read just
> the first description here:
>
> http://lists.landley.net/pipermail/toybox-landley.net/2013-April/000882.html
>
> Note: the ifconfig I received was a professional contribution from a
> team of experienced coders, and what they sent me did work. I'm just...
> picky.
>
>> also in this specific case there's almost no sharing
>> between the implentations anyway because 99% of the code is the hash
>> implementation itself. but if you can, keeping API compatibility with
>> the library you're trying to replace would be good.
>
> I've pondered adding zlib bindings for deflate/inflate when I get them
> done, but that's a post-1.0 thing.
>
> I note that one of my first interactions with Rich Felker (the musl
> maintainer) was him explaining to me what would be involved in making an
> executable also be a shared library (so you can have libz.so be a
> symlink to busybox so -lz was satisfied with the busybox code). Google
> finds the old thread at:
>
> http://lists.uclibc.org/pipermail/busybox/2006-April/054373.html
>
> Busybox never did that, but toybox might. Not in the 1.0 release,
> though. (I _think_ it's worth the complexity? Obviously only if there's
> a config option to not do that...)
>
> However, when researching deflate I read the zlib source, and the
> info-zip source, and the plan 9 source, and three different "tiny"
> implementations (the _least_ useful of which was miniz.c, classic
> example of the kind of code shrinkage tricks I'm trying to _avoid_...)
>
>> anyway, let me know whether you'd like to merge stuff like this into
>> the main codebase. otherwise i can just "git rm" locally and add the
>> alternative version to toys/android.
>
> Toybox commands can all be switched off. Any command you've got a better
> implementation of (for any metric of better), feel free to switch them
> off. I'd very much like to _improve_ toybox's version until you feel
> it's the better one, but "we audited this other codebase already"
>
>> i'll get a delete/merge conflict
>> if you change anything in your version so i'll be able to track
>> changes, so it's only really a loss if you think you have other users
>> who'd prefer to use openssl.
>
> Um, issue to be aware of: the subdirectories are just a developer
> convenience, the command namespace is actually flat. So if you have a
> NEWTOY(sha1sum) in toys/lsb and another NEWTOY(sha1sum) in toys/android,
> the build will break when it hits the duplicate command name.
>
> (Actually since you're not using our build infrastructure you can
> probably just ignore that, and point your .mk files at the right .c
> files for what you're building... :)

yes, but i think some of the scripts get confused, plus it's good for
me to be able to build and test the desktop version of toybox too if
i'm sending you patches. plus removing it locally means git will
complain loudly if something changes upstream, so i'll be able to keep
track of what's going on.

>>> P.S. At a design level I thought about defaulting it "n" but the
>>> defconfig y/n signalling primarily indicates "is this done or not" and
>>> it was finished and worked fine, so... (Well, the examples directory
>>> also has stuff that defaults to "n" but factor isn't really a
>>> demonstration of how to use the toybox infrastructure either.) And
>>> defaulting "n" for other reasons is editorializing, where does it stop?
>>> rev and tac? fallocate? makedevs? freeramdisk? partprobe? People _asked_
>>> me to add most of those, because they needed them. If somebody want to
>>> make a .config file selecting a subset of the commands, you can do that.
>>> It's not my job to guess how people will use generic tools.
>>
>> yeah, i was hoping to abdicate responsibility for subsetting and was
>> disappointed to find that 'default' didn't mean "you probably want
>> this". but it makes sense, and the subset that one project needs isn't
>> necessarily going to be the same as any other project.
>
> Indeed. Something I learned back when I maintained busybox: don't try to
> guess how people will use a hammer. You'll only get in the way.
>
>> it sucks to be me though. the best i can aim for is to try to ensure
>> that there are roughly the same number of people complaining i put too
>> much in as people complaining i left too much out :-)
>
> Oh I've still got that, just at a different level. "Should include this
> command or not". (You're entirely right "factor" was a questionable call
> there. It was sort of on the line even after I wrote it. I just cleaned
> up "mix.c" which is another one. Deciding whether to merge that I was
> looking at the aumix man page and going "this is simpler, but that's
> more standard, but nobody's _asked_ for the bigger one yet and that's
> mostly about curses mode instead of command line, and this seems to do
> the minimum you need...")
>
> So much easier when there's a standards document to blame. (Of course I
> vetoed like 1/3 of the posix command list anyway. Nobody needs sccs in
> 2014.)

(i initially included uuencode/uudecode until one of the other guys on
the team asked what they were. turned out only me and the second
oldest guy had even heard of them...)

>> it's a pity the debian popularity contest only has per-package data
>> (https://qa.debian.org/popcon.php?package=coreutils). if you ask
>> people they always tell you they use everything "all the time". even
>> if you broke it two releases ago and removed it one release ago.
>
> Have you read toybox's roadmap.html page? It may not meet your needs but
> at least I have _reason_ for listing the commands I did. :)

one thing i was curious about was what busybox configurations tend to
get used in the wild. your roadmap implies that you looked at that,
but i didn't see where.

> Always happy to have another viewpoint to rejuggle the weightings...

i'm assuming we (Android) will get some feedback eventually.

> When I get my darn server reinstalled and get AOSP on it, I want to run
> the AOSP build with the toybox commands. (Aboriginal Linux is using an
> old version of linux from scratch as a bootstrapping test, but android's
> build needs more commands than that. And may use command line options
> that toybox doesn't implement yet. I know _you_ aren't trying to get
> android self-hosting anytime soon, but I still am. :)

yeah, my builds are more than slow enough even on the fastest desktop
hardware :-)

>> to work out which options are important for the commands that toolbox
>> and toybox have in common, i've been relying on my command-line
>> history, what i can find in scripts, and whether someone cared enough
>> to add/fix something. but i don't yet have a plan for all the stuff in
>> toys/pending.
>
> My plan is to clean them up (the way I did the other cleanup.html
> things) and get them out of pending.
>
> It's surprisingly time consuming, but if you read through the history of
> one of the cleanups I documented there, you can see why...
>
>> i also haven't thought much about "in the binary" versus
>> "gets a symlink"; i suspect that the "too much" camp will be further
>> subdivided into those who're offended by the binary size and those
>> who're offended by the number of symlinks in /system/bin.
>
> I don't understand the distinction here? (Is your build making
> standalone binaries for the toybox commands ala scripts/single.sh? It
> didn't look like it was but I have to stare at makefiles a lot to beat
> any sense out of 'em...)

no, there's one binary that contains n toys, and then m symbolic
links, where n != m. think of the ones that don't have links as my
level of "pending"ness :-) though it's ill-defined what not having a
link means. sometimes it's "i think we'll want this, but i haven't
checked it works for us yet", sometimes "i think this probably doesn't
make any sense on Android, but it's in the default set and i haven't
yet been convinced to kick it out".

but my earlier point was that no one is likely to look too closely at
what's in the toybox binary, especially as long as it's roughly the
same size as the toolbox one used to be. but if they start stumbling
across symbolic links for things they think are a waste of space,
they're more likely to get on my case about it.

>> and getting back to factor, i can't decide whether having it paints a
>> target on my back or gives me something i don't care about to throw
>> under the bus as a gesture of goodwill :-)
>
> Politics, I can't help you with. :)
>
> Rob
_______________________________________________
Toybox mailing list
[email protected]
http://lists.landley.net/listinfo.cgi/toybox-landley.net

Re: [Toybox] [PATCH] fix newline on stdin for

Reply via email to