On 2/27/24 21:46, Oliver Webb wrote: > On Tuesday, February 27th, 2024 at 19:19, Rob Landley <[email protected]> > wrote: >> On 2/27/24 13:46, Oliver Webb via Toybox wrote: >> >> > Since we build toybox with -funsigned-char, there is no reason to have a >> > type for unnsigned chars in bc.c, >> > since that is the default for all chars. And removing it gets rid of a >> > typedef >> >> >> Sigh, the bc in the tree is "fraught". The guy who submitted it said he was >> escaping from a cloud of drama at the time (I didn't care), and then was >> surrounded by a fresh cloud of drama after "escaping" and tried to suck me >> into >> it (I declined). He wandered off to busybox and also maintaining his command >> as >> a separate project. >> >> According to "wc toys//.c | sort -n" it is the largest command in the entire >> toybox tree. Larger than the shell despite not having a dozen subcommands nor >> needing to manage terminal control or child processes. It's 5 times the size >> or >> sed or tar, and at least 3 times the size it NEEDS to be. I'd far rather >> clean >> up xzcat or implement awk (each of which is a similar effort ballpark) than >> spend months cleaning it up. > > Just a idea, but could we clean up the lexxer for bc, rip some of it out and > make it it's > own separate thing (lib/lex.c?), then base awk off of that?
Toybox needs a "lex.c" that works like flex (it's a posix command) and a yacc that works like bison (because linux screwed up kconfig in 2018). But I haven't seen anything salvageable in bc yet, and I've written my own parser multiple times: https://github.com/landley/toybox/blob/master/toys/pending/sh.c#L2976 > I have barely looked through > the bc.c code and so far have only gotten the _really_ obvious things cleaned > up. But since > there is a lexxer somewhere in there it is theoretically possible. My understanding is lex is just parallel grep: search for these patterns in parallel and tell me which one occurred next. I tend to use if/else staircases because it's easy to do, cache local, and doesn't leave you with the problem of marshalling data between two disconnected lists. (Here's a table of string to magic number value, then here's a case statement on the magic numbers. The table and the case statement have two halves of the same data separated from each other, as we were complaining about earlier today...) Whereas: if (strstart(&x, "potato")) { } else if (strstart(&x, "walrus)) { } else if (strstart(&x, "nictitating") { } ... Is all together in one place, reasonably cache local, and mostly checking the first character of the string on failure anyway. (No point getting fancy to check ~30 strings, the overhead of one system call is probably more work.) > I know the syntax differs in > a _lot_ of places, but they both share C style functions, control flow > statements, logical operators, > etc. So they could share at least some common base? The first user is not "reuse". > When skimming through bc.c, I've found a lot of code for the -w (warn if GNU > extension) > and -s (no GNU extensions) options. Since they only make sure the program > does NOT provide > functionality, and enforce stupid things like single letter variable names > (because bc was created > when someone in the 70s duct taped a lexxer to dc to give it C like syntax > instead of rpn. and dc could > only store 255 variables as one character "registers") A whole lot of work went into this, I agree. "Is that what we should be doing" does not appear to have been asked much. > I have never seen -w or -s used by anything, and mktimeconst.bc (the bc code > in the linux-kernel) > uses plently of GNU extensions in it's code. https://landley.net/bin/mkroot/0.8.10/linux-patches/0004-Replace-timeconst.bc-with-mktimeconst.c.patch I note that my C replacement could be way smaller, I wrote it that way intentionally to mirror the (perl) code it was replacing. And my FIRST attempt at replacing it was written in shell, it just required 64 bit math which wasn't 100% guaranteed at the time. (I had to poke Denys to update busybox ash to always use long long for $((math)) even on 32 bit.) > Would you mind if I stripped the logic of these options out of bc.c? Nothing > USES them from my knowledge > and that would free up at least a few hundred lines of code (Probably more). > And we would lose no > functionality from doing so as long as we keep the options there I have no issue, but I'm still not convinced bc is a command we should keep? Rob _______________________________________________ Toybox mailing list [email protected] http://lists.landley.net/listinfo.cgi/toybox-landley.net
