On Thu, Apr 25, 2019 at 3:03 PM Rob Landley <[email protected]> wrote: > > On 4/24/19 5:50 PM, enh wrote: > > On Wed, Apr 24, 2019 at 3:40 PM Rob Landley <[email protected]> wrote: > >> > >> On 4/23/19 8:06 PM, enh wrote: > >>> (sorry. busy.) > >> > >> Sigh. I'll add a config symbol you can switch on, with help text > >> complaining > >> about it. > > > > well, it's not necessarily grep i'm worried about. i think if all the > > other toys are now going to be line-buffered, it doesn't make much > > sense to single out grep like this. > > Neither do I. > > That said, after writing up a point by point rebuttal to your message, I > benchmarked the code (silly to argue about something we haven't tested the > speed > of) and found out that devuan's grep takes 8 seconds to do something mine > takes > 20 to do _before_ the output buffering changed. Which is unacceptable and made > the argument we were having kinda moot.
(one thing i was going to say is that line buffering has a few weird side-effects. C11's 7.21.3.3 implies that you have to flush line-buffered streams any time you read from an unbuffered or line-buffered stream, for example. which would affect your original code.) > So now I'm staring at do_lines() and seeing if I can get it to do large (at > least 64k) block reads where it doesn't memcpy() the data out into individual > lines but passes a pointer into the buffer where possible. (And this lets me > replace get_line(fd) which has been on the todo list forever, although moving > the all to do_lines() and a callback is probably the best way...) /me shudders having just fixed a bug in NetBSD's code to do this. but, yeah, it's probably worth the cost. and we have ASan to keep us honest. > I might even be able to get it to mmap() and mremap() the data traversing > down a > file without too intrusive a second path. (Does madvise(MADV_WILLNEED) persist > across mremaps of the same mapping, or do I need to call it again?) > > I _also_ have a todo item WAY down my list that if toysh has a "find | grep" > pipeline and both are toybox commands it can automatically -print0 -0 on it, > but > I'm not sure the proper semantics for triggering that. (Or saying it > _shouldn't_ > do that, which is why it's way down on the list...) > > >> Does the attached work/suffice? (I _think_ setlinebuf(stdout) followed by > >> setvbuf() something else is legal as long as we haven't output anything to > >> the > >> stream yet? But the man page is unclear. Presumably the admonition is > >> because > >> pending unflushed data in the buffer would get discarded, but they don't > >> _say_...) > > > > (POSIX explicitly says that it's only valid before any output, but > > bionic [and BSDs] flush output data and discard input data if you call > > setvbuf [and the others are implemented in terms of setvbuf].) > > musl doesn't appear bothered either. I read that as saying we can do it twice > before any output, but would be amazed if glibc ever tested that, yet care > less > and less about breaking anything with "gnu" on it each year. It comes > pre-broken > by the gnu all over it. > > Rob _______________________________________________ Toybox mailing list [email protected] http://lists.landley.net/listinfo.cgi/toybox-landley.net
