On 8/9/21 4:20 PM, enh wrote: > P.S. One of my most unsolvable todo items is what to do about readline() > on > /dev/zero. If it's looking for /n it's just gonna allocate a bigger and > bigger > buffer until it triggers the OOM killer. If a single line IS a gigabyte > long, > what am I supposed to _do_ about it? > > nothing? seems like toybox should just do what was asked, not least because > whether a gigabyte is large or small is a matter of opinion?
Sure, but "I'm about to dirty enough memory to trigger the OOM killer" seems measurable reasonably cheaply when sysinfo() works? > (he says, still > deeply scarred by all the Bell Labs boys' fixed-length buffers... i had to use > _perl_ because of them! PERL!) Currently that _is_ what it's doing, and I said it's probably unsolvable. :( But if you "toybox tac /dev/zero" it will trigger the OOM killer almost immediately. (In large part because remalloc() only cares about _virtual_ address space exhaustion and cares naught for physical memory. Back on 32 bit systems a sufficiently outlandish allocation might hit a useful limit before OOM. On 64 bit systems it won't.) If you want to properly support unlimited length lines, the command should use a different strategy than reading the entire line into memory before processing it. (Which gnu grep does, by the way.) But each such strategy has its own downsides, and I've generally chosen "simple" over trying to handle such corner cases. This is just my TODO note about maybe failing more gracefully in such corner cases than the OOM killer. (Although it's really sort of libc's problem? But previous attempts to solve that gave us "strict overcommit" which was worse than the issue it was trying to solve...) This is why tr doesn't read lines. Even the one in pending is built around a while (read(0, toybuf, sizeof(toybuf)) loop because running tr on /dev/zero to map to an endless stream of another character is a common use case. That doesn't mean the fixed length buffer is user visible in the command's behavior, that would be MORE wrong. This problem is specific to readline(), and to a lesser extent things like vi that read a whole file into memory without mmap()ing it, but I mostly try not to do that. That's why toybox patch is implemented like sed and not like vi: I wanted it to be able to work on the largest files it could. It's easy to come up with cheap heuristics like saying "toybox readline() will perror_exit() if it tries to expand an individual line over 1/4 of total system memory as returned by sysinfo" but... that's not RIGHT? Because there isn't really a right thing to do here. Any restriction that avoids triggering the OOM killer also avoids cases where MAYBE it would have succeeded? (If nothing else the OOM killer might have killed something _else_... :) Truncating or splitting the line is corrupting the data. But then somebody could have an .xz attachment in an email that expands into something that sed gets run against to intentionally trigger the OOM killer on the receiving system in a way that's known to kill the wrong process... I worry about that sort of thing and try to avoid it. I haven't managed to avoid it here, which is why it's still on the todo list, but I dunno how to fix it either. Rob P.S. I don't understand what gnu "tac /dev/zero" is doing? Top has it eating 100% cpu but not increasing its memory allocation? "Not supporting embedded NUL bytes" explains part of it, but not all of it. Even if it's trying to mmap() the file and just storing indexes, you'd think the indexes would add up? Eh, I'm still too sick to think through that sort of thing... _______________________________________________ Toybox mailing list [email protected] http://lists.landley.net/listinfo.cgi/toybox-landley.net
