Re: Modify buffering of standard streams via environment variables (not LD_PRELOAD)?
> On Sun, 21 Apr 2024, wrotycz wrote: > > > > It seems that it's 'interleaved' when buffer is written to a file or > > pipe, and because stdout is buffered it waits until buffer is full or > > flushed, while stderr is not and it doesn't wait and write immediately. > > Right; my point was just that stdout and stderr are still separate streams > (with distinct buffers & buffering modes), even if fd 1 & 2 refer to the > same pipe. As I guess I should've expected, the behavior differs between a bash script and a compiled program. $ cat ./abc123 #!/bin/bash printf '%s' 'a' >&2 printf '%s' '1' printf '%s' 'b' >&2 printf '%s' '2' printf '%s' 'c' >&2 printf '%s' '3' printf '\n' >&2 printf '\n' exit 0; $ ./abc123 a1b2c3 $ ./abc123 2>&1 | cat a1b2c3 $ cat ./abc123.c #include int main() { putc('a', stderr); putc('1', stdout); putc('b', stderr); putc('2', stdout); putc('c', stderr); putc('3', stdout); putc('\n', stderr); putc('\n', stdout); return 0; } $ gcc -o abc123.exe abc123.c $ ./abc123.exe a1b2c3 $ ./abc123.exe 2>&1 | cat 123 abc $ stdbuf --output=0 --error=0 -- ./abc123.exe 2>&1 | cat 123 abc $ I probably shouldn't go around assuming that things are smart. I'll accept that adding logic to glibc to test if any given set of file descriptors are pointing to the same file or pipe and ensuring that anything written to any one of those file descriptors is always actually written to the stream for the first one, for instance, would probably be overkill.
Re: Modify buffering of standard streams via environment variables (not LD_PRELOAD)?
On Sat, Apr 20, 2024 at 11:58 AM Carl Edquist wrote: > > On Thu, 18 Apr 2024, Zachary Santer wrote: > > > > Finally had a chance to try to build with 'stdbuf --output=L --error=L > > --' in front of the build script, and it caused some crazy problems. > > For what it's worth, when I was trying that out msys2 (since that's what > you said you were using), I also ran into some very weird errors when just > trying to export LD_PRELOAD and _STDBUF_O to what stdbuf -oL sets. It was > weird because I didn't see issues when just running a command (including > bash) directly under stdbuf. I didn't get to the bottom of it though and > I don't have access to a windows laptop any more to experiment. This was actually in RHEL 7. stdbuf --output=L --error=L -- "${@}" 2>&1 | tee log-file | while IFS='' read -r line; do # do stuff done # And then obviously the arguments to this script give the command I want it to run. > Also I might ask, why are you setting "--error=L" ? > > Not that this is the problem you're seeing, but in any case stderr is > unbuffered by default, and you might mess up the output a bit by line > buffering it, if it's expecting to output partial lines for progress or > whatever. I don't know how buffering works when stdout and stderr get redirected to the same pipe. You'd think, whatever it is, it would have to be smart enough to keep them interleaved in the same order they were printed to in. That in mind, I would assume they both get placed into the same block buffer by default.
Re: Modify buffering of standard streams via environment variables (not LD_PRELOAD)?
On Fri, Apr 19, 2024 at 8:26 AM Pádraig Brady wrote: > > Perhaps at this stage we should consider stdbuf ubiquitous enough to suffice, > noting that it's also supported on FreeBSD. Alternatively, if glibc were modified to act on these hypothetical environment variables, it would be trivial to have stdbuf simply set those, to ensure backwards compatibility. > I'm surprised that the LD_PRELOAD setting is breaking your ada build, > and it would be interesting to determine the reason for that. If I had that kind of time...
Re: Modify buffering of standard streams via environment variables (not LD_PRELOAD)?
On Fri, Apr 19, 2024 at 5:32 AM Pádraig Brady wrote: > > env variables are what I proposed 18 years ago now: > https://sourceware.org/bugzilla/show_bug.cgi?id=2457 And the "resistance to that" from the Red Hat people 24 years ago is listed on a website that doesn't exist anymore. If I'm to argue with a guy from 18 years ago... Ulrich Drepper wrote: > Hell, no. Programs expect a certain buffer mode and perhaps would work > unexpectedly if this changes. By setting a mode to unbuffered, for instance, > you can easily DoS a system. I can think about enough other reasons why this > is > a terrible idea. Programs explicitly must request a buffering scheme so that > it > matches the way the program uses the stream. If buffering were set according to the env vars before the program configures buffers on its end, if it chooses to, then the env vars have no effect. This is how the stdbuf util works, right now. Would programs that expect a certain buffer mode not set that mode explicitly themselves? Are you allowing untrusted users to set env vars for important daemons or something? How is this a valid concern? This is specific to the standard streams, 0-2. Buffering of stdout and stderr is already configured dynamically by libc. If it's going to a terminal, it's line-buffered. If it's not, it's fully buffered.
Modify buffering of standard streams via environment variables (not LD_PRELOAD)?
Was "RFE: enable buffering on null-terminated data" On Wed, Mar 20, 2024 at 4:54 AM Carl Edquist wrote: > > However, if stdbuf's magic env vars are exported in your shell (either by > doing a trick like 'export $(env -i stdbuf -oL env)', or else more simply > by first starting a new shell with 'stdbuf -oL bash'), then every command > in your pipelines will start with the new default line-buffered stdout. > That way your line-items from build.sh should get passed all the way > through the pipeline as they are produced. Finally had a chance to try to build with 'stdbuf --output=L --error=L --' in front of the build script, and it caused some crazy problems. I was building Ada, though, so pretty good chance that part of the build chain doesn't link against libc at all. I got a bunch of ERROR: ld.so: object '/usr/libexec/coreutils/libstdbuf.so' from LD_PRELOAD cannot be preloaded: ignored. And then it somehow caused compiler errors relating to the size of what would be pointer types. Cleared out all the build products and tried again without stdbuf and everything was fine. >From the original thread just within the coreutils email list, "stdbuf feature request - line buffering but for null-terminated data": On Tue, Mar 12, 2024 at 12:42 PM Kaz Kylheku wrote: > > I would say that if it is implemented, the programs which require > it should all make provisions to set it up themselves. > > stdbuf is a hack/workaround for programs that ignore the > issue of buffering. Specifically, programs which send information > to one of the three standard streams, such that the information > is required in a timely way. Those streams become fully buffered > when not connected to a terminal. I think I've partially come around to this point of view. However, instead of expecting all sorts of individual programs to implement their own buffering mode command-line options, could this be handled with environment variables, but without LD_PRELOAD? I don't know if libc itself can check for those environment variables and adjust each program's buffering on its own, but if so, that would be a much simpler solution. You could compare this to the various locale environment variables, though I think a lot of commands whose behavior differ from locale to locale do have to implement their own handling of that internally, at least to some extent. This seems like somewhat less of a hack, and if no part of a program looks for those environment variables, it isn't going to find itself getting broken by the dynamic linker. It's just not going to change its buffering. Additionally, things that don't link against libc could still honor these environment variables, if the developers behind them care to put in the effort. Zack
Re: RFE: enable buffering on null-terminated data
On Tue, Mar 19, 2024 at 1:24 AM Kaz Kylheku wrote: > > But what tee does is set up _IONBF on its output streams, > including stdout. So it doesn't buffer at all. Awesome. Nevermind.
Re: RFE: enable buffering on null-terminated data
On Thu, Mar 14, 2024 at 11:14 AM Carl Edquist wrote: > Where things get sloppy is if you add some stuff in a pipeline after your > build script, which results in things getting block-buffered along the > way: > > $ ./build.sh | sed s/what/ever/ | tee build.log > > And there you will definitely see a difference. Sadly, the man page for stdbuf specifically calls out tee as being unaffected by stdbuf, because it adjusts the buffering of its standard streams itself. The script I mentioned pipes everything through tee, and I don't think I'm willing to refactor it not to. Ah well. > Oh, I imagine "undefined operation" means something more like > "unspecified" here. stdbuf(1) uses setbuf(3), so the behavior you'll get > should be whatever the setbuf(3) from the libc on your system does. > > I think all this means is that the C/POSIX standards are a bit loose about > what is required of setbuf(3) when a buffer size is specified, and there > is room in the standard for it to be interpreted as only a hint. > Works for me (on glibc-2.23) Thanks for setting me straight here. > What may not be obvious is that the shell does not need to get involved > with writing input for a coprocess or reading its output - the shell can > start other (very fast) programs with input/output redirected to/from the > coprocess pipes to do that processing. Gosh, I'd like to see an example of that, too. > My point though earlier was that a null-terminated record buffering mode, > as useful as it sounds on the surface (for null-terminated paths), may > actually be something _nobody_ has ever actually needed for an actual (not > contrived) workflow. I considered how it seemed like something people could need years ago and only thought to email into email lists about it last weekend. Maybe there are all sorts of people out there who have been using 'stdbuf --output=0' on null-terminated data for years and never thought to raise the issue. I know that's not a very strong argument, though.
Re: stdbuf feature request - line buffering but for null-terminated data
On Tue, Mar 12, 2024 at 12:42 PM Kaz Kylheku wrote: > stdbuf is a hack/workaround for programs that ignore the > issue of buffering. Specifically, programs which send information > to one of the three standard streams, such that the information > is required in a timely way. Those streams become fully buffered > when not connected to a terminal. When we're talking about very simple programs, like expand, stdbuf is probably the best solution we're ever going to actually get. > There can be a performance issue also, though! Suppose > we run "find" to find certain files over a large file tree. > It finds only a small number of files: all the file paths > identified fit into a single buffer, which is not flushed > until the program terminates (when sent to a pipe). > > We pipe this to some program which does some processing > on those files. We would like the processing to start as > soon as the first file has been identified, not when find is done! > It could be that find discovers all the relevant files > early in its execution and then spends a minute finding > nothing else. That minute is added to the processing time > of the files that were found. > > That is the compelling reason for wanting file names to > be flushed individually, whether they are newline terminated > or null terminated. An ideal solution for this situation, from the perspective of a relative layperson, would be to flush a sized buffer after a given time period of containing data but having not been flushed. So, if a buffer gets filled very quickly, it just gets flushed upon being filled. If data sits in the buffer for a few too many processor cycles or what have you, it gets flushed right then. I imagine there would be some overhead to implementing that, which I don't have a good feel for.
Re: stdbuf feature request - line buffering but for null-terminated data
On Tue, Mar 12, 2024 at 2:58 PM Kaz Kylheku wrote: > What if there existed an alternative delimiting mode: a format where > the character strings are delimited by the two byte sequence \0\n. How long did it take for the major command-line utilities to initially implement handling null-terminated data? I submitted a feature request to the pcre2 maintainer to implement printing null-terminated filenames from pcre2grep, just back in July of 2022. To his credit, that got done quickly, but that version of the library still missed getting into RHEL 9, unless they've updated it since I've looked. Furthermore, there's little consistency from utility to utility in what the flag to specify null-delimited data is. Now you're asking for a whole lot more of that. > 1. It now works with line buffering. > > 2. Assuming \0 is just an invisible character consumed by terminals with no >effect, this format can be dumped to a TTY where it turns into >lines, as if the nulls were not there. "tr '\0' '\n'" at the end of a pipeline isn't the end of the world. > 3. Portability: doesn't require a new buffering mode that would only >be initially supported in Glibc, and likely never spread beyond >a handful of freeware C libraries. I've got a conversation going with the glibc people, that this list is cc'd on, but who knows if it goes anywhere. In any case, if it's a choice between unbuffered stdout and a whole new data delimiting sequence that now every utility has to support, unbuffered stdout is going to be the answer.
Re: RFE: enable buffering on null-terminated data
On Mon, Mar 11, 2024 at 7:54 AM Carl Edquist wrote: > > (In my coprocess management library, I effectively run every coproc with > --output=L by default, by eval'ing the output of 'env -i stdbuf -oL env', > because most of the time for a coprocess, that's whats wanted/necessary.) Surrounded by 'set -a' and 'set +a', I guess? Now that's interesting. I just added that to a script I have that prints lines output by another command that it runs, generally a build script, to the command line, but updating the same line over and over again. I want to see if it updates more continuously like that. > ... Although, for your example coprocess use, where the shell both > produces the input for the coproc and consumes its output, you might be > able to simplify things by making the producer and consumer separate > processes. Then you could do a simpler 'producer | filter | consumer' > without having to worry about buffering at all. But if the producer and > consumer need to be in the same process (eg they share state and are > logically interdependent), then yeah that's where you need a coprocess for > the filter. Yeah, there's really no way to break what I'm doing into a standard pipeline. > (Although given your time output, you might say the performance hit for > unbuffered is not that huge.) We see a somewhat bigger difference, at least proportionally, if we get bash more or less out of the way. See command-buffering, attached. Standard: real0m0.202s user0m0.280s sys 0m0.076s Line-buffered: real0m0.497s user0m0.374s sys 0m0.545s Unbuffered: real0m0.648s user0m0.544s sys 0m0.702s In coproc-buffering, unbuffered output was 21.7% slower than line-buffered output, whereas here it's 30.4% slower. Of course, using line-buffered or unbuffered output in this situation makes no sense. Where it might be useful in a pipeline is when an earlier command in a pipeline might only print things occasionally, and you want those things transformed and printed to the command line immediately. > So ... again in theory I also feel like a null-terminated buffering mode > for stdbuf(1) (and setbuf(3)) is kind of a missing feature. My assumption is that line-buffering through setbuf(3) was implemented for printing to the command line, so its availability to stdbuf(1) is just a useful side effect. In the BUGS section in the man page for stdbuf(1), we see: On GLIBC platforms, specifying a buffer size, i.e., using fully buffered mode will result in undefined operation. If I'm not mistaken, then buffer modes other than 0 and L don't actually work. Maybe I should count my blessings here. I don't know what's going on in the background that would explain glibc not supporting any of that, or stdbuf(1) implementing features that aren't supported on the vast majority of systems where it will be installed. > It may just > be that nobody has actually had a real need for it. (Yet?) I imagine if anybody has, they just set --output=0 and moved on. Bash scripts aren't the fastest thing in the world, anyway. command-buffering Description: Binary data
Re: RFE: enable buffering on null-terminated data
On Sun, Mar 10, 2024 at 4:36 PM Carl Edquist wrote: > > Hi Zack, > > This sounds like a potentially useful feature (it'd probably belong with a > corresponding new buffer mode in setbuf(3)) ... > > > Filenames should be passed between utilities in a null-terminated > > fashion, because the null byte is the only byte that can't appear within > > one. > > Out of curiosity, do you have an example command line for your use case? My use for 'stdbuf --output=L' is to be able to run a command within a bash coprocess. (Really, a background process communicating with the parent process through FIFOs, since Bash prints a warning message if you try to run more than one coprocess at a time. Shouldn't make a difference here.) See coproc-buffering, attached. Without making the command's output either line-buffered or unbuffered, what I'm doing there would deadlock. I feed one line in and then expect to be able to read a transformed line immediately. If that transformed line is stuck in a buffer that's still waiting to be filled, then nothing happens. I swear doing this actually makes sense in my application. $ ./coproc-buffering 10 Line-buffered: real0m17.795s user0m6.234s sys 0m11.469s Unbuffered: real0m21.656s user0m6.609s sys 0m14.906s When I initially implemented this thing, I felt lucky that the data I was passing in were lines ending in newlines, and not null-terminated, since my script gets to benefit from 'stdbuf --output=L'. Truth be told, I don't currently have a need for --output=N. Of course, sed and all sorts of other Linux command-line tools can produce or handle null-terminated data. > > If I want to buffer output data on null bytes, the closest I can get is > > 'stdbuf --output=0', which doesn't buffer at all. This is pretty > > inefficient. > > I'm just thinking that find(1), for instance, will end up calling write(2) > exactly once per filename (-print or -print0) if run under stdbuf > unbuffered, which is the same as you'd get with a corresponding stdbuf > line-buffered mode (newline or null-terminated). > > It seems that where line buffering improves performance over unbuffered is > when there are several calls to (for example) printf(3) in constructing a > single line. find(1), and some filters like grep(1), will write a line at > a time in unbuffered mode, and thus don't seem to benefit at all from line > buffering. On the other hand, cut(1) appears to putchar(3) a byte at a > time, which in unbuffered mode will (like you say) be pretty inefficient. > > So, depending on your use case, a new null-terminated line buffered option > may or may not actually improve efficiency over unbuffered mode. I hadn't considered that. > You can run your commands under strace like > > stdbuf --output=X strace -c -ewrite command ... | ... > > to count the number of actual writes for each buffering mode. I'm running bash in MSYS2 on a Windows machine, so hopefully that doesn't invalidate any assumptions. Now setting up strace around the things within the coprocess, and only passing in one line, I now have coproc-buffering-strace, attached. Giving the argument 'L', both sed and expand call write() once. Giving the argument 0, sed calls write() twice and expand calls it a bunch of times, seemingly once for each character it outputs. So I guess that's it. $ ./coproc-buffering-strace L |Line with tabs why?| $ grep -c -F 'write:' sed-trace.txt expand-trace.txt sed-trace.txt:1 expand-trace.txt:1 $ ./coproc-buffering-strace 0 |Line with tabs why?| $ grep -c -F 'write:' sed-trace.txt expand-trace.txt sed-trace.txt:2 expand-trace.txt:30 > Carl > > > PS, "find -printf" recognizes a '\c' escape to flush the output, in case > that helps. So "find -printf '%p\0\c'" would, for instance, already > behave the same as "stdbuf --output=N find -print0" with the new stdbuf > output mode you're suggesting. > > (Though again, this doesn't actually seem to be any more efficient than > running "stdbuf --output=0 find -print0") > > On Sun, 10 Mar 2024, Zachary Santer wrote: > > > Was "stdbuf feature request - line buffering but for null-terminated data" > > > > See below. > > > > On Sun, Mar 10, 2024 at 5:38 AM Pádraig Brady wrote: > >> > >> On 09/03/2024 16:30, Zachary Santer wrote: > >>> 'stdbuf --output=L' will line-buffer the command's output stream. > >>> Pretty useful, but that's looking for newlines. Filenames should be > >>> passed between utilities in a null-terminated fashion, because the > >>> null byte is the only byte that can't appear within one. > >>> > >>> If I want to buffer output data on null bytes, the closest I can get
RFE: enable buffering on null-terminated data
Was "stdbuf feature request - line buffering but for null-terminated data" See below. On Sun, Mar 10, 2024 at 5:38 AM Pádraig Brady wrote: > > On 09/03/2024 16:30, Zachary Santer wrote: > > 'stdbuf --output=L' will line-buffer the command's output stream. > > Pretty useful, but that's looking for newlines. Filenames should be > > passed between utilities in a null-terminated fashion, because the > > null byte is the only byte that can't appear within one. > > > > If I want to buffer output data on null bytes, the closest I can get > > is 'stdbuf --output=0', which doesn't buffer at all. This is pretty > > inefficient. > > > > 0 means unbuffered, and Z is already taken for, I guess, zebibytes. > > --output=N, then? > > > > Would this require a change to libc implementations, or is it possible now? > > This does seem like useful functionality, > but it would require support for libc implementations first. > > cheers, > Pádraig
stdbuf feature request - line buffering but for null-terminated data
'stdbuf --output=L' will line-buffer the command's output stream. Pretty useful, but that's looking for newlines. Filenames should be passed between utilities in a null-terminated fashion, because the null byte is the only byte that can't appear within one. If I want to buffer output data on null bytes, the closest I can get is 'stdbuf --output=0', which doesn't buffer at all. This is pretty inefficient. 0 means unbuffered, and Z is already taken for, I guess, zebibytes. --output=N, then? Would this require a change to libc implementations, or is it possible now? - Zack
Re: [PATCH] printf: add %#s alias to %b
On Thu, Sep 7, 2023 at 12:55 PM Robert Elz wrote: > There are none, printf(3) belongs to the C committee, and they can make > use of anything they like, at any time they like. > > The best we can do is use formats that make no sense for printf(1) to > support > That's still assuming the goal of minimizing the discrepancies between printf(1) and printf(3) format specifiers. As you point out, that isn't particularly useful, and these things diverging further is now a foregone conclusion. The only benefit, from my perspective, is allowing the printf(1) man page to simply reference the printf(3) man page for everything that printf(1) attempts to replicate. Zack
Re: [PATCH] printf: add %#s alias to %b
The trouble with using an option flag to printf(1) to toggle the meaning of %b is that you can't then mix format specifiers for binary literals and backslash escape expansion within the same format string. You'd just have to call printf(1) multiple times, which largely defeats the purpose of a format string. I don't know what potential uppercase/lowercase pairs of format specifiers are free from use in any existing POSIX-like shell, but my suggestion would be settling on one of those to take on the meaning of C2x's %b. They'd still print '0b' or '0B' in the resulting binary literal when given the # flag, which might be a little confusing, but this seems like the safest way to go. It obviously still represents a divergence from C2x's printf(3), but I think the consensus is that that's going to happen regardless. ksh's format specifiers for arbitrary-base integer representation sound really slick, and I'd love to see that in Bash, too, actually. Zack