Re: O_DIRECT "packet mode" pipes on Linux
On Thu, Sep 24, 2020 at 12:48:14PM +0700, Robert Elz wrote: > Date:Wed, 23 Sep 2020 21:47:10 -0700 > From:Vito Caputo > Message-ID: <20200924044710.xpltp22bpxoxi...@shells.gnugeneration.com> > > > | It's useful if you're doing something like say, aggregating data from > | multiple piped sources into a single bytestream. With the default > | pipe behavior, you'd have the output interleaved at random boundaries. > > If that's happening, then either the pipe implementation is badly broken, > or the applications using it aren't doing what you'd like them to do. > > Writes (<= the pipe buffer size) have always (since ancient unix, probably > since pipes were first created) been atomic - nothing will randomly split > the data. > > What the new option is offering (as best I can tell from the discussion > here, I am not a linux user) is passing those boundaries through the pipe > to the reader - that hasn't been a pipe feature, but it is exactly what a > unix domain datagram socket provides (these days pipes are sometimes > implemented using unix domain connection oriented sockets ... I'm guessing > that the option simply changes the transport protocol used with an > implementation that works that way). > Apparently I was incomplete in describing my conjured example. The aggregator in this case is a process connected to multiple pipes, not a pipe with multiple writer processes. What you describe is correct WRT multiple writers to a shared pipe. In my example, the aggregator can trivially read the separate records at the write boundaries from each of the connected packetized pipes. The reads return at the write boundaries. Without packetized pipes you'd need to parse the contents to search for record boundaries. Imagine it's like an inverted `tee` for input instead of output. Without packetized pipes, this hypethetical program couldn't interleave the collected inputs at record boundaries without parsing the contents. Presumably this is *why* we don't already have an input version of `tee`. I'd like to work towards changing that. > | With packetized pipes, if your sources write say, newline-delimited > | text records, kept under PIPE_BUF length, the aggregated output would > | always interleave between the lines, never in the middle of them. > > That happens with regular pipes. > See above, the aggregator is a process, not a shared pipe. > | If we added this to the shell, I suppose the next thing to explore > | would be how to get all the existing core shell utilities to detect a > | packetized pipe on stdout and switch to a line-buffered mode instead > | of block-buffered, assuming they're using stdio. > > I suspect that is really all you need - a mechanism to request line > buffered output rather than blocksize buffered. You don't need to > go fiddling with pipes for that, and abusing the pipe interface as a > way to pass a "line buffer this output please" request to the application > seems like the wrong way to achieve that to me. > This is probably true, though if a packetized pipe were introspectable we could request the behavior via the ||| construction, while simultaneously enabling record boundaries regardless of how the contents are delimited. If consumers knew about packetized pipes, they could treat the separately returned reads as records independently of what's inside. > This isn't a criticism of the datagram packet pipe idea - there are > applications for that (pipe is easier to use than manually setting up > a pair of unix domain datagram sockets) but that is for specialised > applications, where for whatever reason the receiver needs to read just > one packet at a time (usually because of a desire to have multiple > reading applications, each taking the next request, and then processing > it ... if there is just one receiving process all that is needed is > to stick a record length before each packet sent to a normal pipe, and > let the receiver process the records from the aggregations it receives). > Thanks for the thoughtful response, Vito Caputo
Re: O_DIRECT "packet mode" pipes on Linux
Date:Wed, 23 Sep 2020 21:47:10 -0700 From:Vito Caputo Message-ID: <20200924044710.xpltp22bpxoxi...@shells.gnugeneration.com> | It's useful if you're doing something like say, aggregating data from | multiple piped sources into a single bytestream. With the default | pipe behavior, you'd have the output interleaved at random boundaries. If that's happening, then either the pipe implementation is badly broken, or the applications using it aren't doing what you'd like them to do. Writes (<= the pipe buffer size) have always (since ancient unix, probably since pipes were first created) been atomic - nothing will randomly split the data. What the new option is offering (as best I can tell from the discussion here, I am not a linux user) is passing those boundaries through the pipe to the reader - that hasn't been a pipe feature, but it is exactly what a unix domain datagram socket provides (these days pipes are sometimes implemented using unix domain connection oriented sockets ... I'm guessing that the option simply changes the transport protocol used with an implementation that works that way). | With packetized pipes, if your sources write say, newline-delimited | text records, kept under PIPE_BUF length, the aggregated output would | always interleave between the lines, never in the middle of them. That happens with regular pipes. | If we added this to the shell, I suppose the next thing to explore | would be how to get all the existing core shell utilities to detect a | packetized pipe on stdout and switch to a line-buffered mode instead | of block-buffered, assuming they're using stdio. I suspect that is really all you need - a mechanism to request line buffered output rather than blocksize buffered. You don't need to go fiddling with pipes for that, and abusing the pipe interface as a way to pass a "line buffer this output please" request to the application seems like the wrong way to achieve that to me. This isn't a criticism of the datagram packet pipe idea - there are applications for that (pipe is easier to use than manually setting up a pair of unix domain datagram sockets) but that is for specialised applications, where for whatever reason the receiver needs to read just one packet at a time (usually because of a desire to have multiple reading applications, each taking the next request, and then processing it ... if there is just one receiving process all that is needed is to stick a record length before each packet sent to a normal pipe, and let the receiver process the records from the aggregations it receives). kre
Re: O_DIRECT "packet mode" pipes on Linux
On Wed, Sep 23, 2020 at 11:53:10PM -0400, Lawrence Velázquez wrote: > > On Sep 23, 2020, at 11:41 PM, Vito Caputo wrote: > > > > On Wed, Sep 23, 2020 at 09:12:40AM -0400, Chet Ramey wrote: > >> On 9/22/20 11:23 PM, Vito Caputo wrote: > >>> Hello list, > >>> > >>> Is there any chance we could get a | modifier for enabling O_DIRECT on the > >>> created pipe? "Packet" style pipes have some interesting and potentially > >>> useful properties, it would be nice if bash made them more accessible. > >> > >> Is there a general need, especially since they're Linux-specific? > >> > > > > I'm not sure, but as far as GNU/Linux distros go bash, is kind of the > > canonical shell, and this functionality is kind of inaccessible > > without the shell wiring it up. > > What functionality? I (and I'm sure some others) am not familiar > with packet-style pipes and their benefits. You haven't actually > described *how* exposing them would be useful, and why that would > justify introducing new syntax that only matters/works on Linux. > Packetized pipes establish well-defined boundaries between writes reproduced at the read side. If the write sizes are kept within PIPE_BUF bounds, then you can be certain what's read is an atomic record including nothing from a subsequent or previous write, with no possibility for partial records. It's useful if you're doing something like say, aggregating data from multiple piped sources into a single bytestream. With the default pipe behavior, you'd have the output interleaved at random boundaries. With packetized pipes, if your sources write say, newline-delimited text records, kept under PIPE_BUF length, the aggregated output would always interleave between the lines, never in the middle of them. If we added this to the shell, I suppose the next thing to explore would be how to get all the existing core shell utilities to detect a packetized pipe on stdout and switch to a line-buffered mode instead of block-buffered, assuming they're using stdio. That should turn their lines into packets on the pipe, and it all becomes generally relevant across the existing shell utils landscape. This heuristic echoes of the terminal output detection for stdout line-buffering already performed according to setvbuf(3). Thanks, Vito Caputo
Re: O_DIRECT "packet mode" pipes on Linux
> On Sep 23, 2020, at 11:41 PM, Vito Caputo wrote: > > On Wed, Sep 23, 2020 at 09:12:40AM -0400, Chet Ramey wrote: >> On 9/22/20 11:23 PM, Vito Caputo wrote: >>> Hello list, >>> >>> Is there any chance we could get a | modifier for enabling O_DIRECT on the >>> created pipe? "Packet" style pipes have some interesting and potentially >>> useful properties, it would be nice if bash made them more accessible. >> >> Is there a general need, especially since they're Linux-specific? >> > > I'm not sure, but as far as GNU/Linux distros go bash, is kind of the > canonical shell, and this functionality is kind of inaccessible > without the shell wiring it up. What functionality? I (and I'm sure some others) am not familiar with packet-style pipes and their benefits. You haven't actually described *how* exposing them would be useful, and why that would justify introducing new syntax that only matters/works on Linux. vq
Re: O_DIRECT "packet mode" pipes on Linux
On Wed, Sep 23, 2020 at 09:12:40AM -0400, Chet Ramey wrote: > On 9/22/20 11:23 PM, Vito Caputo wrote: > > Hello list, > > > > Is there any chance we could get a | modifier for enabling O_DIRECT on the > > created pipe? "Packet" style pipes have some interesting and potentially > > useful properties, it would be nice if bash made them more accessible. > > Is there a general need, especially since they're Linux-specific? > I'm not sure, but as far as GNU/Linux distros go bash, is kind of the canonical shell, and this functionality is kind of inaccessible without the shell wiring it up. If I'm not mistaken this pipe flavor exists as the default behavior in plan9, so it's not entirely unique to linux conceptually. > What kind of modifier would you suggest? > Maybe triple pipe could be packetized pipe? It visually expresses being sliced up somewhat; `foo ||| bar` > Does anyone want to take a shot at implementing this idea? > It's possible I could find time to take a stab at it, if nobody else wants to. Cheers, Vito Caputo
Re: Feature request: Enable possibility of colored stderr output
A M writes: > It would be really neat to have functionality that stderr could be > output in a different color compared to stdout. It seems like the right way to implement this would be to build a program that reads from two inputs at once and interleaves what it gets from the two inputs, and outputs that on its stdout. In addition, it would ensure that text from each input was labled with the right colorization codes (whatever that might be). Writing that has some complexity but is mostly careful programming. It also involves nailing down exactly what colorization scheme you want to use. Then, in bash, you could execute something like: exec >&$xxx 2>&$yyy to redirect all future stderr and stdout into the post-processor. The part I can't figure out is how to start and background the postprocessor leaving its two input fd's (xxx and yyy in the above commands) open and accessible to later bash commands. That isn't difficult to do with one fd; you can do something like exec 3> >( postprocessor ) command >&3 I wonder if you could do something like this, using a process substitition to a dummy "sleep" process to create an additional connection: exec 3> >( sleep 60 ) exec 4> >( postprocessor /dev/fd/3 ) exec >&3 2>&4 exec 3>&- 4>&- I'll leave that question to the experts. Dale
Re: Confusing documentation of `ENV` in section "Bash variables"
On Wed, 23 Sep 2020 at 14:24, Chet Ramey wrote: > "Expanded and executed similarly to BASH_ENV when an interactive shell is > invoked in POSIX Mode." > Yes, that's better than my suggestions, thanks! -- https://rrt.sc3d.org
Re: Confusing documentation of `ENV` in section "Bash variables"
On 9/22/20 6:27 PM, Reuben Thomas via Bug reports for the GNU Bourne Again SHell wrote: > The documentation says: > > 'ENV' > Similar to 'BASH_ENV'; used when the shell is invoked in POSIX Mode > (*note Bash POSIX Mode::). Hmmm, I can see that. How about, as you suggest, something like "Expanded and executed similarly to BASH_ENV when an interactive shell is invoked in POSIX Mode." The behavior is described pretty completely in the INVOCATION section. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Re: O_DIRECT "packet mode" pipes on Linux
On 9/22/20 11:23 PM, Vito Caputo wrote: > Hello list, > > Is there any chance we could get a | modifier for enabling O_DIRECT on the > created pipe? "Packet" style pipes have some interesting and potentially > useful properties, it would be nice if bash made them more accessible. Is there a general need, especially since they're Linux-specific? What kind of modifier would you suggest? Does anyone want to take a shot at implementing this idea? -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
O_DIRECT "packet mode" pipes on Linux
Hello list, Is there any chance we could get a | modifier for enabling O_DIRECT on the created pipe? "Packet" style pipes have some interesting and potentially useful properties, it would be nice if bash made them more accessible. pipe(2) O_DIRECT excerpt: O_DIRECT (since Linux 3.4) Create a pipe that performs I/O in "packet" mode. Each write(2) to the pipe is dealt with as a separate packet, and read(2)s from the pipe will read one packet at a time. Note the follow‐ ing points: * Writes of greater than PIPE_BUF bytes (see pipe(7)) will be split into multiple packets. The constant PIPE_BUF is defined in . * If a read(2) specifies a buffer size that is smaller than the next packet, then the requested number of bytes are read, and the excess bytes in the packet are discarded. Specifying a buffer size of PIPE_BUF will be sufficient to read the largest possible packets (see the previous point). * Zero-length packets are not supported. (A read(2) that spec‐ ifies a buffer size of zero is a no-op, and returns 0.) Older kernels that do not support this flag will indicate this via an EINVAL error. Regards, Vito Caputo
Re: Bash-5.1-beta available
Hi, On Mon, Sep 21, 2020 at 09:13:13PM -0400, Dale R. Worley wrote: > Andreas Schwab writes: > I assume that if you really want the old effect, you can still do > > exec {dup}<&1 You mean: exec {dup}<&0 > > ... <( ... <$dup ) ... and: ... <( ... <&$dup ) ... > > exec {dup}<&- > > Dale Sample: This worked under previous bash versions (I use 5.0.3(1)-release): testForkInput () { local line while read line ;do echo "$line" done < <( sed 's/^/> /' ) } But with 5.1.0(1)-beta, I have to replace this with: testForkInput () { local dup line exec {dup}<&0 while read line ;do echo "$line" done < <( sed 's/^/> /' <&$dup ) exec {dup}>&- } testForkInput < <(seq 1 5) > 1 > 2 > 3 > 4 > 5 On Tue, Sep 22, 2020 at 08:57:57AM +0200, Andreas Schwab wrote: > That point is that it silently breaks existing scripts. I agree. -- Félix Hauri -- http://www.f-hauri.ch